Short, simple projects produce a PR containing a single, small commit. However, it is often the case that a project grows into multiple PRs, or a PR grows to contain multiple commits.

This happens mainly in two situations:

Table of contents:

PR organization philosophy

A pull request (PR) is a package of work that engages three parties:

It is important to understand that a PR is not just about changing the code. It is really about creating a dialogue, via the history of commits, that helps both reviewers and future maintainers understanding the concepts and the components inside the code.

In a nutshell, the programmer’s job is to organize the PR to be easily reviewable and to make the code more easily maintainable. To achieve reviewability, the best way is to make each commit cover one conceptual unit of work towards the goal, and explain intent (in commit messages) separately from the code. To achieve maintainability, the programmer should explain their choices and spell out (in commit messages) what they left out, when applicable.

How PRs help reviewers and maintainers

To do their job properly, the reviewer wants to know “what is going on” in the change. The reviewer needs to evaluate whether there is a difference between what the programmer intended to do (i.e. the goal) and what the code actually does. To do this work properly, the reviewer needs to understand what the programmer thinks they were doing—separately from the code.

It’s easy to conflate the two; programmers often like to think “my code output is, by definition, what I wanted to do”. But that is not true. To err is human; a mistake is literally a difference between intent and result, and the reviewer cannot spot mistakes if the programmer doesn’t tell them (in prose) what they wanted to do separately from the code. It’s also easy to say ”well I have a google doc which explains intent already, why do I need to explain again?”. This is unfriendly to the future maintainer of the code, who will likely not know where to find this documentation. Also, sometimes, the best reviewer comes from another team and doesn’t know about the project yet. We wish the commit messages to speak for themselves.

Then separately, to do their job properly, the maintainers want to know “what the programmer did not put in the code”. Very often, in a project, a programmer cannot change their code in a perfect way towards the goal. The programmer makes choices, or leaves certain aspects away, on purpose, with the intent to tweak/optimize things later. The maintainers need to know what are the possible/acceptable follow-up changes, even if the code (that may be incomplete/imperfect) does not tell that on its own.

This is why we wish commit messages to tell a story that goes beyond the current change. We want the commit messages to talk about the goals, and all the ways in which the current commit/PR is not achieving the goal yet. The descriptions in prose should talk about choices (“I did X instead of Y because…” or “I did not do X even though it was requested, because…”) and link to related public issues, so that later maintainers can trace the decisions and revisit them when circumstances change.

So in summary, the commit messages and PR descriptions should talk about intent separately from the result, and document the choices. This does not need many sentences, or advanced use of the English language. It is not a complicated literary exercise.

It does need that the change has well-defined boundaries though. It is very had to talk about intent vs results and document choices if the commits are too fine-grained. If the programmer makes too many small commits (e.g. “fixed a comma”, or “added a function there” then another commit “added another function there”), there is nothing of value for the reviewer and maintainer to look at!

This is also why we encourage each commit to cover one conceptual unit of work towards the goal. Not more (do not combine too many changes / decisions into one commit), but not less either.

We call this “one commit per logical change”.

How to decompose PRs into logical changes

For example, a change to add a new SQL statement could have the following commits:

  1. one commit to add support for the SQL syntax, with parsing unit tests. In this commit, the planning code is changed to report an "unimplemented error" when the new statement is planned. A planning unit test is added that verifies that the unimplemented error is reported

  2. another commit to plan the statement, with planning unit tests. In this commit, the execution code is changed to report an "unimplemented error" when the new statement is executed, and the planning unit test is updated to check the result of logical planning.

  3. a final commit to execute the statement, with execution unit tests. In this commit, the execution unit test is updated to check the results of execution.

All these 3 commits can be combined in a single PR.  However, we do not encourage PRs with numerous commits. After 3 logical changes in a single PR, consider splitting the PR into sub-PRs to be reviewed separately.

When using the "one commit per logical change" principle, each commit must individually have a standalone explanatory commit message, as per the page Git Commit Messages.

One aspect we’re strongly aiming for is that each standalone commit can be built into a CockroachDB executable, and pass unit tests. Achieving this strengthens the confidence we have that the commits are, in fact, pertaining to just one logical change. This also simplifies the work of analyzing the git history for regressions using the git bisect command. There are exceptions, of course, that can be negotiated on a per-PR basis with the reviewer or your team’s TL.

Sometimes, you realize that you wish to split the PR into multiple logical change only after starting the implementation and touching many files across multiple areas. This is still possible! You can use git add -p for this purpose.

Moving blocks of code

Changes may require moving blocks of code (or tests) to new packages or files, and then making edits to the moved code. We recommend splitting this type of refactoring into two commits. The first commit relocates the code without logical changes. The second makes the required logical changes to the code in its new location. This makes it easier for reviewers to scrutinize the moved code and logical changes separately. If both are contained in a single commit the logical changes will be “hidden” from the reviewer in a large block of newly added lines.

Aesthetic changes

Sometimes an author feels compelled to update spelling, grammar, or the overall structure of the code for legibility or other subjective purposes. When doing so, the author should be mindful that moving/editing code for aesthetic reasons also has a downside: it pollutes the output of git blame, which connects code to its last author/maintainer.

It's nice for a blame to be as clean as possible. It's nice for the history of a stanza to have fewer entries. Being mindful of blame is just common hygiene. There is a bar for changing a line of code; we don't do it gratuitously, and, generally speaking, simply fixing an obvious typo (in a comment) does not meet that bar.

Besides the effect of blame churn on humans, it also has an effect on machines. For example, GitHub might start suggesting this person as a reviewer for future patches.

Authors should thus carefully weigh the legibility gains of aesthetic changes with the maintainability cost of impairing history navigation.

Updating the PR during reviews

During a review, the reviewer requests additional change to a PR.

How to modify the PR to address reviewer comments? There are two main ways to do this: git commit --amend and git commit --fixup. Both are detailed below.

The approaches proposed below reflects our practice from past experience. We have observed that approaching iterating through PR changes this way during review tends to produce PRs of higher quality, makes the iteration easier for the programmer, and tends to let the review cycles complete faster.


PR containing just 1 logical change commit

PR containing a small number of logical changes (e.g. less than 3)

PR containing a larger number of logical changes (e.g. more than 3)

Answer one round of reviewer's comments.

Suggestions:

  • git commit --amend

  • git commit --fixup

The latter is slightly easier but requires an extra step at the end (see below).

Suggestions:

  • git rebase -i then edit separate commits at the point of the reviewer's comment.
    In Emacs, use magit-commit-instant-fixup.

  • git commit --fixup

The latter is slightly easier but requires an extra step at the end (see below).

Preferably git commit --fixup.

Otherwise, the reviewer tool (e.g reviewable) quickly becomes confused with the amount of combined changes.

What to do if the review has more iterations (e.g. more than 5)

Same as above.

Prefer git commit --fixup.

Consider breaking down the PR into sub-PRs.

Break down the PR into sub-PRs.

Merge the “prefix” PRs that the reviewer has approved already.

At the end of the review, when the PR has been approved.

Squash all fixup commits together, so that only 1 commit per logical change remains.

Polish the commit message that results from multiple fixups. Ensure it describes the change accurately and add relevant release notes.

Squash fixup commits to their respective logical change.

Polish each commit message that results from multiple fixups. Ensure it describes the change accurately and add relevant release notes.

Ensure each commit builds  into a cockroach executable and passes its unit tests.


Squash fixup commits to their respective logical change..

Polish each commit message that results from multiple fixups. Ensure it describes the change accurately and add relevant release notes.

Ensure each commit builds  into a cockroach executable and passes its unit tests.


Additional notes

This is the most common case, also the case we expect from first-time external contributors.

New Cockroach Labs team members should learn how to operate PRs with a small number of logical commits. This is a required skill.

We try to avoid letting external contributors submit multi-commit PRs. When they do, we encourage them to break down the PR into sub-PRs upfront.


PRs with numerous commits are generally frowned upon in the Cockroach Labs team. They introduce significant risk in the engineering process; either reviews are unable to conclude, or can let more mistakes slip through.

The skill required to manipulate numerous commits through review rounds simultaneously is more advanced and we don't commonly expect it from new co-workers.

Likewise, the skill required to review numerous commits in one PR is also advanced. It is also more expensive: it costs most energy and time to review the PR.

Amending commits

Note: This section is intended to teach how to use git to achieve the goals outlined above, based on what the team has observed works well from past experience.

The command git commit --amend "edits" an existing commit in-place with new changes, after git add.

For a single-commit PR, this is the preferred and most simple way to edit the PR during reviews.

When there are multiple commits in the PR already, git commit --amend only edits the latest commit in the sequence.

To edit an earlier commit, you may proceed as follows:

  1. Stash your changes (git stash).

  2. Run git rebase -i origin/master (or the branch base if you’re forking something else than master).

  3. Mark the commit(s) you want to amend with "edit" in the first column

  4. Save the edit plan. Git will present you the commits to edit one by one.

  5. For each commit to amend, make the changes you wish (or git stash pop if you stashed some changes above). Then check the resulting code builds and passes its unit tests.

  6. Run git add -p on the desired changes, then git commit --amend; polish the commit message for that commit.

  7. Stash the rest.

  8. Once you finish editing each commit, run git rebase --continue to go to the next commit in the list you've selected at step 2. Iterate steps 5-8 until completion.

After you are done editing all commits, the branch has been rewritten and you can push it to the PR again.

Note that for multi-commit PRs, amending commits in this way is more error-prone. Consider fixup commits (next section) if unsure.

Fix-up commits

Note: This section is intended to teach how to use git to achieve the goals outlined above, based on what the team has observed works well from past experience.

The command git commit --fixup   takes the SHA of another commit on the command line, which you intend to amend/modify. Then it creates an additional commit at the end of the current branch, with only the latest changes included with git add -p. The original commit you want to amend/modify is not yet modified at that point.

You can add multiple fix-up commits to a PR, one per commit that needs to be changed based on reviewer comments.

Then you can push the fix-up commits so the reviewer can look at them. Through the review, you can add more fix-up commits.

This solution is simpler because it does not require rewriting the entire PR branch every review round, and simplifies the display of changes inside the review tool.

After multiple round of review, a PR/branch with fix-up commits can look like this:

ba30a29 my first change
deaf012 my second change
55301bf fixup! my first change
900b0c1 fixup! my second change
9bbbd35 fixup! my second change
4c56fb3 fixup! my first change

Then, at the end of the review, when the reviewer has approved the changes, you may proceed as follows:

  1. run the following command: git rebase -i --autosquash ... (give the branch base SHA as argument). The argument --autosquash recognizes all the fixup! markers and re-organizes the entire history so that the fixup commits are in the right order and "squashed" into the commit they are amending.

  2. Check the proposed ordering, and add edit markers to the commits that have fix-up commits attached to them.

  3. Save the rebase plan and let git run it.

  4. For each commit with fix-up commits, check:

Once all the fixups have been squashed, you can push the resulting changes to the PR for final approval and for merging.

How to split amends/fix-ups across multiple commits

Say, you have a multi-commit PR, and you have started editing multiple files, making changes that really belong to separate commits inside your PR. How to make this work easily?

The magic tool is git add -p (the -p argument). It makes it possible to add individual diffs within a file to the current git index, so that you can commit only part of the changes to a file. Pulling a change back from the index into the workspace is via git reset -p.

(To completely cancel part of a change in afile, i.e. revert the code on disk to what was committed earlier, use git checkout -p)

Alternate commands to amend and fix-up PRs

The specific git commands suggested above are sometimes automated by code editors, IDEs, or command-line aliases. The instructions above are informative suggestions more than prescription. We welcome additional approaches.

That said, reusing the common suggestions makes it easier to request help from a co-worker who shares that experience already.

In general, any approach is fine as long as it follows the PR organization philosophy described at the top, and is mindful of the reviewer’s time during review.

Projects with multiple interdependent in-flight PRs

Sometimes, the PR for one change has not been merged yet when we create a follow-up PR that builds upon the first one. This section shares some experience and considerations to keep in mind when this happens.

Note: this is an “advanced” topic and not intended for novice programmers.

Why do in-flight dependent PRs happen

This happens sometimes in more complex projects. For example, a first PRs might create “infrastructure” or a code framework, that the next PRs use to implement a feature.

Generally, we do not favor multiple PRs remaining in-flight concurrently, that pertain to the same project. It makes it harder for TLs and PMs to reason about progress and harder for programmers to keep everything top of mind.

However, it happens sometimes because there is another tension: once an “infrastructure” PR is merged and other PRs are merged on top of it, it makes it harder to change the infrastructure. Also, sometimes we simply do not know exactly how to build this infrastructure until we have some experience using it.

It is the job of the TL to analyze work in advance and propose a project plan that avoids multiple in-flight PRs. So, ideally, this situation remains infrequent. However, it still happens sometimes.

In any case, when this happens we do not consider merging the in-flight PRs together into a single PR. That would make things ever harder to review and iterate on.

Instead, we proceed with caution.

Work extra to keep the PR boundaries clear

When there are multiple related PRs, it is paramount to ensure that each PR speaks for itself. Even the review discussion should be self-contained. Things to keep in mind:

Avoid frequent rebases for multi-PR projects

When working on a standalone PR, it is desirable to rebase the PR’s branch on top of the master branch every now and then, to pick up bug fixes from coworkers, etc.

With multiple inter-dependent open PRs, each rebase makes it extra hard for reviewers of dependent PRs to understand what is going on. Additionally, the rebase process for multiple PRs is technically difficult for the programmmer and thus more error-prone. Therefore, in that situation we prefer to refrain from frequent rebases unless strictly required.

Be mindful of your time and that of the reviewer

Maintaining multiple PRs in-flight costs extra effort and time. The cognitive overhead of context-switching between multiple sub-projects is extremely high.

Be mindful of how much extra effort you are spending. It may be a better investment of your time (and that of your team) to decide to merge one or more of the prefix PRs early, to reduce the number of simultaneous discussions ongoing; then issue additional PRs later to address the shortcomings found in the first PRs.