Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Short, simple projects produce a PR containing a single, small commit. However, it is often the case that a project grows we strongly encourage contributors to split up larger projects into multiple PRs , or a PR grows to contain multiple commitsor commits in order to make code reviews more effective.

This happens mainly in two situations:

  • A larger project may should be decomposed in sub-parts / steps. We recommend one commit per logical change, up to and including when PRs are merged. See below for details.

  • During code review, to answer reviewer comments. We accept seeing fix-up commits during the review, but we request they are merged together (“squashed”) before PRs are merged. See below for more nuance.

Table of contents:

Table of Contents

Table of contents:

Table of Contents

PR organization philosophy

A pull request (PR) is a package of work that engages three parties:

...

In a nutshell, the programmer’s job is to organize the PR to be easily reviewable and to make the code more easily maintainable. To achieve reviewability, the best way is to make each commit cover one conceptual unit of work towards the goal, and explain intent (in commit messages) separately from the code. To achieve maintainability, the programmer should explain their choices and spell out (in commit messages) what they left out, when applicable.

How PRs help reviewers and maintainers

To do their job properly, the reviewer wants to know “what is going on” in the change. The reviewer needs to evaluate whether there is a difference between what the programmer intended to do (i.e. the goal) and what the code actually does. To do this work properly, the reviewer needs to understand what the programmer thinks they were doing—separately from the code.

...

We call this “one commit per logical change”.

How to decompose PRs into logical changes

For example, a change to add a new SQL statement could have the following commits:

...

Studies have shown that as the code size of a change increases, the code review effectiveness decreases [1]. Keeping changes in a review “small, independent, and complete” has been suggested to improve the review process [2]. We strongly encourage contributors to thoughtfully split up larger projects into multiple PRs or commits.

For example, a change to add a new SQL statement could have the following commits:

  1. one commit to add support for the SQL syntax, with parsing unit tests. In this commit, the planning code is changed to report an "unimplemented error" when the new statement is planned. A planning unit test is added that verifies that the unimplemented error is reported

  2. another commit to plan the statement, with planning unit tests. In this commit, the execution code is changed to report an "unimplemented error" when the new statement is executed, and the planning unit test is updated to check the result of logical planning.

  3. a final commit to execute the statement, with execution unit tests. In this commit, the execution unit test is updated to check the results of execution.

...

Sometimes, you realize that you wish to split the PR into multiple logical change only after starting the implementation and touching many files across multiple areas. This is still possible! You can use git add -p for this purpose.

Moving blocks of code

Changes may require moving blocks of code (or tests) to new packages or files, and then making edits to the moved code. We recommend splitting this type of refactoring into two commits. The first commit relocates the code without logical changes. The second makes the required logical changes to the code in its new location. This makes it easier for reviewers to scrutinize the moved code and logical changes separately. If both are contained in a single commit the logical changes will be “hidden” from the reviewer in a large block of newly added lines.

Aesthetic changes

Sometimes an author feels compelled to update spelling, grammar, or the overall structure of the code for legibility or other subjective purposes. When doing so, the author should be mindful that moving/editing code for aesthetic reasons also has a downside: it pollutes the output of git blame, which connects code to its last author/maintainer.

It's nice for a blame to be as clean as possible. It's nice for the history of a stanza to have fewer entries. Being mindful of blame is just common hygiene. There is a bar for changing a line of code; we don't do it gratuitously, and, generally speaking, simply fixing an obvious typo (in a comment) does not meet that bar.Besides the effect of blame churn on humans, it also has an effect on . For example, a one-line change to fix a typo in a comment impairs the output of blame and also has an opportunity cost in your use of time. (But see also the section “Comment updates “ in Code commenting guidelines.)

Besides the effect of blame churn on humans, it also has an effect on machines. For example, GitHub might start suggesting this person as a reviewer for future patches.

Authors should thus carefully weigh the legibility gains of aesthetic changes with the maintainability cost of impairing history navigation.

Updating the PR during reviews

During a review, the reviewer requests additional change to a PR.

How to modify the PR to address reviewer comments? There are two main ways to do this: git commit --amend and git commit --fixup. Both are detailed below.

The approaches proposed below reflects our practice from past experience. We have observed that approaching iterating through PR changes this way during review tends to produce PRs of higher quality, makes the iteration easier for the programmer, and tends to let the review cycles complete faster.

...

PR containing just 1 logical change commit

...

PR containing a small number of logical changes (e.g. less than 3)

...

PR containing a larger number of logical changes (e.g. more than 3)

...

Answer one round of reviewer's comments.

...

Suggestions:

  • git commit --amend

  • git commit --fixup

The latter is slightly easier but requires an extra step at the end (see below).

...

Suggestions:

  • git rebase -i then edit separate commits at the point of the reviewer's comment.
    In Emacs, use magit-commit-instant-fixup.

  • git commit --fixup

The latter is slightly easier but requires an extra step at the end (see below).

...

Preferably git commit --fixup.

Otherwise, the reviewer tool (e.g reviewable) quickly becomes confused with the amount of combined changes.

...

What to do if the review has more iterations (e.g. more than 5)

...

Same as above.

...

Prefer git commit --fixup.

Consider breaking down the PR into sub-PRs.

...

Break down the PR into sub-PRs.

Merge the “prefix” PRs that the reviewer has approved already.

...

At the end of the review, when the PR has been approved.

...

Squash all fixup commits together, so that only 1 commit per logical change remains.

Polish the commit message that results from multiple fixups. Ensure it describes the change accurately and add relevant release notes.

Squash fixup commits to their respective logical change.

Polish each commit message that results from multiple fixups. Ensure it describes the change accurately and add relevant release notes.

Ensure each commit builds  into a cockroach executable and passes its unit tests.

Squash fixup commits to their respective logical change..

Polish each commit message that results from multiple fixups. Ensure it describes the change accurately and add relevant release notes.

Ensure each commit builds  into a cockroach executable and passes its unit tests.

...

Additional notes

...

This is the most common case, also the case we expect from first-time external contributors.

New Cockroach Labs team members should learn how to operate PRs with a small number of logical commits. This is a required skill.

We try to avoid letting external contributors submit multi-commit PRs. When they do, we encourage them to break down the PR into sub-PRs upfront.

...

PRs with numerous commits are generally frowned upon in the Cockroach Labs team. They introduce significant risk in the engineering process; either reviews are unable to conclude, or can let more mistakes slip through.

The skill required to manipulate numerous commits through review rounds simultaneously is more advanced and we don't commonly expect it from new co-workers.

Likewise, the skill required to review numerous commits in one PR is also advanced. It is also more expensive: it costs most energy and time to review the PR.

Amending commits

Note: This section is intended to teach how to use git to achieve the goals outlined above, based on what the team has observed works well from past experience.

The command git commit --amend "edits" an existing commit in-place with new changes, after git add.

For a single-commit PR, this is the preferred and most simple way to edit the PR during reviews.

When there are multiple commits in the PR already, git commit --amend only edits the latest commit in the sequence.

To edit an earlier commit, you may proceed as follows:

  1. Stash your changes (git stash).

  2. Run git rebase -i origin/master (or the branch base if you’re forking something else than master).

  3. Mark the commit(s) you want to amend with "edit" in the first column

  4. Save the edit plan. Git will present you the commits to edit one by one.

  5. For each commit to amend, make the changes you wish (or git stash pop if you stashed some changes above). Then check the resulting code builds and passes its unit tests.

  6. Run git add -p on the desired changes, then git commit --amend; polish the commit message for that commit.

  7. Stash the rest.

  8. Once you finish editing each commit, run git rebase --continue to go to the next commit in the list you've selected at step 2. Iterate steps 5-8 until completion.

After you are done editing all commits, the branch has been rewritten and you can push it to the PR again.

Note that for multi-commit PRs, amending commits in this way is more error-prone. Consider fixup commits (next section) if unsure.

Fix-up commits

Note: This section is intended to teach how to use git to achieve the goals outlined above, based on what the team has observed works well from past experience.

The command git commit --fixup   takes the SHA of another commit on the command line, which you intend to amend/modify. Then it creates an additional commit at the end of the current branch, with only the latest changes included with git add -p. The original commit you want to amend/modify is not yet modified at that point.

You can add multiple fix-up commits to a PR, one per commit that needs to be changed based on reviewer comments.

Then you can push the fix-up commits so the reviewer can look at them. Through the review, you can add more fix-up commits.

This solution is simpler because it does not require rewriting the entire PR branch every review round, and simplifies the display of changes inside the review tool.

After multiple round of review, a PR/branch with fix-up commits can look like this:

...

Code Block
ba30a29 my first change
deaf012 my second change
55301bf fixup! my first change
900b0c1 fixup! my second change
9bbbd35 fixup! my second change
4c56fb3 fixup! my first change

Then, at the end of the review, when the reviewer has approved the changes, you may proceed as follows:

  1. run the following command: git rebase -i --autosquash ... (give the branch base SHA as argument). The argument --autosquash recognizes all the fixup! markers and re-organizes the entire history so that the fixup commits are in the right order and "squashed" into the commit they are amending.

  2. Check the proposed ordering, and add edit markers to the commits that have fix-up commits attached to them.

  3. Save the rebase plan and let git run it.

  4. For each commit with fix-up commits, check:

    • that the source code builds and passes its unit tests (You can use git rebase -i origin/master --exec make test PKG=something TESTS=something to automate the check during the rebase.)

    • run git commit --amend and polish the commit message so it is explanatory, accurate and contains the relevant release notes

Once all the fixups have been squashed, you can push the resulting changes to the PR for final approval and for merging.

How to split amends/fix-ups across multiple commits

Say, you have a multi-commit PR, and you have started editing multiple files, making changes that really belong to separate commits inside your PR. How to make this work easily?

The magic tool is git add -p (the -p argument). It makes it possible to add individual diffs within a file to the current git index, so that you can commit only part of the changes to a file. Pulling a change back from the index into the workspace is via git reset -p.

(To completely cancel part of a change in afile, i.e. revert the code on disk to what was committed earlier, use git checkout -p)

Alternate commands to amend and fix-up PRs

The specific git commands suggested above are sometimes automated by code editors, IDEs, or command-line aliases. The instructions above are informative suggestions more than prescription. We welcome additional approaches.

That said, reusing the common suggestions makes it easier to request help from a co-worker who shares that experience already.

In general, any approach is fine as long as it follows the PR organization philosophy described at the top, and is mindful of the reviewer’s time during review.

Projects with multiple interdependent in-flight PRs

Sometimes, the PR for one change has not been merged yet when we create a follow-up PR that builds upon the first one. This section shares some experience and considerations to keep in mind when this happens.

Note: this is an “advanced” topic and not intended for novice programmers.

Why do in-flight dependent PRs happen

This happens sometimes in more complex projects. For example, a first PRs might create “infrastructure” or a code framework, that the next PRs use to implement a feature.

Generally, we do not favor multiple PRs remaining in-flight concurrently, that pertain to the same project. It makes it harder for TLs and PMs to reason about progress and harder for programmers to keep everything top of mind.

However, it happens sometimes because there is another tension: once an “infrastructure” PR is merged and other PRs are merged on top of it, it makes it harder to change the infrastructure. Also, sometimes we simply do not know exactly how to build this infrastructure until we have some experience using it.

It is the job of the TL to analyze work in advance and propose a project plan that avoids multiple in-flight PRs. So, ideally, this situation remains infrequent. However, it still happens sometimes.

In any case, when this happens we do not consider merging the in-flight PRs together into a single PR. That would make things ever harder to review and iterate on.

Instead, we proceed with caution.

Work extra to keep the PR boundaries clear

When there are multiple related PRs, it is paramount to ensure that each PR speaks for itself. Even the review discussion should be self-contained. Things to keep in mind:

  • Keep discussion about the “base” PRs on the review thread for the base PR itself, not that of the dependent PRs.

  • Use PR number references in comments, to create hyperlinks.

  • If/when you re-arrange the ordering of the PRs or split PRs into multiple sub-PRs, explain this as additional review comments so that a latecomer to the conversation can understand what went on.

Avoid frequent rebases for multi-PR projects

When working on a standalone PR, it is desirable to rebase the PR’s branch on top of the master branch every now and then, to pick up bug fixes from coworkers, etc.

With multiple inter-dependent open PRs, each rebase makes it extra hard for reviewers of dependent PRs to understand what is going on. Additionally, the rebase process for multiple PRs is technically difficult for the programmmer and thus more error-prone. Therefore, in that situation we prefer to refrain from frequent rebases unless strictly required.

Be mindful of your time and that of the reviewer

Maintaining multiple PRs in-flight costs extra effort and time. The cognitive overhead of context-switching between multiple sub-projects is extremely high.

...

Projects with multiple interdependent in-flight PRs

Sometimes, the PR for one change has not been merged yet when we create a follow-up PR that builds upon the first one. This section shares some experience and considerations to keep in mind when this happens.

Note: this is an “advanced” topic and not intended for novice programmers.

Why do in-flight dependent PRs happen

This happens sometimes in more complex projects. For example, a first PRs might create “infrastructure” or a code framework, that the next PRs use to implement a feature.

Generally, we do not favor multiple PRs remaining in-flight concurrently, that pertain to the same project. It makes it harder for TLs and PMs to reason about progress and harder for programmers to keep everything top of mind.

However, it happens sometimes because there is another tension: once an “infrastructure” PR is merged and other PRs are merged on top of it, it makes it harder to change the infrastructure. Also, sometimes we simply do not know exactly how to build this infrastructure until we have some experience using it.

It is the job of the TL to analyze work in advance and propose a project plan that avoids multiple in-flight PRs. So, ideally, this situation remains infrequent. However, it still happens sometimes.

In any case, when this happens we do not consider merging the in-flight PRs together into a single PR. That would make things ever harder to review and iterate on.

Instead, we proceed with caution.

Work extra to keep the PR boundaries clear

When there are multiple related PRs, it is paramount to ensure that each PR speaks for itself. Even the review discussion should be self-contained. Things to keep in mind:

  • Keep discussion about the “base” PRs on the review thread for the base PR itself, not that of the dependent PRs.

  • Use PR number references in comments, to create hyperlinks.

  • If/when you re-arrange the ordering of the PRs or split PRs into multiple sub-PRs, explain this as additional review comments so that a latecomer to the conversation can understand what went on.

Avoid frequent rebases for multi-PR projects

When working on a standalone PR, it is desirable to rebase the PR’s branch on top of the master branch every now and then, to pick up bug fixes from coworkers, etc.

With multiple inter-dependent open PRs, each rebase makes it extra hard for reviewers of dependent PRs to understand what is going on. Additionally, the rebase process for multiple PRs is technically difficult for the programmmer and thus more error-prone. Therefore, in that situation we prefer to refrain from frequent rebases unless strictly required.

Be mindful of your time and that of the reviewer

Maintaining multiple PRs in-flight costs extra effort and time. The cognitive overhead of context-switching between multiple sub-projects is extremely high.

Be mindful of how much extra effort you are spending. It may be a better investment of your time (and that of your team) to decide to merge one or more of the prefix PRs early, to reduce the number of simultaneous discussions ongoing; then issue additional PRs later to address the shortcomings found in the first PRs.

References

Anchor
bosu
bosu
[1] Bosu, A., Greiler, M., & Bird, C. (2015, May). Characteristics of useful code reviews: An empirical study at microsoft. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (pp. 146-156). IEEE.

Anchor
rigby
rigby
[2] Rigby, P., Cleary, B., Painchaud, F., Storey, M. A., & German, D. (2012). Contemporary peer review in action: Lessons from open source development. IEEE software, 29(6), 56-61.