TeamCity Continuous Integration

Overview

We use JetBrains TeamCity for various continuous integration and testing tasks. Our instance is available at teamcity.cockroachdb.com - you may sign in with your GitHub OAuth credentials.

Pull request testing

TeamCity is triggered for each new or updated pull request to this repository. It runs a set of tests triggered by the GitHub CI job and reports the results back to the pull request as a GitHub status check. This job succeeds when all of its dependencies succeed. The tests are as follows:

  1. make test
  2. make testrace
  3. make check
  4. make acceptance

Each of these is run by a separate build agent in parallel.

TeamCity represents GitHub pull requests by branches named by the number of the pull request:

Unearthing hidden test failures

Unfortunately, when something fails, not always will it be associated to a failing test that you can click in the UI. For example, compile errors, data races, panics (off the test's main goroutines) and other events can mean that you will look at a red build without an explicit test failure. In this case, head to the artifacts tab of the failing build - see the next section on how to get there.

For example, let's start with this unhelpful failed build:

Unless the test runner scripts broke, you can navigate to Artifacts (top right of screenshot above) and find full_output.txt:


from which you can (hopefully, it's not always easy) glean what went wrong, for example in this case a compile-time error:


Depending on the failure mode, there are also other artifacts that might be helpful. Notably, any log.Scope that is not properly cleaned up will be in the artifacts and can catch fatal errors (even those should also be in full_output.txt.

Finding the artifacts

This is less trivial than it sounds, for the top-level build is merely a fan-out that pulls in results from the leaf builds, so you need to navigate to the leaf first.

For example, when trying to find the artifacts for a particular job, say testrace, be sure to navigate to the actual testrace job and not the top-level GitHub CI job that triggered it.

For example, when the nightly stress job or master branch test failure detector posts a comment to GitHub, it will link to the top-level GitHub CI job. Let's look at #14632, for example. After clicking the link to the failing TeamCity build, click (or hover) the Artifacts link next to the failing build, not the one on the top right:

image

From there, you should be able to see the artifacts.

image

Retriggering Github CI

If tests fail on a pull request due to a flaky test or some other kind of transient error, you can retrigger the tests from TeamCity by running the GitHub CI job against the pull request branch you're trying to rebuild. Make sure to rebuild the GitHub CI job itself and not any of the individual test jobs - those sub-jobs that failed will be automatically triggered when you request a GitHub CI build.

Here's a mini screencast of how to view a test failure and retrigger TeamCity from a failed status check on GitHub:

Triggering Github CI after an un-triggered Pull Request

In some rare circumstances, TeamCity will fail to notice a new pull request on GitHub. As a workaround, you can manually trigger the GitHub CI job which will report status to GitHub.

To do this, navigate to the GitHub CI page. Click the ... button next to the Run button to enter the Run Custom Build dialog. Select the branch number of your pull request under the Changes tab and click run - TeamCity will then run your tests and report status to GitHub.

Odd situations

If there is a situation with TC you can't explain and for which we do not have best practices or troubleshooting guides yet, file a new issue, explain the situation, and assign it to the Dev Infrastructure team.

Be sure to include a link to the failed TC Build if there is one.

Then after you have captured a link to the failed build, for good measure try to run your build again once. PerhapsĀ  the first time was a flake and you can move on with your day after that.

Example odd, transient situations we've encountered in the past:

  • Build appears to fail but really TC was unable to find an agent to run some targets.
  • Build appears to fail but really an agent was preempted and TC was unable to re-start a target.
  • TC configuration was changed recently and one of the CI targets was overlooked in the change.


Release builds

When code is merged to master, TeamCity's Release Build job gets triggered for that branch. This job triggers all of the tests run by GitHub CI and posts any failures as GitHub issues. If there are no failures and the updated branch is master, it will kick off a set of binary builds and push them to Docker Hub and our binary release storage area in AWS S3.

Nightly tests

There are an additional set of tests that get run each night against master. They live under the Nightlies superproject and get triggered by the All Nightly Tests job.

Playbook

The playbook has information about what to do if there are problems.