Large scale testing

workload

Workload refers both to the package and the tool. The package is built around an abstraction, called Generator, and the tool has various bits for working with generators.

Generator is abstraction for a related collection of tables with (optionally) a set of initial data and (also optionally) a query load to run on those tables. Each generator has a small bit of associated metadata and is configurable via flags. There are a few more pieces (splits, hooks for arbitrary code execution, etc) but an exhaustive list would rot, so check out the source or the godoc.

make bin/workload will build the workload tool and put it at bin/workload.

Some notable fixtures:

bank        Bank models a set of accounts with currency balances
json        JSON reads and writes to keys spread (by default, uniformly at random) across the cluster
kv          KV reads and writes to keys spread (by default, uniformly at random) across the cluster
roachmart   Roachmart models a geo-distributed online storefront with users and orders
tpcc        TPC-C simulates a transaction processing workload using a rich schema of multiple tables
tpch        TPC-H is a read-only workload of "analytics" queries on large datasets.
ycsb        YCSB is the Yahoo! Cloud Serving Benchmark

workload run

workload run <generator> [flags] is the fundamental operation of the workload tool. It runs the queryload of the specified generator, printing out per-operation stats every second: opcount, latencies (p50, p90, p95, p99, max). It also prints out some totals when it exceeds any specified --duration or --max-ops or when ctrl-c'd. The output looks something like

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
        1s        0        15753.6        15825.4      0.8      2.0      3.3     15.2 read
        1s        0          822.2          826.0      2.0      4.1      6.3     14.2 write
        2s        0        14688.7        15257.0      0.8      2.2      3.9     17.8 read
        2s        0          792.0          809.0      2.2      4.5      7.3     11.5 write
        3s        0        15569.2        15361.0      0.8      2.0      3.4     16.8 read
        3s        0          857.0          825.0      2.0      4.5      7.3     10.5 write
[...]

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
    11.5s        0         176629        15425.0      0.9      0.8      2.0      3.5     28.3  read
    11.5s        0           9369          818.2      2.5      2.1      5.5      8.9     18.9  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
    11.5s        0         185998        16243.2      1.0      0.8      2.4      4.5     28.3

BenchmarkWorkload/generator=ycsb          185998             61564 ns/op

There are a few common flags and each generator can define its own, so be sure to check out workload run <generator> --help to see what’s available. By default, run expects the cluster to be on the default port on localhost, but this can be overridden: workload run <generator> <crdb uri>.

For ease of working with large datasets, workload run <generator> [flags] by default doesn’t create the necessary tables. The --init flag can be passed to a run invocation to cause it to first create tables and use batched SQL inserts to fill them with the generator-defined initial data before starting up the queryload. This works well for small datasets, but many generators can be configured to make large amounts of data. For that, you may want a fixture.

workload fixtures load

workload fixtures load <generator> [flags] restores a fixture, which is a precomputed enterprise BACKUP of the tables in some generator configuration. Loading a fixture is much faster than batched INSERTs.

Because the schema of the tables in a generator as well as the initial data in those tables can depend on the flags used, a fixture must be created for each desired configuration of the generator. In practice, so far this has worked out to various values of one “size of the dataset” dimension (e.g. --warehouses in tpcc).

Fixtures are all stored under gs://cockroach-fixtures/workload on Google Cloud Storage. This bucket is multi-regional, so it can be used without paying bandwidth costs from any GCE instance in the US. The bandwidth costs for large fixtures are significant, so think twice before using this from other parts of the world.

Fixtures must be precomputed to be used with load. Many interesting ones already exist (you can see them with workload fixtures list) but at some point you'll likely have to make one.

workload fixtures make

workload fixtures make <crdb uri> [flags] precomputes a new fixture (internally using IMPORT CSV). It needs a cluster to do the IMPORT. By default, it writes all the necessary CSVs to GCS, running IMPORT whenever it finishes the CSVs for a table. This works for small-ish fixtures but is slow and wasteful, so most fixtures should be made using the csv-server magic.

workload csv-server starts an http server that returns CSV data for generator tables. If you start one next to each node in a cockroach cluster, you can point them at it in the IMPORT and the CSVs are never written anywhere. I’d love to package this up in one command, but for now the best way to make a large fixture is

roachprod create <cluster> -n <nodes>
roachprod put <cluster> <linux build of cockroach> cockroach
roachprod put <cluster> <linux build of workload> workload
roachprod start <cluster>
for i in {1..<nodes>}; do roachprod ssh <cluster>:$i -- './workload csv-server --port=8081 &> logs/workload-csv-server.log < /dev/null &'; done
roachprod ssh <cluster>:1 -- ./workload fixtures make <generator> [flags] --csv-server=http://localhost:8081

workload check

workload check <generator> [flags] runs any consistency checks defined by the generator. These are expected to pass after initial table data load (whether via batched inserts or fixtures) or after any duration of queryload. In particular, this may be useful in tests that want to validate the correctness of a cluster that is being stressed in some way (crashes, extreme settings, etc).

roachprod

TODO (In the meantime, check out the --help.)

roachtest

TODO (In the meantime, check out the --help.)

Intro to roachprod and workload (raw notes from a presentation Dan gave)

roachprod

Roachprod is designed to be as unopinionated as possible. It's mostly a way to start/stop/run commands on many nodes at once, with a bit of crdb knowledge baked in.

Install roachprod:

go get -u github.com/cockroachdb/roachprod

The best source of documentation about roachprod is definitely the --help:

roachprod --help

Start a local roachprod cluster with 3 nodes:

roachprod create local -n 3

Download cockroach onto it:

roachprod run local "curl https://binaries.cockroachdb.com/cockroach-v2.0.0.darwin-10.9-amd64.tgz | tar -xJ; mv cockroach-v2.0.0.darwin-10.9-amd64/cockroach cockroach"
  • sorry, this should be easier

Run a unix command on one node:

roachprod run local:1 "ls"

Start a cluster!

roachprod start local
  • If export COCKROACH_DEV_LICENSE=crl-<sekret license string in #backup>aW5n is in your .bashrc, start will also apply the license.

View the admin ui for node 1 (note the "1"):

roachprod admin local:1 --open

View the admin ui for all nodes:

roachprod admin local --open

Open a SQL shell:

roachprod sql local:1

Shut it down:

roachprod destroy local

Make one in the cloud (so fast!):

roachprod create dan-crdb -n 4
  • the cluster must start with your username
  • it will be shutdown and collected if you leave it running (the deadline can be increased via roachprod extend)

Download cockroach onto it:

roachprod run dan-crdb "curl https://binaries.cockroachdb.com/cockroach-v2.0.0.linux-amd64.tgz | tar -xvz; mv cockroach-v2.0.0.linux-amd64/cockroach cockroach"

Start it up (leave node 4 for the load generator):

roachprod start dan-crdb:1-3
  • note the 1-3 means nodes 1 to 3

workload

Workload is many things, but it's most important purpose is to run queryloads and report throughput/latency statistics. Each of these (kv, tpcc, roachmart) is a "generator" and defines it's own set of initial data. Generators are configured by flags, which are very powerful and generator implementors can use them to control almost any behavior.

Workload's help is also good:

roachprod ssh dan-crdb:1 -- ./workload --help
roachprod ssh dan-crdb:1 -- ./workload run --help
roachprod ssh dan-crdb:1 -- ./workload run kv --help
roachprod ssh dan-crdb:1 -- ./workload run tpcc --help

Download workload onto your GCE cluster:

roachprod run dan-crdb "curl -L https://edge-binaries.cockroachdb.com/cockroach/workload.LATEST > workload; chmod a+x workload"

Run TPCC warehouses=1:

roachprod ssh dan-crdb:4 -- ./workload run tpcc --init --warehouses=1 --duration=1m  {pgurl:1-3}
  • note the {pgurl} which roachprod expands into the connection url

Wipe your cluster:

roachprod wipe dan-crdb

And restart it:

roachprod start dan-crdb:1-3

Init-ing TPCC with large numbers of warehouses would be slow, so use a fixture:

roachprod admin dan-crdb:1 --open
roachprod ssh dan-crdb:1 -- ./workload fixtures load tpcc --warehouses=10

Now run it:

roachprod ssh dan-crdb:4 -- ./workload run tpcc --warehouses=10 --duration=1m  {pgurl:1-3}