Database Background

This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive (but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.

Introduction

General

Storage

Admission Control

Transactions

Linearizability

Consensus

SQL Execution

  • Volcano: https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf

    • The introduction of the general execution model that the planNode and DistSQL engines in Cockroach use. Start here if you know nothing about how SQL statements are executed.

  • MonetDB (MIL Primitives for querying a fragmented world) (1999) (Boncz, Kersten): https://ir.cwi.nl/pub/11183/11183B.pdf

    • This paper discusses MonetDB, a database system that uses column-at-a-time processing. Entire columns are processed at once, with no batching. 

  • MonetDB/X100: Hyper-Pipelining Query Execution (2005) (Boncz, Zukowski, Nes) : http://cidrdb.org/cidr2005/papers/P19.pdf

    • This paper is the primary source of the idea to use batched, templated, column-at-a-time execution to avoid the interpretation and type-lookup overhead inherent in the Volcano model. CockroachDB's nascent vectorized execution engine follows the ideas in this paper closely.

    • This paper is really important! Read it if you're interested in CockroachDB's exec package and vectorized execution.

    • It's also written by Marcin Zukowski, cofounder of Snowflake

  • Everything you always wanted to know about compiled and vectorized queries but were afraid to ask (2018) (Kersten, Leis, Kemper, Neumann, Pavlo, Boncz): http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

    • Really good intro to what's the deal with vectorized and how does it compare with JIT (compiled) systems like HyPER or MemSQL. Andy Pavlo co-author.

  • Balancing vectorized query execution with bandwidth-optimized storage (2009) (Zukowski): https://dare.uva.nl/search?identifier=5ccbb60a-38b8-4eeb-858a-e7735dd37487

    • This is the paper about the VectorWise database system that came out of MonetDB/X100 from Zukowski, his PhD thesis.

    • Especially chapter s4, 5, 6 are really relevant to CockroachDB.

  • The Design and Implementation of Modern Column-Oriented Database Systems (2012) (Abadi, Boncz, ...) : http://db.csail.mit.edu/pubs/abadi-column-stores.pdf

    • Massive survey paper. Good but a lot of info in there.

  • Rethinking SIMD Vectorization for In-Memory Databases (2015) (Polychroniou, Raghavan, Ross) http://www.cs.columbia.edu/~orestis/sigmod15.pdf

    • Interesting stuff about how to actually utilize SIMD for vectorized execution.

  • DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing (2008) (Zukowski, Nes, Boncz) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.1243

SQL Optimization/Query Planning

 

Systems

This section is randomly important because people at Cockroach talks about things in terms of the Google system which introduced them

Other

Copyright (C) Cockroach Labs.
Attention: This documentation is provided on an "as is" basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose.