Database Background

This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive (but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.


Original Author: The content of this macro can only be viewed by users who have logged in.

Introduction
General


Transactions


Linearizability

Consensus


SQL Execution

  • Volcano: https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
    • The introduction of the general execution model that the planNode and DistSQL engines in Cockroach use. Start here if you know nothing about how SQL statements are executed.
  • MonetDB (MIL Primitives for querying a fragmented world) (1999) (Boncz, Kersten): https://ir.cwi.nl/pub/11183/11183B.pdf
    • This paper discusses MonetDB, a database system that uses column-at-a-time processing. Entire columns are processed at once, with no batching. 
  • MonetDB/X100: Hyper-Pipelining Query Execution (2005) (Boncz, Zukowski, Nes) : http://cidrdb.org/cidr2005/papers/P19.pdf
    • This paper is the primary source of the idea to use batched, templated, column-at-a-time execution to avoid the interpretation and type-lookup overhead inherent in the Volcano model. CockroachDB's nascent vectorized execution engine follows the ideas in this paper closely.
    • This paper is really important! Read it if you're interested in CockroachDB's exec package and vectorized execution.
    • It's also written by Marcin Zukowski, cofounder of Snowflake
  • Everything you always wanted to know about compiled and vectorized queries but were afraid to ask (2018) (Kersten, Leis, Kemper, Neumann, Pavlo, Boncz): http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
    • Really good intro to what's the deal with vectorized and how does it compare with JIT (compiled) systems like HyPER or MemSQL. Andy Pavlo co-author.
  • Balancing vectorized query execution with bandwidth-optimized storage (2009) (Zukowski): https://dare.uva.nl/search?identifier=5ccbb60a-38b8-4eeb-858a-e7735dd37487
    • This is the paper about the VectorWise database system that came out of MonetDB/X100 from Zukowski, his PhD thesis.
    • Especially chapter s4, 5, 6 are really relevant to CockroachDB.
  • The Design and Implementation of Modern Column-Oriented Database Systems (2012) (Abadi, Boncz, ...) : http://db.csail.mit.edu/pubs/abadi-column-stores.pdf
    • Massive survey paper. Good but a lot of info in there.
  • Rethinking SIMD Vectorization for In-Memory Databases (2015) (Polychroniou, Raghavan, Ross) http://www.cs.columbia.edu/~orestis/sigmod15.pdf
    • Interesting stuff about how to actually utilize SIMD for vectorized execution.
  • DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing (2008) (Zukowski, Nes, Boncz) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.1243


SQL Optimization/Query Planning


Systems

This section is randomly important because people at Cockroach talks about things in terms of the Google system which introduced them

Other