Database Background

This list is a compilation of readings which are valuable to a general understanding of the operation of Cockroach. This list is extensive (but not exhaustive), don't feel you need to read everything here, it's provided as a way to drill down into topics you find interesting, if you so choose. The entries in each section are roughly organized in recommended order of consumption, but this is not a strict ordering in any sense.






SQL Execution

  • Volcano:
    • The introduction of the general execution model that the planNode and DistSQL engines in Cockroach use. Start here if you know nothing about how SQL statements are executed.
  • MonetDB (MIL Primitives for querying a fragmented world) (1999) (Boncz, Kersten):
    • This paper discusses MonetDB, a database system that uses column-at-a-time processing. Entire columns are processed at once, with no batching. 
  • MonetDB/X100: Hyper-Pipelining Query Execution (2005) (Boncz, Zukowski, Nes) :
    • This paper is the primary source of the idea to use batched, templated, column-at-a-time execution to avoid the interpretation and type-lookup overhead inherent in the Volcano model. CockroachDB's nascent vectorized execution engine follows the ideas in this paper closely.
    • This paper is really important! Read it if you're interested in CockroachDB's exec package and vectorized execution.
    • It's also written by Marcin Zukowski, cofounder of Snowflake
  • Everything you always wanted to know about compiled and vectorized queries but were afraid to ask (2018) (Kersten, Leis, Kemper, Neumann, Pavlo, Boncz):
    • Really good intro to what's the deal with vectorized and how does it compare with JIT (compiled) systems like HyPER or MemSQL. Andy Pavlo co-author.
  • Balancing vectorized query execution with bandwidth-optimized storage (2009) (Zukowski):
    • This is the paper about the VectorWise database system that came out of MonetDB/X100 from Zukowski, his PhD thesis.
    • Especially chapter s4, 5, 6 are really relevant to CockroachDB.
  • The Design and Implementation of Modern Column-Oriented Database Systems (2012) (Abadi, Boncz, ...) :
    • Massive survey paper. Good but a lot of info in there.
  • Rethinking SIMD Vectorization for In-Memory Databases (2015) (Polychroniou, Raghavan, Ross)
    • Interesting stuff about how to actually utilize SIMD for vectorized execution.
  • DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing (2008) (Zukowski, Nes, Boncz)

SQL Optimization/Query Planning


This section is randomly important because people at Cockroach talks about things in terms of the Google system which introduced them