Log and error redactability

Our customers routinely send crash reports and log files to us, but they want confidence that this data does not contain confidential information. How do we achieve this?

For this purpose, the CockroachDB source code uses redactability. This is a crdb-specific combination of data types and APIs on top of Go’s string manipulation, logging and errors APIs.

Redactability makes it possible to remove sensitive information from a string after the string has been constructed.

This wiki page explains how to maintain redactability when adding or modifying CockroachDB’s source code. For more details, see the section References at the bottom.

Main concepts / definition

We use the word “sensitive” or “unsafe” to designate information that's potentially PII-laden or confidential, and “safe” for information that's certainly known to not be unsafe.

Notice the “priority order” in this definition: information is unsafe by default, until proven safe. For example, the basic string type in Go will be considered unsafe.

A confidentiality leak occurs when unsafe information is incorrectly marked as safe.

The APIs discussed below make it possible to annotate information with proofs/promises that things are safe.

In summary:

  • A redactable string (or byte array) is a string where unsafe information is enclosed between special delimiters. For example,
    var s RedactableString = “hello ‹secret›“
    contains the safe word “hello” and unsafe word “secret”.

  • String redaction is a function that deletes the data between delimiters to produce a redacted string.
    For example, RedactableString(“hello ‹secret›”).Redact() returns ”hello ‹×›”

  • The remaining APIs can:

    • introduce the guarantee of safety from scratch,

    • promise that unsafe information is, in fact, safe; and that redactable strings are, in fact, redactable.
      (See below for the definition of “promise”. This is unrelated to the similarly-named javascript concept.)

    • transform information and compose redactable strings in a way that is proven to preserve redactability without leaking sensitive information.

Where can users observe redactable information?

Redactability (i.e. information where sensitive and safe bits can be separated from each other) can be found:

  • In log files or network log entries produced by CockroachDB with the redactable: true configuration flag set. Here is an example redactable log entry:

    1 2 3 I210413 09:51:03.906798 14 heapprofiler.go:49 ⋮ [n2] 34 writing go heap profiles to ‹/home/kena/cockroach/cockroach-data/logs/heap_profiler› at least every 1h0m0s ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this symbol means that the rest of the entry is redactable unsafe information
  • In crash reports, prior to sending to Cockroach Labs for monitoring and error tracking. The unsafe information is redacted on the way out.
    For example:

    (The filename was marked as redactable-unsafe in the error message, and was redacted into “×” before sending. The web display erased the redaction markers.)

  • In Go error objects produced / managed by the CockroachDB errors library.
    Why? Because errors eventually get translated into log entries crash reports, see above. If the errors were not redactable to start with, we couldn’t make log entries / crash reports redactable.
    (The redaction markers inside errors are hidden when looking via the .Error() method, but appear when using redact.Sprint() or crdb’s log functions)

  • At some point in the future (relative to the date of this writing, May 2021), in distributed traces produced by CockroachDB, which can be inspected during troubleshooting.
    (At the time of this writing, traces are not redactable and thus should be considered thoroughly unsafe as per the definition above.)

  • Certain data structures and strings held in RAM inside CockroachDB, when they are likely to be included in log messages or error payloads.

How to make information redactable?

Any data inside CockroachDB’s source code that may be included in an error message or a log entry should be made redactable.

Otherwise, it will be considered as unsafe by our tooling and removed when customers send log entries / errors to technical support.

More redactability = more observability + more troubleshootability.

The various APIs try to minimize the work needed by CockroachDB programmers, but sometimes extra care must be taken.

Simple cases

Redactability is mostly noticeable when emitting a log entry. A good way to check the redactability properties of an object is thus to log it and see what happens.

Here is what CockroachDB’s APIs already provide for you:

  • The constant literal string used as first argument to errors.New / Newf / Wrap etc, as well as log.Info, Infof etc is axiomatically considered safe.

  • The redactable contents of error objects are automatically recognized and propagated when constructing further error objects or log entries from them.
    (Certain common error types from the Go runtime are also properly recognized to separate their safe vs unsafe payloads.)

  • Certain data types outside of CockroachDB’s own source code have been marked as always-safe using the “safe type registry” (redact.RegisterSafeType), because we consider that they can never been traced back to individual customers or PII. This includes, for example:

    • Go’s native booleans, integers & float types

    • time.Time, time.Duration

    • os.Interrupt

  • Remaining data types are considered unsafe by default.

To make more information redactable, the CockroachDB programmer should thus spend extra effort to annotate information as safe or redactable that would be considered as unsafe otherwise.

This is especially the case with struct types and other Go types that alias basic types.

API Basics

  • To mark a data type as safe or redactable when it would be considered unsafe otherwise, use:

    • For simple types that alias a Go numeric type, you can axiomatically mark the type as always-safe by marking it with the redact.SafeValue interface.

    • For more complex types or types that alias the Go string type, implement a SafeFormat method (i.e. the redact.SafeFormatter interface).
      The primitives available in the body of SafeFormat provably generate redactable strings. See below for examples.

  • Compose RedactableString values upfront, then store them until later, instead of composing a string value, storing it into a struct, and then later trying to include it into an error or log message.

  • To compose a redactable string from a mix of safe and unsafe information, use:

    • redact.Sprint() / redact.Sprintf() to create a RedactableString from various bits using fmt.Print / Printf-like formatting.

    • redact.Join() / redact.JoinTo() to adjoin a list of various bits using a delimiter and form a RedactableString a bit like strings.Join

    • redact.StringBuilder to compose a RedactableString programmatically like strings.Builder or bytes.Buffer.

As you start learning about these mechanisms, you will slowly start noticing that Go’s native fmt.Stringer interface (and the String() method) becomes less and less relevant in your code — none of the logging or error code ever uses it if your objects implement SafeFormatter or SafeValue. In fact, we are likely to slowly phase out String() methods over time.

Examples

Before

After

Before

After

1 2 3 4 5 6 7 8 9 10 11 12 13 // type MetricSnap does not implement SafeFormat and its representation // as string is thus considered fully unsafe by default. func (m MetricSnap) String() string { suffix := "" if m.ConnsRefused > 0 { suffix = fmt.Sprintf(", refused %d conns", m.ConnsRefused) } return fmt.SPrintf("infos %d/%d sent/received, bytes %dB/%dB sent/received%s", m.InfosSent, m.InfosReceived, m.BytesSent, m.BytesReceived, suffix) }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 // SafeFormat implements the redact.SafeFormatter interface. func (m MetricSnap) SafeFormat(w redact.SafePrinter, _ rune) { // Notice how similar the code below is to the original code on the // left. The SafePrinter API has been designed to make it easy // to “migrate” existing String() methods into SafeFormat(). // // Why this “does the right thing” without special annotations: // - The format string for w.Printf() is a literal constant and considered safe. // - The numeric arguments are simple integers and thus considered safe. // As a result, the entire string produced is automatically considered // safe. No special “this is safe” annotations are needed. w.Printf("infos %d/%d sent/received, bytes %dB/%dB sent/received", m.InfosSent, m.InfosReceived, m.BytesSent, m.BytesReceived) if m.ConnsRefused > 0 { w.Printf(", refused %d conns", m.ConnsRefused) } } func (m MetricSnap) String() string { // StringWithoutMarkers applies the SafeFormat method // then removes the redaction markers to produce a “flat” string. // This helps avoid code duplication between String() // and SafeFormat(). // // Note: The resulting String() method is only rarely // called, since most relevant uses of MetricSnap // will now use .SafeFormat() directly. return redact.StringWithoutMarkers(m) }
1 2 3 4 5 6 7 8 // type OutgoingConnStatus does not implement SafeFormat and its representation // as string is thus considered fully unsafe by default. func (c OutgoingConnStatus) String() string {(w redact.SafePrinter, _ rune) { return fmt.Printf("%d: %s (%s: %s)", c.NodeID, c.Address, roundSecs(time.Duration(c.AgeNanos)), c.MetricSnap) }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // SafeFormat implements the redact.SafeFormatter interface. func (c OutgoingConnStatus) SafeFormat(w redact.SafePrinter, _ rune) { // Notice how similar the code below is to the original code on the // left. The SafePrinter API has been designed to make it easy // to “migrate” existing String() methods into SafeFormat(). // // Why this “does the right thing” without special annotations: // - The format argument is a literal constant and considered safe. // - c.NodeID is a roachpb.NodeID, // which aliases a basic integer type and implements SafeValue() and is considered safe. // - c.Address is a string and is unsafe. // - roundSecs() returns a time.Duration and this type has been registered as safe. // - c.MetricSnap implements a SafeFormat method, which is called implicitly to "do the right thing". // The resulting string contains a mix of safe/unsafe information: // the address is marked as unsafe, the rest is safe. w.Printf("%d: %s (%s: %s)", c.NodeID, c.Address, roundSecs(time.Duration(c.AgeNanos)), c.MetricSnap) } // This String() method is defined via SafeFormat(). See explanation in the other example above. func (c OutgoingConnStatus) String() string { return redact.StringWithoutMarkers(c) }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 type Gossip struct { ... // lastConnectivity remembers the connectivity details // across calls to the LogStatus() method. lastConnectivity string } // LogStatus logs the current status of gossip such as the incoming and // outgoing connections. func (g *Gossip) LogStatus() { // The log call below should only report the connectivity // if it is different from the last call to LogStatus(). var connectivity string if s := g.Connectivity().String(); s != g.lastConnectivity { g.lastConnectivity = s connectivity = s } log.Infof(ctx, "gossip status: %s", connectivity) }

 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 type Gossip struct { ... // lastConnectivity remembers the connectivity details // across calls to the LogStatus() method. lastConnectivity redact.RedactableString } // LogStatus logs the current status of gossip such as the incoming and // outgoing connections. func (g *Gossip) LogStatus() { // g.Connectivity() returns an object that implements SafeFormat(). // Its redactable representation contains a mix of safe and unsafe information. // // (Again, notice how the code below is similar to the code on the left.) var connectivity redact.RedactableString if s := redact.Sprint(g.Connectivity()); s != g.lastConnectivity { g.lastConnectivity = s connectivity = s } log.Infof(ctx, "gossip status: %s", connectivity) }

When to use SafeFormatter vs SafeValue

When in doubt, implement a SafeFormatter method. This creates redactable strings that provably do not leak confidential information.

The SafeValue marker interface is reserved to “leaf” data types which are so simple that they can be argued by just looking at the source code that they never can contain sensitive information. We do this e.g. for roachpb.NodeID, descpb.DescID and other such integer types.

Generally, avoid using the SafeValue interface for non-simple types. The main problem this general rule solves it that that nothing prevents a programmer from later adding more data into values of that type and start leaking confidential information without noticing.

For the same reason, generally avoid using redact.Safe and its aliases errors.Safe / log.Safe. The promise made at the time the call is introduced that its argument is safe can be too easily broken “at a distance” by someone else later, for example by changing the type definition of the argument to start leaking unsafe information.

General rules

Proofs vs promises

A promise is when a person (e.g. a member of the CockroachDB team) expresses in the source code that some information is safe or redactable according to their opinion or understanding.

A proof is a function or algorithm that takes a combination of safe/unsafe information and is guaranteed, by construction (and as long as it compiles without type errors), to avoid confidentiality leaks.

Whenever we have a choice between a “proof API” or a “promise API”, we always prefer the proof, because it ensures that the code is not sensitive to human mistakes.

An axiom is an argument expressed in the code that a bit of information is safe or unsafe in a way that provably always true regardless of which data is processed by CockroachDB. Axioms thus have the same general quality as proofs and are thus superior to promises. We prefer axioms where the argument that it makes can be verified locally at the position in the code where it is made, without relying on knowledge pulled from elsewhere.

 

For example:

Bad

Good

Bad

Good

1 2 3 4 5 6 7 8 9 10 11 12 func foo(s string) RedactableString { // Casting an arbitrary string to RedactableString // is a PROMISE: only the programmer knows // that s does not contain redaction markers // and that the string concatenation is guaranteed // not to leak information. // // The promise can easily be broken “by accident” if // a new call is made to foo() with a broken // string as input. return RedactableString("hello ‹" + s + "›") }
1 2 3 4 5 6 7 func foo(s string) RedactableString { // The redact.Sprintf function is a PROOF: // its algorithm guarantees that the unsafe // information in s will be properly annotated // in the result, without confidentiality leak. return redact.Sprintf("hello %s", s) }

 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 type myStruct { s string } func (m *myStruct) foo(v string) { m.s = string(redact.Sprintf("hello %s", v)) } func (m *myStruct) bar() string { // PROMISE: m.s has not be modified since foo() was called, // and is thus known (only to the programmer) to still be // properly redactable. // // The promise can easily be broken “by accident” if // another programmer adds a separate method that modifies // m.s. return redact.RedactableString(m.s).Redact().StripMarkers() }
1 2 3 4 5 6 7 8 9 10 11 12 type myStruct { s redact.ReadactableString } func (m *myStruct) foo(v string) { m.s = redact.Sprintf("hello %s", v) } func (m *myStruct) bar() string { // PROOF: as long as the type rules are obeyed, m.s // will have remained redactable ever since it was // constructed. return m.s.Redact().StripMarkers() }

 

 

Redactability in error objects

The default error constructors from CockroachDB’s error library (Newf, New, AssertionFailedfd, etc.) automatically implement redactability:

  • (axiom) The constant literal string argument to the non-formatting variants (eg New, Wrap) is considered safe.

  • (axiom)The characters in the constant literal format string 1st argument to the formatting bariants (e.g. Newf, Wrapf, AssertionFailedf) are considered safe.

  • (proof) The remainder arguments are turned into redactable information as per the rules below.

  • (axiom) The data type name of the error objects (as well as their Go package import path) are considered safe. Although users don’t see this, it is included in crash reports for further troubleshootability.

The points above emphasize “constant literal string”. The fact a string is a constant literal (i.e. statically embedded in the CockroachDB executable) is what makes it safe. We enforce this property using a linter.

Redactability in log entries

CockroachDB’s log functions first transform their parameters into an error object internally, as per the rules above. Then, that error object is formatted into a log entry.

Therefore, the rules for error objects described above apply equally when constructing log entries:

  • (axiom) The constant literal string argument to the non-formatting variants, as well as the first formatting argument to the formatting variants, is considered safe.

  • (proof) The remainder arguments are turned into redactable information as per the rules below.

The redactability properties of that error object are preserved throughout the logging system.

Redactability through the redact package

Marking things as safe or unsafe

The best way to mark things as safe or unsafe is to implement a SafeFormatter method or use redact.Sprint / redact.Sprintf.

The other ways below are detailed only for reference but are error-prone.

  • (promise) Constant literal strings used to construct redact.SafeString values are considered safe.
    This is not axiomatic because if the constant literal is modified separately from the cast to accidentally contain redaction markers, confidentiality leaks may occur.
    Use redact.Sprintf in case of doubt.

  • (promise) Constant literal strings casted to redact.RedactableString are considered redactable.
    This is not axiomatic because if the constant literal is modified separately from the cast to accidentally contain redaction markers, confidentiality leaks may occur.
    Use redact.Sprintf in case of doubt.

  • (promise) Variable strings casted toredact.SafeString values are considered safe; those casted to redact.RedactableString are considered redactable.
    Again, this is highly dependent on programmer knowledge that the string doesn’t contain redaction markers, or unsafe information not marked as unsafe.

  • (promise) Values of the types marked as always-safe via redact.RegisterSafeType are always considered safe.
    This is used e.g. in CockroachDB to mark bool, integer types, float types, time.Time, time.Duration and os.Interrupt as safe.

  • (promise) Values of the types that implement the redact.SafeValue interface are considered safe.
    (See the section above “When to use SafeFormatter vs SafeValue” about why the SafeValue interface should only be used for the simplest types.)

  • (promise) The result of redact.Safe applied to a value is considered safe.
    Generally, redact.Safe (and its aliases error.Safe and log.Safe are not recommended. Implement a SafeFormatteror SafeValue instead.)

  • (axiom) The result of redact.Unsafe applied to a safe or redactable value becomes unsafe.
    (This function is provided for symmetry and testing, but is unlikely to be useful in practice.)

Combining things together

  • (proof) redact.Sprintf creates a redactable string using printf-like formatting (see below).
    (NB: CockroachDb’s error library uses redact.Sprintf internally, hence the overlap in rules.)

  • (proof) redact.Sprint formats its positional arguments using the “extra argument” rules of printf-like formatting, see below.

  • (proof) redact.Join, redact.JoinTo preserve the redactability of their arguments, and can be constructed to create delimited lists of values.

  • (proof) the redact.SafePrinter object available inside SafeFormat methods composes redactability provably.

  • (proof) the redact.StringBuilder type composes redactability provably.

Recursive rules during printf-like formatting

  • (axiom) The format string is considered safe.

  • (proof) An argument of type RedactableString or RedactableBytes is included as-is. Whatever produced a value of that type made it properly redactable.

  • (proof) An argument of type error is formatted using errors.FormatError which takes care of exposing safe bits from known error types.

  • (proof) An argument that implements the SafeFormatter interface is formatted using that interface.

  • (proof) An argument whose Go type has been registered as safe (see above for details) is considered safe.

  • (proof) Certain additional rules apply to make types safe by default (see above “marking things safe or unsafe” for details).

  • (proof) At some point in the future relative to the date of this writing (May 2021), array and struct types whose elements are redactable will be formatted recursively in a way that preserves redactability.
    (At the point of this writing, arrays and structs are considered unsafe unless one of the previous rules apply to them)

  • (axiom) Other values are considered as unsafe.

The future of redactability in CockroachDB

See the last section in the blog post: both our external customers and our internal product team want to introduce a new separation inside log and error data:

  • Operational sensitive data: customer-owned data that can identify one of our customers, but not their end-users or the data that they store inside CockroachDB.
    For example, the IP addresses of our customers' client apps running in the cloud are operational sensitive data.

  • Application sensitive data: customer-owned data that contains PII or can identify end-users of our customers, with whom we don’t have a contractural relationship.
    This is the most protected type of data and access is heavily regulated by law in most jurisdictions.
    For example, the contents of the SQL tables are application sensitive data.

We wish to introduce this distinction because it would enable us to ingest operational data into telemetry without redaction; most of our customers have expressed willingness to share operational data (but not application data) with us.

Until we make this distinction in the code, we are unable to distinguish them so any sensitive data must be considered as application sensitive data by default.

If/when we study this distinction further, we will need to be extra careful about the following:

  • The names of databases, schemas, tables, types and columns inside the SQL schema (the “SQL metaschema”) must be considered application sensitive data by default, because sometimes SQL applications generate these names dynamically from data stored in the tables. We cannot blindly promote the SQL metaschema to the status of operational data without our customer’s explicit consent.

  • Same applies to the application_name field of SQL sessions.

  • The names of SQL users used to log into CockroachDB must be considered application sensitive data by default, because in most deployments they are derived from actual people and thus contain PII.

References