Collecting a "debug zip" manually

When asked for help, the CockroachDB team often requests the output of the command cockroach debug zip.

This utility automatically scrapes metadata about a cluster which can aid with debugging all kinds of issues.

Sometimes, using cockroach debug zip is not practical; or perhaps, an issue in the tool itself prevents its successful use. When debug zip is not an option, the following steps can be used to collect a subset of the same information which may be sufficient to troubleshoot.

Note: this page has been written ahead of the release of CockroachDB v20.1. Some of the details listed below may not be relevant to previous versions, and some more details from subsequent versions may be missing.

Cluster-level information

This information can be retrieved from just one node, because it is shared between all nodes.

  • the contents of all the following tables. The tables should be scraped with a binary format so that special characters are properly preserved.
    crdb_internal.cluster_queries

    crdb_internal.cluster_sessions

    crdb_internal.cluster_settings

    crdb_internal.cluster_transactions

    crdb_internal.jobs

    system.jobs

    system.descriptor

    system.namespace

    system.namespace_deprecated

    crdb_internal.kv_node_status

    crdb_internal.kv_store_status

    crdb_internal.schema_changes

    crdb_internal.partitions

    crdb_internal.zones

  • The data from the following HTTP endpoints (may need to use cockroach auth-session login to get a valid authentication cookie from a user with admin credentials):
    /_admin/v1/liveness
    /_admin/v1/rangelog
    /_admin/v1/settings
    /_admin/v1/events
    /_status/problemranges
    /_status/nodes

  • For every SQL database {database}, the contents of
    /_admin/v1/databases/{database}

  • For every SQL table {table} in database {database}, the contents of
    /_admin/v1/databases/{database}/tables/{table}

Node-level information

This information must be collected separately for every node in the cluster.

  • The log files from that node’s log directory.

  • The heap profile dumps from that node’s log directory.

  • The goroutine dumps from that node’s log directory.

  • The contents of the following tables. The tables must be scraped using a SQL client connected to that node’s SQL address&port.
    They should also be scraped with a binary format so that special characters are properly preserved.
    crdb_internal.feature_usage

    crdb_internal.gossip_alerts

    crdb_internal.gossip_liveness

    crdb_internal.gossip_network

    crdb_internal.gossip_nodes

    crdb_internal.leases

    crdb_internal.node_build_info

    crdb_internal.node_metrics

    crdb_internal.node_queries

    crdb_internal.node_runtime_info

    crdb_internal.node_sessions

    crdb_internal.node_statement_statistics

    crdb_internal.node_transactions

    crdb_internal.node_txn_stats

  • The data from the following HTTP endpoints, preferably using that node’s address & HTTP port number (you may need to use cockroach auth-session login to get a valid authentication cookie from a user with admin credentials).
    They must be scraped using that node’s node ID:
    /_status/enginestats/{nodeID}
    /_status/gossip/{nodeID}
    /_status/nodes/{nodeID}
    /_status/details/{nodeID}
    /_status/profile/{nodeID}
    /_status/stacks/{nodeID}?stacks_type=0
    /_status/stacks/{nodeID}?stacks_type=1
    /_status/ranges/{node_id}

  • For every range ID reported by /_status/ranges/{node_id}, the content of
    /_status/range/{range_id}