Skip to main content
Skip to main content

Map vs JSON type for ClickStack

ClickStack's default schema stores resource, scope, log, and span attributes as Map(LowCardinality(String), String) columns. ClickHouse also supports a strongly typed JSON type, and ClickStack has beta support for using it in place of Map.

For typical observability workloads we recommend keeping the default Map-based schema. The JSON type is available for users who want to evaluate it on workloads with a small, stable set of attribute keys, but it is not the recommended schema for general use.

Why Map is the recommended default

Observability data is dominated by attributes such as resource attributes, scope attributes, and span and log attributes, and these sets are typically large, high-cardinality, and ingested at high throughput. The schema you pick for those attributes is the dominant factor in ingest cost and storage layout.

Map(LowCardinality(String), String) stores keys and values as a single structure. The historical disadvantage of Map was that reading a single key required reading the entire map column. That's no longer true: ClickHouse now supports bucketed map serialization, which splits the map into buckets so queries only read the buckets they need. Combined with text indexes on map keys and values, which is how ClickStack's default schema is configured, this makes Map selective and fast at read time without paying any ingest penalty for new keys.

In practice this means:

  • Stable ingest cost as keys grow. Adding a new attribute key doesn't change the on-disk column layout or create new column files. Ingest cost is bounded by the data volume, not the key cardinality.
  • No metadata explosion. The number of column files on disk doesn't track the number of unique attribute keys.
  • Selective lookups via indexes. Text indexes on map keys and values give point lookups without scanning every row.
  • Predictable behaviour at high throughput. Map handles bursty, schemaless attribute sets, common in tracing and logs, without per-key overhead.

Why not JSON by default

The JSON type takes a different approach: at insert time, ClickHouse dynamically creates a dedicated, strongly typed subcolumn for each path it sees. At read time this is attractive, since only the requested subcolumns are read, types are preserved, and no query-time casting is needed.

The tradeoff lands at ingest time. Creating and managing many dynamic subcolumns introduces write-time overhead and metadata complexity. On observability workloads, which routinely have very large or highly dynamic attribute sets and high ingest throughput, that overhead is significant. The max_dynamic_paths limit can cap the damage by spilling extra paths into a shared column, but accessing the shared column is slower than dedicated subcolumns, which erodes the read-time advantage that motivated using JSON in the first place.

With bucketed map serialization removing most of the historical read-time overhead of Map, the read-time advantage of JSON no longer outweighs its ingest-time cost for typical observability workloads.

When you might still consider JSON

The JSON type can be a reasonable fit when all of the following hold:

  • Your attribute key-set is small and stable, meaning you are not seeing thousands of unique keys, and new keys appear rarely.
  • Ingest throughput is modest relative to the attribute cardinality.
  • You want strongly typed access to attributes without query-time casts (numbers stay numbers, booleans stay booleans).
  • You are willing to operate a beta feature in ClickStack and accept that the integration may change.

If those conditions don't all hold, stay on the default Map-based schema.

Beta status

Beta feature. Learn more.
Beta feature, not production ready

JSON type support in ClickStack is a beta feature. While the JSON type itself is production-ready in ClickHouse 25.3+, its integration within ClickStack is still under active development and may have limitations, change in the future, or contain bugs.

ClickStack has beta support for the JSON type from version 2.0.4.

Enabling JSON support

To use JSON-typed schemas instead of the default Map-based schemas, set the following environment variables.

VariableSet onPurpose
OTEL_AGENT_FEATURE_GATE_ARG='--feature-gates=clickhouse.json'OTel collectorCreates schemas in ClickHouse using the JSON type.
BETA_CH_OTEL_JSON_SCHEMA_ENABLED=trueHyperDX (ClickStack UI)Enables the application layer to query JSON-typed schemas. ClickStack Open Source only.

Managed ClickStack

To enable JSON support in Managed ClickStack, contact support@clickhouse.com prior to configuring the collector. The feature must also be enabled in the ClickStack UI (HyperDX) in ClickHouse Cloud.

Set OTEL_AGENT_FEATURE_GATE_ARG='--feature-gates=clickhouse.json' on the collector. For example:

docker run -e OTEL_AGENT_FEATURE_GATE_ARG='--feature-gates=clickhouse.json' -e CLICKHOUSE_ENDPOINT=${CLICKHOUSE_ENDPOINT} -e CLICKHOUSE_USER=default -e CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD} -p 8080:8080 -p 4317:4317 -p 4318:4318 clickhouse/clickstack-otel-collector:latest

Open Source ClickStack

Set OTEL_AGENT_FEATURE_GATE_ARG='--feature-gates=clickhouse.json' on any deployment that includes the collector, and BETA_CH_OTEL_JSON_SCHEMA_ENABLED=true on the HyperDX application layer so it can query the JSON-typed schemas.

For example:

docker run -e OTEL_AGENT_FEATURE_GATE_ARG='--feature-gates=clickhouse.json' -e OPAMP_SERVER_URL=${OPAMP_SERVER_URL} -e CLICKHOUSE_ENDPOINT=${CLICKHOUSE_ENDPOINT} -e CLICKHOUSE_USER=default -e CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD} -p 8080:8080 -p 4317:4317 -p 4318:4318 clickhouse/clickstack-otel-collector:latest

Migrating from a Map-based schema to JSON

Backwards compatibility

The JSON type is not backwards compatible with existing map-based schemas. Enabling this feature creates new tables using the JSON type and requires manual data migration.

To migrate from the default Map-based schemas, follow these steps:

Stop the OTel collector

Rename existing tables and update sources

Rename existing tables and update data sources in HyperDX.

For example:

RENAME TABLE otel_logs TO otel_logs_map;
RENAME TABLE otel_metrics TO otel_metrics_map;

Deploy the collector

Deploy the collector with OTEL_AGENT_FEATURE_GATE_ARG set.

Restart the HyperDX container with JSON schema support

export BETA_CH_OTEL_JSON_SCHEMA_ENABLED=true

Create new data sources

Create new data sources in HyperDX pointing to the JSON tables.

Migrating existing data (optional)

To move old data into the new JSON tables:

INSERT INTO otel_logs SELECT * FROM otel_logs_map;
INSERT INTO otel_metrics SELECT * FROM otel_metrics_map;
Note

Recommended only for datasets smaller than ~10 billion rows. Data previously stored with the Map type didn't preserve type precision (all values were strings). As a result, this old data will appear as strings in the new schema until it ages out, requiring some casting on the frontend. Type for new data will be preserved with the JSON type.