liubing 64981805a5 marshal 11 ヶ月 前
..
example da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
internal da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
testdata da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
Makefile c9e646b996 clickhouseexporter: add traces support (#13442) 2 年 前
README.md da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
config.go da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
config_test.go da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
exporter_logs.go da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
exporter_logs_test.go 977eae4ebd [chore] [clickhouseexporter] use errors.Join instead of go.uber.org/multierr (#25183) 1 年間 前
exporter_metrics.go da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
exporter_metrics_test.go 5d263214db [exporter/clickhouse] Change writing of metrics data to batch (#24403) 1 年間 前
exporter_traces.go 64981805a5 marshal 11 ヶ月 前
exporter_traces_test.go eb6c42b2c3 [exporter/clickhouse] Add ScopeName ScopeVersion in span table (#21919) 1 年間 前
factory.go da0da506d6 [exporter/clickhouse] Enhacement - TTL Fine-grained configuration control (#29095) 1 年間 前
factory_test.go 5133f4ccd6 [chore] use license shortform (#22052) 1 年間 前
go.mod d680729c09 [chore] Prepare release 0.90.0 (#29543) 1 年間 前
go.sum 40b485f08a Update core for v0.90.0 release (#29539) 1 年間 前
metadata.yaml 8a4348cb00 [chore] add codeowners to metadata (#24404) 1 年間 前

README.md

ClickHouse Exporter

Status
Stability alpha: traces, metrics, logs
Distributions contrib
Issues Open issues Closed issues
Code Owners @hanjm, @dmitryax, @Frapschen

This exporter supports sending OpenTelemetry data to ClickHouse.

ClickHouse is an open-source, high performance columnar OLAP database management system for real-time analytics using SQL. Throughput can be measured in rows per second or megabytes per second. If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10 GB/s of uncompressed data on a single server. If 10 bytes of columns are extracted, the speed is expected to be around 100-200 million rows per second.

Note: Always add batch-processor to collector pipeline, as ClickHouse document says:

We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed can be from 50 to 200 MB/s.

User Cases

  1. Use Grafana Clickhouse datasource or vertamedia-clickhouse-datasource to make dashboard. Support time-series graph, table and logs.

  2. Analyze logs via powerful clickhouse SQL.

Logs

  • Get log severity count time series.

    SELECT toDateTime(toStartOfInterval(Timestamp, INTERVAL 60 second)) as time, SeverityText, count() as count
    FROM otel_logs
    WHERE time >= NOW() - INTERVAL 1 HOUR
    GROUP BY SeverityText, time
    ORDER BY time;
    
  • Find any log.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find log with specific service.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE ServiceName = 'clickhouse-exporter'
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find log with specific attribute.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE LogAttributes['container_name'] = '/example_flog_1'
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find log with body contain string token.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE hasToken(Body, 'http')
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find log with body contain string.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE Body like '%http%'
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find log with body regexp match string.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE match(Body, 'http')
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find log with body json extract.

    SELECT Timestamp as log_time, Body
    FROM otel_logs
    WHERE JSONExtractFloat(Body, 'bytes') > 1000
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    

Traces

  • Find spans with specific attribute.

    SELECT Timestamp as log_time,
       TraceId,
       SpanId,
       ParentSpanId,
       SpanName,
       SpanKind,
       ServiceName,
       Duration,
       StatusCode,
       StatusMessage,
       toString(SpanAttributes),
       toString(ResourceAttributes),
       toString(Events.Name),
       toString(Links.TraceId)
    FROM otel_traces
    WHERE ServiceName = 'clickhouse-exporter'
    AND SpanAttributes['peer.service'] = 'tracegen-server'
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find traces with traceID (using time primary index and TraceID skip index).

    WITH
    '391dae938234560b16bb63f51501cb6f' as trace_id,
    (SELECT min(Start) FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as start,
    (SELECT max(End) + 1 FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as end
    SELECT Timestamp as log_time,
       TraceId,
       SpanId,
       ParentSpanId,
       SpanName,
       SpanKind,
       ServiceName,
       Duration,
       StatusCode,
       StatusMessage,
       toString(SpanAttributes),
       toString(ResourceAttributes),
       toString(Events.Name),
       toString(Links.TraceId)
    FROM otel_traces
    WHERE TraceId = trace_id
    AND Timestamp >= start
    AND Timestamp <= end
    Limit 100;
    
  • Find spans is error.

    SELECT Timestamp as log_time,
       TraceId,
       SpanId,
       ParentSpanId,
       SpanName,
       SpanKind,
       ServiceName,
       Duration,
       StatusCode,
       StatusMessage,
       toString(SpanAttributes),
       toString(ResourceAttributes),
       toString(Events.Name),
       toString(Links.TraceId)
    FROM otel_traces
    WHERE ServiceName = 'clickhouse-exporter'
    AND StatusCode = 'STATUS_CODE_ERROR'
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    
  • Find slow spans.

    SELECT Timestamp as log_time,
       TraceId,
       SpanId,
       ParentSpanId,
       SpanName,
       SpanKind,
       ServiceName,
       Duration,
       StatusCode,
       StatusMessage,
       toString(SpanAttributes),
       toString(ResourceAttributes),
       toString(Events.Name),
       toString(Links.TraceId)
    FROM otel_traces
    WHERE ServiceName = 'clickhouse-exporter'
    AND Duration > 1 * 1e9
    AND Timestamp >= NOW() - INTERVAL 1 HOUR
    Limit 100;
    

Metrics

Metrics data is stored in different clickhouse tables depending on their types. The tables will have a suffix to distinguish which type of metrics data is stored.

Metrics Type Metrics Table
sum _sum
gauge _gauge
histogram _histogram
exponential histogram _exponential_histogram
summary _summary

Before you make a metrics query, you need to know the type of metric you wish to use. If your metrics come from Prometheus(or someone else uses OpenMetrics protocol), you also need to know the compatibility between Prometheus(OpenMetrics) and OTLP Metrics.

  • Find a sum metrics with name

    select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
    where MetricName='calls_total' limit 100
    
  • Find a sum metrics with name, attribute.

    select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
    where MetricName='calls_total' and Attributes['service_name']='featureflagservice'
    limit 100
    

The OTLP Metrics define two type value for one datapoint, clickhouse only use one value of float64 to store them.

Performance Guide

A single ClickHouse instance with 32 CPU cores and 128 GB RAM can handle around 20 TB (20 Billion) logs per day, the data compression ratio is 7 ~ 11, the compressed data store in disk is 1.8 TB ~ 2.85 TB, add more clickhouse node to cluster can increase linearly.

The otel-collector with otlp receiver/batch processor/clickhouse tcp exporter can process around 40k/s logs entry per CPU cores, add more collector node can increase linearly.

Configuration options

The following settings are required:

  • endpoint (no default): The ClickHouse server address, support multi host with port, for example:
    • tcp protocol tcp://addr1:port,tcp://addr2:port or TLS tcp://addr1:port,addr2:port?secure=true
    • http protocol http://addr1:port,addr2:port or https https://addr1:port,addr2:port
    • clickhouse protocol clickhouse://addr1:port,addr2:port or TLS clickhouse://addr1:port,addr2:port?secure=true

Many other ClickHouse specific options can be configured through query parameters e.g. addr?dial_timeout=5s&compress=lz4. For a full list of options see the ClickHouse driver documentation

Connection options:

  • username (default = ): The authentication username.
  • password (default = ): The authentication password.
  • ttl_days (default = 0): Deprecated: Use 'ttl' instead. The data time-to-live in days, 0 means no ttl.
  • ttl (default = 0): The data time-to-live example 30m, 48h. Also, 0 means no ttl.
  • database (default = otel): The database name.
  • connection_params (default = {}). Params is the extra connection parameters with map format.

ClickHouse tables:

  • logs_table_name (default = otel_logs): The table name for logs.
  • traces_table_name (default = otel_traces): The table name for traces.
  • metrics_table_name (default = otel_metrics): The table name for metrics.

Processing:

  • timeout (default = 5s): The timeout for every attempt to send data to the backend.
  • sending_queue
    • queue_size (default = 1000): Maximum number of batches kept in memory before dropping data.
  • retry_on_failure
    • enabled (default = true)
    • initial_interval (default = 5s): The Time to wait after the first failure before retrying; ignored if enabled is false
    • max_interval (default = 30s): The upper bound on backoff; ignored if enabled is false
    • max_elapsed_time (default = 300s): The maximum amount of time spent trying to send a batch; ignored if enabled is false

TLS

The exporter supports TLS. To enable TLS, you need to specify the secure=true query parameter in the endpoint URL or use the https scheme.

Example

This example shows how to configure the exporter to send data to a ClickHouse server. It uses the native protocol without TLS. The exporter will create the database and tables if they don't exist. The data is stored for 72 hours (3 days).

receivers:
  examplereceiver:
processors:
  batch:
    timeout: 5s
    send_batch_size: 100000
exporters:
  clickhouse:
    endpoint: tcp://127.0.0.1:9000?dial_timeout=10s&compress=lz4
    database: otel
    ttl: 72h
    logs_table_name: otel_logs
    traces_table_name: otel_traces
    metrics_table_name: otel_metrics
    timeout: 5s
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
service:
  pipelines:
    logs:
      receivers: [ examplereceiver ]
      processors: [ batch ]
      exporters: [ clickhouse ]