使用 SigNoz 整合 Logs、Metrics、Traces,OpenTelemetry 原生的可觀測性解決方案 專案簡介 SigNoz 是一個開源的 APM 和可觀測性平台,原生支援 OpenTelemetry。整合日誌、指標、追蹤於單一介面,是 DataDog、New Relic 的開源替代方案。
GitHub Stars : 25K+
主要功能 統一平台 - Logs、Metrics、Traces 一站式OpenTelemetry - 原生支援 OTelAPM - 應用程式效能監控日誌管理 - 結構化日誌查詢分散式追蹤 - 端對端請求追蹤安裝 Docker(快速體驗) 1
2
3
git clone https://github.com/SigNoz/signoz.git
cd signoz/deploy/
docker compose -f docker/clickhouse-setup/docker-compose.yaml up -d
訪問 http://localhost:3301
Kubernetes 1
2
helm repo add signoz https://charts.signoz.io
helm install signoz signoz/signoz -n signoz --create-namespace
OpenTelemetry 設定 Python 1
2
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
trace . set_tracer_provider ( TracerProvider ())
tracer_provider = trace . get_tracer_provider ()
otlp_exporter = OTLPSpanExporter (
endpoint = "http://localhost:4317" ,
insecure = True
)
tracer_provider . add_span_processor ( BatchSpanProcessor ( otlp_exporter ))
# 使用
tracer = trace . get_tracer ( __name__ )
with tracer . start_as_current_span ( "my-operation" ):
# 業務邏輯
pass
自動儀器 1
2
3
4
opentelemetry-instrument \
--service_name my-service \
--exporter_otlp_endpoint http://localhost:4317 \
python app.py
Node.js 1
2
3
4
npm install @opentelemetry/api \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const { NodeSDK } = require ( "@opentelemetry/sdk-node" );
const {
getNodeAutoInstrumentations ,
} = require ( "@opentelemetry/auto-instrumentations-node" );
const {
OTLPTraceExporter ,
} = require ( "@opentelemetry/exporter-trace-otlp-grpc" );
const sdk = new NodeSDK ({
traceExporter : new OTLPTraceExporter ({
url : "http://localhost:4317" ,
}),
instrumentations : [ getNodeAutoInstrumentations ()],
serviceName : "my-node-service" ,
});
sdk . start ();
日誌收集 Python 日誌 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import logging
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs import LoggerProvider , LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
logger_provider = LoggerProvider ()
set_logger_provider ( logger_provider )
exporter = OTLPLogExporter ( endpoint = "http://localhost:4317" , insecure = True )
logger_provider . add_log_record_processor ( BatchLogRecordProcessor ( exporter ))
handler = LoggingHandler ( level = logging . DEBUG , logger_provider = logger_provider )
logging . getLogger () . addHandler ( handler )
# 使用
logger = logging . getLogger ( __name__ )
logger . info ( "This log will be sent to SigNoz" )
Fluentd/Fluent Bit 1
2
3
4
5
6
7
8
# fluent-bit.conf
[ OUTPUT]
Name otlp
Match *
Host localhost
Port 4317
Trace_Id_Key trace_id
Span_Id_Key span_id
指標收集 自訂指標 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
exporter = OTLPMetricExporter ( endpoint = "http://localhost:4317" , insecure = True )
reader = PeriodicExportingMetricReader ( exporter , export_interval_millis = 1000 )
provider = MeterProvider ( metric_readers = [ reader ])
metrics . set_meter_provider ( provider )
meter = metrics . get_meter ( __name__ )
# Counter
request_counter = meter . create_counter (
name = "http_requests_total" ,
description = "Total HTTP requests"
)
# Histogram
request_duration = meter . create_histogram (
name = "http_request_duration_seconds" ,
description = "HTTP request duration"
)
# 使用
request_counter . add ( 1 , { "method" : "GET" , "path" : "/api" })
request_duration . record ( 0.5 , { "method" : "GET" })
儀表板 建立儀表板 進入 Dashboard 新增 Panel 選擇資料來源(Traces、Logs、Metrics) 設定查詢和視覺化 常用查詢 1
2
3
4
5
6
7
-- 服務延遲 P99
SELECT
quantile ( 0 . 99 )( durationNano / 1000000 ) as p99_ms
FROM signoz_traces . signoz_index_v2
WHERE serviceName = 'my-service'
AND timestamp > now () - INTERVAL 1 HOUR
GROUP BY toStartOfMinute ( timestamp )
警報 設定警報規則 1
2
3
4
5
6
7
8
9
10
11
12
# 透過 UI 設定
alert : High Error Rate
expr : |
sum(rate(signoz_calls_total{status_code="ERROR"}[5m]))
/
sum(rate(signoz_calls_total[5m]))
> 0.05
for : 5m
labels :
severity : critical
annotations :
summary : "High error rate detected"
通知管道 Slack PagerDuty Email Webhook 分散式追蹤 服務地圖 自動產生服務拓撲圖,顯示:
Trace 詳情 資源需求 最小配置 1
2
3
4
5
6
7
8
9
10
# 開發環境
SigNoz :
CPU : 2 cores
Memory : 4 GB
Storage : 20 GB
ClickHouse :
CPU : 2 cores
Memory : 4 GB
Storage : 50 GB
生產配置 1
2
3
4
5
6
7
8
9
# 生產環境
SigNoz :
CPU : 4 cores
Memory : 8 GB
ClickHouse :
CPU : 8 cores
Memory : 32 GB
Storage : 500 GB SSD
資料保留 設定保留期 1
2
3
4
# 透過環境變數
RETENTION_PERIOD_TRACES=72h
RETENTION_PERIOD_METRICS=30d
RETENTION_PERIOD_LOGS=15d
相關連結 延伸閱讀 Licensed under CC BY-NC-SA 4.0