SigNoz 開源可觀測性平台

使用 SigNoz 整合 Logs、Metrics、Traces,OpenTelemetry 原生的可觀測性解決方案

專案簡介

SigNoz 是一個開源的 APM 和可觀測性平台,原生支援 OpenTelemetry。整合日誌、指標、追蹤於單一介面,是 DataDog、New Relic 的開源替代方案。

GitHub Stars: 25K+

主要功能

  • 統一平台 - Logs、Metrics、Traces 一站式
  • OpenTelemetry - 原生支援 OTel
  • APM - 應用程式效能監控
  • 日誌管理 - 結構化日誌查詢
  • 分散式追蹤 - 端對端請求追蹤

安裝

Docker(快速體驗)

1
2
3
git clone https://github.com/SigNoz/signoz.git
cd signoz/deploy/
docker compose -f docker/clickhouse-setup/docker-compose.yaml up -d

訪問 http://localhost:3301

Kubernetes

1
2
helm repo add signoz https://charts.signoz.io
helm install signoz signoz/signoz -n signoz --create-namespace

OpenTelemetry 設定

Python

1
2
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

trace.set_tracer_provider(TracerProvider())
tracer_provider = trace.get_tracer_provider()

otlp_exporter = OTLPSpanExporter(
    endpoint="http://localhost:4317",
    insecure=True
)

tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

# 使用
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("my-operation"):
    # 業務邏輯
    pass

自動儀器

1
2
3
4
opentelemetry-instrument \
    --service_name my-service \
    --exporter_otlp_endpoint http://localhost:4317 \
    python app.py

Node.js

1
2
3
4
npm install @opentelemetry/api \
    @opentelemetry/sdk-node \
    @opentelemetry/auto-instrumentations-node \
    @opentelemetry/exporter-trace-otlp-grpc
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
const { NodeSDK } = require("@opentelemetry/sdk-node");
const {
  getNodeAutoInstrumentations,
} = require("@opentelemetry/auto-instrumentations-node");
const {
  OTLPTraceExporter,
} = require("@opentelemetry/exporter-trace-otlp-grpc");

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: "http://localhost:4317",
  }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: "my-node-service",
});

sdk.start();

日誌收集

Python 日誌

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import logging
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter

logger_provider = LoggerProvider()
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint="http://localhost:4317", insecure=True)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))

handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider)
logging.getLogger().addHandler(handler)

# 使用
logger = logging.getLogger(__name__)
logger.info("This log will be sent to SigNoz")

Fluentd/Fluent Bit

1
2
3
4
5
6
7
8
# fluent-bit.conf
[OUTPUT]
    Name  otlp
    Match *
    Host  localhost
    Port  4317
    Trace_Id_Key trace_id
    Span_Id_Key span_id

指標收集

自訂指標

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

exporter = OTLPMetricExporter(endpoint="http://localhost:4317", insecure=True)
reader = PeriodicExportingMetricReader(exporter, export_interval_millis=1000)
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)

meter = metrics.get_meter(__name__)

# Counter
request_counter = meter.create_counter(
    name="http_requests_total",
    description="Total HTTP requests"
)

# Histogram
request_duration = meter.create_histogram(
    name="http_request_duration_seconds",
    description="HTTP request duration"
)

# 使用
request_counter.add(1, {"method": "GET", "path": "/api"})
request_duration.record(0.5, {"method": "GET"})

儀表板

建立儀表板

  1. 進入 Dashboard
  2. 新增 Panel
  3. 選擇資料來源(Traces、Logs、Metrics)
  4. 設定查詢和視覺化

常用查詢

1
2
3
4
5
6
7
-- 服務延遲 P99
SELECT
  quantile(0.99)(durationNano / 1000000) as p99_ms
FROM signoz_traces.signoz_index_v2
WHERE serviceName = 'my-service'
  AND timestamp > now() - INTERVAL 1 HOUR
GROUP BY toStartOfMinute(timestamp)

警報

設定警報規則

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 透過 UI 設定
alert: High Error Rate
expr: |
  sum(rate(signoz_calls_total{status_code="ERROR"}[5m]))
  /
  sum(rate(signoz_calls_total[5m]))
  > 0.05  
for: 5m
labels:
  severity: critical
annotations:
  summary: "High error rate detected"

通知管道

  • Slack
  • PagerDuty
  • Email
  • Webhook

分散式追蹤

服務地圖

自動產生服務拓撲圖,顯示:

  • 服務相依性
  • 請求流向
  • 延遲分佈
  • 錯誤率

Trace 詳情

  • Span 時間軸
  • 屬性和事件
  • 日誌關聯
  • 錯誤詳情

資源需求

最小配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 開發環境
SigNoz:
  CPU: 2 cores
  Memory: 4 GB
  Storage: 20 GB

ClickHouse:
  CPU: 2 cores
  Memory: 4 GB
  Storage: 50 GB

生產配置

1
2
3
4
5
6
7
8
9
# 生產環境
SigNoz:
  CPU: 4 cores
  Memory: 8 GB

ClickHouse:
  CPU: 8 cores
  Memory: 32 GB
  Storage: 500 GB SSD

資料保留

設定保留期

1
2
3
4
# 透過環境變數
RETENTION_PERIOD_TRACES=72h
RETENTION_PERIOD_METRICS=30d
RETENTION_PERIOD_LOGS=15d

相關連結

延伸閱讀

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy