使用 Jaeger 追蹤微服務架構中的請求流程,定位效能瓶頸和問題根因 專案簡介 Jaeger 是由 Uber 開發的開源分散式追蹤系統,現為 CNCF 畢業專案。用於監控和排查微服務架構中的問題,支援 OpenTelemetry。
GitHub Stars : 22K+
主要功能 分散式追蹤 - 端對端請求追蹤根因分析 - 定位效能問題服務相依 - 自動產生服務圖效能優化 - 識別延遲瓶頸OpenTelemetry - 原生整合架構元件 1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent │───▶│ Collector │───▶│ Storage │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Query │
└─────────────┘
│
▼
┌─────────────┐
│ UI │
└─────────────┘
安裝 All-in-One(開發用) 1
2
3
4
5
6
7
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
jaegertracing/all-in-one:latest
訪問 http://localhost:16686
Docker Compose 1
2
3
4
5
6
7
8
9
10
11
version : '3.8'
services :
jaeger :
image : jaegertracing/all-in-one:latest
ports :
- "16686:16686" # UI
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "14268:14268" # Thrift HTTP
environment :
- COLLECTOR_OTLP_ENABLED=true
Kubernetes 1
2
kubectl create namespace observability
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.50.0/jaeger-operator.yaml -n observability
OpenTelemetry 整合 Python 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
resource = Resource . create ({ "service.name" : "my-service" })
provider = TracerProvider ( resource = resource )
processor = BatchSpanProcessor ( OTLPSpanExporter ( endpoint = "http://localhost:4317" ))
provider . add_span_processor ( processor )
trace . set_tracer_provider ( provider )
tracer = trace . get_tracer ( __name__ )
# 使用
with tracer . start_as_current_span ( "operation" ) as span :
span . set_attribute ( "user.id" , "12345" )
# 業務邏輯
Go 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package main
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/trace"
)
func initTracer () func () {
exporter , _ := otlptracegrpc . New ( context . Background (),
otlptracegrpc . WithEndpoint ( "localhost:4317" ),
otlptracegrpc . WithInsecure (),
)
tp := trace . NewTracerProvider (
trace . WithBatcher ( exporter ),
trace . WithResource ( resource . NewWithAttributes (
semconv . ServiceNameKey . String ( "my-service" ),
)),
)
otel . SetTracerProvider ( tp )
return func () { tp . Shutdown ( context . Background ()) }
}
Node.js 1
2
3
4
5
6
7
8
9
10
11
12
13
const { NodeSDK } = require ( "@opentelemetry/sdk-node" );
const {
OTLPTraceExporter ,
} = require ( "@opentelemetry/exporter-trace-otlp-grpc" );
const sdk = new NodeSDK ({
traceExporter : new OTLPTraceExporter ({
url : "http://localhost:4317" ,
}),
serviceName : "my-node-service" ,
});
sdk . start ();
儲存後端 Elasticsearch 1
2
3
4
5
6
7
8
services :
jaeger-collector :
environment :
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
elasticsearch :
image : elasticsearch:8.x
Cassandra 1
2
3
4
5
6
7
8
services :
jaeger-collector :
environment :
- SPAN_STORAGE_TYPE=cassandra
- CASSANDRA_SERVERS=cassandra
cassandra :
image : cassandra:4.x
Kafka(緩衝) 1
2
3
4
5
6
7
8
9
10
services :
jaeger-collector :
environment :
- SPAN_STORAGE_TYPE=kafka
- KAFKA_PRODUCER_BROKERS=kafka:9092
jaeger-ingester :
environment :
- SPAN_STORAGE_TYPE=elasticsearch
- KAFKA_CONSUMER_BROKERS=kafka:9092
查詢和分析 搜尋 Trace 1
2
3
4
5
6
# UI 查詢
Service: my-service
Operation: /api/users
Tags: http.status_code=500
Min Duration: 1s
Limit: 20
Trace 比較 選擇多個 Trace 點擊 Compare 分析差異 服務效能 P50/P95/P99 延遲 請求率 錯誤率 服務相依圖 Span 屬性 設定屬性 1
2
3
4
5
6
7
8
from opentelemetry import trace
tracer = trace . get_tracer ( __name__ )
with tracer . start_as_current_span ( "db-query" ) as span :
span . set_attribute ( "db.system" , "postgresql" )
span . set_attribute ( "db.statement" , "SELECT * FROM users" )
span . set_attribute ( "db.name" , "mydb" )
標準屬性 屬性 說明 http.methodHTTP 方法 http.url請求 URL http.status_code狀態碼 db.system資料庫類型 db.statementSQL 語句 rpc.systemRPC 系統
取樣策略 Collector 設定 1
2
3
4
5
6
7
8
9
10
11
12
13
14
# sampling.json
{
"service_strategies": [
{
"service": "my-service" ,
"type": "probabilistic" ,
"param": 0.5
}
],
"default_strategy": {
"type": "probabilistic" ,
"param": 0.1
}
}
取樣類型 類型 說明 const固定取樣(0 或 1) probabilistic機率取樣 ratelimiting速率限制 remote遠端控制
Jaeger Operator 部署 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion : jaegertracing.io/v1
kind : Jaeger
metadata :
name : production
spec :
strategy : production
storage :
type : elasticsearch
options :
es :
server-urls : http://elasticsearch:9200
collector :
replicas : 2
query :
replicas : 2
自動注入 1
2
3
4
5
apiVersion : apps/v1
kind : Deployment
metadata :
annotations :
sidecar.jaegertracing.io/inject : "true"
相關連結 延伸閱讀 Licensed under CC BY-NC-SA 4.0