使用 Grafana 建立監控儀表板,整合 Prometheus、Loki、Tempo 實現完整可觀測性
專案簡介
Grafana 是最受歡迎的開源可觀測性平台,提供美觀的儀表板來視覺化監控資料。支援 Prometheus、InfluxDB、Elasticsearch 等數十種資料源。
GitHub Stars: 72K+
主要功能
- 儀表板 - 豐富的視覺化元件
- 多資料源 - 整合 100+ 資料來源
- 告警系統 - 多管道通知
- 日誌查詢 - Loki 整合
- 分散式追蹤 - Tempo 整合
快速部署
Docker
1
2
3
4
5
| docker run -d \
-p 3000:3000 \
--name grafana \
-v grafana-storage:/var/lib/grafana \
grafana/grafana-oss
|
Docker Compose(完整堆疊)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| version: '3.8'
services:
grafana:
image: grafana/grafana-oss
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
loki:
image: grafana/loki
ports:
- "3100:3100"
promtail:
image: grafana/promtail
volumes:
- /var/log:/var/log
- ./promtail-config.yml:/etc/promtail/config.yml
volumes:
grafana-data:
|
訪問 http://localhost:3000(admin/admin)
資料源設定
Prometheus
- Configuration → Data Sources → Add
- 選擇 Prometheus
- URL:
http://prometheus:9090 - Save & Test
Loki(日誌)
- Configuration → Data Sources → Add
- 選擇 Loki
- URL:
http://loki:3100
MySQL
1
2
3
4
5
6
7
8
9
10
| # datasources/mysql.yaml
apiVersion: 1
datasources:
- name: MySQL
type: mysql
url: mysql:3306
database: mydb
user: grafana
secureJsonData:
password: secret
|
建立儀表板
Panel 類型
| 類型 | 用途 |
|---|
| Time series | 時間序列資料 |
| Stat | 單一數值 |
| Gauge | 量表顯示 |
| Bar chart | 長條圖 |
| Table | 表格資料 |
| Heatmap | 熱力圖 |
| Logs | 日誌顯示 |
PromQL 範例
1
2
3
4
5
6
7
8
9
10
11
| # CPU 使用率
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# 記憶體使用率
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# HTTP 請求率
sum(rate(http_requests_total[5m])) by (status)
# P95 延遲
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
LogQL 範例(Loki)
1
2
3
4
5
6
7
8
| # 查詢特定應用日誌
{app="nginx"} |= "error"
# JSON 解析
{app="api"} | json | response_code >= 500
# 統計錯誤數
sum(rate({app="api"} |= "error" [5m])) by (level)
|
告警設定
告警規則
- Alerting → Alert Rules → Create
- 設定條件
1
2
3
4
5
6
7
8
| # 範例:CPU 高使用率告警
alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
|
通知管道
支援多種通知方式:
- Email
- Slack
- PagerDuty
- Webhook
- Microsoft Teams
- Discord
Slack 設定
- Alerting → Contact points → New
- 選擇 Slack
- 輸入 Webhook URL
儀表板即程式碼
JSON 匯出
Dashboard Settings → JSON Model → Copy
Provisioning
1
2
3
4
5
6
7
8
| # provisioning/dashboards/default.yaml
apiVersion: 1
providers:
- name: 'default'
folder: ''
type: file
options:
path: /var/lib/grafana/dashboards
|
1
2
3
4
5
6
7
8
9
10
11
12
| resource "grafana_dashboard" "metrics" {
config_json = file("dashboard.json")
}
resource "grafana_alert_rule" "cpu" {
name = "High CPU"
folder_id = grafana_folder.alerts.id
rule_group {
name = "cpu_alerts"
interval = "1m"
}
}
|
Grafana Loki
Promtail 設定
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| # promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: containers
static_configs:
- targets:
- localhost
labels:
job: containerlogs
__path__: /var/lib/docker/containers/*/*.log
pipeline_stages:
- json:
expressions:
log: log
stream: stream
time: time
|
查詢日誌
1
2
3
4
5
| # 最近錯誤
{job="containerlogs"} |= "error" | limit 100
# 按級別統計
sum by (level) (count_over_time({app="api"}[1h]))
|
效能優化
資料保留
1
2
3
4
5
6
7
8
9
| # prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
storage:
tsdb:
retention.time: 15d
retention.size: 50GB
|
快取設定
1
2
3
4
| # grafana.ini
[caching]
enabled = true
ttl = 300
|
安全設定
認證
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # grafana.ini
[auth]
disable_login_form = false
[auth.ldap]
enabled = true
config_file = /etc/grafana/ldap.toml
[auth.generic_oauth]
enabled = true
name = Keycloak
client_id = grafana
client_secret = secret
auth_url = https://keycloak/auth
token_url = https://keycloak/token
api_url = https://keycloak/userinfo
|
HTTPS
1
2
3
4
| [server]
protocol = https
cert_file = /etc/grafana/cert.pem
cert_key = /etc/grafana/key.pem
|
相關連結
延伸閱讀