
关于
创建和管理生产级 Grafana 仪表盘,实现全面的系统可观测性。
name: grafana-dashboards description: "创建和管理生产就绪的 Grafana 仪表板,实现全面的系统可观测性。" risk: unknown source: community date_added: "2026-02-27"
Grafana 仪表板
创建和管理生产就绪的 Grafana 仪表板,实现全面的系统可观测性。
不适用场景
- 任务与 Grafana 仪表板无关时
- 需要此范围之外的不同领域或工具时
使用说明
- 明确目标、约束条件和所需输入。
- 应用相关最佳实践并验证结果。
- 提供可操作的步骤和验证方法。
- 如需详细示例,请打开
resources/implementation-playbook.md。
目的
为监控应用、基础设施和业务指标设计有效的 Grafana 仪表板。
使用场景
- 可视化 Prometheus 指标
- 创建自定义仪表板
- 实现 SLO 仪表板
- 监控基础设施
- 跟踪业务 KPI
仪表板设计原则
1. 信息层次
┌─────────────────────────────────────┐
│ Critical Metrics (Big Numbers) │
├─────────────────────────────────────┤
│ Key Trends (Time Series) │
├─────────────────────────────────────┤
│ Detailed Metrics (Tables/Heatmaps) │
└─────────────────────────────────────┘
2. RED 方法(服务)
- Rate - 每秒请求数
- Errors - 错误率
- Duration - 延迟/响应时间
3. USE 方法(资源)
- Utilization - 资源繁忙时间百分比
- Saturation - 队列长度/等待时间
- Errors - 错误计数
仪表板结构
API 监控仪表板
{
"dashboard": {
"title": "API Monitoring",
"tags": ["api", "production"],
"timezone": "browser",
"refresh": "30s",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (service)",
"legendFormat": "{{service}}"
}
],
"gridPos": {"x": 0, "y": 0, "w": 12, "h": 8}
},
{
"title": "Error Rate %",
"type": "graph",
"targets": [
{
"expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
"legendFormat": "Error Rate"
}
],
"alert": {
"conditions": [
{
"evaluator": {"params": [5], "type": "gt"},
"operator": {"type": "and"},
"query": {"params": ["A", "5m", "now"]},
"type": "query"
}
]
},
"gridPos": {"x": 12, "y": 0, "w": 12, "h": 8}
},
{
"title": "P95 Latency",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
"legendFormat": "{{service}}"
}
],
"gridPos": {"x": 0, "y": 8, "w": 24, "h": 8}
}
]
}
}
参考: 见 assets/api-dashboard.json
面板类型
1. 统计面板(单值)
{
"type": "stat",
"title": "Total Requests",
"targets": [{
"expr": "sum(http_requests_total)"
}],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value"
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"value": 0, "color": "green"},
{"value": 80, "color": "yellow"},
{"value": 90, "color": "red"}
]
}
}
}
}
2. 时间序列图
{
"type": "graph",
"title": "CPU Usage",
"targets": [{
"expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
}],
"yaxes": [
{"format": "percent", "max": 100, "min": 0},
{"format": "short"}
]
}
3. 表格面板
{
"type": "table",
"title": "Service Status",
"targets": [{
"expr": "up",
"format": "table",
"instant": true
}],
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {"Time": true},
"indexByName": {},
"renameByName": {
"instance": "Instance",
"job": "Service",
"Value": "Status"
}
}
}
]
}
4. 热力图
{
"type": "heatmap",
"title": "Latency Heatmap",
"targets": [{
"expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
"format": "heatmap"
}],
"dataFormat": "tsbuckets",
"yAxis": {
"format": "s"
}
}
变量
查询变量
{
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)"
}
]
}
}
兼容工具
Claude CodeCursor
标签
运维部署

