
关于
Istio、Linkerd 和服务网格部署的可观测性模式完整指南
name: service-mesh-observability description: "Istio、Linkerd 和服务网格部署的可观测性模式完整指南。" risk: critical source: community date_added: "2026-02-27"
服务网格可观测性
Istio、Linkerd 和服务网格部署的可观测性模式完整指南。
不要在以下情况使用此技能
- 任务与服务网格可观测性无关
- 你需要此范围之外的不同领域或工具
说明
- 明确目标、约束和所需输入。
- 应用相关最佳实践并验证结果。
- 提供可操作的步骤和验证。
- 如果需要详细示例,请打开
resources/implementation-playbook.md。
在以下情况使用此技能
- 设置跨服务的分布式追踪
- 实现服务网格指标和仪表板
- 调试延迟和错误问题
- 为服务通信定义 SLO
- 可视化服务依赖关系
- 排查网格连接问题
核心概念
1. 可观测性三大支柱
┌─────────────────────────────────────────────────────┐
│ Observability │
├─────────────────┬─────────────────┬─────────────────┤
│ Metrics │ Traces │ Logs │
│ │ │ │
│ • Request rate │ • Span context │ • Access logs │
│ • Error rate │ • Latency │ • Error details │
│ • Latency P50 │ • Dependencies │ • Debug info │
│ • Saturation │ • Bottlenecks │ • Audit trail │
└─────────────────┴─────────────────┴─────────────────┘
2. 网格黄金信号
| 信号 | 描述 | 告警阈值 | |------|------|----------| | 延迟 | 请求持续时间 P50、P99 | P99 > 500ms | | 流量 | 每秒请求数 | 异常检测 | | 错误 | 5xx 错误率 | > 1% | | 饱和度 | 资源利用率 | > 80% |
模板
模板 1:Istio 配合 Prometheus 和 Grafana
# Install Prometheus
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
namespace: istio-system
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'istio-mesh'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: istio-telemetry
---
# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: istio-mesh
namespace: istio-system
spec:
selector:
matchLabels:
app: istiod
endpoints:
- port: http-monitoring
interval: 15s
模板 2:关键 Istio 指标查询
# Request rate by service
sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service_name)
# Error rate (5xx)
sum(rate(istio_requests_total{reporter="destination", response_code=~"5.."}[5m]))
/ sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100
# P99 latency
histogram_quantile(0.99,
sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination"}[5m]))
by (le, destination_service_name))
# TCP connections
sum(istio_tcp_connections_opened_total{reporter="destination"}) by (destination_service_name)
# Request size
histogram_quantile(0.99,
sum(rate(istio_request_bytes_bucket{reporter="destination"}[5m]))
by (le, destination_service_name))
模板 3:Jaeger 分布式追踪
# Jaeger installation for Istio
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
enableTracing: true
defaultConfig:
tracing:
sampling: 100.0 # 100% in dev, lower in prod
zipkin:
address: jaeger-collector.istio-system:9411
---
# Jaeger deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
namespace: istio-system
spec:
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:1.50
ports:
- containerPort: 5775 # UDP
- containerPort: 6831 # Thrift
- containerPort: 6832 # Thrift
- containerPort: 5778 # Config
- containerPort: 16686 # UI
- containerPort: 14268 # HTTP
- containerPort: 14250 # gRPC
- containerPort: 9411 # Zipkin
env:
- name: COLLECTOR_ZIPKIN_HOST_PORT
value: ":9411"
模板 4:Linkerd Viz 仪表板
# Install Linkerd viz extension
linkerd viz install | kubectl apply -f -
# Access dashboard
linkerd viz dashboard
# CLI commands for observability
# Top requests
linkerd viz top deploy/my-app
# Per-route metrics
linkerd viz routes deploy/my-app --to deploy
兼容工具
Claude CodeCursor
标签
运维部署

