Incident Runbook Templates

Low Risk

by @sickn33Verified Source

4.1176 installsv1.0.0Updated May 25, 2026

How to Use

Run in Claude Code terminal

Step 1: Add Marketplace

/plugin marketplace add sickn33/antigravity-awesome-skills

Step 2: Install Plugin

/plugin install antigravity-awesome-skills@antigravity-awesome-skills

About

Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.

name: incident-runbook-templates description: "Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication." risk: critical source: community date_added: "2026-02-27"

Incident Runbook Templates

Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.

Do not use this skill when

The task is unrelated to incident runbook templates
You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.
If detailed examples are required, open resources/implementation-playbook.md.

Use this skill when

Creating incident response procedures
Building service-specific runbooks
Establishing escalation paths
Documenting recovery procedures
Responding to active incidents
Onboarding on-call engineers

Core Concepts

1. Incident Severity Levels

| Severity | Impact | Response Time | Example | |----------|--------|---------------|---------| | SEV1 | Complete outage, data loss | 15 min | Production down | | SEV2 | Major degradation | 30 min | Critical feature broken | | SEV3 | Minor impact | 2 hours | Non-critical bug | | SEV4 | Minimal impact | Next business day | Cosmetic issue |

2. Runbook Structure

1. Overview & Impact
2. Detection & Alerts
3. Initial Triage
4. Mitigation Steps
5. Root Cause Investigation
6. Resolution Procedures
7. Verification & Rollback
8. Communication Templates
9. Escalation Matrix

Runbook Templates

Template 1: Service Outage Runbook

# [Service Name] Outage Runbook

## Overview
**Service**: Payment Processing Service
**Owner**: Platform Team
**Slack**: #payments-incidents
**PagerDuty**: payments-oncall

## Impact Assessment
- [ ] Which customers are affected?
- [ ] What percentage of traffic is impacted?
- [ ] Are there financial implications?
- [ ] What's the blast radius?

## Detection
### Alerts
- `payment_error_rate > 5%` (PagerDuty)
- `payment_latency_p99 > 2s` (Slack)
- `payment_success_rate < 95%` (PagerDuty)

### Dashboards
- [Payment Service Dashboard](https://grafana/d/payments)
- [Error Tracking](https://sentry.io/payments)
- [Dependency Status](https://status.stripe.com)

## Initial Triage (First 5 Minutes)

### 1. Assess Scope
```bash
# Check service health
kubectl get pods -n payments -l app=payment-service

# Check recent deployments
kubectl rollout history deployment/payment-service -n payments

# Check error rates
curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total{status=~'5..'}[5m]))"

2. Quick Health Checks

[ ] Can you reach the service? curl -I https://api.company.com/payments/health
[ ] Database connectivity? Check connection pool metrics
[ ] External dependencies? Check Stripe, bank API status
[ ] Recent changes? Check deploy history

3. Initial Classification

| Symptom | Likely Cause | Go To Section | |---------|--------------|---------------| | All requests failing | Service down | Section 4.1 | | High latency | Database/dependency | Section 4.2 | | Partial failures | Code bug | Section 4.3 | | Spike in errors | Traffic surge | Section 4.4 |

Mitigation Procedures

4.1 Service Completely Down

# Step 1: Check pod status
kubectl get pods -n payments

# Step 2: If pods are crash-looping, check logs
kubectl logs -n payments -l app=payment-service --tail=100

# Step 3: Check recent deployments
kubectl rollout history deployment/payment-service -n payments

# Step 4: ROLLBACK if recent deploy is suspect
kubectl rollout undo deployment/payment-service -n payments

# Step 5: Scale up if resource constrained
kubectl scale deployment/payment-service -n payments --replicas=10

# Step 6: Verify recovery
kubectl rollout status deployment/payment-service -n payments

4.2 High Latency

# Step 1: Check database connections
kubectl exec -n payments deploy/payment-service -- \
  curl localhost:8080/metrics | grep db_pool

# Step 2: Check slow queries (if DB issue)
psql -h $DB_HOST -U $DB_USER -c "
  SELECT pid, now() - query_start AS duration, query
  FROM pg_stat_activity
  WHERE state = 'active' AND duration > interval '5 seconds'
  ORDER BY duration DESC;"

# Step 3: Kill long-running queries if needed
psql -h $DB_HOST -U $DB_USER -c "SELECT pg_terminate_backend(pid);"

# Step 4: Check external dependency latency
curl -w "@curl-format.txt" -o /dev/null -s https://api.stripe.com/v1/health

# Step 5: Enable circuit breaker if dependency is slow
kubectl set env deployment/payment-service \
  STRIPE_CIRCUIT_BREAKER_ENABLED=true -n payments

4.3 Partial Failures (Specific Errors)

# Step 1: Identify error pattern
kubectl logs -n payments -l app=payment-service --tail=500 | \
  grep -i error | sort | uniq -c | sort -

Compatible Tools

Claude CodeCursor

Incident Runbook Templates

About

name: incident-runbook-templates description: "Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication." risk: critical source: community date_added: "2026-02-27"

Incident Runbook Templates

Do not use this skill when

Instructions

Use this skill when

Core Concepts

1. Incident Severity Levels

2. Runbook Structure

Runbook Templates

Template 1: Service Outage Runbook

2. Quick Health Checks

3. Initial Classification

Mitigation Procedures

4.1 Service Completely Down

4.2 High Latency

4.3 Partial Failures (Specific Errors)

Compatible Tools

Tags

Related Skills

MCP Builder Ms

Distributed Tracing

Homelab Network Setup

Container Orchestration

Devops Troubleshooter

Azure Keyvault Keys Ts