# Alert Rule Standards
## Rule
All Grafana alert rules MUST have severity labels, for-duration, runbook links, and follow consistent naming patterns. Alerts MUST route based on severity level.
## Naming Convention
```
[Service] Symptom Description
```
Examples:
- "[API] High Error Rate"
- "[Database] Connection Pool Exhausted"
- "[Worker] Job Queue Backlog Growing"
## Required Labels
| Label | Values | Purpose |
|-------|--------|---------|
| severity | critical, warning, info | Routing priority |
| team | backend, frontend, infra | Ownership |
| service | api, worker, database | Affected service |
| environment | production, staging | Environment scope |
## Required Annotations
| Annotation | Content |
|-----------|---------|
| summary | "{{ $value }} — brief description of what is wrong" |
| description | Detailed explanation with affected service and impact |
| runbook_url | Link to remediation steps |
## For-Duration Requirements
| Severity | Minimum for-duration |
|----------|---------------------|
| critical | 5 minutes |
| warning | 10 minutes |
| info | 15 minutes |
## Routing Rules
| Severity | Channel | Repeat Interval |
|----------|---------|----------------|
| critical | PagerDuty + Slack | 30 minutes |
| warning | Slack | 4 hours |
| info | Email digest | 24 hours |
## Examples
### Good
```yaml
- title: "[API] High Error Rate"
for: 5m
labels:
severity: critical
team: backend
service: api
annotations:
summary: "Error rate is {{ $value }}% (threshold: 5%)"
runbook_url: https://wiki.example.com/runbooks/api-error-rate
```
### Bad
```yaml
- title: "Alert 1" # Non-descriptive name
for: 0s # No for-duration
# No severity label
# No runbook link
```
## Enforcement
Review alert rules in provisioning PRs.
Audit active alerts monthly for noise/relevance.
Test alert routing in staging environment.