Incident Management
All Incidents
8 incidents stored in Aurora memory
1
Active
1
Investigating
5
Resolved
4
Critical
8 results
| Incident | Severity | Status | Service | Owner | MTTR | Created | |
|---|---|---|---|---|---|---|---|
INC-201 Production API Gateway Timeout Storm #redis#timeout#api-gateway | critical | active | API Gateway | AS | ongoing | Jun 17 | View → |
INC-200 Worker Pod OOMKilled Loop #oomkilled#memory-leak#kubernetes | high | investigating | Payment Service | PS | ongoing | Jun 17 | View → |
INC-199RCA Aurora DB Connection Saturation #aurora#postgresql#connection-pool | critical | resolved | User Service | RM | 45m | Jun 16 | View → |
INC-198RCA CDN Cache Invalidation Failure #cdn#cache#frontend | medium | resolved | CDN / Frontend | AR | 35m | Jun 15 | View → |
INC-197RCA Elasticsearch Index Corruption #elasticsearch#index#disk | high | resolved | Search Service | VN | 90m | Jun 14 | View → |
INC-196RCA Kubernetes Node Group Eviction Storm #kubernetes#spot-instances#eviction | critical | resolved | Kubernetes Platform | SR | 75m | Jun 12 | View → |
INC-104RCA Redis Memory Exhaustion — Payment Cache #redis#memory#cache | critical | closed | Payment Cache | AS | 90m | May 20 | View → |
INC-195RCA S3 Rate Limiting on Log Ingestion #s3#rate-limiting#logging | medium | resolved | Log Pipeline | MK | 45m | Jun 10 | View → |