Delayed event processing

Incident Report for Kisi

Resolved

The Event Service—Kisi’s core system for capturing and forwarding events—experienced a degradation after a recent feature deployment. An unclosed-connection bug gradually exhausted CPU and memory resources, throttling database writes. Our monitoring detected elevated error rates. A redeployment addressed the bug, but due to a large backlog some delayed event webhooks were unfortunately dropped. However, no events were lost.

Service Impact:

- Event visibility in the Kisi Admin Dashboard and via API was degraded during the degradation window (new events were not yet persisted).
- Event Webhook deliveries and report generation were paused until full recovery.

Next Steps:

- We will adapt our system to process all event webhooks and event related jobs regardless of backlog size
- We will isolate our event pipeline to avoid spillover effects from new functionality
Posted Jun 02, 2025 - 16:00 UTC