Resolved -
The Event Service—Kisi’s core system for capturing and forwarding events—experienced a degradation after a recent feature deployment. An unclosed-connection bug gradually exhausted CPU and memory resources, throttling database writes. Our monitoring detected elevated error rates. A redeployment addressed the bug, but due to a large backlog some delayed event webhooks were unfortunately dropped. However, no events were lost.
Service Impact:
- Event visibility in the Kisi Admin Dashboard and via API was degraded during the degradation window (new events were not yet persisted).
- Event Webhook deliveries and report generation were paused until full recovery.
Next Steps:
- We will adapt our system to process all event webhooks and event related jobs regardless of backlog size
- We will isolate our event pipeline to avoid spillover effects from new functionality
Jun 2, 16:00 UTC