Authors: SRE, Data Engineering and Support Teams
The performance degradation was caused by an unusually high number of requests generated by a single operation. Those requests triggered automations, which caused exponential growth in our processing queue and ended up causing timeouts in the and affecting the performance of the reports.
Major - The performance degradation on the reports feature happened from Jul 25 15:26 (UTC) to Jul 25 19:11 (UTC).
The root cause of what caused the initial overload in our processing queues is under investigation and yet to be determined.
However, due to our initial findings we strongly believe that the changes that’ll be implemented soon in the application’s non-relational database will be enough to prevent similar incidents from happening again in the future.
Detection and resolution:
The issue was detected by our internal monitoring system that triggered automated alerts and informed the team.
Action plan: Preventive action items
1. Modifications in our Database configuration in order to increase overall performance. Due date: 08/31/2020