Root Cause
The issue was caused by a failure in the integration feature, which led to an overload in the system. An unoptimized job created thousands of subflows, quickly filling up memory and affecting system availability. The misuse of the integration was identified as the root cause, prompting the disabling of the problematic flow as an initial step to mitigate the issue.
Resolution
To resolve the issue, the team deactivated the problematic flow and partially cleared the queues. This initial solution aimed to prevent new calls and address the items stuck in the queue.
Action Plan
We will continue to monitor the system closely to ensure stability and prevent future issues. Our focus remains on continuously improving the customer experience by optimizing our processes and enhancing communication between teams.