Root Cause
The migration performed on Sunday involved moving from one cluster to another, where some data types were changed and columns removed. This affected database statistics and execution plans, leading to poor performance.
Resolution
The solution involved increasing the sampling range significantly, which improved understanding and execution plans. Temporary measures included disabling a database feature globally and pre-loading indexes into memory. Once the permanent fix was applied, the feature was re-enabled.
Action Plan
Initial temporary solutions brought the system back online by disabling a database feature and caching indexes to improve queries. Sampling range was increased for better execution plans. Future actions include improving alerts to monitor database health, increasing load tests before migrations