Service Degradation Post-Mortem
Authors: SRE, Security and Support Teams
Date: 2021-04-19
Status: Resolved
Summary:
The search mechanism (ElasticSearch) used by some of Pipefy's features (Application, Advanced Reports, Automation, Live updates, and Filters) was unstable from Apr 19 20:08 (UTC) to Apr 20 12:44 (UTC).
It was first identified in Pipe and Company reports that presented higher load times than usual which alerted the internal teams to check the internal controls that detected that the requests were not being processed. While acting to minimize the impact on the users, the infrastructure team contacted the vendor for further investigation.
Impact:
Major - Some users experienced performance degradation in the affected features listed above from Apr 19 20:08 (UTC) to Apr 20 12:44 (UTC).
Root Cause:
The identified root cause of this incident was defined as the sudden unavailability of ElasticSearch.
Detection and resolution:
The issue was detected by our internal monitoring systems that show application failures as well as reports from both internal and external clients about unexplained error messages.
Action plan: Preventive action items
1. Further investigation with the AWS team. Due date: 05/2021
2. Improvements on Reports and Filters. Due date: 30/2021