Pipefy is currently facing latency, causing processing delays
Incident Report for Pipefy
Postmortem

Service Degradation Post-Mortem

Authors: SRE, Security and Support Teams

Date: 2021-04-19

Status: Resolved 

Summary: 

The search mechanism (ElasticSearch) used by some of Pipefy's features (Application, Advanced Reports, Automation, Live updates, and Filters) was unstable from Apr 19 20:08 (UTC) to Apr 20 12:44 (UTC).

It was first identified in Pipe and Company reports that presented higher load times than usual which alerted the internal teams to check the internal controls that detected that the requests were not being processed. While acting to minimize the impact on the users, the infrastructure team contacted the vendor for further investigation.

Impact: 

Major - Some users experienced performance degradation in the affected features listed above from Apr 19 20:08 (UTC) to Apr 20 12:44 (UTC).

Root Cause: 

The identified root cause of this incident was defined as the sudden unavailability of ElasticSearch.

Detection and resolution: 

The issue was detected by our internal monitoring systems that show application failures as well as reports from both internal and external clients about unexplained error messages.

Action plan: Preventive action items

1. Further investigation with the AWS team. Due date: 05/2021

2. Improvements on Reports and Filters. Due date: 30/2021

Posted May 10, 2021 - 21:08 UTC

Resolved
The system latency has been fixed and the performance of the platform has been fully restored.
As soon as the investigation process is over, we’ll share further details about the causes, implemented fixes and preventive actions to be implemented.
Posted Apr 20, 2021 - 00:45 UTC
Monitoring
Application operational. Filters on reports will be delayed for about 2 hours.
Posted Apr 19, 2021 - 23:54 UTC
Update
We are continuing to work on a fix for this issue.
Posted Apr 19, 2021 - 23:38 UTC
Update
We are continuing to work on a fix for this issue.
Posted Apr 19, 2021 - 23:10 UTC
Update
We are continuing to work on a fix for this issue.
Posted Apr 19, 2021 - 22:39 UTC
Update
We are continuing to work on a fix for this issue. Any further details about the system status, investigation and preventive actions to avoid future incidents will be shared as soon as available.
Posted Apr 19, 2021 - 22:10 UTC
Update
We are continuing to work on a fix for this issue.
Posted Apr 19, 2021 - 21:42 UTC
Update
The issue is still being investigated and we're working on a fix.
Posted Apr 19, 2021 - 21:34 UTC
Update
We are continuing to work on a fix for this issue. Any further details about the system status, investigation and preventive actions to avoid future incidents will be shared as soon as available.
Posted Apr 19, 2021 - 21:14 UTC
Identified
The technical team has identified the causes of the latency/slowness and is currently working towards restoring optimal system speed as soon as possible.
Any further details about the system status, investigation and preventive actions to avoid future incidents will be shared as soon as available.
Posted Apr 19, 2021 - 20:52 UTC
Investigating
We are currently investigating the causes of the latency and working towards restoring full platform performance as soon as possible.
Any further details about the system status, investigation and preventive actions to avoid future incidents will be shared as soon as available.
Posted Apr 19, 2021 - 20:34 UTC
This incident affected: Application, Automation and Components (Advanced Reports, Live updates, Filters).