Pipefy's web application is currently unavailable to all users
Incident Report for Pipefy
Postmortem

Authors: SRE and Support Teams

Date: 2020-05-06

Status: Resolved

Summary:

There was a platform outage that prevented users from accessing the web application from May 06 18:10 (UTC) to May 06 18:30 (UTC).

Impact:

Major - All Pipefy users were unable to access both the web application from May 06 18:10 (UTC) to May 06 18:30 (UTC). The outage has affected the entire user base.

Root Cause:

The identified root cause of this incident was defined as a modification that was made on Amazon Security Group Cluster. Due to a fail in properly versioning the change, it was overwritten on sync.

Detection and resolution:

The issue was first detected by our internal monitoring systems that triggered automated alerts and informed the team that our users were unable to properly access the application.

Messages from both internal and external customers were also received while the team had already begun investigating.

Once the root cause was identified, the issue was solved by manually inserting (and versioning) the new rule in the Security Group, which restored access and ensured the application was working properly.

Action plan: Preventive action items

1. Implementation of additional preventive alert structures to identify possible inconsistencies before they affect the users. Due date: 05-22-2020

Posted May 11, 2020 - 19:48 UTC

Resolved
The unavailability has been fixed and the access to Pipefy’s web application has been restored.
As soon as the investigation process is over, we’ll share further details about the causes, implemented fixes and preventive actions to be implemented.
Posted May 06, 2020 - 18:38 UTC
Monitoring
The technical team has identified the causes of the system unavailability and a fix has been applied to resolve the it.
Message:
The access to Pipefy’s web application has been restored. We are currently monitoring the system to ensure all features are working as expected. As soon as the monitoring process is over we’ll share further details about the causes, investigation and preventive actions to be implemented.
Posted May 06, 2020 - 18:32 UTC
Update
Pipefy’s technical team is still investigating the causes of the platform outage. A final diagnosis is still to be established. We’re sorry about the inconvenience this is causing and we ensure we’re working with all our resources to restore normal access and isolate the causes of this incident.
Posted May 06, 2020 - 18:22 UTC
Investigating
We are currently investigating the system unavailability and working towards restoring full access to all users as soon as possible.
Any further details about the system status, investigation and preventive actions to avoid future incidents will be shared as soon as available.
Posted May 06, 2020 - 18:15 UTC
This incident affected: Application.