Emails are not being sent from Pipefy
Incident Report for Pipefy
Postmortem

Service Degradation Post-Mortem

Authors: SRE, Security and Support Teams

Date: 2021-04-28

Status: Resolved 

Summary: 

Some users have experienced a partial outage of Pipefy's email-related features from Apr 28 15:58 (UTC) to Apr 28 16:34 (UTC).

This instability was caused by a scheduled migration with the third party vendor that handles emails in our platform started on Apr 18. The migration was planned and scheduled to be executed in stages to avoid any issues that could impact the users of these services. The team also had a fallback plan to quickly rollback changes in case of issues,

Due to a misinterpretation of the communications there were some misalignments in the actions that demanded resetting the account and setting up important DNS changes. Due to the mechanism called TTL ("Time to Live") that establishes how long DNS chances stay in cache before being updated, some users were able to use the feature normally within minutes while others that had TTL set for longer experienced issues for about an hour.

Impact: 

Major - Several users experienced partial outage on the email-related features from Apr 28 15:58 (UTC) to Apr 28 16:34 (UTC).

Root Cause: 

The identified root cause of this incident was defined as an operational error in the agreed upon procedures with the third party email service. 

Detection and resolution: 

The issue was detected by our internal monitoring system that triggered an alert and informed the team. We have also received reports from internal and external clients.

After the root cause was identified, our team was able to quickly connect with the vendor and apply the correct settings of the IPs to the sub accounts in the system.

Action plan: Preventive action items

1. Review and improve internal email errors monitoring systems to identify any abnormality instantly. Due date: 06/21

Posted May 10, 2021 - 17:41 UTC

Resolved
Pipefy email features are now stable.
Posted Apr 28, 2021 - 16:34 UTC
Monitoring
We have applied the fix and now the emails are being sent. Emails that were have been sent during the incident may arrive with some delay. No information have been lost.
Posted Apr 28, 2021 - 16:19 UTC
Identified
We have identified what cause the issue and now we are working to apply a fix.
Posted Apr 28, 2021 - 16:11 UTC
Investigating
We are currently investigating this issue.
Posted Apr 28, 2021 - 16:04 UTC
This incident affected: Apps (Email messaging, Email template).