Stack 1 - Node 2 [Part 2]

Incident Report for Zoey

Resolved

Node2 is now fully restored. Our Engineers are monitoring the servers to ensure connectivity is maintained.

The root cause of this incident appears to be one of the three control (Node 1/2/3) clusters failed at around 8am ET on July 5, 2016 for, at this time, unknown reasons. Zoey has multiple computing clusters which we call "Stacks". Each Stack is controlled by three control nodes that provide critical functionality, such as network routing, to the underlying computing nodes in the cluster. While a single failure of a control cluster should not have caused downtime it appears to have been the case. Our Engineering team is currently in the process of rebuilding Stack 1 to conform to our newest architecture thereby resolving this type of problem in the future. Our expected rebuild of Stack 1 is by the end of August 2016 and should have no impact to our customers. We thank you for your patience during this incident and apologize for the frustration this caused.

Posted Jul 05, 2016 - 12:00 EDT

Update

We continue to troubleshoot the stability of Node 2. No further updates at this time.

Posted Jul 05, 2016 - 11:11 EDT

Monitoring

We have identified the core routing issue for Node2 and are now resolving it. Thank you for your patience.

Posted Jul 05, 2016 - 10:29 EDT