Log Outs Occurring due to AWS Session Timer Issue
Incident Report for foreUP
Postmortem

Our Linux platform is configured to update automatically to help prevent problems and ensure our operating systems have the latest security updates and protection. A recent update occurred on Thursday (2 Feb) morning. The Saturday following the update some minor behavioral issues began to surface related to user sessions.

We immediately began to investigate the issues and were able to identify that they arose every time a new server was added to our web cluster. The problem would disappear after the server had been online for about 15 minutes. It was difficult to identify the root cause earlier as we automatically add servers at 4:00 and 6:00 am during non peak hours in preparation for busier traffic throughout the day. This means that most people likely never had or noticed any issues previously because the servers would be fully operational by peak hours. On Saturday however, with warmer weather, we had an increased surge of traffic that triggered our cluster to add two additional servers at around 10 AM. These two servers, for roughly 15 minutes, were causing these session errors.

During the access issue on Saturday our engineering team was working hard to identify and resolve the issues. Not yet knowing the root cause of the issue they spun up 6 new servers and killed the old servers in an attempt to fix the problem. The result ended up making things worse as all the 6 new servers would be problematic for 15 minutes. The engineering team continued to investigate any possible causes and eventually the issue subsided as the servers had become operational for more than 15 minutes.

Working into the night on Saturday the engineering team identified the problem was a bug in the latest Amazon AWS platform update. AWS had disabled the automatic time sync which caused all the new servers to be out of sync with the master clock by about 1 second. After 20 minutes they'd resync the clock and the problem would fix itself. We reported the bug to Amazon AWS and received confirmation from them that it was an actual issue and their development team would begin working out a resolution. We are currently awaiting that resolution.

What has foreUP done: 1. Bug reported to AWS. 2. Rolled back all clusters to version 2.3.0 3. Turned off automatic updates and will check alternate solutions with AWS support once the issue has been rectified by Amazon.

Posted Feb 09, 2017 - 10:24 MST

Resolved
We have identified the issue being related to do with individual instances not being able to verify correctly the session id. We have resolved the issue, have been monitoring the situation, and all is functioning as intended. Please use foreupsoftware.com moving forward.

If you have further issues please contact support@foreup.com.
Posted Feb 04, 2017 - 13:04 MST
Identified
The issue has been identified and a fix is being implemented.
Posted Feb 04, 2017 - 11:56 MST
Update
You can continue operating with no issue by going to app.foreupsoftware.com temporarily.

The engineering team have added more server support to help reduce the issue and are still working to rectify the main issue on foreupsoftware.com. You will be able to return to foreupsoftware.com shortly.
Posted Feb 04, 2017 - 11:56 MST
Investigating
We are investigating reports that the software is automatically logging people out while they are trying to perform certain functions. The engineering team is aware and working to diagnose and resolve the issue as quickly as possible.
Posted Feb 04, 2017 - 11:49 MST