Calendars not loading
Incident Report for CoSchedule
Postmortem

UPDATE: As you know, Coschedule experienced a major outage yesterday afternoon that resulted in significant data loss for our customers. At that time I reported that a team of engineers would be working through the night to restore as much data as possible. This morning I have some good news.

Our engineering teams was able to recover a majority of the most valuable data for our existing customers.

  • We believe that we were able to restore all content, WordPress posts, tasks, comments, marketing projects, social messages, and content created using the CoSchedule text editor or other editing tools like Google Docs and Evernote for all customers whose calendars were created before 2:00 am CST on November 16th.

  • If you had a new calendar that was created on November 16th between the hours of 2:00 am CST and 2:00 pm CST your data will still be missing, as it was not automatically restored during our overnight recovery operation. That said, select elements may still be manually retrievable. If you believe that you are still missing important data, please reach out to our support team immediately. We will do everything we can to help you.

  • Our engineering team will continue working on our restore and recovery efforts throughout the next several days. Additionally, we have also assembled a team to implement a series of infrastructure upgrades designed to prevent situations like this in the future.

We have every reason to believe that total prevention is possible.

I want to take a moment to say again that we are very sorry for any inconvenience, frustration, or undue stress that this event caused. We expected better of ourselves, and we failed. You should be able to assume that your data is 100% safe and secure on our service. Unfortunately, we did not live up to that expectation this time. We will do better.

Over the next several days our team will continue to do all we can to remedy any remaining issues that you have resulting from this event. We will also plan to update you on the infrastructure upgrades that are already in the works once they are complete. We broke your trust, but fully intend on restoring it over the next several days and weeks.

Justin Walsh CTO and Cofounder | CoSchedule

Posted Nov 17, 2017 - 09:41 CST

Resolved
This incident has been resolved.
Posted Nov 16, 2017 - 17:04 CST
Update
As you may be aware, CoSchedule was down for an extended period of time earlier today. I want to let you know that we have identified the source of the problem, and have applied a permanent fix. CoSchedule is back online. That said, unfortunately today’s incident did result in some data loss that you need to be aware of.

----------------------------------------------------
What You Need to Know:
----------------------------------------------------

Due to the nature of the incident, we were required to restore our system from a restore point of 2:00 am CST on November 16.

While we have reason to believe that we can recover some of the data that was lost over the next 24 hours, you need to be aware that significant data from your calendar may be missing, and may never be restored.

You should assume:
* Any NEW social messages, content, projects or tasks you may have created after 2:00 am CST on November 16 will be missing from your calendar and will need to be recreated.
* Any messages or content that were created before that 2:00 am CST on November 16 will still be on your calendar. If you are missing a WordPress post in CoSchedule, please open that post in WordPress and it will sync back to your calendar.
* Any social messages that were scheduled to publish between 2:00 pm and 4:00 pm CST on November 16 will need to be rescheduled.

We understand that this is extremely frustrating. We take your data, security, and trust in us very seriously, and while we know that companies always say that, please understand that we care deeply about our product, and the customers we serve. You rely on CoSchedule for your work and livelihood, and we do not take that for granted. We are already looking into ways to prevent this type of incident in the future.

----------------------------------------------------
What Happened:
----------------------------------------------------

I want to emphasise that today’s issue was related to a database failure, and was not the result of a security breach of any kind.

At approximately 2:00 pm CST we learned that significant amounts of data had become corrupted on our primary database. To prevent further damage, we immediately took CoSchedule offline.

For many years we have maintained a redundant (backup) database. This backup generally allows us to restore corrupted data instantly with minimal data loss. In this instance, however, we learned that data loss had happened there as well, rendering it useless.

With a few minutes, we were able to begin the restore process from our third layer of protection. This restore was provided by our overnight backup system which had a restore point of 2:00 am CST on November 16. This means that while we were able to restore the vast majority of data, we did experience an irreversible loss of some customer data.

We want to leave no doubt: This was our fault, and it should have never happened. I promise you, we will make sure that it doesn’t happen again.

We appreciate your help and patience while our team worked on a fix, and apologize for the downtime and resulting data loss. We take the responsibility for the data you trust us with very seriously and we are assessing the incident and are making changes to make sure something like this does not happen again.

For now, we are confident that we have mitigated the issue. We have no reason to believe that a recurrence of this issue is possible, however our engineering team will be monitoring things closely over the next few days.

If you have further questions or concerns, please feel free to reach out to our support team. support@coschedule.com
Posted Nov 16, 2017 - 17:04 CST
Update
The system has been restored and we are still monitoring the situation. We will publish a full report as soon as we finalize the details.
Posted Nov 16, 2017 - 16:04 CST
Update
We are continuing to implement a fix and will have another update in a few minutes.
Posted Nov 16, 2017 - 14:53 CST
Update
We are continuing to monitor the build for the fix. More details when the fix is live.
Posted Nov 16, 2017 - 14:52 CST
Monitoring
We are continuing to monitor the situation. More details to follow.
Posted Nov 16, 2017 - 14:28 CST
Identified
Some users are reporting not being able to load their calendars. We have identified the issue and are working on a fix.
Posted Nov 16, 2017 - 14:11 CST