Last weekend, we performed a datacenter fail-over exercise to validate our disaster recovery capabilities. As a result, today (Monday April 15th), some of our customers experienced application performance issues related to the core application, reporting, collaboration features, and workflow rules. Three of these issues were related to the datacenter fail-over. We have responded to each of these issues with utmost urgency, and identified and addressed the root cause for each one. We understand the pain and frustration these issues have caused and are working to avoid future performance issues.
We know that platform stability and reliability are key to earning your trust and keeping it. We continue to make significant investments in Smartsheet’s reliability. Specifically, we’re prioritizing investments in monitoring, alerting, and telemetry tools that will help prevent issues like those we experienced today from occurring in the future. We also have dedicated members on our site reliability team who are committed to building additional automated scripting and QA validation for all of our infrastructure code. Our intent is to continuously improve the repeatable and reliable infrastructure components upon which to build and run our application.
For updates on our progress in resolving these performance issues, please see our Status page.