Digital workplaces help teams remain productive, connected and informed. That is, until the system goes down.
A Microsoft 365 outage in late November reminded us of our reliance on these SaaS platforms and why a contingency plan is needed to avoid work grinding to a halt. To be clear: the Microsoft outage is only one case among many — all cloud-based solutions are susceptible.
November's Microsoft 365 Outage
Microsoft 365 experienced a significant outage on Nov. 25 that disrupted key applications and services:
- Exchange Online: People couldn't access emails via Outlook on the web or desktop.
- Microsoft Teams: The outage blocked access to calendars and core functionalities.
- SharePoint Online: People reported issues with document management and sharing.
- Outlook: Many users struggled to send emails and access calendars.
The disruptions affected multiple regions, including the U.S., Europe and India. By Nov. 26, Microsoft announced the services were largely restored, with approximately 98% of affected environments fixed by noon on Nov. 25. However, some users continued to experience delays in email delivery. The outage followed a similar major incident in July 2024, attributed to a DDoS attack that affected multiple Microsoft services.
In that case, communication apps were down for four hours or longer. The outage left organizations scrambling to function without these crucial tools and apps.
Have a Contingency Plan in Place
The November outage was bad, admits MeetAmi Innovations CTO Vaclav Vincalek, but continues that organizations should have already had contingency plans in place.
Robust communication protocols, resilient IT infrastructure and leadership transparency will help organizations navigate such events, said Vincalek. In terms of communications specifically, he advised companies to have alternative tools in place to maintain team collaboration and to opt for phone calls, SMS or social media platforms like LinkedIn for external communications, depending on the urgency and confidentiality of the situation.
“Leadership plays a vital role in maintaining employee morale through transparent updates and reassurances,” he said. "Encouraging teams to focus on tasks that do not depend on the affected platform, while fostering a calm and solution-oriented atmosphere, further helps keep productivity on track."
Repeated outages can erode employee morale, frustrate customers and result in financial losses, he continued. Long-term, organizations may face strategic decisions about continuing with platforms prone to frequent downtime.
"System outages like Microsoft 365's recent disruptions highlight the critical need for resilience in digital infrastructure, clear communication and proactive planning,” he added. “Organizations that invest in these areas can better navigate such challenges and minimize their impact on operations.”
Related Article: Lessons on Legacy Tech From the Delta Airlines Outage
A Two-Sided Approach to Resilience
Resilience isn’t about eliminating downtime — it’s about minimizing its impact, JoggAI founder Anbang Xu told Reworked. The most effective organizations approach resilience as a proactive process rather than a reactive one, he continued.
Organizations should design infrastructure with redundancy and adaptability in mind, he said. This involves using multiple cloud providers to ensure failover capabilities as well as training teams to shift workflows seamlessly between tools.
He shared JoggleAI's policy as an example. If their project management system goes offline, they have templates and workflows ready to deploy in an alternative platform, such as Notion or even offline spreadsheets.
Its an approach Splashtop CEO and co-founder Mark Lee also advocates for, suggesting organizations proactively sync or replicate critical content to local computers to maintain business continuity and minimize downtime.
The emergence of generative AI has now made it possible to use AI-driven monitoring tools to track and prepare for outages, continued Xu. “Predictive analytics can identify potential system failures before they occur, allowing businesses to intervene early,” Xu said, but he reiterated tools were only one part of the plan. Resilience also includes training employees to adapt quickly, empowering them to make decisions independently, and ensuring every team member knows the fallback plan.
“In my opinion, resilience isn’t just technical, it’s cultural. Teams that see challenges as opportunities will weather outages far better than those who panic,” he added.
Related Article: Building a Resilient Organization for Turbulent Times
The Role of Leadership
Prolonged losses of service require leaders to balance employee morale and motivation with empathy and clear guidance, Rivermate CEO Lucas Botzen advised.
“Leaders should first acknowledge the frustrating reality that such disruptions cause and keep employees updated on the situation to reassure them the issue is being worked on,” he said. They should then switch focus to tasks that can be performed without the affected systems.
A pre-existing investment in contingency plans helps organizations weather these outages, Botzen said. He seconds many of the suggestions Xu and Vincalek shared, namely having alternative tools and platforms in place, preparing employees with regular training sessions and maintaining detailed incident response documentation. A full risk assessment should be conducted on a periodic basis to identify vulnerabilities and reinforce critical systems.
Organizations must prioritize redundancy and regular system testing to develop more resilient digital infrastructure. Implementing backup servers, secondary productivity tools and failover systems ensure essential functions continue during outages.
Repeated system failures can have serious long-term consequences, including diminished employee trust in leadership, lower morale and potential reputational damage with clients and stakeholders. In financial terms, prolonged outages could result in lost revenue and higher operational costs to mitigate the impact, Botzen added.