How Remote Work Leaders Should Respond to a Service Outage
When three of the major social media platforms are grouped under the same roof, what could go wrong?
We found out on Oct. 4 when Facebook suffered an outage, resulting in the social media giant going down for several hours, alongside Instagram and WhatsApp, two platforms also owned by Facebook.
Fortune estimated that the company lost almost $100 million in revenue during the outage. As individuals and businesses around the world clamored to understand what went wrong, many began to realize that if one of the largest platforms in the world could go down, the same could happen to their business.
While the losses may not be as astronomical for others, similar occurrences can still have adverse effects, and the event provided an opportunity for some interesting lessons.
What Happened When Facebook Was Down?
When Facebook disappeared from the internet, web infrastructure firm Cloudflare described the phenomenon like someone disconnecting all of the cables from data servers all at once. As a result, millions of people could not access any of Facebook’s applications, no matter what they did.
But what did this look like for businesses? For many, it wasn’t as terrible as it may have seemed.
“The biggest effect on our business was that our Facebook ads stopped working, which made no significant impact on our growth and performance,” said Snizhana Kolomiets, marketing manager at Brighton, England-based Better Proposals.
Essentially, the outage resulted in a major communication issue. “For our customers, the biggest issue was not being able to access our Facebook group, but they contacted us through other channels for any questions they had,” she said.
Still, while the Facebook outage seemed to be more of an inconvenience rather than a significant issue for most, it did provide a warning about what can happen when a business places all of its eggs in one basket.
Related Article: 2021 Business Lesson: Don't Slack on Your Outage Backup Plan
Remote Work Disaster Prep
The Facebook outage highlighted the dangers that could affect remote teams, and served as a timely reminder to have a clear disaster recovery plan. Here are some plan guidelines:
Create a Plan
First and foremost, remote organizations should create a crisis plan for what needs to be done if any IT outage occurs. As part of this plan, there should be instructions on where copies of critical information is stored and steps to take if an issue arises.
Start With Security Protocols
According to Catalino Vega III, security analyst at Bristow, Va.-based consulting company Kuma LLC, a plan should begin with the security protocols which will be put in place beforehand and how any immediate risks can be reduced.
“Things like separation of duties, business impact assessments, change control, and security reviews ensure a layered approach towards security,” he said.
While security may be a secondary thought for some members of an organization, the key is to have everything up-to-date and ready just in case.
Use Artificial Intelligence
Downtime equals dollars lost, said Phil Tee, CEO and co-founder at Moogsoft, a San Francisco-based platform for IT operations.
“When it comes to web applications that manifest thousands, perhaps millions, of microservices and component parts that change and swirl second to second, trying to keep track of that with rules or manual searching of logs is an impossible task,” he said.
He recommended businesses use artificial intelligence such as AIOps to help diagnose problems and come up with solutions faster.
Leverage Omnichannel Communication
Perhaps most critically, any disaster preparation plan should include alternatives for communication. “You need an omnichannel approach to communicating with your customers,” Kolomiets said.
It’s important that a business can be found by customers. Internally, that means having alternative channels in place if one of them suddenly becomes unavailable.
Related Article: Is Your Cloud Service Provider Trustworthy?
How to Recover If Problems Occur
Assess Damage and Determine the Cause
If a problem does occur, the first step is to see what damage has occurred and figure out the root cause of the problem. Time is money in these situations, so it's important to get to the bottom of the problem as quickly as possible.
Bring Team Members Together and Implement the Disaster Plan
Use alternative communication platforms to bring key team members together, review the recovery plan and determine the next steps.
“When teams are working remotely, and you can’t quickly convene in a conference room, a platform that can facilitate collaboration and communication is key to improving results,” said Tee.
After that, begin to alert internal and external stakeholders of the problem as necessary. If it’s only downtime for a few minutes, there might be no need to panic. But when outages reach hours or it's unclear when the problem will be fixed, it's critical to let staff and customers know. For internal teams, that means prioritizing work that can still be done and reorganize for what can’t be done.
Disaster can strike at any time, but having a plan and alternatives in place can make all the difference.