Facebook and its associated services experienced a six-hour outage on Monday.
Facebook, Instagram, WhatsApp, and Oculus suffered a major outage over the course of Monday, Oct. 4. Beginning just before noon EST, the outage lasted six hours, preventing users from accessing their social media profiles, messaging apps, and Oculus-based services.
How Did This Happen?
In a blog post issued Monday night, Facebook reported that the outage was the result of a configuration change to its routers.
This “networking traffic issue” resulted in a cascading disruption to the way the company’s data centers communicate. What should have been a routine Border Gateway Protocol (BGP) update wiped out the routing information that allows other networks to find sites like Facebook and Instagram?
This was especially disruptive to Facebook’s internal company processes because they all rely on their own platform. Staff members communicate using Facebook messenger, which was unavailable while they were trying to determine a resolution. This only further disrupted their recovery processes.
3 Lessons To Learn From The Facebook Outage
This is the worst outage Facebook has experienced since 2019 when they were offline for more than a day. While Facebook services are nonessential, they do offer a low-cost platform for small businesses, which are undoubtedly hit the hardest during outages like these.
Here are three key takeaways:
Routine processes cannot be overlooked: If something as minor as a BGP update can take a business like Facebook offline for a quarter of a day, consider what is at stake at your business. Anytime your staff interacts with business systems and data, you need to ensure they’re doing so carefully. One wrong click could result in lost data or a period of downtime.
Prepare for the worst: No matter how big or how resource-rich a business is, it can still experience crippling downtime. Make sure to invest the right time and resources in developing your business continuity and disaster recovery processes.
Always have a plan: Facebook has learned not to rely on its own platform as an internal messaging solution — do you rely too heavily on one tool as well? Make sure you have an alternative ready to go so that you don’t have to start from scratch when the worst occurs.
As always, events like this prove that IT is an error-prone and complicated part of the business world. You need to plan ahead to both prevent downtime and mitigate the effect it has on your business when it can’t be avoided.