One Typo Took Down Facebook for 6 Hours — The 2021 BGP Outage

What is BGP and why does it matter?

Before diving into the outage, you need a quick mental model of BGP — the Border Gateway Protocol. BGP is the routing protocol that holds the internet together. Every major network on earth — ISPs, cloud providers, enterprise networks, Facebook — is identified by an Autonomous System Number (ASN). Facebook’s ASN is AS32934.

Each Autonomous System uses BGP to announce to its neighbors: “Hey, if you want to reach IP addresses in these prefixes, send the traffic to me.” Those announcements propagate across the entire internet in a matter of minutes. When a network withdraws its BGP announcements, every other network stops knowing how to reach it. It simply vanishes from the internet’s routing table — as if it never existed.

This is exactly what happened to Facebook on October 4, 2021. And it happened because of a single maintenance command.

The sequence of events

11:51 AM UTC — The command goes out

Facebook’s network engineers were performing routine maintenance on the backbone routers that interconnect Facebook’s global data centers. The command they issued was intended to assess the capacity of their backbone fiber links. Instead, due to a bug in the audit tool, it sent a command that took all of Facebook’s backbone routers offline simultaneously.

When those routers went down, Facebook’s global network fractured into isolated islands. And with the backbone down, the Border Gateway Protocol sessions that Facebook’s routers maintained with external internet providers also dropped. The BGP sessions tear down. And when a BGP session tears down, the routes go with it.

Within seconds — BGP withdrawals propagate worldwide

Facebook’s IP prefixes — representing millions of IP addresses — were withdrawn from the global BGP routing table. Within about two minutes, network operators worldwide saw the changes propagate. Tools like BGPlay and RIPE’s routing data showed Facebook’s prefixes simply disappearing:

# What the global routing table looked like before:
# 129.134.0.0/17  via AS32934 (Facebook)
# 157.240.0.0/17  via AS32934 (Facebook)
# 31.13.24.0/21   via AS32934 (Facebook)

# After the BGP withdrawal — gone.
# No route to Facebook's IP space exists anywhere on the internet.

DNS falls next

Here’s where it gets elegantly catastrophic. Facebook runs its own authoritative DNS servers for facebook.com, instagram.com, and whatsapp.com. Those DNS servers sit on IP addresses that are part of Facebook’s now-withdrawn BGP prefixes. So when your browser asked “what’s the IP address for facebook.com?”, the query reached the world’s DNS resolvers — but those resolvers couldn’t reach Facebook’s authoritative nameservers, because there was no longer a route to get there.

DNS queries timed out. Every user on the planet got SERVFAIL. Browsers showed “This site can’t be reached.” Not an HTTP error. Not a 503. Just nothing — as if facebook.com had never been registered.

The engineers couldn’t log in to fix it

This is the detail that makes the story truly remarkable. Facebook’s internal operations tools — the dashboards, the remote access systems, the configuration management platforms — were all hosted inside Facebook’s own network. With the backbone down and BGP routes withdrawn, engineers sitting at home or in offices around the world couldn’t reach any of Facebook’s internal systems.

It gets worse. The physical access control systems at Facebook’s data centers also ran on Facebook’s internal network. Engineers who drove to the Santa Clara data center found that their employee badges didn’t work. The door readers couldn’t phone home to validate credentials. Some engineers reportedly had to be physically escorted through security checkpoints by staff who could manually verify their identity.

The network engineers who needed to fix the routers couldn’t get into the building to reach the routers. The routers that were down were the same ones controlling the doors.

5:58 PM UTC — BGP routes return

Once engineers gained physical access to the backbone routers and could apply configuration changes directly via console cables, the fix itself was not complicated: re-advertise the BGP routes. As soon as the routes were injected back into the global routing table, they propagated across the internet in minutes. DNS started resolving again. Services came back online. The outage had lasted approximately 5 hours and 51 minutes.

The real-world impact

The scale of a Facebook outage is hard to overstate. At the time, Facebook had approximately 3.5 billion monthly active users across its family of apps. The October 2021 outage affected all of them simultaneously:

Facebook, Instagram, WhatsApp, Messenger, and Facebook Workplace were all unreachable for the full duration.
Businesses that used WhatsApp for customer communications (especially prevalent across Asia, Europe, and Latin America) lost their primary communication channel.
Millions of websites use “Login with Facebook” — those login flows also broke, locking users out of third-party services entirely.
Facebook’s stock dropped approximately 5% during the outage, wiping around $7 billion from Mark Zuckerberg’s net worth in a single day.
Revenue losses were estimated at over $100 million given Facebook’s roughly $86 billion annual revenue at the time.

Meanwhile, Telegram reported 70 million new sign-ups during the outage period. Twitter usage spiked. The internet found a way, as it always does.

What Facebook’s post-mortem revealed

Facebook published a detailed post-mortem shortly after the outage. The key findings were:

The maintenance command had a bug: it was supposed to assess backbone capacity but instead issued a command that took the routers offline.
A second-order safety check that should have prevented a command from affecting all backbone routers simultaneously had been inadvertently disabled during a previous software update.
The loss of the backbone also broke Facebook’s internal out-of-band management network, which was supposed to survive backbone failures but had a dependency on the same infrastructure.
The BGP sessions to external peers were torn down automatically when the backbone routers lost power, and the routers were configured to withdraw routes on session teardown rather than hold them.

Every one of these is a well-understood failure mode in network engineering. The tragedy is not that these risks were unknown — it’s that the safeguards had silently eroded.

Why this matters for your certification

If you’re preparing for CCNA 200-301, CompTIA Network+ N10-009, or AWS Advanced Networking Specialty ANS-C01, the Facebook BGP outage is a textbook-perfect case study. Here’s how it maps to exam domains:

CCNA and Network+: BGP fundamentals

Both CCNA and Network+ cover BGP at a conceptual level. Expect questions about:

What BGP does: BGP is an Exterior Gateway Protocol (EGP) that exchanges routing information between Autonomous Systems. It is the only EGP used on the public internet today.
BGP route withdrawal: When a BGP session drops, the routes learned via that session are withdrawn from the routing table. This is exactly what caused Facebook’s IP prefixes to disappear.
The relationship between BGP and DNS: DNS resolution depends on IP reachability. If the authoritative DNS servers for a domain are unreachable at the IP layer, DNS fails entirely — even if the DNS servers themselves are running perfectly.
AS numbers: Every BGP-speaking network is identified by a 16-bit or 32-bit ASN assigned by IANA/regional registries. Facebook is AS32934.

AWS ANS-C01: BGP in hybrid and cloud networks

The AWS Advanced Networking Specialty exam goes deep on BGP. The Facebook outage illustrates several ANS-C01 exam scenarios:

AWS Direct Connect uses BGP to exchange routes between your on-premises network and your AWS VPC. If BGP sessions drop — due to a misconfiguration, a physical link failure, or a maintenance event — your hybrid connectivity fails the same way Facebook’s did.
Route propagation in Transit Gateways relies on BGP under the hood. Understanding when routes are advertised versus withdrawn is critical for designing resilient multi-VPC architectures.
BGP local preference and AS path prepending are the primary tools for influencing inbound and outbound traffic in hybrid AWS environments — topics that appear directly on the ANS-C01 exam.

The operational lessons every networking engineer should internalize

Never let your out-of-band management network share fate with your production network. Facebook’s OOB management ran on the same backbone it was managing. When that backbone failed, the recovery path disappeared with it.
Blast radius matters. A single maintenance command should never be able to affect all routers simultaneously. Rate-limiting, canary deployments, and staged rollouts apply to network configuration just as much as software deployments.
Physical access is a recovery path. Facebook engineers had to drive to a data center because remote access was gone. Ensuring physical access procedures work independently of the network being managed is an operational requirement, not a nice-to-have.
Test your safeguards. The second-order check that should have prevented the command from affecting all routers had been silently disabled. Safety mechanisms that aren’t regularly tested eventually stop being safety mechanisms.

The bigger picture

The Facebook BGP outage sits alongside the 2003 Northeast blackout and the 2010 Deepwater Horizon disaster as a masterclass in how cascading failures work. Each individual component — the maintenance command, the missing safety check, the out-of-band dependency, the badge readers — represented a manageable risk in isolation. Combined, they produced a catastrophic outcome that took the world’s largest social network offline for the better part of a working day.

BGP is often described as running the internet on “trust and good intentions” — it was designed in 1989 as a protocol between networks that agreed to exchange routes voluntarily, with no cryptographic authentication of route announcements. BGPsec and RPKI (Resource Public Key Infrastructure) exist to address this, and the Facebook incident accelerated industry adoption of RPKI route origin validation. But as of 2026, the majority of internet routing still runs on the honor system.

For anyone studying networking — whether for a certification or for their career — the core lesson is this: BGP is powerful precisely because it is simple. Announce a prefix, and the world routes to you. Withdraw it, and you vanish. Respect that power accordingly.