WFH made Facebook’s 7-hour outage worse with around 75% of its 60,000 workforce still not in the office to fix it: Insiders say blackout disabled building security passes and internal comms

  • Monday’s outage caused all of Facebook’s services to go offline for several hours
  • Staff were unable to communicate with each other or access the office buildings
  • An insider said working from home made it more difficult to fix the issues 

Facebook insiders claim the tech giant’s seven-hour outage that caused all of its services to go offline was exacerbated by employees working from home as staff were locked out of remote messaging systems and company buildings.

The social media company, which has been leading the charge for post-pandemic remote working, had an ‘internal issue’ which forced Facebook, Instagram, WhatsApp and Facebook Messenger to stop updating.

Facebook explained that the problem was caused by a faulty update that was sent to their core servers which effectively disconnected them from the internet.

Engineers were rushed to the company’s data centres in Santa Clara, California, to reset the servers manually, but it took until 5.45pm Eastern Time (10.45pm GMT) for them to be reconnected.

The same update also disabled all the systems Facebook needed to fix the issue, from digital engineering tools to messaging services, even key-fob door locks. 

Jonathan Zittrain, director of Harvard’s Berkman Klein Centre for Internet and Society, said: ‘Facebook basically locked its keys in its car.’

Employees at the company’s Menlo Park, California, campus had trouble entering buildings because the outage had rendered their security badges useless, while other staff already inside the buildings were locked out of conference rooms, forcing them to communicate via text messages and Outlook emails. 

An insider said on Reddit: ‘There are people now trying to gain access to the peering routers to implement fixes, but the people with physical access is separate from the people with knowledge of how to actually authenticate to the systems and people who know what to actually do, so there is now a logistical challenge with getting all that knowledge unified.

‘Part of this is also due to lower staffing in data centres due to pandemic measures.’ 

Engineers were rushed to the company’s data centres in Santa Clara, California (pictured), to reset the servers manually

Employees at the company’s Menlo Park (pictured) campus had trouble entering buildings because the outage had rendered their security badges useless

A person claiming to be a Facebook employee said on Reddit that high numbers of staff working from home made the problem worse. The account was later deleted 

Employees use Facebook services to communicate with each other and its internal messaging platform Workspace was also down, leaving many unable to do their jobs and discuss how to fix the issue while working from their homes.

Facebook engineers said in a statement: ‘The underlying cause of this outage impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem.

‘Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. 

‘This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.’

Facebook has around 60,000 employees globally and announced in May that it would be operating with a 25 per cent capacity in its offices after reopening in July as Covid infections fell.

They planned to have 50 per cent of the workforce in the office by September and offer 100 per cent in October, but a surge in Delta infections caused bosses to push back the return date.

Now the company is not expecting to fully reopen for staff until 2022.  

They promised to provide ample notice before forcing staff to show up in person, which they will only be able to do after being fully vaccinated. 

Facebook has 47 locations across North America but many are smaller data sites, while 15,000 people, around a quarter of the total workforce, are based in the Menlo Park headquarters.

Mark Zuckerberg has pledged to move to a working from home setup within the coming years and predicts that as much of half of the workforce will be remote within the next five to ten years.

The CEO said he would start ‘aggressively opening up remote hiring’, telling the Verge: ‘We’re going to be the most forward-leaning company on remote work at our scale.

Mark Zuckerberg has pledged to move to a working from home setup within the coming years

WHAT IS THE DOMAIN NAME SYSTEM AND HOW DOES IT WORK? 

The Domain Name System, or DNS, is the directory of the internet.

Whenever you click on a link, send an email, open a mobile app, often one of the first things that has to happen is your device needs to look up the address of a domain. 

There are two sides of the DNS network: the authoritative side, ie webpages and other content, and the resolver side, devices that are trying to access this content.

Every domain needs to have an authoritative DNS provider, servers which store DNS records. Amazon, Cloudflare and Google are among the bigger names in authoritative DNS server provision. 

On the other side of the DNS system are resolvers. Every device that connects to the Internet needs a DNS resolver. 

By default, these resolvers are automatically set by whatever network you’re connecting to. 

So, for most Internet users, when they connect to an ISP, or a WiFi hot spot, or a mobile network, the network operator will dictate what DNS resolver to use.

The problem is that these DNS services are often slow and don’t respect your privacy. 

What many Internet users don’t realise is that even if you’re visiting a website that is encrypted, indicated by the green padlock in your browser’s address bar, that doesn’t keep your DNS resolver from knowing the identity of all the sites you visit. 

That means, by default, your ISP, every WiFi network you’ve connected to, and your mobile network provider have a list of every site you’ve visited while using them. 

‘We need to do this in a way that’s thoughtful and responsible, so we’re going to do this in a measured way. But I think that it’s possible that over the next five to 10 years — maybe closer to 10 than five, but somewhere in that range — I think we could get to about half of the company working remotely permanently.’ 

Monday’s outage caused Facebook shares to plunge 5 per cent amid the outage, wiping some $48billion off its value – though the slide started before the tech problems, in-part due to a whistleblower accusing the company of putting profits before safety in a 60 Minutes program broadcast Sunday night. 

It marks the firm’s second-worst day on the markets ever.

In addition to the stock market slide, Facebook likely missed out on at least $67million in direct revenue and possibly as much as $102million during the outage – based on average hourly earnings across 2020 and projections of its 2021 hourly earnings from Q1 and Q2 results.   

It is also estimated the company lost as much as $545,000 in US ad revenue an hour during the outage.

Facebook was already in the throes of a separate major crisis after whistleblower Frances Haugen, a former Facebook product manager, provided The Wall Street Journal with internal documents that exposed the company’s awareness of harms caused by its products and decisions. 

Haugen went public on CBS’s ’60 Minutes’ program Sunday and is scheduled to testify before a Senate subcommittee Tuesday.

Haugen had also anonymously filed complaints with federal law enforcement alleging Facebook’s own research shows how it magnifies hate and misinformation and leads to increased polarization. It also showed that the company was aware that Instagram can harm teenage girls’ mental health.

The Journal’s stories, called ‘The Facebook Files,’ painted a picture of a company focused on growth and its own interests over the public good. Facebook has tried to play down the research. 

Former Deputy Prime Minister Nick Clegg, the company’s vice president of policy and public affairs, wrote to Facebook employees in a memo Friday that ‘social media has had a big impact on society in recent years, and Facebook is often a place where much of this debate plays out.’  

Source: Read Full Article