Facebook admits its engineers made mistake that caused $100m seven-hour outage

Facebook admits its engineers made mistake that caused $100m seven-hour outage
Facebook admits its engineers made mistake that caused $100m seven-hour outage

Facebook has said there was 'no malicious activity' behind a seven-hour blackout that cost the company an estimated $100million in lost revenue, which experts and insiders say was exacerbated by remote working policies. 

The crisis came on a week of cascading disasters for Facebook, as a whistleblower testified before Congress slamming the company's artificial intelligence content algorithms as harmful and divisive. 

The company quietly updated a prior blog post on Tuesday to say that there was no malicious intent behind the historic outage, meaning that an employee error is most likely to blame. 

It is believed that a faulty update to Facebook's Border Gateway Protocol (BGP), which routes traffic between large private networks and the public Internet, left apps and browsers unable to locate the company's services. 

The global outage - which hit Facebook, Instagram, WhatsApp and Messenger on Monday - was caused when the faulty configuration disconnected its servers from the internet, meaning engineers had to travel to its Santa Clara data center to fix the glitch in-person.

But the repair was delayed, according a purported insider, because of 'lower staffing in data centers due to pandemic measures', along with outages in physical access card systems and internal messaging services. 

Kieron Harding, an IT Infrastructure Engineer at GRC International Group, told DailyMail.com: 'The nature of the problem meant Facebook would have needed network engineers to physically access their BGP routers – and due to the pandemic, some of the data centers quite possibly don't have an engineer based on site, or someone who could have immediately started to work on the problem.' 

'One of the reasons why the outage lasted for as long as it did was because the misconfiguration of the BGP also affected Facebook's physical door access systems – which shut down; meaning engineers couldn't get into the buildings, or secure rooms, to start fixing the issues straightaway,' said Harding. 

Facebook operates dozens of offices and data centers around the US. Monday's outage reportedly knocked out physical access to the company's facilities when key card systems went offline

Facebook operates dozens of offices and data centers around the US. Monday's outage reportedly knocked out physical access to the company's facilities when key card systems went offline

The glitch, which has prompted calls for a break-up of big tech firms, also brought down messaging services that remote-working staff use to communicate, so those who knew how to fix the servers couldn't get that information to the teams inside the data-center, the insider said. 

'There are people now trying to gain access to... implement fixes, but the people with physical access is separate from the people with knowledge of how to authenticate the systems and people who know what to actually do, so there is now a logistical challenge,' the purported insider said on Reddit.

Industry sources who have worked closely with the tech giant say Facebook is suffering from two major problems: Staff working from home and over reliance on artificial intelligence.

The social media site has been beset by bugs, glitches and AI issues for months – exacerbated by staff not being on premises to deal with or correct issues.

One source said that Facebook is simply unprepared to deal with emergencies and 'is very weak on the technical side'. Another added Facebook is currently 'a shambles' and has been beset with tech problems 'for months'.

They added: 'They think they can do everything with AI – but their tech isn't up to scratch. I'm inclined to think it's because they're WFH.'

Monday's outage was partly to blame for a nose-dive in Facebook's share price that saw $47billion wiped from its market value in its second-worst day ever on the stock market, also driven by a whistleblower testifying about the harms the site does to teenagers in Congress this week. 

Facebook shares rebounded on Tuesday, rising 2.3 percent in midday trading. 

In addition to the stock market slide during the outage, Facebook likely missed out on at least $67million in direct revenue and possibly as much as $102million during the outage - based on average hourly earnings across 2020 and projections of its 2021 hourly earnings from Q1 and Q2 results. 

A person claiming to be a Facebook employee said on Reddit that high numbers of staff working from home made the problem worse. The account was later deleted

A person claiming to be a Facebook employee said on Reddit that high numbers of staff working from home made the problem worse. The account was later deleted 

Users around the world reported problems with Facebook, Instagram and WhatsApp on Downdetector

Mark Zuckerberg - who lost around $7billion in stock value amidst the carnage - has previously vowed to make work from home a permanent part of Facebook, telling staff back in June that 'anyone whose role can be done remotely can request remote work.' 

The multi-billionaire said he plans to spend around half his time working remotely in 2022, and predicted that half of his staff could be permanently off-site by 2030.

Facebook's office are currently open but only to 25 per cent capacity, after plans to open fully by October were pushed back to at least January 2022 amid the spread of the Delta Covid variant. 

Of the staff who are not currently in the office, it is not clear how many will become permanent remote workers.

But a Facebook executive previously told the Wall Street Journal that the company has approved 90 per cent of WFH requests. The only caveat is that salaries may be cut to reflect the locations where people are actually working, as opposed to where the office is based.

Data centre staff are among those who cannot request a permanent WFH.  

Facebook's problems began around midday Eastern Time (5pm GMT) on Monday, shortly after its servers were updated, and lasted until around 5.45pm (10.45pm GMT) when the servers came back online. It took several more hours for all users to be able to access Facebook's sites and apps. 

Following Monday's outage, Zuckerberg issued a personal apology to Facebook users - telling them 'sorry for the disruption' while adding: 'I know how much you rely on our services.' 

But his message was immediately attacked from all sides, with those who use Facebook business saying he failed to take the issue seriously while casual users accused him of 'making yourself more important than you are'.

Twitter founder Jack Dorsey appeared to make light of Facebook's plight on Monday. Responding to a post which appeared to show how the facebook.com domain is for sale as a result of the outage, he jokingly asked: 'How much?'

Twitter founder Jack Dorsey appeared to make light of Facebook's plight on Monday. Responding to a post which appeared to show how the facebook.com domain is for sale as a result of the outage, he jokingly asked: 'How much?'

A Facebook staff member reportedly accidentally deleted large sections of the code (pictured) which keeps the website online

A Facebook staff member reportedly accidentally deleted large sections of the code (pictured) which keeps the website online

The above Tweet read: 'So, someone deleted large sections of the routing....that doesn't mean Facebook is just down, from the looks of it....that means Facebook is GONE'

The above Tweet read: 'So, someone deleted large sections of the routing....that doesn't mean Facebook is just down, from the looks of it....that means Facebook is GONE'

Facebook shares are down by more than 6 percent from last week as a result of the outage on Monday

Facebook shares are down by more than 6 percent from last week as a result of the outage on Monday

Still others said they had enjoyed the outage, and were planning to spend more time off social media in the future. 'Life was way simpler without these services,' wrote one. 

John Graham-Cumming, the chief technology officer of web security firm Cloudflare, said Facebook made a series of updates to its border gateway protocol (BGP) which caused it to 'disappear' from the internet. 

The BGP allows for the exchange of routing information on the internet and takes people to the websites they want to access.  

Dane Knecht, senior vice president of the firm, said earlier the Facebook Border Gateway Protocol (BGP) routes had been 'withdrawn from the internet.' 

Cybersecurity expert, Kevin Beaumont, wrote on Twitter: 'This one looks like a pretty epic configuration error, Facebook basically don't exist on the internet right now. Even their authoritative name server ranges have been BGP withdrawn.'   

Facebook, Instagram and WhatsApp were all brought down for almost seven hours yesterday in a massive global outage. The US tech giant said the problem was caused by a faulty update that was sent to its core servers, which effectively disconnected them from the internet

WhatsApp, Instagram and Facebook Messenger, run on a shared back-end infrastructure, creating a 'single point of failure' according to experts.

It wasn't just the main Facebook apps going down, other services, including Facebook Workplace and the Oculus website were also down. 

The EU's competition commissioner said it shows why large tech firms should be broken up to avoid a similar failure of multiple platforms at once.

EU competition commissioner Margrethe Vestager said the incident highlighted the negative impact of big tech firms controlling large swathes of the online world. 

'We need alternatives and choices in the tech market, and must not rely on a few big players, whoever they are,' she wrote on Twitter. 

The dominance of a handful of large social media and internet companies has come under scrutiny from competition watchdogs on a number of issues, with many campaigners in the UK, Europe and US urging governments and regulators to take steps to break up larger firms to prevent monopolies being created.

IT experts have also called on the tech industry to come up with better systems to prevent a single error from having such a wide impact.

Ms Vestager, who is also the European Commission's executive vice-president for a Europe fit for the digital age, added that the incident showed it was also sometimes good to step away from social media and talk to people 'offline'. 

Facebook's Chief Technology Officer, Mike Schroepfer, offered his 'sincere apologies' for the outage on Monday afternoon.  The scandal-hit company's shares had dipped by 5 percent on Monday amid the outage and after a whistleblower went public on Sunday night with claims that the firm prioritises 'growth over safety'. 

There have been a number of social media outages in recent months, with Instagram going down for 16 hours just last month, and all Facebook platforms going offline in June. 

Twitter founder Jack Dorsey appeared to make light of Facebook's plight on Monday. Responding to a post which appeared to show how the facebook.com domain is for sale as a result of the outage, he jokingly asked: 'How much?' 

The cause of the outage remains unconfirmed and it's unclear if all are linked but not long before Facebook's entities went down, entries for Facebook and Instagram were removed from the Domain Name System (DNS) it uses. 

A DNS is essentially an internet directory. Whenever someone opens a link or an app, their device has to search the DNS used by the service they are trying to access to find it and then connect them to it. 

Major DNS providers are Google, Amazon and Cloudflare. It's unclear if all of the sites and services that went down on Monday use the same DNS or not. 

A similar outage at cloud company Akamai Technologies Inc took down multiple websites in July.

Cloudflare's Mr Graham-Cumming tweeted on Monday that Facebook accidentally 'disappeared' from the internet after making a 'flurry' of updates to its BGP - Border Gateway Protocol.   

'Between 15:50 UTC and 15:52 UTC [4.50-4.52pm UK time] Facebook and related properties disappeared from the Internet in a flurry of BGP updates,' he

read more from dailymail.....

PREV Donald Trump is more trusted than Joe Biden to stand up to the USA's foreign ... trends now
NEXT Chilling final moments of murdered psychologist as she's seen welcoming killer ... trends now