The usually reliable social networking site left many of its 500 million users out in the cold last week, as an internal glitch forced Facebook to shut down the entire site temporarily.
It was first reported that the problem was caused by a DNS failure, however the trouble has now been blamed on an internal error.
“This is the worst outage we’ve had in over four years,” announced Facebook in a blog post, “and we wanted to first of all apologize for it.”
“The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.”
“To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.”
“We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.”
The popular social media site is now back up and running, with no further problems reported.
Image courtesy of Jason McElweenie