Today at around 3:45 Central Time, Reddit had a bit of an internal error the likes of which brought parts of the page to a halt. This down-time was verified by several sources independently, including the official Reddit status page (dot io) where the official Error Rate jumped at around a quarter-till the hour.
What interesting about this particular incident was the otherwise-flawless nature of the rest of the site, as far as up-time went. Up until the incident began, Reddit’s status page reported 99.98% uptime – that’s across the past 90 days of service. All CDN locations appeared to be Operational at the time at which the site caught errors.
The most recent time this sort of incident occurred in the recent past was on the 12th of June, 2019. At that time, the Reddit error rate spiked slightly – as did the vote backlog for posts and comments. Thumbnail and embed scraper backlogs also spiked on that day. The 12th of June had Reddit suggest that they were investigating an issue as of 9:20 PDT, with a fix and resolution just before 11 PDT.
That June 12th incident occurred 6 days after scheduled database maintenance. The incident today occurred a little over a day after regularly scheduled Modmail Maintenance. Infrastructure seems to be passing basic tests run by IsItDownRightNow, but there has been an elevated number of functionality problem reports.
DownDetector had a whopping 4317 individual reports of outages over the past hour, which is extremely abnormal. Good news is, it would appear the reports are getting less frequent, so things have probably been resolved. Not shocking whatsoever is the Reddit “outage map” from DownDetector, showing that the majority of reports of down time came from some of the most populated areas in the United States and Europe – shocker!