Back in 2010, we learned that the Library of Congress had struck a deal with the fine folks over at Twitter to archive every public tweet ever made. In order for this to be accomplished, the Library of Congress first had to go back to 2006 and, starting with the first tweet ever sent using the service, archive all of the tweets to April 2010. This required that the Library of Congress not only create a way to manage a constant stream of tweets on a daily basis, but to also “create a structure for organizing the entire archive by date.”
The Library of Congress says that all of those goals will be completed this month. Amazingly, the archive the Library has amassed – in less than three years, remember – now comes in at a staggering 170 billion tweets. That certainly isn’t anything to scoff at, and the rate at which new tweets are being added per day is almost just as absurd.
In February 2011, the number of new tweets the Library of Congress had coming in per day was right around 140 million. In the time since, that number has grown almost half a billion tweets per day. 500 million tweets is a lot to archive each and every day, so you can bet the Library of Congress is pouring a ton of resources into this project.
Now that the Library of Congress has figured out a way to effectively archive and organize all of these tweets, next on the list is making the archive easily accessible to researchers in a “comprehensive, useful way.” It sounds like this is expected to take some time, as there are some “significant technology challenges” that need to be overcome before this portion of the project is complete. We’ll be keeping an eye on this ongoing project, so keep it here at SlashGear for more!
[via Library of Congress]