Google Caffeine parallel indexing speeds search, boosts accuracy

Jun 9, 2010
2

Google reckons searchers should be seeing better, more relevant and fresher results from now on, thanks to their new Caffeine search index.  Freshly rolled out this week, Caffeine shifts to a parallel processing model rather than the staged index refreshes Google used previously; that means adding hundreds of thousands of pages every second to the index, rather than the so-called "main layer" of the old index being refreshed every couple of weeks.

"With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published." Google

It wouldn't be a Google announcement without a few mind-blowing stats, and Caffeine certainly delivers on that.  If the new index was represented as a pile of paper, they say, it would grow three miles taller every second; meanwhile Caffeine occupies nearly 100 million gigabytes of storage in a single database, adding new information at a rate of hundreds of thousands of gigabytes per day.

Google's graphic certainly suggests that part of the reasoning for shifting to Caffeine is to accommodate the growing amount of multimedia data, including voice, video and photography, together with ebook content.


Must Read Bits & Bytes