News

Google AI Helps Archive The New York Times' Vast Photo Collection

By Brittany Roston Nov. 9, 2018 6:18 pm EST

Google is working with The New York Times on a photo digitization project. The work involves Google Cloud Platform for storing the images, where the publication will be able to search through them. This is more than mere storage, however, also involving technology that helps provide insight into the content.

The New York Times stores millions of physical photographs in a series of file cabinets in an underground library it calls the "morgue." These images date back to the 19th century and contain iconic imagery from major historical events. However, due to the vast size of the collection, many of these images haven't been seen in years.

There are multiple reasons to digitize the images, including both the ability to access any given image without physically sorting through stacks of photos, as well as the need to preserve the content in case of accidents. Such a huge collection presents its own issues, though, namely in organizing the images in such a way that one can reasonably find what they're looking for.

It's impractical to manually add data to the digitized files for each image, and that's where Google's machine learning comes in. According to the company, its Cloud Vision API enables the system to automatically read content on images, process it, and store it. This includes information found on the back of printed photographs.

The Cloud Vision API is also capable of finding logos, and it is joined by the Cloud Natural Language API, which understands extracted text to classify content. One example Google gives is finding the name of a city and state, as well as landmarks like a train station, and using this data to determine what category and sub-category the image belongs in.

Google AI Helps Archive The New York Times' Vast Photo Collection

Recommended