Google is throwing open the computer vision system that powers Google Photos to third-party developers, helping them to filter out obscene images, spot faces and landmarks, and more. The new Cloud Vision API is what allows Google Photos to sift through all your pictures and collect all those with dogs in the frame, with well-known landmarks, or even particular facial emotions.
Figuring that other apps might want to do something similar, Google is now offering a limited preview of the API. Currently free to implement – though Google says paid tiers will be launched at some point – it can apply up to six different analyses onto each shot.
So, it’s possible to spot a landmark or a product logo, as well as pull out text from an image – including identifying the language it’s written in.
Label/entity detection can spot the dominant subject of the photo, whether that be an object like a house or a boat, an animal like a pet cat or rabbit, or a person. Individual apps can use their own metadata, creating custom identification sets.
Facial detection spots individual faces and classifies them as one of more than eight different attributes, such as showing sorrow or joy. However, Google points out, it’s detection not facial recognition: the API can’t identify the person themselves, and it’s not storing any of the images.
Finally, there’s Safe Search detection. Google uses that to filter out explicit pictures which might be adult-rated or show violence, and now developers can use the same filters on their own apps.
In its current form, the Cloud Vision API requires the photo to be analyzed to be submitted to the system. However, Google says that in the future it expects any images saved in its cloud storage to be processable.
While this sort of processing would generally require a high-performance computer if done locally, the reliance on the cloud means the hardware demands are far more frugal. In fact, Google has built a proof-of-concept using a Raspberry Pi and “just a few hundreds of lines of Python code”, putting it into a roving robot.
The robot is able to spot and identify objects, as well as tell whether the people it sees are smiling.
The potential applications are vast. Drone manufacturer AeroSense, a collaboration between Sony and others, is already using the API to organize the hundreds and thousands of photos its drones gather.
Other possibilities include recommendation apps, social networks, and robots that can better adapt to unexpected environments.