Microsoft’s “How Old” age estimator didn’t exactly impress everyone with its mixed abilities to figure out ages, but the company is already looking at more impressive object recognition tech. A new system, handiwork of the Microsoft Research team, is capable of analyzing a photo and automatically captioning what it sees in it, whether that be people, objects, or groups. The hope is that it can be used to make more responsive, independent artificial intelligences.
Microsoft Research’s approach was to teach the computer much as a human would be taught: with a huge repository of photos. “The machine has been trained to understand how a human understands the image,” team member Xiaodong He said of the system.
The promise is a combination of analysis that adds up to better overall understanding. That might include differentiating gender based on facial hair, the company says.
Still, as the image above suggests, it’s not infallible. The woman in the foreground is, according to Microsoft’s captioning system, wearing a cat, whereas we’re pretty sure it’s actually her furred coat collar and her own hair confusing things.
After object recognition comes sentence generation, and then those individual sentences are “re-ranked” into a more illustrative sentence.
The end-goal might be a so-called “universal augmented intelligence system” that, Microsoft Research suggests, could follow you around 24/7, making inferences as it goes, and suggestions when relevant. That’s probably still some way off all the same, but this is a good place to start.