Google Clips' AI is learning to be a better photographer

Google Clips, the AI-enhanced camera that promises to capture moments you'd miss with your smartphone, is getting a brain upgrade. A new update for the hands-free camera promises to make it smarter at spotting the sort of activities users probably want to have records of, like hugs and dancing.

Launched earlier this year, Clips stands at odds to most camera experiences. Rather than demanding the user consider the scene in front of them through a preview display or viewfinder, Clips is designed to be turned on, pointed in the general direction of the action, and then left to its own devices. The camera itself decides what to capture and what to ignore.

The resulting photos and short video animations are then transferred over to the user's smartphone. Initial feedback from reviewers proved to be mixed: some praised Clips' ability to capture moments that they couldn't photograph themselves, often because they were too busy being involved in the scene itself. However, others complained that Clips wasn't responsive to every type of activity, and had ignored moments they'd hoped to have recorded.

Now, the Clips team is pushing out a new update for the camera, which should hopefully make it more capable on that front. According to Google research scientist Aseem Agarwala, the new firmware is "better at recognizing hugs, kisses, jumps and dance moves."

"You may want to get your daughter jumping up and down in excitement, or your son kissing your cat," Agarwala suggests. "It's all about the little moments and emotions that you can't stage or coordinate ahead of time."

Meanwhile, Clips is also getting new multi-user support. At launch, the camera could only be paired with a single phone. Now, that's being changed so that it can be linked with multiple devices, each of which will be able to access recordings and share them.

In a posting over at the Google AI Blog, Agarwala goes into more depth as to how it actually trained the machine learning that powers the camera. While products like Google Photos get to lean on the power of the cloud to do their processing, Clips had a far more constrained remit. All of its computation had to be done on-device, for example, both for privacy and power consumption reasons.

It also highlights a core decision that, perhaps inadvertently, contributed to one of the other early complaints about the camera. "We wanted to focus on capturing candid moments of people and pets, rather than the more abstract and subjective problem of capturing artistic images," Agarwala writes. "That is, we did not attempt to teach Clips to think about composition, color balance, light, etc.; instead, Clips focuses on selecting ranges of time containing people and animals doing interesting activities."

Lackluster composition was another of the criticisms leveled at Clips when it launched, though that's also a side-effect of the wide-angle lens Google used so as to fit more of the scene in-frame. "Clips is designed to work alongside a person, rather than autonomously," Agarwala concludes, "to get good results, a person still needs to be conscious of framing, and make sure the camera is pointed at interesting content."