Researchers develop a deep learning method able to animate portions of a photo

By Shane McGlaun/June 17, 2021 6:46 am EST

We've all looked at a photograph at some point of something like the ocean, clouds, or waterfall, and for the briefest time, it almost seems like the picture is moving. Typically, that perceived motion is just a trick of our minds. Researchers at the University of Washington have developed a new deep learning method that can animate certain portions of a photograph, turning it into a video.

The UW deep learning method can animate flowing material such as waterfalls, smoke, or clouds. The technique developed at the University only needs a single photo of a waterfall to create the animation. Researchers used a short video that loops seamlessly, giving the impression of continual movement of the water.

It's almost like being able to turn any photograph into something akin to Apple's Live photographs that capture a second or so of movement before a still image. Researchers on the project said that what's unique about their method is that it doesn't need any user input or additional information. The process only requires a picture and produces a high-resolution, seamlessly looping video that typically looks like a real video.

Developing the technology was a challenge. According to researchers, it effectively requires them to predict the future. The system contains two parts, with the first part predicting how things were moving when a photo was taken. That information is taken and used to create the animation.

Estimating motion required the researchers to train a neural network with thousands of videos of waterfalls, rivers, oceans, and other materials with fluid motion. The training process asked the neural network to guess the motion of the video when only given the first frame. The neural network compared its prediction with the actual video and learned to identify clues, such as ripples in a stream, to help predict what happens next. Researchers created something they call "symmetric splattering" that predicts the future and past for an image, combining them into a single animation.