This Could Make Creating Convincing Fake News Videos Too Easy

Fake news is a big and thorny subjects these days and there is no indication that it's going away any time soon. In fact, it might get worse before it gets better. Countless people are already easily duped by legit-looking text and professionally photoshopped images. But what if they can be deceived by video as well? While the machine learning algorithms developed by University of Washington researchers have more lofty goals in mind, this "lip-syncing tech" could also be used to prank and mislead people in the future.

Imagine taking footage from one video, say an interview of former US president Barack Obama, and mixing it with audio from a completely different video, like a totally different interview. While technically possible, the results are less than convincing. But using this new machine learning technique, University of Washington researchers have been able to produce such a mix that eerily looks like the real deal.

The problem with mixing video with different audio is in the way the mouth moves. Humans have an uncanny talent for detecting such out of sync or even fake mouth movements. Unsurprisingly, that is part of the aversion better known as "uncanny valley". This new research fixes all that and in a more efficient way.

Audio to video conversion, as it is called, has been around for quite a while now, but almost all techniques involved multiple people being filmed to speak the same sentences repeatedly while cameras and computers capture the changes in the shape of the mouth and associate those with certain sounds. Not exactly ideal or cheap. The UW researchers, instead, utilized machine learning and fed existing videos to a neural network, which then proceeded to do the same analysis of mouth shapes and their associated sounds. You no longer need to have live actors on hand but can simply use tons of recorded videos to achieve the same effect.

That's only half of the process though. The second half involves analyzing the "fake" audio input and then create mouth shapes corresponding to the sounds in the audio and superimpose them on the target video. The results are both impressive and worrying and many people probably won't be able to tell that the resulting video is anything but legit.

The researchers envision that this technology could be used for things like video conferencing, where video can "catch up" with audio despite the lag, or for education, where historical figures can be made to look like delivering speeches recorded only in audio. But it can also be used to create fake videos of prominent figures saying something they might have said long ago or not at all.

The good news is that the machine learning agent is currently fixated on Mr. Obama. That is, it is the only thing it knows about, thanks to the over abundance of data, like video interviews, available. The second is that the researchers might also have a way to "reverse" the process to tell whether a video is fake or not.

SOURCE: University of Washington