Of all the Pixel 2 surprises, top of the list has to be just how good the Google smartphone’s portrait mode photography is, despite only using a single camera. Although phones that simulate background bokeh aren’t new, the iPhone 8 Plus and others rely on two cameras side-by-side to pull off the trick. Now, Google is explaining how it’s done – and giving some tips as to how to maximize the impact.
In a new post on the company’s research blog, Marc Levoy, Principal Engineer, and Yael Pritch, Software Engineer, go in-depth on the combination of machine learning and computational photography that gives the Pixel 2 and Pixel 2 XL their portrait mode talents. It all comes down to depth of field, and how you go about measuring it.
A traditional camera like a DSLR relies on how the aperture of the lens works, both in its physical structure (which defines the shape of the bokeh) and how open it is (which controls what portions of the frame are in or out of focus). Since a smartphone has a small, fixed-size aperture, that won’t work. Instead, calculations about the depth within the frame of the subject and the environment they’re in are used to artificially adjust blur added in post-processing.
Most phones, the engineers point out, use either triangulation to make those depth calculations – using two side-by-side lenses in the same way human eyes see 3D in a scene – or bluntly separate the picture into two layers, background and foreground, by attempting to figure out the outline of the subject. That’s how the Pixel 2’s front-facing selfie camera does it, in fact, but the main camera has some tricks up its sleeve.
First, the phone shoots an HDR+ burst, grabbing multiple shots in short order at a variety of exposure levels. HDR+ is commonly used in scenes where there are sharp differences in lighting, so as to brighten darker portions of the frame without over-blowing the lighter sections. Then, though, the machine-learning comes in.
That relies on a TensorFlow neural network, which looks at the picture and tries to ascertain which pixels are in the foreground and which are in the background. Google trained it on “almost a million picture of people (and their hats, sunglasses, and ice cream cones)” the researchers write. You can get a reasonable portrait shot from that, though it’ll have uniform blur and may have confused foreground objects with part of the subject itself.
Finally, the Pixel 2 uses its dual-pixel camera sensor. That has two photodiodes for every individual pixel, and by splitting the light from the scene in half, each of those photodiodes gets a slightly offset perspective. In effect, Google is creating its own version of the two frames a side-by-side lens phone like the iPhone 8 Plus would generate, but the offset is less than 1mm.
It’s enough, though, in combination with the neural network, to create a workable depth map. “The last step is to combine the segmentation mask we computed in step 2 with the depth map we computed in step 3 to decide how much to blur each pixel in the HDR+ picture from step 1,” Levoy and Pritch explain. Google’s software opts for perfectly disk-shaped bokeh, with the whole portrait process running four seconds.
As with any computational system, some shots will come out better than others. The engineers do have a couple of suggestions as to how to maximize the impact of portrait mode, though. Framing subjects so that their head, or head and shoulders, fill the scene is a good start, and it’ll work better if they remove sunglasses or big scarves and hats. The greater the distance between the subject and the background, the more blur will be applied. For group shots, everybody should be at the same distance from your Pixel 2 so that they’re all sharp.