NVIDIA Instant NeRF Uses AI To Convert Flat Photos To 3D

NVIDIA has showcased an impressive tech that turns a set of flat, 2D images into 3D renders. The company is calling it Instant NeRF, an upgraded version of the NeRF (Neural Radiance Fields) tech that has been making waves in the industry lately. NVIDIA claims that its product deploys an AI that takes mere seconds for training and can turn a bunch of still photos into 360-degree work almost instantly. In fact, the company claims to have achieved a 1,000x higher performance gain in some scenarios.

The whole process of turning 2D photos into 3D is based on a technique called inverse rendering, which does its job by guessing how light interacts with objects in real life. A NeRF system uses a neural network to predict the colors of light movement in each direction from any given point in space. In simpler terms, the AI mimics what the human eyes would see from a certain angle, and then reconstructs the scene. The reconstruction happens by using multiple images clicked from various angles as the source data for creating a 3D scene out of it.

Speeding up metaverse development

Instant NeRF requires a few dozen photos as the source and details on the camera angle they were clicked from, to render an accurate 3D version, with all of it happening in milliseconds. The current-gen NeRF models are quick at rendering, but require a lot of training time. To overcome that challenge, NVIDIA created a technique called multi-resolution hash grid encoding that is tailored to run on NVIDIA's graphics cards. The aforementioned system creates small neural networks that can be trained in a shorter window. NVIDIA says all that wizardry can happen on a single GPU, which means it's not a particularly resource-intensive tech and can be applied on products that can run on consumer-grade PCs.

Application scenarios include training robots to improve their spatial awareness, polishing the software of self-driving cars, and quickly turning 2D assets into 3D objects for virtual worlds. The latter sounds like the perfect use-case scenario for the metaverses that every tech giant appears to be interested in right now, including NVIDIA. But there are a dew limitations. 

If the subject moves too much during the 2D capture, the resulting 3D scenes will be blurry. Morever, if there are a lot of moving elements, capturing all the source data should ideally happen in a short burst. A detailed technical explainer of the NeRF technique can be found here, while this article is a better visual representation of the technology in action with a variety of real-world subjects involved.