New artificial intelligence (AI) technology from Nvidia could soon turn 2D photos into 3D scenes in just seconds, making the creation of immersive virtual spaces like the metaverse as trivial as word processing.

  • Nvidia recently showed off a technique that turns 2D photos into 3D scenes in mere seconds. The method uses computer power to approximate how light behaves in the real world.The metaverse is one area where 3D scenes are helpful because they can be viewed from any camera perspective.

Nvidia recently demonstrated the photo method called Instant NeRF, which uses computing power to approximate how light behaves in the real world. It could transform your old photos into a video game scene, or it can be used to train robots and self-driving cars to understand the size and shape of real-world objects. 

“3D imaging brings a new world of transformation,” Oren Debbi, the CEO of Visionary.ai, a computer vision company that runs its 3D algorithms on the Nvidia platform, told Lifewire in an email interview. “Using 3D, you mimic real-world depth into the scene and make the image appear more alive and realistic. Besides AR/VR and industrial cameras, where 3D is very common, we are now seeing it being used on almost every smartphone without the user even knowing.” 

Adding Dimensions

The first instant photo, taken 75 years ago with a Polaroid camera, aimed to capture the 3D world in a 2D image rapidly. Now, AI researchers are working on the opposite: turning a collection of still images into a digital 3D scene in seconds.

Known as inverse rendering, the process uses AI to approximate how light behaves in the real world, enabling researchers to reconstruct a 3D scene from a handful of 2D images taken at different angles. Nvidia claims it has developed an approach that accomplishes this task almost instantly.

Nvidia used this approach with a new technology called neural radiance fields, or NeRF. The company says the result, dubbed Instant NeRF, is the fastest NeRF technique to date. The model requires just seconds to train on a few dozen still photos and can then render the resulting 3D scene within tens of milliseconds.

“If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene,” David Luebke, vice president for graphics research at Nvidia, said in a news release. “In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography—vastly increasing the speed, ease and reach of 3D capture and sharing.”

Collecting data to feed a NeRF requires the neural network to capture a few dozen images taken from multiple positions around the scene, as well as the camera position of each of those shots.

The NeRF trains a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space.

The Appeal of 3D

The metaverse is one area where 3D scenes are useful because they can be viewed from any camera perspective, Brad Quinton, founder of the Perceptus Platform for augmented reality (AR), told Lifewire in an email interview. Just like we can walk through a room in real life and see its contents from many different angles, with a reconstructed 3D scene, we can virtually move through a space and view it from any perspective. 

“This can be particularly useful for creating environments for use in virtual reality,” Quinton said. 

Programs like Apple’s Object Capture use a technique called photogrammetry to create virtual 3D objects from a series of 2D images. The 3D models will be used extensively in virtual reality and AR applications, Quinton predicted. For example, some AIs, like the one in the Perceptus AR Platform, use 3D models to create an understanding of the real world, which allows for real-time AR applications.

The use of 3D images also mimics real-world depth in a scene and makes the image appear more alive and realistic, Debbi said. To create a Bokeh effect (aka portrait mode or cinematic mode), 3D depth mapping is necessary. The technique is used on almost every smartphone. 

“This is already the standard for professional videographers filming movies, and this is becoming the standard for every consumer,” Debbi added. 

Get the Latest Tech News Delivered Every Day