Representing Scenes as Neural Radiance Fields for View Synthesis
View synthesis in the context of this discussion involves the synthesis of any view of a 3d scene or object(s), given only a small sparse set of inputs images.
So why does this even work in the first place? Once again, we enter the domain of manifold representations. Real world data lives on a lower dimensionality manifold. Higher than the 2D, higher than 3D, higher than 4D, but still lower than the potential variability in the data.
View synthesis exploits this. Remember, neural nets do non-linear function interpolation. The function is the manifold the view synthesis data lives on. By moving along the surface of the manifold, you can change the view.
And with those words of wisdom, let's dive into our good friend Yannic Kilcher's discussion of the Berkeley NeRF paper.
You can check out the paper here.
The project website is here.
1: It's fun to think about how to exploit this whole approach in more generic generative image synthesis schemes.
2: Modern cameras can generate a point cloud of data (as opposed to just a 2D pixel map). Pixels in these cameras can have depth, so each pixel is effectively a colored point located in a 3D cloud.
3: Be careful about that '1 year old paper is real old' statement. Most 'new' ideas were actually invented 20 or 30 years ago, or even earlier.
4: Note they use the transformer positional encoding trick in a different way. Back door way to build a scale space.