The document discusses recent advances in novel view synthesis using neural rendering. It describes different approaches for representing 3D scenes like voxel grids, multi-plane images, and implicit functions. Voxel-based methods can render high quality novel views but are memory intensive. Implicit functions enable more compact representations but rendering is slow. Hybrid implicit/explicit and image-based methods provide faster rendering but cannot represent scenes globally. The document outlines open challenges in reducing rendering costs, improving generalization, and enabling new applications in scene understanding.
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
1. Vincent Sitzmann, SIGGRAPH 2021
Novel View Synthesis for Objects and
Scenes
Neural Rerendering in the Wild, Meshry et al. 2019
Scene Representation Networks, Sitzmann et al. 2019
Neural Volumes,
Lombardi et al. 2019
Deep View, Flynn et al. 2019
2. Vincent Sitzmann, SIGGRAPH 2021
Goal: Render novel views given sparse set of
observations
+
+
Observations
Image + Pose & Intrinsics
{ ,
,
…
{ Model
Novel Views
3. Vincent Sitzmann, SIGGRAPH 2021
Training on dataset of images
Differentiab
le Renderer
Scene
Representati
on
Image Loss
Reconstructi
on
Scene Representation + Differentiable Renderer:
Train on images
+ ,
+ ,
…
Observations
Re-Rendered
Observations
, , …
, ,
…
4. Vincent Sitzmann, SIGGRAPH 2021
How to do few-shot reconstruction?
Differentiab
le Renderer
Scene
Representati
on
Image Loss
Scene Representation + Differentiable Renderer:
Train on images
Prior-based Reconstruction:
If method learns prior, enables few-shot reconstruction!
Single
Observation
+ ?
Prior-Based
Reconstructi
on
Re-Rendered
Observations
, , …
, ,
…
5. Vincent Sitzmann, SIGGRAPH 2021
Overview
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
Both Scene Representation and Differentiable Renderer often
adapted from traditional computer graphics.
6. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
7. Vincent Sitzmann, SIGGRAPH 2021
Voxel-based methods
Lombardi et al., SIGGRAPH 2019
Sitzmann et al., CVPR 2018
DeepVoxels Neural Volumes HoloGAN
Phuoc et al., ICCV 2019
8. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
9. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
10. Vincent Sitzmann, SIGGRAPH 2021
Neural Implicit Approaches
Scene Representation Networks
Generalizes across scenes
Sitzmann et al., NeurIPS 2019
NeRF
Single-scene
Mildenhall et al., ECCV 2020
Implicit Differentiable Renderer
Single-scene
Yariv et al., NeurIPS 2020
Volumetric
• Higher Quality
• Easy convergence
• Very expensive
Near
Far
Sphere tracing
• Faster
• Fewer network evaluations
• Convergence more difficult
Differentiable Volumetric Rendering
Generalizes across scenes
Niemeyer et al., CVPR 2020
11. Vincent Sitzmann, SIGGRAPH 2021
Dynamic Extensions
Nerfies, Park et al., arXiv 2019
D-NeRF, Pumarola et al. 2020
Neural Radiance Flow, Du et al., arXiv 2020
Neural Scene Flow Fields, Li et al., CVPR 2021
Space-time Neural Irradiance Fields, Xian et al., arXiv 2020
12. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
13. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
14. Vincent Sitzmann, SIGGRAPH 2021
Hybrid Implicit / Explicit
PiFU, Saito et al., ICCV 2019
GRF, Trevithick et al., arXiv 2020
pixelNeRF, Yu et. al., CVPR 2021
MVSNerf, Chen et al., arXiv 2021
Learn local (image patch-based) priors
Neural Sparse Voxel Fields,
Liu et. al., NeurIPS 2020
Unconstrained Scene Generation with
Locally Conditioned Radiance Fields,
DeVries et al., arXiv 2021
15. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
Significant Speedup
Admits local priors
No compact
representation
No global priors
16. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
Significant Speedup
Admits local priors
No compact
representation
No global priors
17. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
Significant Speedup
Admits local priors
No compact
representation
No global priors
High-quality
Fast
Large Size
Only 2.5D
18. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
Significant Speedup
Admits local priors
High-quality
Fast
Large Size
Only 2.5D
No compact
representation
No global priors
19. Vincent Sitzmann, SIGGRAPH 2021
Image-based methods
Stable View Synthesis
Riegler et al., CVPR 2021
IBRNet, Wang et al., CVPR 2021
20. Vincent Sitzmann, SIGGRAPH 2021
Requirements
Scene
Representati
on
Multi-Plane Images
Voxelgrids Implicit Function
Renderer
(Alpha) compositing
Volumetric
Ray-based
Sphere-Tracing
Volumetric
Pros
Cons
“True 3D”
High quality
No reconstruction
priors
Memory O(n3)
Hybrid
Implicit/Explicit
Volumetric
Image-based
Rasterization /
Volumetric
True 3D
High quality
Compact
Admits global priors
Extremely expensive,
slow rendering
Significant Speedup
Admits local priors
No compact
representation
No global priors
Memory O(n3)
High-quality
Fast
Large Size
Only 2.5D
High-quality
Fast
Not compact:
Needs source images.
21. Vincent Sitzmann, SIGGRAPH 2021
Summary: Open Challenges
Expensive Rendering
• Rendering requires hundreds of samples per ray – at train and test time.
• How to do non-Lambertian effects? Multi-bounce barely tractable.
Generalization
• Local conditioning enables stronger generalization, but doesn’t learn
object-/scene-centric representations. Can we have both?
Scene Understanding
• Lots of important applications outside of computer graphics worth
exploring!