Project Site: http://montage4d.com
The commoditization of virtual and augmented reality devices and the availability of inexpensive consumer depth cameras have catalyzed a resurgence of interest in spatiotemporal performance capture. Recent systems like Fusion4D and Holoportation address several crucial problems in the real-time fusion of multiview depth maps into volumetric and deformable representations. Nonetheless, stitching multiview video textures onto dynamic meshes remains challenging due to imprecise geometries, occlusion seams, and critical time constraints. In this paper, we present a practical solution towards real-time seamless texture montage for dynamic multiview reconstruction. We build on the ideas of dilated depth discontinuities and majority voting from Holoportation to reduce ghosting effects when blending textures. In contrast to their approach, we determine the appropriate blend of textures per vertex using view-dependent rendering techniques, so as to avert fuzziness caused by the ubiquitous normal-weighted blending. By leveraging geodesics-guided diffusion and temporal texture fields, our algorithm mitigates spatial occlusion seams while preserving temporal consistency. Experiments demonstrate significant enhancement in rendering quality, especially in detailed regions such as faces. We envision a wide range of applications for Montage4D, including immersive telepresence for business, training, and live entertainment.
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
Montage4D: Interactive Seamless Fusion of Multiview Video Textures
1. Montage4D: Interactive Seamless Fusion of
Multiview Video Textures
Ruofei Du†‡, Ming Chuang‡¶
, Wayne Chang‡, Hugues Hoppe‡§
, and Amitabh Varshney†
†Augmentarium and GVIL | UMIACS | University of Maryland, College Park
‡Microsoft Research, Redmond ¶
Apple Inc. §
Google Inc.
THE AUGMENTARIUM
VIRTUAL AND AUGMENTED REALITY LABORATORY
AT THE UNIVERSITY OF MARYLAND
COMPUTER SCIENCE
UNIVERSITY OF MARYLAND
16. 16
of the participants does not believe the 3D reconstructed person looks real
Escolano et al. Holoportation: Virtual 3D Teleportation in Real-time (UIST 2016)
17. 17
of the participants does not believe the 3D reconstructed person looks real
Screen recorded with OBS. Computed in real-time from 8 videos.
24. Related Work
3D Texture Montage
24
Screen-space optical flow could fix
many misregistration issues, but
heavily relies on RGB features and
screen resolution, and fails when
rapidly changing viewpoints.
25. Related Work
3D Texture Montage
25
Up to now, few systems but Holoportation
could fuse dynamic meshes with
multiple cameras in real time.
This recent SIGGRAPH 2017 paper
produces excellent dynamic
reconstruction results,
but uses a single RGBD camera,
which may result in lots of occlusion.
26. Related Work
3D Texture Montage
26
𝑤𝑖 = 𝑉 ⋅ max 0, 𝑛 ⋅ 𝑣𝑖
𝛼
Normal Weighted Blending
(200 FPS)
Majority Voting for color correction
For each vertex, and for each texture,
test if the projected color agrees with
more than half of the other textures, if
not, set the texture weight field to 0.
Visibility test
Normal vector
Texture camera view direction
39. Majority Voting (Color Seams)
Each vertex is assigned with 8 colors coming from the 8
cameras. These colors are classified into different clusters
in LAB color space with a 0.1 threshold. The mean color
of the largest cluster is named majority voting color.
Seams
Causes
39
40. Majority Voting (Color Seams)
The triangle vertices have different results of the majority
voting colors, which may be caused by either mis-
registration or self-occlusion.
Seams
Causes
40
43. Field of View (Background Seams)
One or two triangle vertices lie outside the
camera’s field of view or in the subtracted background
region while the rest are not.
Seams
Causes
43
52. Geodesics
For diffusing the seams
52
On triangle meshes, this is challenging because of the computation of tangent directions.
And shortest paths are defined on edges instead of the vertices.
53. Geodesics
For diffusing the seams
53
We use the algorithm by Surazhsky and Hoppe for computing the approximate geodesics.
The idea is to maintain only 2~3 shortest paths along each edge
to reduce the computational cost.
58. 58
Visibility test
Visibility% of view i
User camera’s view direction
Texture camera view direction
Geodesics
𝑫 𝑣
𝑖 (𝑡)
For frame at time 𝑡, for each camera 𝑖, for each vertex 𝑣,
We define the Desired Texture Field:
62. Temporal Texture Field
Temporally smooth the texture fields
62
For frame at time 𝑡, for each camera 𝑖, for each vertex 𝑣,
We define the Temporal Texture Field (exponential smoothing)
𝑻 𝑣
𝑖 𝑡 = 1 − 𝜆 𝑻 𝑣
𝑖 𝑡 − 1 + 𝜆𝑫 𝑣
𝑖 (𝑡)
Texture field of the previous frame
Temporal smoothing factor
1
𝐹𝑃𝑆
68. 68
Note: Results use default parameters of Floating Textures, which may not be optimal for our datasets.
Still, for optical flow based approach, it would be better if seam vertices are assigned with less texture weights.
74. Experiment
Break-down of a typical frame
74
Most of the time is used in communication between CPU and GPU
75. In conclusion, Montage4D uses seam identification, geodesic fields,
and temporal texture field to provides a practical texturing solution for
real-time 3D reconstructions. In the future, we envision that Montage4D is useful
for fusing the massive multi-view video data into VR applications like
remote business meeting, remote training, and live broadcasting.
78. Thank you
With a Starry Night Stylization
Ruofei Du
ruofei@cs.umd.edu
Amitabh Varshney
varshney@cs.umd.edu
Wayne Chang
wechang@microsoft.com
Ming Chuang
mingchuang82@gmail.com
Hugues Hoppe
hhoppe@gmail.com
• www.montage4d.com
• www.duruofei.com
• shadertoy.com/user/starea
• github.com/ruofeidu
• I am graduating December 2018.
Good afternoon everyone,
My name is Ruofei Du. I am a Ph.D. candidate advised by Prof. Amitabh Varshney from University of Maryland, Colelge Park.
Today I am going present Montage4D, Interactive Seamless Fusion of Multiview Video Textures.
This work is collaborated with Ming Chuang, Wanye Chang, and Hugues Hoppe at Microsoft Research, Redmond.
Recently, consumer-level virtual and augmented reality displays have exploded in popularity.
For example, Oculus Rift,
HTV Vive
, and the Playstation VR was sold over 3 million during the past years.
Microsoft HoloLens have shipped over a few thousand all over the world.
These advances of VR and AR technologies have catalyzed many potential applications across many disciplines, such as, remote education,
immersive entertainment,
and business meetings.
These VR and AR applications have huge data requirements—so the question becomes: <click>
where is this massive amount of 3D data going to come from?
One potential source of 3D data in VR and AR will be real-time multi-view reconstruction.
In 2016, Microsoft Research presents Holoportation, the first-ever high-fidelity real-time reconstruction system.
This system uses 8 pods of color and depth cameras. It computes volumetric fusion in real time, and transfer the meshes, colors, and audio sterams to the renderer for remote rendering.
However, fusing multiview video textures onto dynamic meshes with real-time constraint is a challenging task.
For example, here is a short demo computed and recorded in real-time
According to the user study,
about 30% of the participants does not believe that
the 3D reconstructed person looks real compared with a real person.
This is mostly due to imprecise reconstructed geometries, textures seams, and normal-weighted blending.
So in this paper, we present Montage4D, a practical solution to blend the video textures on dynamic meshes in real time.
Previous research has addressed the problem of fusing multiple textures from static 3D models,
But compared with Holoportation,
their methods usually takes 10 minutes to process a single frame, because they try to solve an optimization problem on millions of vertices.
Or 30 seconds according to a recent SIGGRAPH paper.
Or 30 seconds per frame with texture atlas.
In addition, Eisemann have invented a novel view-dependent rendering technique, named Floating Texture (FT), running at $5$-$24$ frames per second. They have found that Screen-space optical flow could fix many misregistration issues, but heavily relies on RGB features, and fails when rapidly changing viewpoints.
% While FT solves a similar problem as our paper, the core ideas and rendering results between FT and Montage4D have significant differences. % First, Montage4D uses {\em Temporal Texture Fields} to achieve {\bf both} spatial and temporal consistency during the visual navigation of dynamic meshes. % FT chooses two or three of the closest input images for sparse camera arrangements, which works well for static models and static viewpoints, but suffers from spatial artifacts when the user quickly changes the viewpoints or when the mesh changes incoherently (occlusion changes can, for example, lead to such artifacts). % %This leads to the future work section of FT which states that "one source for artefacts remaining in rendering dynamic scenes is that of temporally incoherent geometry". % Second, as an optical-flow-based approach, surface specularity and poor color calibration of cameras can cause challenges in the optical flow estimation for the FT algorithm. Additionally, screen-based optical flow approach may also lead to visual inconsistencies (due to changes in specular lighting or occlusion) when moving the viewpoint.% These are some of the reasons we chose to not use the optical-flow-based approach.% Third, the diffusion processes are fundamentally different. FT diffuses the binary visibility map outwards in the screen-space, while Montage4D diffuses the seams inwards on the 3D geometry. Since the diffusion radius in FT is fixed in the fragment shader, its value is not consistent when the user zooms in or out. See Figure~\ref{fig:results2} for a comparison of different texturing schemes.% Although FT does not work well for our datasets, applying real-time optical flow over the mesh or within the screen space has the potential to address some of the misalignment errors in small patches.
So our main comparison is with the Holoportation system published last year.
First, the Holoportation uses normal weighted blending to compute a texture field for each vertex.
They first conduct a visibility test for occlusion, then uses the dot product of the normal vector and the view direction of the texture cameras to compute the scalar field of the texture weights
Next, they use a majority voting algorithm for color correction,
However, as we showed before, it suffers from blurring
and visible seams, especially in human faces.
So, What is our approach for real-time seamless texture fusion?
Our pipeline takes the input triangle meshes from a real-time reconstruction system called Fusion4D.
First, out system receives the input triangles meshes from Fusion4D via the network
Then we compute the rasterized depth maps and texture maps with a background subtraction model.
Next, we carefully identify the seams caused by mis-registration and occlusoin
Our first major research questions is: what are the causes for the blurring and seams?
The seams majorly comes from three reasons, self-occlusion, majority-voting, and field-of-view.
I will explain them one by one.
First, for each triangle, we recognize it as self-occlusion seam, if and only if,
One or two vertices of the triangle are occluded in the depth map while the others are not.
Here is an example, the person is holding a toy in front of the body, which results self-occlusion seams on his T-shirt.
Next, we identify the majority voting seam, if and only if
Next, we identify the majority voting seam, if and only if
The triangle vertices have different results in the majority voting process, which may be caused by either misregistration.
The triangle vertices have different results near the mean color
Here is an example, the occlusion test cannot remove some of the brown color projected from the toy,
While the majority voting test can eliminate them, they may also introduce visible seams.
The last category of seams are caused by camera’s limited field of view.
These seam triangles have one of two vertices lie outside of the camera’s field of view while the rest are not
Here is an illustration
For example, there is toy lying on the ground, where each camera only observe a small portion of the toy,
Here is an example of the seam triangles identified on this person’s faces.
Typically, less than 1% of the million-level triangles are seams.
Our next questions are, for a static frame,
How can we get rid of the annoying seams at interactive frame rate
Moreover, how can we spatially smooth the texture weight field near the seams, so that we cannot see visible seams in the results?
In Montage4D, we use the discrete geodesics to diffuse the texture fields from the seams.
Basically, geodesic is the shortest route between two points on the surface
On triangle meshes, computing geodesics is very challenging, because of the computation of tangent directions,
And shortest paths are defined on edges instead of the vertices.
We have adopted the algorithm from Hugues Hoppe’s group for computing the approximate geodesics.
The main idea, is to maintain only 2~3 shortest paths along each edges to reduce the computational cost.
Here are the results of estimating geodesic on the triangle meshes.
Using 20 iterations on the GPU, we can achieve a good diffusion field for the rendering
Another research questions is: what are the causes for the blurring?
On one hand, the blurring comes from the texture projection errors,
On the other hand, the normal weighted blending may accumulate errors from every camera, resulting the blurring in people’s faces;
To solve this, we use view-dependent rendering to replace it.
With the geodesics, we also used view-dependent rendering techniques instead of normal based blending.
However, this may introduce temporal artifacts,
However, using only the desired texture fields, it may lead to temporal inconsistency if you move the camera rapidly, or suddenly add another video stream in the reconstruction pipeline.
With the geodesics, we also used view-dependent rendering techniques instead of normal based blending.
However, this may introduce temporal artifacts,
so we use temporal smoothing technique for updating the temporal texture fields.
We also used view-dependent rendering techniques instead normal based blending
However, this may introduce temporal artifacts, so we calculate the gradients of the changes in the texture fields, then multiply by a factor lambda of 0.02 to slowly change the texture fields.
Here I show the animation by applying temporal texture fields.
Here are the results after using the view-dependent rendering, and the temporal texture fields.
Row 2 and row 4 are the results after applying the view-dependent rendering with the temporal texture fields.
Here are the results after using the view-dependent rendering, and the temporal texture fields.
Row 2 and row 4 are the results after applying the view-dependent rendering with the temporal texture fields.
Here are some more results between the Holoportation and Montage4D.
In addition to Holoportation, we also compared results with other rendering algorithms such as the floating textures.
Without identification of the seam triangles, the floating texture sometimes amplify the errors alongside the seams.
Next you may wonder:
With additional computation for seams, geodesics, and temporal texture fields, is our approach still in real time?
In out cross-validation experiments in 5 datasets on an NVIDIA GTX 1080, we can see that Montage4D achieves better quality with over 90 FPS, which is fully capable of VR and AR applications.
Here are some more examples.
We also evaluate the break-down time consumption of a particular input mesh,
Most of the time is used in communication between CPU and GPU when computing the geodesics.
In conclusion, Montage4D provides a practical texturing solution for
real-time 3D reconstructions. In the future, we envision that Montage4D is useful
for fusing the massive multi-view video data into VR applications like
remote business meeting, remote training, and broadcasting industries.
Finally, I would like to thank the Holoportation Team
As well as the Mobile Holoportation team in Microsoft Research for collaboration.
In the end, thanks to my audience for your listening, watching, and thinking. Thank you!
Thank you for your great question!
First, let me briefly introduce the background and motivation of this research.
Last year, Shahram's team presented the Holoportation system, which is the first to achieve 3D reconstruction in real time.
However, the original system requires 1.6 Giga bits per second for the bandwidth
So, last summer, our Project Peabody team worked very hard towards the Mobile Holoportation project.
Which brings down the bandwidth to 30-50 Mega bits per second.
Our pipeline should be compatible with the latest Motion2Fusion fusing pipeline for higher quality reconstruction in real time.