The document summarizes an upcoming webinar on new developments in MPEG standards. It will discuss Versatile Video Coding (VVC), MPEG-H 3D Audio Baseline Profile, video-based point cloud compression (V-PCC), and MPEG Immersive Video (MIV). The webinar will provide overviews of each standard and their applications, as well as results from recent verification tests that evaluated subjective quality and performance. Speakers will include leaders from MPEG working groups and the Joint Video Experts Team.
1. What’s new in MPEG? Webinar | July 21, 2020 | 10:00 UTC and 21:00 UTC
Jörn Ostermann
MPEG Convenor
Versatile Video Coding
Video-based Point Cloud Compression
MPEG
3DAudio
MPEG
Roadmap
Carriage of VVC and EVC
MPEG Immersive Video
Further Information:
https://bit.ly/mpeg131
Bart Kroon
MPEG Video
Marius Preda
MPEG 3DG
Young-Kwon Lim
MPEG Systems
Gary Sullivan
JVET
Jens-Rainer Ohm
JVET
Schuyler Quackenbush
MPEG Audio
MPEG Web Site: https://mpeg-standards.com/meetings/mpeg-131/
2. Joint Video Experts Team (JVET)
of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11
Finalization of Versatile Video Coding
Webinar, 21 July 2020
Gary Sullivan and Jens-Rainer Ohm
JVET Co-chairs
3. Documents approved in recent meeting
• Versatile Video Coding (JVET-S2001)
– Twin text: ITU-T H.266 | ISO/IEC 23090-3
– Description of bitstream syntax and semantics, processes for core decoding
and high-level syntax as necessary for decoding
• Versatile SEI messages for coded video bitstreams (JVET-S2007)
– Twin text: ITU-T H.274 | ISO/IEC 23002-7
– Independent SEI messages and VUI, specification not needed for core
decoding process, could be used with VVC or other video standards
• Test Model 10 of Versatile Video Coding (VTM 10) (JVET-S2002)
– Encoder and algorithm description
– Has corresponding software implementation
• Draft 4 of VVC conformance testing (JVET-S2008)
• VVC verification test plan (v3) (JVET-S2009)
4. VTM9 compared to HEVC-HM, "common test conditions" (CTC)
Random Access is most important in storage, streaming, broadcast
• UHD average >40% (PSNR) – both luma and chroma
• Reasonable complexity tradeoff
Random Access
Over HM-16.20
Y U V EncT DecT
Class A1 −38.74% −37.19% −44.34% 884% 186%
Class A2 −43.13% −39.74% −38.35% 999% 199%
Class B −34.74% −46.77% −44.61% 935% 189%
Class C −29.90% −30.58% −32.56% 1212% 199%
Class E
Overall −35.93% −39.13% −40.09% 1004% 193%
Class D −27.64% −26.48% −26.11% 1326% 194%
Class F −41.55% −44.78% −46.09% 689% 163%
Performance of VVC (PSNR)
5. Visual Subjective Performance of VVC
• Test with non-expert viewers, sequences not included in
CTC (from preparation of verification test)
• Notable: Visual results seem to be better for VVC than
when measured by PSNR (from JVET-S0246)
6. Versatility of VVC Video Applications
• Designed for a wide variety of types of video
• Camera captured, computer-generated, and mixed content
– Screen sharing
– Adaptive streaming
– Game streaming
– Video with scrolling text, etc.
• Standard and high dynamic range (emphasis on 10 bit
video)
• Various colour formats, including 4:4:4 and wide gamut
• 360° video with various projection map types
• Multiview video (including depth maps)
• MPEG’s video-based point cloud compression
• Lossless coding support
7. Special Features with High-Level Syntax
• Flexible access mechanisms, including localized access using
“subpictures”
• Extraction and merging at bitstream level
• Special boundary handling for gradual refresh and 360° video
• Layered coding, including low-complex scalability operation
• Nested temporal sublayering
• Predictive reference picture resampling
• Wavefront parallel processing similar to HEVC,
with less CTU row delay
• General constraints information: Mechanism to identify tool
usage at high level
8. Overview of coding tools
• Partititioning: Multi-type tree (Quad/binary/ternary)
• Intra prediction using
– more directional modes (incl. wide angles), DC and planar
– sample smoothing with various adaptation methods (position dependent)
– inheritance of chroma modes and chroma sample prediction from luma
– multi-line prediction, matrix weighted prediction
• Inter prediction using advanced MV coding, affine models, sub-block and
geometric/diagonal partitioning, decoder side motion refinement (three tools
named DMVR, BDOF, PROF)
• Combined intra/inter prediction
• Switchable primary and secondary transforms
• New adaptive loop filter based on local classification, in-loop amplitude mapping
stage, additional elements in deblocking
• Quantization with log step size switching (& trellis-based dependent quantization)
• Context-adaptive arithmetic coding with various improvements
• Support for screen content (intra block copy, palette mode, transform skip) and
lossless and near-lossless coding
9. • Document archives (publicly accessible, >10k docs)
– http://phenix.int-evry.fr/jvet
– http://wftp3.itu.int/av-arch/jvet-site
– http://phenix.int-evry.fr/jct
– http://wftp3.itu.int/av-arch/jctvc-site
• Software for VVC-VTM, HEVC-HM, and 360° Video
(publicly accessible):
– https://jvet.hhi.fraunhofer.de/
– https://hevc.hhi.fraunhofer.de/
– https://jvet.hhi.fraunhofer.de/svn/svn_360Lib/
Obtain documents and software
11. ARL
audio research labs
MPEG-H 3D Audio - Introduction
• MPEG-H 3D Audio standard was finalized in 2015, specifying the Low
Complexity Profile
• The Low Complexity Profile enables delivery of:
– Channels and Objects
– Higher-Order Ambisonics (HOA).
• Audio Objects are a key component in enabling advanced personalization
options in broadcast applications
– Dialog enhancement
– Language selection
2
12. ARL
audio research labs
MPEG-H 3D Audio – New Profile
• In July 2019, industry requested a new profile dedicated to broadcast,
streaming and streaming immersive music applications.
• In July 2020, WG11 (MPEG) announces the completion of Amendment 2
on 3D Audio which specifies the new Baseline Profile addressing this
industry request.
3
13. ARL
audio research labs
MPEG-H 3D Audio Baseline Profile
• Tailored for broadcast, streaming, and high-quality immersive music delivery,
the Baseline profile:
4
Baseline Profile
Advanced Coding: Channels and Objects
Loudness Control and DRC
Rendering and Downimx
Rich Metadata Set
Personalization and Interactivity
Accessibility and Dialog Enhancement
Seamless Configuration Changes
Sample Accurate Ad-insertion and Splicing
…
Low Complexity Profile
HOA
LPD
DRC – Dynamic Range Control
LPD – Linear Prediction Domain
– Supports Channels and Objects.
– Is a subset of the Low
Complexity profile.
– Supports all advanced broadcast
and streaming features
14. ARL
audio research labs
MPEG-H 3D Audio Baseline Profile
• In addition, the Baseline Profile:
– Enables the use of up to 24 audio objects in Level 3 for high quality
immersive music delivery.
– Can be signaled in a backwards compatible fashion, such that Baseline
Profile bitstreams will be decoded by all MPEG-H enabled devices that
support either one of the two profiles
5
15. ARL
audio research labs
3D Audio Baseline Profile Verification Test Report
• Reports on the results of five subjective listening tests assessing the
performance of the 3D Audio Baseline Profile.
• Covers a wide range of bit rates and immersive audio use cases
• The tests were conducted in nine different test sites:
– Dolby, ETRI, Fraunhofer IIS, Gaudio, NHK, Nokia, Orange, Qualcomm and Sony
• With a total of:
– 341 listeners
– 1,144,592 subjective scores
6
Public
Document
16. ARL
audio research labs
3D Audio Baseline Profile Verification Test Report
• Three Tests achieve "Excellent" quality on the MUSHRA scale:
– Test 2: 11.1 or 7.1 channels at 512 kb/s to 256 kb/s rate
– Test 3: 7.1, 5.1 and 2.0 channels at 256 kb/s to 48 kb/s rate
– Test 4: Content as Test 2, but binauralized for headphones at 384 kb/s
• Two Tests achieve "ITU-R High-Quality Emission" quality
– Test 1 "Ultra-HD Broadcast": 22.2 channels at 768 kbs
– Test 5 "High-Quality Immersive Music Delivery": 24 audio objects coded
at 1.5 Mb/s, presented as 11.1 (7.1 + 4H) loudspeakers
7
17. ARL
audio research labs
360 Reality Audio Music Service
• 360 Reality Audio music can be
enjoyed by consumers using:
– Tidal,
– Deezer,
– Nugs.net,
– Amazon Music HD and
– Sony Select (China).
8https://www.sony.com/electronics/360-reality-audio
https://www.amazon.com/music/unlimited/why-hd?ref=dmm_LP_WHYHD
18. Point Cloud Compression
in MPEG
MPEG 131st Press Release,
ISO/IEC FDIS 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud
Compression
July 2020
Institut Polytechnique de Paris, FRANCE
Marius PREDA
MPEG 3D Graphics Chair
19. Point Cloud
A set of 3D points
• not ordered,
• without relations between
them
Each point is defined by
• (X, Y, Z)
• (R, G, B) or (Y, U, V)
• eflec ance, an a enc ,
21. Sport viewing with point clouds
360°
backgroun
d
3D
objects
1-3 Gbps per object
22. Point Cloud
800,000 points -> 1 000 Mbps (uncompressed)
Compression is required in order to make PC useful
23. Very sparse occupancy of the 3D space
- (usually) the objects are represented by their
surface and not by volumes
- In 2D a pixel has 8 neighbors, in 3D - 26 and
many of them are transparent
Point Cloud Compression basic principles
24. 2014 2015 2016 2017 2018 2019 2020
MPEG initiated
the work on
PCC
G-PCC
10/2020
V-PCC
07/2020
First Committee Draft
issued in October 2018
In April 2017 MPEG
issued a Call for
Proposals
9 technology leading companies
responded and MPEG evaluated them in
October 2017
0
5
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Patch
generation
Packing
Geometry
image
generation
Texture
image
generation
Occupancy map
compression
Image
padding
Compressed
bitstream
Input
point
cloud
frame
Occupancy
map
Auxiliary patch-info
compression
Patchinfo
Texture
images
Geometry
images
Padded
geometry
images
Padded
texture
images
Compressed
geometry
video
Compressed
Texture
video
multiplexer
Compressed
occupancy
map
Compressed
auxiliary patch
information
Reconstructed
geometry imagesSmoothing
Video
Compression
Smoothed
geometry
V-PCC Video-
based PCC
G-PCC Geometry-
based PCC
Point Cloud Compression in MPEG
25. Video-based Point Cloud Compression
Main ideas:
(1) a point coordinate is encoded as a distance with respect to a
particular plane inspired from he displacemen mapping in
Graphics
Pixel intensity Vertex Height
26. Video-based Point Cloud Compression
Main ideas:
(2) the color (or any attribute) associated to a 3D vertex is
encoded in a 2D texture inspired from he e re mapping in
Graphics
Vertex color Pixel color
27. Video-based Point Cloud Compression
Projecting all the points on a
single plane would result to
several 3D points having the same
2D projection - > several depth
values should be stored per pixel
28. Video-based Point Cloud Compression
Projecting per patch is
preferred:
- A set of points (patch) in
a small neighborhood is
projected on the same
plane
- The set of projection
planes is very limited
- 6 faces of the cube
- 4 additional diagonal planes
29. Video-based Point Cloud Compression
Encoding the 3D point clouds as a set of 2D patches
Geometry
Color (Attributes)
30. Video-based Point Cloud Compression
Encoding the 3D point clouds as a set of 2D patches
- For enforcing lossless, the missed points are encoded separately
=
31. Video-based Point Cloud Compression
Encoding the 3D point clouds as a set of 2D videos: depth, color
and occupancy maps
MPEG is very
good in video
coding!
Problem solved
32. Video-based Point Cloud Compression
Encoding 3D point clouds as a set of 2D videos: color, depth and occupancy map
100,000 points @ 30fps 360 Mbps (uncompressed)
1 Mbps (MPEG PCC 2020)
7 Mbps 4.4 Mbps
33. Video-based Point Cloud Compression
V-PCC implementations publicly available
Integrated real-time decoder and renderer
source code is also available for Android,
Windows & Linux
www.mpeg-pcc.org
34. Video-based Point Cloud Compression
Beyond V-PCC
ISO/IEC FDIS 23090-5 Visual Volumetric Video-based Coding and Video-
based Point Cloud Compression
Visual Volumetric Video-based Coding is an MPEG framework for 3D to
2D projection based coding technologies
- used by V-PCC
- used by MIV (MPEG Immersive Video)
- to be used for future projects (Dynamic Mesh Coding)
39. Example encoder source
16 physical cameras:
• 2K × 1K
• Perspective projection
• Geometry (=depth range)
• Texture attribute (YCbCr)
210 mm
210 mm
40. MIV codec model
Multi-view video
• Geometry (G)
• Texture attribute (T)
• View parameters
MIV
encoder
MIV
decoder/
renderer
V3C bitstream Reconstruction
Viewing
space
Viewport video
• (Geometry)
• Texture attribute
Original
T
G
Atlas
Complete
(basic) view
Patches from additional views
41. Bitstream structure
V3C unit stream
V3C
parameter set
V3C
unit
V3C
unit
V3C
unit
V3C
unit
Sub bitstream
V3C unit
Access unit Access unit Access unit…
V3C unit
Access unit Access unit Access unit…
Sub bitstreams:
• Common atlas data (has view parameters)
• Multiple atlases:
• Geometry video data
• Attribute video data
• Atlas data (has patch parameters)
42. Test model – Encoder
Attribute
video data
Geometry
video data
Parameters
Camera data
Format
Bitstream
(V3C sample
stream with MIV
extensions)
Source views
View parameters
Geometry video data
Attribute video data
Pack patches
Into atlases
Geometry
video data
(raw)
Attribute
video data
(raw)
Encode
video sub
bitstreams
(HEVC)
Atlas data
Parameter set
View parameters list
Bitstream
(one file)
Multiplex
Automatic parameter selection
(geometry quality, basic/additional views, atlas frame sizes)
SEI messagesSEI messages
Prune views
(Flag redundant
pixels)
(Simplified)
43. Test model – Decoder/renderer
Filter out blocks
Color code
Core processes
Filter viewport
Decoded
access unit
(all conformance points)
Patch
culling
Pruned
view
reconstruction
View
synthesis
Inpainting
Viewing
space
handling
Viewport
Viewport
parameters
Geometry
upscaling
(Simplified)
44. Discussion
• Flexible standard for multiview video with depth:
• Video codec agnostic (e.g. HEVC, VVC, …)
• MIV Main uses a subset of V3C
• Extensible with more V3C features
(multiple attributes, occlusion video data, SEI messages, etc.)
• MIV-specific extensions
(coding per group of views, auxiliary patches, object-based coding, etc.)
• Please participate:
• Test model: https://gitlab.com/mpeg-i-visual/tmiv
• Test material (14 sequences) available on request
45. Carriage of VV and EVC in MPEG Systems
Youngkwon Lim
Chair of MPEG Systems
young.L@Samsung.com
46. 2
What is carriage?
Video Coding Standard
ISO/IEC 13818-1
MPEG-2 Systems
Delivery over MPEG-2 TS ISO/IEC 14496-15
NAL File Format
Storage and Delivery
over ISOBMFF
ISO/IEC 14496-12
ISO Base Media File Format
ISO/IEC 23008-12
Image file format
Storage and Delivery over ISOBMFF
as a image or image sequence
ISO/IEC 23000-19
Common Media Application Format
Brands definition for CMAF Segments
with a specific video codec
ISO/IEC 23009-1
Media Presentation Description and Segment Formats
MPEG-DASH extension for a specific video codec
ISO/IEC 23000-22
Multi Image Application Format
Brands definition for Image File Format
with a specific video codec
ISO/IEC 23008-1
MPEG Media Transport
Delivery over MMT
47. 3
13818-1 MPEG-2 Systems
ISO/IEC 13818-1:2019 AMD 2 Carriage of VVC in MPEG-2 TS
• Current Stage : DAM
• ETA for Final Stage : 2021/04
• Features
• VVC data alignment with PES packets
• VVC video descriptor and VVC HRD descriptor
• Constraints on transport of VVC bitstream
• T-STD extension for single layer VVC and layered temporal video subsets
ISO/IEC 13818-1:2019 AMD 3 Carriage of EVC in MPEG-2 TS and update of the MPEG-H 3D Audio descriptor
• Current Stage : CDAM
• ETA for Final Stage : 2021/07
• Features
• EVC data alignment with PES packets
• descriptors carrying metadata for EVC elementary streams
• constraints for the transport of EVC elementary streams
• the T-STD buffer model for EVC elementary streams
48. 4
14496-15 NAL File Format
• Current Stage : DAM
• ETA for Final Stage : 2021/04
• VVC related features
• definition of sample, sub-sample, sync sample, decoder configuration record and etc.
• storage format for single-layer VVC (ISO/IEC 23090-3) video streams
• storage of multiple layers in one track or each layer/sub-layer in its own track
• storage format for VVC bitstreams with more than one layer.
• EVC related features
• definition of sample, sub-sample, sync sample, decoder configuration record and etc.
• storage format for single-layer EVC video streams
ISO/IEC 14496-15:2019 AMD 2 Carriage of VVC and EVC in ISOBMFF
49. 5
23008-12 Image file format
• Current Stage : CDAM
• ETA for Final Stage : 2021/07
• VVC related features
• definition of image item, sub-sample item, VVC operating points information property, subpicture items and etc.
• definition of VVC image sequences
• definition of VVC-specific brands, vvic for image and image collections and vvcc for image sequence
• EVC related features
• definition of image item, sub-sample item and etc.
• Definition of EVC-specific brands
• evbi and evbs for EVC baseline profile image and image sequence, respectively
• evmi and evms for EVC main profile image and image sequence, respectively
ISO/IEC 23008-12:2017 AMD 3 Support for VVC, EVC, slideshows and other
improvements
50. 6
23008-1 MPEG Media Transport
• Current Stage : CDAM
• ETA for Final Stage : 2021/07
• Features
• Use of CMAF track constraints for MPU
ISO/IEC 23008-1 3rd edition AMD 2 Carriage of EVC in MMT