2. Section I
– ISO/IEC JTC 1/SC 29 Structure and MPEG
– ITU-T structure and VCEG (Video Coding Experts Group or Visual Coding Experts Group)
– A Generic Interframe Video Encoder
– H.261 Video Coding Standard
– MPEG-1 Video Coding Standard
– MPEG-2 Video Coding Standard
Section II
– MPEG-2 Transport and Program Streams
– H.263 Video Coding Standard
– H.263+ Video Coding Standard
– H.263++ Video Coding Standard
– Bit-rate (R) and Distortion (D) in Video Coding
2
Outline
3. JTC1
IEC ISO
SC 29
RAAGM
AG
WG12WG11WG1
WG
JBIG
JPEG
SG
MHEG-5
Main- tenance
MHEG-6
SG
Audio
SNHC
System
Video
Requirements
Implementation Studies
Test
SG
Liaisons
Advisory Group (AG) on Management (AGM)
• To advise SC 29 and its WGs on matters of management that
affect their works.
Advisory Group (AG) on Registration Authority (RA)
WG1: Still images, JPEG and JBIG
• Joint Photographic Experts Group and
Joint Bi-level Image Group
WG11: Video, MPEG
• Motion Picture Experts Group
WG12: Multimedia, MHEG
• Multimedia Hypermedia Experts Group
International
Standardization
Organization
Subcommittee 29
Title: “Coding of Audio, Picture, Multimedia and Hypermedia Information”
Joint Technical Committee
ISO/IEC JTC 1/SC 29 Structure and MPEG
MPEG (Moving Picture Experts Group, 1988 )
To develop standards for coded representation of
digital audio, video, 3D Graphics and other data
International
Electrotechnical
Committee
3
4. Telecommunication Standardization
Advisory Group (TSAG)
WTSA
World Telecommunication
Standardization Assembly
SG
Workshops,
Seminars,
Symposia
…
IPRs (Intellectual
Property Rights)
WP
Questions: Develop Recommendations
SG
WP WP
Q
Focus
Group
VCEG (ITU-T SG16/Q6) )
• Study Group 16
Multimedia terminals, systems and
applications
• Working Party 3
Media coding
• Question 6
Video coding
Rapporteurs (R):
Mr Gary SULLIVAN, Mr Thomas WIEGAND
SG16
WP3
4
ITU-T structure and VCEG (Video Coding Experts Group or Visual Coding Experts Group)
Administrative Entities
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q6 VCEG
5. 5
ITU, International Telecommunication Union structure
− Founded in 1865, it is the oldest specialized agency of the United Nations system
− ITU is an International organization where governments, industries, telecom operators, service providers
and regulators work together to coordinate global telecommunication networks and services
− Help the world communicate!
− What does ITU actually do?
• Spectrum allocation and registration
• Coordinate national spectrum planning
• International telecoms/ICT standardization
• Collaborate in international tariff-setting
• Cooperate in telecommunications development assistance
• Develop measures for ensuring safety of life
• Provide policy reviews and information exchange
• Insure and extend universal Telecom access
6. 6
ITU, International Telecommunication Union structure
− Plenipotentiary Conference: Key event, all ITU Member States decide on the future role of the organization
(Held every four years)
− ITU Council: The role of the Council is to consider, in the interval between Plenipotentiary Conferences,
broad telecommunication policy issues to ensure that the Union's activities, policies and strategies fully
respond to today's dynamic, rapidly changing telecommunication environment (held yearly)
7. 7
ITU, International Telecommunication Union structure
− General Secretariat: Coordinates and manages the administrative and financial aspects of the Union’s activities
(provision of conference services, information services, legal advice, finance, personnel, etc.)
− ITU-R: Coordinates radio communications, radio-frequency spectrum management and wireless services.
− ITU-D: Technical assistance and deployment of telecom networks and services in developing and least developed
countries to allow the development of telecommunication.
− ITU-T: Telecommunication standardization on a world-wide basis. Ensures the efficient and on-time production of high
quality standards covering all fields of telecommunications (technical, operating and tariff issues). (The Secretariat of ITU-T
(TSB: Telecommunication Standardization Bureau) provides services to ITU-T Participants)
8. 8
ITU, International Telecommunication Union structure
Telecommunication Standardization Bureau (TSB) (Place des Nations, CH-1211 Geneva 20)
− The TSB provides secretarial support for ITU-T and services for participants in ITU-T work (e.g. organization of meeting,
publication of Recommendations, website maintenance etc.).
− Disseminates information on international telecommunications and establishes agreements with many international SDOs.
Mission of ITU-T Standardization Sector of ITU
− Helping people all around the world to communicate and to equally share the advantages and opportunities of
telecommunication reducing the digital divide by studying technical, operating and tariff matters to develop
telecommunication standards (Recommendations) on a worldwide basis.
9. 9
ITU, International Telecommunication Union structure
World Telecommunication Standardization Assembly (WTSA)
− WTSA sets the overall direction and structure for ITU-T, meets every four years and for the next four-year period:
• Defines the general policy for the Sector
• Establishes the study groups (SG)
• Approves SG work programmes
• Appoints SG chairmen and vice-chairmen
Telecommunication Standardization Advisory Group (TSAG)
− TSAG provides ITU-T with flexibility between WTSAs, and reviews priorities, programmes, operations, financial matters and
strategies for the Sector (meets ~~ 9 months )
• Follows up on accomplishment of the work programme
• Restructures and establishes ITU-T study groups
• Provides guidelines to the study groups
• Advises the TSB Director
• Produces the A-series Recommendations on organization and working procedures
10. • ISO/IEC MPEG = “Moving Picture Experts Group”
(ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee 1, Subcommittee 29, Working Group 11)
• ITU-T VCEG = “Video Coding Experts Group”
(ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector (ITU-T,
a United Nations Organization, formerly CCITT), Study Group 16, Working Party 3, Question 6)
• JVT = “Joint Video Team”
Collaborative team of MPEG & VCEG, responsible for developing AVC (discontinued in 2009)
• JCT-VC = “Joint Collaborative Team on Video Coding”
Team of MPEG & VCEG , responsible for developing HEVC (established January 2010)
• JVET = “Joint Video Experts Team”
Exploring potential for new technology beyond HEVC (established Oct. 2015 as Joint Video Exploration Team, renamed
Apr. 2018)
10
Video Coding Standardization Organizations
13. 13
H series are low delay codecs for telecom applications (International Telecommunication Union (ITU-T)
developed several recommendations for video coding)
• H.120 The first digital video coding standard
− H.261 (1990): the first video codec specification, “Video Codec for Audio Visual Services at p x 64kbps”
− H.262 (1995) : Infrastructure of audiovisual services—Coding of moving video
− H.263 (1996): next conf. solution, Video coding for low bit rate communications
− H.263+ (H.263V2) (1998)
− H.263++ (H.263V3)(2000), follow-on solutions
− H.26L: “long-term” solution for low bit-rate video coding for communication applications (Not backward
compatible to H.263+)
− H.264 (H.26L) completed in May 2003 and lead to H.264: known as advanced video coding (AVC)
− H.265/HEVC (2013) High Efficiency Video Coding
ITU H.26x History
14. 14
Motion Picture Experts Group (MPEG) codecs are designed for storage/broadcast/streaming applications
MPEG-1 (1992)
• Started in 1988 by Lenardo Chiariglione
• Compression standard for progressive frame-based video in SIF (360x240) formats
• Applications: VCD
MPEG-2 (1994-5)
• Compression standard for interlaced frame-based video in CCIR-601 (720x480) and high definition (1920x1088i)
formats
• Applications: DVD, SVCD, DIRECTV, GA, DVB, HDTV Studio, DTV Broadcast, DVD, HD, video standards for
television and telecommunications standards
MPEG-4 (1999)
• Multimedia standard for object-based video from natural or synthetic source
• Applications: Internet, cable TV, virtual studio, home LAN etc..
• Object-oriented
• Over-ambitious?
MPEG History
MPEG 21
MPEG-2
MPEG-1
MPEG-4
MPEG-7
15. 15
Motion Picture Experts Group (MPEG) codecs are designed for storage/broadcast/streaming applications
MPEG-7, 2001
• Standardized descriptions of multimedia information, formally called “Multimedia Content Description
Interface”
• Metadata for audio-video streams
• Applications: Internet, video search engine, digital library
MPEG-21, 2002
• Intellectual right protection propose
• Distribution, exchange, user access of multimedia data and intellectual property management
AVC (2003), also known as MPEG-4 version 10
• Conventional to HD
• Emphasis on compression performance and loss resilience
HEVC (2013) High Efficiency Video Coding
MPEG History
MPEG 21
MPEG-2
MPEG-1
MPEG-4
MPEG-7
16. 16
ITU and MPEG (ISO/IEC) have also worked together for joint codecs:
− MPEG-2 is also called H.262
− H.26L has lead to a codec now is called:
• H.264 in telecom
• MPEG-4 (version 10) in broadcast
• AVC (Advanced Video Coding) in broadcast
• Joint Video Team (JVT) Codec
− H.265/HEVC (2013) High Efficiency Video Coding
Joint ITU/MPEG
22. 22
Spatial Domain
− Elements are used “raw” in suitable combinations.
− The frequency of occurrence of such combinations is used to influence the design of the
coder so that shorter codewords are used for more frequent combinations and vice versa
(entropy coding).
Transform Domain
− Elements are mapped onto a different domain (i.e. the frequency domain).
− The resulting coefficients are quantised and entropy-coded.
Hybrid
− Combinations of the above.
Classification of Compression Techniques
23. Current Stage
Used since early days of video compression
standards, e.g. MPEG-1/-2/-4, H.264/AVC, HEVC and
also in most proprietary codecs (VC1, VP8 etc.)
Input Frame 1
,Q
23
A Generic Interframe Video Encoder
24. Input Frame 1 DCT
,Q
24
A Generic Interframe Video Encoder
30. Input Frame 2 Residual with MC (Frames 1&2)
,Q
30
Reconstructed Frame 1 with MC
A Generic Interframe Video Encoder
If the motion prediction is successful, the energy
in the residual is lower than in the original frame
and can be represented with fewer bits.
33. Reconstructed Residual with
MC (Frames 1&2)
QuantizedDCT
Residual with MC
(Frames 1&2)
,Q
33
A Generic Interframe Video Encoder
34. ,Q
34
Reconstructed Residual with
MC (Frames 1&2)
Reconstructed Frame 1
with MC
+
Reconstructed Frame 2
with MC
=
A Generic Interframe Video Encoder
36. 36
− All standard codecs follow the generic interframe codec of: DCT/DPCM/MC/VLC
− Their main differences lie on the way these elements are employed
• Block transform length and type
• Block size for Motion estimation and its precision
• Methods of VLC
• Quantisation
• Coding of quantised transform coefficients
• Addressing of data
• Preventing error propagation
• Various types of coding each frame
Generic Standard Codec
37. 37
− An earlier digital video compression standard, its principle of MC-based compression is retained in all later
video compression standards.
− The standard was designed for videophone, video conferencing and other audiovisual services over ISDN.
− The video codec supports bit-rates of p×64 kbps, where p ranges from 1 to 30 (Hence also known as p ' 64).
− Require that the delay of the video encoder be less than 150 msec so that the video can be used for real-
time bidirectional video conferencing.
− Problems:
• Error propagation
• In case of errors, it needs updating
Video Formats Supported by H.261
H.261 Standard
39. 39
H.261 Standard
− The coding parameters of the compressed video signal are multiplexed and then combined with the
audio, data and end-to-end signalling for transmission.
− The transmission buffer controls the bit rate, either by changing the quantiser step size at the encoder or, in
more severe cases, by requesting reduction in frame rate to be carried out at the preprocessor.
A block diagram of an H.261 audio-visual encoder
41. 41
Picture layer
Group of Blocks (GOB)
Macroblocks (MB)
Blocks
(CIF=352x288, QCIF=176x144)
(GOB=176x48)
(MB=16x16)
H.261 Layer Structures
GOBs within CIF GOBs within QCIF
352
288
176
144
16
16
1
3
5
7
9
11
2
4
6
8
10
12
macroblocks
within a GOB
1
3
5
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
Y 0
Y3
Y1
Y2
8
8
A macroblock structure
CrCb
Preventing error propagation
– Macroblock (MB) is the smallest
Coding Unit of video
– In the standard codecs, we only
define how a MB is coded
– How many Luma/Chroma blocks in
an MB, depends on picture format
(B=8x8)
42. MBA CODE MBA COOE
1 1 17 0000 0101 10
2 011 18 0000 0101 01
3 010 19 0000 0101 00
4 0011 20 0000 0100 11
5 0010 21 0000 0100 10
6 0001 1 22 0000 0100 011
7 0001 0 23 0000 0100 010
8 0000 111 24 0000 0100 001
9 0000 110 25 0000 0100 000
10 0000 1011 25 0000 0011 111
11 0000 1010 27 0000 0011 110
12 0000 1001 28 0000 0011 101
13 0000 1000 29 0000 0011 100
14 0000 0111l 30 0000 0011 011
15 0000 0110 31 0000 0011 010
16 0000 0101 11 32 0000 0011 001
33 0000 0011 000
MBA Stuffing 0000 0001 111
Start code 0000 0000 0000 0001
Macroblock Addressing (MBA)
MBA stuffing:
− An extra codeword in the table for bit stuffing immediately after a GOB header or a coded macroblock.
− This codeword should be discarded by decoders.
42
44. 44
Intra
Motion Vector
Coding Control
qz
Picture
Memory
+
-
Inter
DCT Q RLC+VLC
Motion
Estimation
Video
Input
+
Q
-1
IDCT
+
Loop
Filter
H.261 Standard
When too much data accumulates in transmission buffer, the
rate controller raises the quantization level to low down quality!
46. 46
H.261 Standard
COMP: a comparator for deciding inter/intra
coding mode for an MB
Th: threshold, to extend the quantisation range
T: transform coding blocks of 8 8 pixels
Q: quantisation of DCT coefficients
P: picture memory with motion-compensated
variable delay
F: loop filter
p: flag for inter/intra
t: flag for transmitted or not
q: quantisation index for transform coefficients
qz: quantiser indication
v: motion vector information
f: switching on/off of the loop filter
47. − For DC coefficients in Intra mode:
− For all other coefficients:
• scale — an integer in the range of [1, 31])
47
H.261 Standard
A uniform quantiser with threshold
49. 49
− In interframe coding in the event of channel error, the error
propagates into the subsequent frames. If that part of the
picture is not updated, the error can persist for a long time.
− The variance of intraframe MB is compared with that of the
variance of interframe MB (motion compensated or not)
in previous frame. The smallest is chosen.
• For large variances, no preference between the two
modes.
• For smaller variances, interframe is preferred.
− The reason is that, in intra mode, the DC coefficients of the
blocks have to be quantised with a quantiser without a
dead zone and with 8-bit resolutions. This increases the bit
rate compared to that of the interframe mode, and hence
interframe is preferred.
MC/NO_MC mode decision in H.261
Inter/Intra Switch
(Intraframe AC energy)
(Interframe AC energy)
51. 51
BD –Block Difference
DBD – Displaced Block Difference
X
X
3
2.7
MC
No MC
256
DBD
y
x
256
BD
1.5
0.5
1
DBD c[x, y] r[x dx, y dy]
256 MB
1
BD c[x, y] r[x, y]
256 MB
1
𝑦 = 𝑥/1.1
Motion Compensation Decision Characteristic
– Not all blocks are motion compensated
– The one which generates less bits are preferred.
52. Macro-block
– Motion estimation of a macroblock involves finding a 16×16-sample
region in a reference frame that closely matches the current
macroblock.
– Luminance: 16x16, four 8x8 blocks
– Chrominance: two 8x8 blocks
– Motion estimation only performed for luminance component
Motion Vector Range
– [ -15, 15]
– MB: 16 x 16
15
15
15 15
Search Area in Reference Frame
MB
52
Macro-block and Motion Vector Range
𝑪𝒓 𝑪𝒃
𝒀
𝒀 𝟎 𝒀 𝟏
𝒀 𝟐 𝒀 𝟑
54. 54
1) Motion Estimation for each Marco Block (MB)
MB: 16 x 16
Search range (Motion Vector Range): ±15
2) Select a compression mode
DBD = Displace Block Difference
= 𝑓(𝑥, 𝑦, 𝑡) − 𝑓(𝑥+Δ𝑥, 𝑦 + Δ𝑦, 𝑡 − 1)
3) Process each MB to generate a header followed by a data bit stream that is consistent
with the compression mode chosen.
H.261 Motion Estimation and Compression Modes
]1[][
]1[][
nMVnMVMVD
nMVnMVMVD
yyy
xxx
55. 55
Selection Considerations:
Variance of Macroblock
Macroblock Difference (DB)
Macroblock Displaced Macroblock Difference (DBD)
Determination Rules:
(a) If variance of DBD is smaller than BD
Inter + MC (Selected) (Motion vector must be transmitted)
otherwise:
Motion vector will not be transmitted
(b) Small variance : Intra
Large variance : Inter (Motion vector=8)
(c) Prediction error can be chosen to be modified by a 2-D spatial filter for each 8×8 block.
(separable coefficients with 1/4 1/2 1/4)
H.261 Mode Selection
56. 56
H.261 Mode Selection
Forced Updating
− The intraframe coded MB increases the resilience of H.261 codec to channel errors.
− In case in inter/intra MB decision, no intra mode is chosen, some of the MBs in a frame are forced to be
intra coded.
− The specification recommends that an MB should be updated at least once every 132 frames.
− This means that for CIF pictures with 396 MBs/frame, on average 3 MBs of every frame are intraframe
coded.
57. 57
Decision tree for macroblock type
Types of Macroblocks
1. Inter coded: interframe coded MBs with no motion vector or with a zero motion vector.
2. MC coded: motion-compensated MB, where the MC error is significant and needs to be
DCT coded.
3. MC not coded: these are motion-compensated error MBs, where the motion-
compensated error is insignificant. Hence, there is no need to be DCT coded.
4. Intra coded: intraframe coded MBs.
5. Skipped (not coded, fixed):
• If all the six blocks in an MB without MC have an insignificant energy, they are not
coded. These MBs are sometimes called skipped, not coded or fixed MBs.
• These types of MBs normally occur at the static parts of the image sequence. Fixed
MBs are therefore not transmitted, and at the decoder they are copied from the
previous frame.
• Since the quantiser step sizes are determined at the beginning of each GOB or
row of GOBs, they have to be transmitted to the receiver.
• Hence, the first MBs have to be identified with a new quantiser parameter.
• Therefore, we can have some new MB types:
6. Inter coded + Q
7. MC coded + Q
8. Intra + Q
58. 58
Addressing of Blocks
Once the type of an MB is identified and variable length coded, its position inside the GOB should also be
determined.
− The quantity of the combinations of the coded/noncoded blocks.
• Since an MB has six blocks, there will be 26
= 64 different state.
• Except the one with all six blocks not coded (fixed MB),
the remaining 63 are identified within 63 different patterns.
− The pattern information consists of a set of 63 Coded Block Pattern (CBP) indicating coded/noncoded
blocks within an MB.
− With a coding order of Y0, Y1, Y2, Y3, Cb and Cr, the block pattern information or pattern number is
defined as Pattern number
Where the coded and noncoded blocks are assigned 1 and 0, respectively.
𝑪𝒓 𝑪𝒃
𝒀
𝒀 𝟎 𝒀 𝟏
𝒀 𝟐 𝒀 𝟑
𝑷𝒂𝒕𝒕𝒆𝒓𝒏 𝑵𝒖𝒎𝒃𝒆𝒓 = 𝟑𝟐𝒀 𝟎 + 𝟏𝟔𝒀 𝟏 + 𝟖𝒀 𝟐 + 𝟒𝒀 𝟑 + 𝟐𝑪𝒃 + 𝑪 𝒓
59. 59
Addressing of Blocks
Examples of bit pattern for indicating the coded/not-
coded blocks in an MB (black, coded; white, not coded)
𝑷𝒂𝒕𝒕𝒆𝒓𝒏 𝑵𝒖𝒎𝒃𝒆𝒓 = 𝟑𝟐𝒀 𝟎 + 𝟏𝟔𝒀 𝟏 + 𝟖𝒀 𝟐 + 𝟒𝒀 𝟑 + 𝟐𝑪𝒃 + 𝑪 𝒓The pattern information is not transmitted for
Intracoded MB
− Each pattern number is variable length
coded.
− It should be noted that if an MB is intracoded,
its pattern information is not transmitted.
− This is because, in intraframe coded MB, all
blocks have significant energy and will be
definitely coded.
− In other words, there will not be any
noncoded blocks in an intra coded MB.
EX:
CBP = 1100112 = Transmitting Y1, Y2, Cr, Cb
= 4110
61. 61
Addressing of Blocks
Relative addressing of coded MB
− The overhead information for addressing of the positions of the coded MB is minimised if they are relatively
addressed to each other.
− Numbers represent the relative addressing value of the number of fixed MBs preceding a nonfixed MB.
− The GOB start code indicates the beginning of the GOB.
− These relative addressing numbers are finally variable length coded.
62. 62
Loop Filter
− At low bit rates the quantiser step size is normally large that can force many DCT coefficients to zero.
− If only the DC and a few AC coefficients remain, then the reconstructed picture appears blocky.
− When the positions of blocky areas vary from one frame to another, it appears as a high-frequency noise,
commonly referred to as mosquito noise.
− The blockiness degradations at the slant edges of the image appear as staircase noise.
− Coarse quantisation of the coefficients that results in the loss of high-frequency components implies that
compression can be modelled as a low-pass filtering process.
− These artefacts are to some extent reduced by using the loop filter. The low-pass filter removes the
highfrequency and block boundary distortions.
63. 63
Loop Filter
− Loop filtering is introduced after the
motion compensator to improve the
prediction.
− It should be noted that the loop filter
has a picture blurring effect.
− It should be activated only for blocks
with motion, otherwise, nonmoving
parts of the pictures are repeatedly
filtered in the following frames, blurring
the picture.
− The filtering should be applied for
coding rates less than 6×64 kbit/s (six
DCT blocks of an MB) and switched off
otherwise.
Coded pictures with loop filter: (a) 128 kbit/s and (b) 64 kbit/s
H.261 coded at (a) 128 kbit/s and (b) 64 kbit/s
65. 65
Bit-Stream Syntax
− The Picture layer: Picture Start Code (PSC) delineates
boundaries between pictures. TR (Temporal Reference)
provides picture time-stamp.
− The GOB layer: H.261 pictures are divided into regions
of 11×3 macroblocks, each of which is called a Group
of Blocks (GOB). (GQuant indicates the Quantizer to be
used in the GOB)
− The Macroblock layer: Each Macroblock (MB) has its
own Address indicating its position within the GOB,
Quantizer (MQuant: Quantizer for Macroblock), and six
8×8 image blocks (4 Y, 1 Cb, 1 Cr).
− The Block layer: For each 8×8 block, the bitstream starts
with DC value, followed by pairs of length of zero-run
(Run) and the subsequent non-zero value (Level) for
ACs, and finally the End of Block (EOB) code. The range
of Run is [0, 63]. Level reflects quantized values — its
range is [−127, 127] and Level )= 0.
67. Date format for Picture Layer
PSC TR Ptype PEI GOB
20 bits 5 bits 6 bits 1 bit Variable
1. PSC: Picture Start Code
2. TR: Temporal Reference
3. Ptype: Picture Type
4. PEI: Picture Extra Insertion
5. GOB Layer (Variable Length Codes)
VLC: Variable Length Coding
FLC: fixed length coding
Data Format of H.261
67
68. PSC: Picture Start Code: 20 bits
0000 0000 0000 0001 0000
(one code happen once in a picture)
TR: Temporal Reference: 5 bits (0-31)
Since the last transmitted picture, it is formed by incrementing its value in the previously transmitted picture
header by one plus the number of non-transmitted pictures.
(Each picture unit time: 1/30 or 1/29.97 second)
Format for Picture Layer
PSC TR Ptype PEI GOB
20 bits 5 bits 6 bits 1 bit Variable
68
69. Ptype: Information about the complete picture
Bit 1: Split screen indicator, "0" off; "1" on.
Bit 2: Document camera indicator, "0" off; "1" on.
Bit 3: Freeze Picture Release, "0" off; "1" on.
Bit 4: Source Format, "0" QCIF; "1" CIF.
Bit 5-6: Spare
PEI: Picture Extra Insertion Information (1 bit)
Bit 1: ,"0" No Pspare; "1" Pspare.
To determine if Pspare 1: + 8-bit Pspare
0: GOB; (usually PEI=0)
Pspare: Picture Spare Information ( 0/8/16 … bits )
• If PEI is set to "1", then 9 bits follow consisting of 8 bits of data (Pspare) and then another PEI bit to
indicate if further 9 bits follow and so on.
• Encoder must not insert Pspare until specified by the CCITT.
• Decoder must specify future "backward" compatible additions in SPARE
Format for Picture Layer
PSC TR Ptype PEI GOB
20 bits 5 bits 6 bits 1 bit Variable
69
70. PSC TR Ptype PEI Pspare
PEI=0
PEI=1
For 3 (12) GOBs, it will go 3 (12) times
Next Picture
GOB
Picture Layer Loop Structure
PSC TR Ptype PEI GOB
20 bits 5 bits 6 bits 1 bit Variable
70
71. 1. GBSC: Group of Block Start Code
2. GN: Group Number
3. Gquant: GOB quantization number
4. GEI: Group Extra Insertion
5. Gspare: GOB Spare
6. MB Data: Macroblock Data (Variable Length Code)
GOB Date structure
16 bits 4bits 5bits 1bit 0/8/16..bits
GBSC GN Gqunat GEI Gspare MB Data
Format for GOB Layer
71
72. GBSC: Group of Block Start Code
0000 0000 0000 0001
It is fixed and all codes will not occur again, otherwise the picture crash by finding the start code.
GN: Group Number 4 bits
0000 Reserved for PSC (should not be used)
13, 14, 15Reserved for future use
Gquant: 5 bits
A fixed length codeword which indicates the quantizer to be used in the group of block until overridden
by any subsequent Mquant.
GEI: Picture Extra Insertion Information (1 bit)
Bit 1: ,"0" No Gspare; "1" Gspare.
Gspare: Picture Spare Information ( 0/8/16 … bits )
Same as Pspare
Format for GOB Layer
16 bits 4bits 5bits 1bit 0/8/16..bits
GBSC GN Gqunat GEI Gspare MB Data
72
73. GBSC GN Gquant GEI Gspare
MB
Layer
Could run for at most 33 times!
GOP Layer Loop Structure
16 bits 4bits 5bits 1bit 0/8/16..bits
GBSC GN Gqunat GEI Gspare MB Data
73
74. Data structure of MB layer
1. MBA: Marcoblock Address
2. Mtype: Marcoblock Type
3. Mquant: Marcoblock quantization level
4. MVD: Motion Vector Difference
5. CBP: Coded block pattern
6. Block Data
5bits
MBA Mtype Mquant CBPMVD Block Data
Format for MB Layer
74
75. Macroblock
MBA: Macroblock Address
A variable length codeword indicating the position of a macroblock within a group of blocks
to indicateg the position of a macroblock in the GOB.
GOB
16
Y
Cr Cb
16
8 8
88
Format for MB Layer
5bits
MBA Mtype Mquant CBPMVD Block Data
1 2
12 13
23 24
11
22
33
75
76. MBA CODE MBA COOE
1 1 17 0000 0101 10
2 011 18 0000 0101 01
3 010 19 0000 0101 00
4 0011 20 0000 0100 11
5 0010 21 0000 0100 10
6 0001 1 22 0000 0100 011
7 0001 0 23 0000 0100 010
8 0000 111 24 0000 0100 001
9 0000 110 25 0000 0100 000
10 0000 1011 25 0000 0011 111
11 0000 1010 27 0000 0011 110
12 0000 1001 28 0000 0011 101
13 0000 1000 29 0000 0011 100
14 0000 0111l 30 0000 0011 011
15 0000 0110 31 0000 0011 010
16 0000 0101 11 32 0000 0011 001
33 0000 0011 000
MBA Stuffing 0000 0001 111
Start code 0000 0000 0000 0001
Macroblock Addressing (MBA)
MBA stuffing:
− An extra codeword in the table for bit stuffing immediately after a GOB header or a coded macroblock.
− This codeword should be discarded by decoders.
76
77. Mtype: Marcoblock Type
Mquant: (5 bits) - fixed length
Mquant signify the quantizer to be used for this and any following blocks in the GOB until overridden by
any Mquant:
1. Use for coding control
2. Can be adjusted to meet the bit rate required
3. Used to control image quality
MVD: Motion Vector Data: (Variable Length)
MVD is included for all MC macroblocks. MVD is obtained from the macroblock by subtracting the vector
of the preceding macroblock , except :
1. MVD for macroblocks #1, 12, 23
2. MBA does not represent a difference of 1
3. Mtype of the previous marcoblock was not MC
Mquant and MVD Codes
5bits
MBA Mtype Mquant CBPMVD Block Data
77
79. CBP: is present if indicated by Mtype.
The codeword gives a pattern number signifying those blocks in the macroblock for which at
least one transform coefficient is transmitted. The pattern number is
Where the coded and noncoded blocks are assigned 1 and 0, respectively.
CBP: Coded Block Pattern (Variable length)
5bits
MBA Mtype Mquant CBPMVD Block Data
𝑪𝒓 𝑪𝒃
𝒀
𝒀 𝟎 𝒀 𝟏
𝒀 𝟐 𝒀 𝟑
𝑷𝒂𝒕𝒕𝒆𝒓𝒏 𝒏𝒖𝒎𝒃𝒆 = 𝟑𝟐𝒀 𝟎 + 𝟏𝟔𝒀 𝟏 + 𝟖𝒀 𝟐 + 𝟒𝒀 𝟑 + 𝟐𝑪𝒃 + 𝑪 𝒓
79
80. MBA Mtype Mquant MVD
MVD
CBP
CBP Block
Layer
MBA STUFFING
5
6
3/4
1/2
MB Layer
MB Layer Loop Structure
5bits
MBA Mtype Mquant CBPMVD Block Data
80
81. 81
− A macroblock comprises four luminance blocks and one of each of the two colour difference
blocks
OR
− Data for a block consists of codewords for transform coefficients followed by an end of block
marker.
− The order of clock transmission is as
1 2
3 4
5 6
𝑌
𝐶𝑟 𝐶𝑏
TCOEFF EOB
𝑪𝒓 𝑪𝒃
𝒀
𝒀 𝟎 𝒀 𝟏
𝒀 𝟐 𝒀 𝟑
Block layer
EOB: End of Block
TCOFF EOB
Block Layer Loop Structure
83. − For Intra blocks the DC coefficient linearly quantized with a step size of 8 without dead-zone.
− The DC coefficient of all Intra Blocks are fixed length coded (FLC) with 8 bits.
− A nominally black block will give 0001 0000 and a nominally white one 1110 1011.
− The codes 0000 0000 and1000 0000 are not used.
− For Intra DC one, the Reconstruction Levels (RECs) are as following table:
Intra DC Coefficient Inverse Quantization
Reconstruction level (REC)
into inverse transform
0000 0001 (1) 8
0000 0010 (2) 16
0000 0011 (3) 24
0111 1111 (127) 1016
1111 1111 (255) 1024
1000 0001 (129) 1032
1111 1101 (253) 2024
1111 1110 (254) 2032
FLC (Fixed Length Coding)
83
84. − For all coefficients other than the Intra DC one, the Reconstruction Levels (RECs) are in the
range -2048 to 2047 and are given by clipping the results of the following formulae:
− Note: QUANT ranges from l to 31 and is transmitted by either Gquant or Mquant.
QUANT =“Odd”
REC = QUANT*(2*LEVEL+1); LEVEL > 0
REC = QUANT*(2*LEVEL1); LEVEL < O
QUANT =“Even”
REC = QUANT*(2*LEVEL+1)1; LEVEL > O
REC = QUANT*(2*LEVEL1)1; LEVEL < O
REC = 0; LEVEL=O
DCT Coefficient (except Intra DC) Inverse Quantization
84
86. 86
1 2 6 7 15 16 28 29
3 5 8 14 17 27 30 43
4 9 13 18 26 31 42 44
10 12 19 25 32 41 45 54
11 20 24 33 40 46 53 55
21 23 34 39 47 52 56 61
22 35 38 48 51 57 60 62
36 37 49 50 58 59 63 64
− Transform coefficient data is always present for all six blocks in a macroblock when MTYPE
indicates Intra.
− In other cases MTYPE and CBP signal which blocks-have coefficient data transmitted for them.
− The quantized transform coefficients are sequentially transmitted according to the zig zag
scan sequence as follows.
Ordering of DCT Coefficients or Transform Coefficient (TCOEFF)
87. 87
The most commonly occurring combinations of (RUN, LEVEL) are encoded with Variable
Length Codes.
The least commonly occurring combinations of (RUN, LEVEL) are encoded with a 20 bit word
consisting of 6 bits ESCAPE, 6 bits RUN and 8 bits LEVEL.
− There are two code tables for VLC:
• One being used for the first transmitted LEVEL in “Inter” and “Inter + MC” blocks
• One being used for all other LEVELs except DC in Intra blocks witch is fixed length coded with 8 bits.
DCT Coefficients Coding
88. 88
RUN LEVEL CODE
EOB 10
0 1 1s IF FIRST COEFFICIENT
0 1 11s NOT FIRST COEFFICIENT
0 2 0100 s
0 3 0010 1s
0 4 0000 1110 s
0 5 0010 0110 s
0 6 0010 0001 s
0 7 0000 0010 10 s
0 8 0000 0001 1101 s
0 9 0000 0001 1000 s
0 10 0000 0001 0011 s
0 11 0000 0001 0000 s
0 12 0000 0000 1101 0s
0 13 0000 0000 1100 1s
0 14 0000 0000 1100 0s
0 15 0000 0000 1011 1s
RUN LEVEL CODE
1 1 011s
1 2 0001 10s
1 3 0010 0101 s
1 4 0000 0011 00s
1 5 0000 0001 1011 s
1 6 0000 0000 1011 0s
1 7 0000 0000 1010 1s
2 1 0101 s
2 2 0000 100s
2 3 0000 0010 11s
2 4 0000 0001 0100 s
2 5 0000 0000 1010 0s
3 1 0011 1s
3 2 00l0 0100 s
3 3 0000 0001 1100 s
3 4 0000 0000 1001 1s
4 1 0011 0s
4 2 0000 0011 11s
4 3 0000 0001 0010 s
VLC Table for TCOEFF (1)
End of Block (EOB)
− It is in this set.
− Because CBP indicates
those blocks with no
coefficient data, the
EOB cannot occur as
the first coefficient.
− Hence, the EOB can
be removed from the
VLC table for the first
coefficient
90. The least commonly occurring combinations of (RUN, LEVEL) are encoded with a 20 bit word
consisting of 6 bits ESCAPE, 6 bits RUN and 8 bits LEVEL.
Fixed Length Coding Table for TCOEFF
RUN is a 6-bit LEVEL is an 8-bit
fixed length code fixed length code
RUN CODE LEVEL CODE
0 0000 00 -128 FORBIDDEN
1 0000 01 -127 1000 0001
2 0000 10
-2 1111 1110
-1 1111 1111
63 1111 11 0 FORBIDDEN
1 0000 0001
2 0000 0010
127 0111 1111
The last bit "s" denotes the sign of the level,
"0” for positive,"1" for negative 90
92. Examples of FLC (Fixed Length Coding)
− PSC: Picture Start Code, 20 bits
− TR: Temporal Reference, 5-bit
− PTYPE: Picture Type, 6 bits
− PEI: Extra insertion information (1 bit) – set if
PSPARE to follow.
− PSPARE: Extra information (0/8/16. . .bits) – not
used, always followed by PEI.
− GBSC: GOB Start Code, 16 bits
− GN: Group Number, 4 bits, indexing 12 GOBs
− GQUANT: Group Quantization information, 5 bits
− MQUANT: MB Quantization information, 5 bits
− EOB: End-of-Block
92
Bit-Stream Syntax, FLC and VLC Loop Structures Summary
Examples of VLC (Variable Length Coding)
− MBA: MB Address, indexing MBs within a GOP,
11 bits max
− MTYPE: MB Type information
− GEI: Same function and size as PEI.
− GSPARE: Same function and size as PSPARE.
− MVD: Motion Vector Data, 11 bits max, 32 VLCs
− CBP: Coded Block Pattern, 9 bits max, 63 VLCs
− TCOEFF: Transform Coefficients
93. 93
− The Problem: H.261 is typically used to send data over a constant bit rate channel, such as ISDN (e.g.
384kbps).
− The encoder output bit rate varies depending on amount of movement in the scene.
− Therefore, a rate control mechanism is required to map this varying bit rate onto the constant bit rate
channel.
Rate Control
94. 94
− The encoded bitstream is buffered and the buffer is emptied at the constant bit rate of the channel
− An increase in scene activity will result in the buffer filling up
• The quantization step size in the encoder is increased which increases the compression factor and reduces
the output bit rate
− If the buffer starts to empty, then the quantization step size is reduced which reduces compression
and increases the output bit rate.
− The compression, and the quality, can vary considerably depending on the amount of motion in
the scene
• Relatively "static" scenes lead to low compression and high quality
• “Active" scenes lead to high compression and lower quality
Encoder
Rate Ctrl
Channel
Buffer
Video
Sequence
Rate Control
95. − Even when channel coding is used, some residual (transmission) errors may end at the source decoder.
− Residual errors may be detected at the source detector due to syntactical and semantic
inconsistencies.
− For digital video, the most basic error concealment techniques imply:
− Repeating the co-located data from previous frame
− Repeating data from previous frame after motion compensation
− Error concealment for non-detected errors may be performed through post-processing.
95
Error Concealment
98. What Is MPEG?
– MPEG is an encoding and compression system for digital multimedia content defined by the
Motion Pictures Expert Group (MPEG).
– MPEG reduces the amount of data needed to represent video many times over, but still
manages to retain very high picture quality.
– MPEG can compress both audio & video
– Similar to the reference model in H.261, software-based reference codecs for laboratory
testing have also been thought for MPEG-1 and MPEG-2. For these codecs, the reference
codec is called the Test Model (TM).
98
99. − Coding of moving pictures and associated audio for digital storage media (Standard ISO/IEC
11172-2 (1991))
− The MPEG-1 video coding algorithm is largely an extension of H.261, and many of the features
are common. Their bitstreams are, however, incompatible, although their encoding units are
very similar.
− MPEG-1 is the first generation of video codecs proposed by the MPEG as a standard to
provide video coding for digital storage media or DSM (other than the conventional analogue
video cassette recorders (VCRs))
− Since coding for digital storage can be regarded as a competitor to VCRs, MPEG-1 video
quality at the rate of 1–1.5 Mbit/s is expected to be comparable to VCRs.
99
MPEG-1 Standard
100. − Designed for up to 1.5 Mbit/sec (Although in most applications the MPEG-1 video bit rate is in
the range of 1–1.5 Mbit/s, the international standard does not limit the bit rate, and higher bit
rates might be used for other applications)
− A popular standard for video on the Internet, transmitted as .mpg files.
− Standard for the compression of moving pictures and audio.
− Level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as
MP3.
− Optimized & used for storing movies on CD ROM
− Supports progressive images, non-interlaced video (Interlaced sources have to be converted
to a non-interlaced format before coding.)
100
MPEG-1 Standard
101. Video
− Optimized for bitrates around 1.5 Mbit/s
− Originally optimized for SIF picture format, but not limited to it:
• 352x240 pixels a 30 frames/sec [ NTSC based ]
• 352x288 pixels at 25 frames/sec [ PAL based ]
− Progressive frames only - no direct provision for interlaced video applications, such as broadcast television
Audio
− Joint stereo audio coding at 192 kbit/s (layer 2)
System
− Mainly designed for error-free digital storage media
− Multiplexing of audio, video and data
Applications
− CD-I, digital multimedia, and video database (e.g. video-on-demand)
101
MPEG-1 Standard (Standard ISO/IEC 11172-2 (1991))
102. Source Input
− Supports only 352 * 240 resolution
− All the three main picture types, I, P and B, have the same SIF size with 4:2:0 format.
− (In SIF-625, the luminance part of each picture has 360 pixels, 288 lines and 25 Hz, and those of each chrominance
are 180 pixels, 144 lines and 25 Hz)
− Before we describe how I-frames are encoded, we should describe our input.
− 3 planes of Y, U, V
• 8 bits per pixel.
• Y range [0,255].
• U and V range [-128,127] (U and V biased by 128 to put in range [0,255])
− Planes are all of the same size.
− Pixels colocated between frames.
MPEG-1 Standard
102
103. 103
MPEG-1 Standard
H.261 MPEG-1
Sequential Access Random Access
One basic frame rate Flexible frame rate
OCIF and CIF images only Flexible image size
I and P frame only I, P, and B frames
MC over 1 frame MC over 1 or more frame
1 pixel MV accuracy 1/2 pixel MV accuracy
121 filter in the loop No filter
Variable threshold + Uniform quantiz. Quantization Matrix
No GOP structure GOP structure
GOB structure Slice structure
104. − The MPEG-1 standard gives the syntax description of how audio, video and data are combined
into a single data stream. This sequence is formally termed as the ISO 11172 stream.
− It consists of a compression layer and a systems layer.
104
Systems Coding Outline
To support the combination of video
and audio elementary streams
Multiplexing of elementary
audio, video and data
105. − The MPEG-1 systems standard defines a packet structure for multiplexing coded audio and video
into one stream and keeping it synchronised.
− A pack consists of a pack header that gives the systems clock reference (SCR) and the bit rate of
the multiplexed stream followed by one or more packets.
− Each packet has its own header that conveys essential information about the elementary data
that it carries.
− The basic functions in systems layer are as follows:
• Synchronised presentation of decoded streams
• Construction of the multiplexed stream
• Initialisation of buffering for playback start-up
• Continuous buffer management
• Time identification
105
Systems Coding Outline
106. Multiplexing elementary streams
− The multiplexing of elementary stream (ES) of audio, video and data is performed at the packet
level.
− Each packet thus contains only one elementary data type.
− The systems layer syntax allows up to 32 audio, 16 video and 2 data streams to be multiplexed
together.
− If more than two data streams are needed, substreams may be defined.
106
Systems Coding Outline
107. 107
Systems Coding Outline
ES Packetization process into MPEG-1 PS Stream (Packs)
Packet
Header
Packet
Payload
Pack
Header
Pack
Payload
108. 108
Systems Coding Outline
MPEG-1 PS bitstream and its time related fields
SCR: Systems Clock Reference
STD: System Target Decoder
PTS: Presentation Time Stamp
DTS: Decoding Time Stamp
109. Synchronisation
− Prototypical encoder and decoder of MPEG-1,
illustrating end-to-end synchronisation
• STC: Systems Time Clock
• SCR: Systems Clock Reference
• PTS: Presentation Time Stamp
• DSM: Digital Storage Media
109
Systems Coding Outline
110. Synchronisation
− Multiple elementary streams are synchronised by means of Presentation Time Stamps (PTS) in the ISO
11172 bit stream (by recording time stamps during capture of raw data)
− The receivers will then make use of these PTS in each associated decoded stream to schedule their
presentations.
− Playback synchronisation is pegged onto a master time base, which may be extracted from one of the
elementary streams, DSM, channel or some external source.
− The occurrences of PTS and other information such as SCR and systems headers will also be essential for
facilitating random access of the MPEG-1 bitstream.
− This set of access codes should therefore be located near to the part of the elementary stream where
decoding can begin. In the case of video, this site will be near the head of an intraframe.
110
Systems Coding Outline
112. • Intraframe Compression
– Frames marked by (I) denote the frames that are strictly intraframe compressed.
– The purpose of these frames, called the "I pictures", is to serve as random access points
to the sequence.
I Frames
112
113. • P Frames use motion-compensated forward predictive compression on a block basis.
– Motion vectors and prediction errors are coded.
– Predicting blocks from closest (most recently decoded) I and P pictures are utilised.
Forward Prediction
P Frames
113
114. • B frames use motion-compensated bi-directional predictive compression on a block basis.
– Motion vectors and prediction errors are coded.
– Predicting blocks from closest (most recently decoded) I and P pictures are utilised.
Forward Prediction
Bi-Directional Prediction
B Frames
114
Backward Prediction
115. • Relative number of (I), (P), and (B) pictures can be arbitrary.
• Group of Pictures (GOP) is the Distance from one I frame to the next I frame
1 2 3 4 5 6 7 8 9 10 11 12 1
GOP = 12
Group of Pictures
115
116. 1 2 3 4 5 6 7 8 9 10 11 12 1
Source and Display Order
Transmission Order
116
Structure of the Coded Bit-Stream, Example
117. I-pictures
• They are coded without reference to the previous picture.
• They provide access points to the coded sequence for decoding (intraframe coded as for JPEG)
P-pictures
• They are predictively coded with reference to the previous I- or P-coded pictures.
• They themselves are used as a reference (anchor) for coding of the future pictures.
B-pictures
• Bidirectionally coded pictures, which may use past, future or combinations of both pictures in their
predictions.
D-pictures
• As intraframe coded, where only the DC coefficients are retained.
• Hence, the picture quality is poor and normally used for applications like fast forward.
• D-pictures are not part of the GOP; hence, they are not present in a sequence containing any other
picture types. 117
Structure of the Coded Bit-Stream
118. Group of pictures and Reordering
− I and P pictures are called “anchor” pictures
− A GOP is a series of one or more pictures to assist random access into the picture sequence.
− The GOP length is normally defined as the distance between I-pictures, which is represented by
parameter N in the standard codecs.
− The distance between the anchor I/P and P-pictures is represented by M.
− The encoding or transmission order of pictures differs from the display or incoming picture order.
− This reordering introduces delays amounting to several frames at the encoder (equal to the number of B-
pictures between the anchor I- and P-pictures).
− The same amount of delay is introduced at the decoder in putting the transmission/ decoding sequence
back to its original. This format inevitably limits the application of MPEG-1 for telecommunications.
− A GOP, in coding, must start with an I picture and in display order, must start with an I or B picture and
must end with an I or P picture
118
Structure of the Coded Bit-Stream
119. 119
Structure of the Coded Bit-Stream
Video Sequence
... ...
Group of Pictures
Picture
Slice
Macroblock
8
pixels
8
pixels
Block
120. Video Sequence
– Begins with a sequence header and ends with an end-of-sequence code.
– It includes one or more groups of pictures.
Group of Pictures (GOP)
– A Header and a series of one or more pictures intended to allow random access into the
sequence.
120
Structure of the Coded Bit-Stream
Video Sequence
... ...
Group of Pictures
Picture
Slice
Macroblock
8
pixels
8
pixels
Block
121. Picture
• The primary coding unit of a video sequence.
A picture consists of three rectangular matrices representing
luminance (Y) and two chrominance (Cb and Cr) values.
Slice
• Each picture is divided into a group of macroblocks, called
slices. Slices can have different sizes within a picture, and
different division in pictures.
• The reason for defining a slice is resetting the variable length
code (VLC) to prevent channel error propagation into the
picture. Each slice is coded independently from the other
slices of the picture.
• Slice are important in the handling of errors. If the bit stream
contains an error, the decoder can skip to the next slice.
121
Structure of the Coded Bit-Stream
Video Sequence
... ...
Group of Pictures
Picture
Slice
Macroblock
8
pixels
8
pixels
Block
122. − If the coded data are corrupted, and the decoder detects it, then it can search for the new slice, and
the decoding starts from that point.
− Each slice starts with a slice start code and is followed by a code that defines its position and a code
that sets the quantisation step size.
122
Structure of the Coded Bit-Stream
123. − To optimise the slice structure, that is, to give a good immunity from channel errors and at the same time
to minimise the slice overhead, one might use short slices for macroblocks with significant energy (such
as intra MB) and long slices for less significant ones (e.g. macroblocks in B-pictures).
123
Structure of the Coded Bit-Stream
Short slices for macroblocks with significant energy
124. − The division of slices may vary from picture to picture.
− If "restricted slice structure" is applied, the slices must cover the whole pictures.
− If "restricted slice structure" is not applied, the decoder will have to decide what to do with that part of
the picture, which is not covered by a slice.
124
Structure of the Coded Bit-Stream
Restricted Slice StructureGeneral Slice Structure
A
B
C
G
E D
F
H
I
A
B
C
GE
D
F
H
I
J
K
OM
L
N
A
I
A
C
G
E
D
F
H
B
I
125. Macro block
• A portion of image that consists of 16x16 pixels and
comprises 4 blocks of luminance component and
1 block each of the 2 chrominance components.
• At this layer, motion compensation and prediction
are performed.
• Since a slice has a raster scan structure,
macroblocks are addressed in a raster scan order.
• The top left macroblock in a picture has address 0,
the next one on the right has address 1 and so on.
125
Structure of the Coded Bit-Stream
Video Sequence
... ...
Group of Pictures
Picture
Slice
Macroblock
8
pixels
8
pixels
Block
126. Macro block
• To reduce the address overhead, macroblocks
are relatively addressed by transmitting the
difference between the current macroblock and
the previously coded macroblock.
• This difference is called macroblock address
increment.
• In I-pictures, since all the macroblocks are coded,
the macroblock address increment is always 1.
• The first and last macroblocks of a slice, shall not
be skipped macroblocks.
126
Structure of the Coded Bit-Stream
Video Sequence
... ...
Group of Pictures
Picture
Slice
Macroblock
8
pixels
8
pixels
Block
127. Block and Color Sampling
127
4:2:0
Block
• A matrix of 8x8 elements.
• One of the ways rate control is achieved is by increasing the quantisation step size in blocks which would otherwise
have a higher entropy.
128. 128
YUV Y Only
YUV YUV
YUV
Sampling
Points
13.5 MHz
4:2:2
4:4:4
Recall, 4:4:4 & 4:2:2 Sampling
129. 129
YUV Y Only Y Only Y Only
4:2:0
YUV
Sampling
Points
13.5 MHz
4:1:1
Y V Y
Y U Y
JPEG/JFIF
H.261
MPEG-1
Recall, 4:1:1 & 4:2:0 MPEG-1 Sampling
130. 130
YUV Y Only Y Only Y Only
4:2:0
YUV
Sampling
Points
13.5 MHz
4:1:1
YV Y Only
YU Y Only
Co-sited
Sampling
MPEG-2
Recall, 4:1:1 & 4:2:0 MPEG-2 Sampling
131. 131
4:2:0
YV Y Only
YU Y Only
Co-sited
Sampling
MPEG-2
4:2:0 Sampling in MPEG-1 and MPEG-2
4:2:0
Y V Y
Y U Y
JPEG/JFIF
H.261
MPEG-1
Downsize chrominance Components.
• 4:2:0 (with chrominance samples centered)
• Requires bilinear interpolation
132. Structure of the Coded Bit-Stream, Summary
• Sequence layer: picture dimensions, pixel
aspect ratio, picture rate, minimum buffer size,
DCT quantization matrices
• GOP layer: will have one I picture, start with I or
B picture, end with I or P picture, has closed
GOP flag, timing info, user data
• Picture layer: temporal ref number, picture
type, synchronization info, resolution, range of
motion vectors
• Slices: position of slice in picture, quantization
scale factor
• Macroblock: position, H and V motion vectors,
which blocks are coded and transmitted
GOP-1 GOP-2 GOP-n
I B B B P B B..
Slice-1
Slice-2
…
Slice-N
MB-1 MB-2 MB-n
0 1
2 3 4 5
Sequence layer
GOP layer
Picture layer
Slice layer
Macroblock layer
8x8 block
132
134. Seq. Header
• Width
• Height
• Frame Rate
• Buffer Control
GOP Header
• Time Code
Picture Header
• Temporal Ref
• Picture Type
• Motion Vector Parameters
Picture Data Seq. End Code
• All headers begin with 23 zeroes followed by 9 bits that indicate header type.
• Encoding process will never produce 23 zeroes.
Headers in Structure of the Coded Bit-Stream
134
136. The main differences between this encoder and H.261
Frame reordering: at the input of the encoder, coding
of B-pictures is postponed to be carried out after
coding the anchor I- and P-pictures.
Quantisation: intraframe coded macroblocks are
subjectively weighted to emulate perceived coding
distortions.
Motion estimation: not only is the search range
extended but the search precision is increased to half a
pixel. B-pictures use bidirectional motion compensation.
No loop filter.
Frame store and predictors: to hold two anchor pictures
for prediction of B-pictures.
Rate regulator: here there is more than one type of
picture, each generating different bit rates.
136
MPEG-1 Encoder
137. − Within each picture, macroblocks are coded in a sequence from left to right.
− Since 4:2:0 image format is used, the six blocks of 8×8 pixels, four luminance and one of each
chrominance components are coded in turn.
− First, for a given macroblock, the coding mode is chosen. This depends on the picture type, the effectiveness of
motion-compensated prediction in that local region and the nature of the signal within the block.
− Second, depending on the coding mode, a motion-compensated prediction of the contents of the block based
on the past and/or future reference pictures is formed. This prediction is subtracted from the actual data in the
current macroblock to form an error signal.
− Third, this error signal is divided into 8×8 blocks and a DCT is performed on each block. The resulting DCT
coefficients is quantised and is scanned in zigzag order to convert into a one-dimensional string of quantised DCT
coefficients.
− Fourth, the side information for the macroblock, including the type, block pattern, motion vector and address
alongside the DCT coefficients are coded (The DCT coefficients are run length coded)
137
MPEG-1 Encoder
138. − The insensitivity of the human visual system to high-frequency distortions can be exploited for further
bandwidth compression.
− The DCT coefficients, prior to quantisation (-2047 to +2047), are divided by the weighting matrix.
− Weighted coefficients are then quantised by the quantisation step size, and at the decoder, reconstructed
quantised coefficients are then multiplied to the weighting matrix to reconstruct the coefficients.
138
Default Intra and Inter Quantisation Weighting Matrices
DCT Coefficients Weighting Matrix
Quantisation by
Quantisation Step Size
÷
139. Intra Quantisation Weighting Matrix
− Experience has shown that for SIF pictures, a suitable distortion weighting matrix for the intra-DCT
coefficients is the one shown in Figure. This intra matrix is used as the default quantisation matrix for
intraframe coded macroblocks.
Inter (or Nonintra) Quantisation Weighting Matrix (A flat matrix)
− The different weightings may not be used for interframe coded macroblocks.
− This is because high-frequency interframe error does not necessarily mean high spatial frequency.
(It might be due to poor motion compensation or block boundary artefacts).
139
Default Intra and Inter Quantisation Weighting Matrices
140. The strategy for motion estimation in this codec is different from the H.261 in four main respects:
1. Motion estimation is an integral part of the codec.
• The motion estimation in H.261 was optional.
2. Motion search range is much larger (larger search area).
• H.261 is normally used for head-and-shoulders pictures, where the motion speed is normally very small.
• In contrast, MPEG-1 is used mainly for coding of films with much larger movements and activities.
3. Higher precision of motion compensation is used.
• Motion estimation with half-pixel precision
4. B-pictures can benefit from bidirectional motion compensation.
• When B-pictures are present, due to various distances between a picture and its anchor, it is expected that
the search range for motion estimation to be different for different picture types.
• For normal scenes, the maximum search range for P-pictures is usually taken as 11 pixels/3 frames, and the
forward and backward motion range for B1-pictures are 3 pixels/frame and 7 pixels/2 frames, respectively.
These values for B2-pictures become 7 and 3.
140
Motion Estimation
141. Motion estimation with half-pixel precision
− The normal block matching with integer pixel positions is carried out first.
− Then eight new positions, with a distance of half a pixel around the final integer pixel, are tested.
141
Motion Estimation
Motion-compensated prediction error (a) with and (b) without half-pixel precision
142. Coding of Pictures
change
MQUANT
no change to
MQUANT
I picture
change
MQUANT
no change to
MQUANT
coded not coded
interframe
change
MQUANT
no change to
MQUANT
intraframe
motion comp.
A
motion vector
set to 0
P picture
A
Fwd motion
compensation
A
Bwd motion
compensation
A
interpolated
compensation
B picture
Picture Type
142
A
MQUANT: MB Quantization information
143. In I-pictures, all the macroblocks are intra coded.
− There are two intra macroblock types:
intra-d: one that uses the current quantiser scale
• Variable length coded with 1
• The default value when the quantiser scale is not changed
• no quantiser scale is transmitted and the decoder uses the previously set value.
intra-q: and the other that defines a new value for the quantiser scale, intra-q
• Variable length coded with 01
• The macroblock overhead should contain an extra 5 bits to define the new quantiser scale between 1 and 31
• In I-pictures of MPEG-1, an intra-q can be any of the macroblocks.
143
I-pictures Coding
144. DC indices are coded losslessly by DPCM (DC_DIFF)
− The quantiser step size is different for different coefficients and may change from MB to MB.
− The only exception is the DC coefficients, which are treated differently. This is because the eye is
sensitive to large areas of luminance and chrominance errors; then the accuracy of each DC value
should be high and fixed.
− The quantiser step size for the DC coefficient is fixed to eight. Since in the quantisation weighting matrix,
the DC weighting element is eight, then the quantiser index for the DC coefficient is always 1,
irrespective of the quantisation index used for the remaining AC coefficients.
− Because of the strong correlation between the DC values of blocks within a picture, the DC indices are
coded losslessly by DPCM (DC_DIFF).
− Such a correlation does not exist among the AC coefficients, and hence they are coded independently.
144
I-pictures Coding
145. DC indices are coded losslessly by DPCM (DC_DIFF)
− The prediction for the DC coefficients of luminance blocks follows the coding order of blocks within a
macroblock and the raster scan order.
− For example, in the macroblocks of 4:2:0 format pictures shown in Figure, the DC coefficient of block Y2
is used as a prediction for the DC coefficient of block Y3.
− The DC coefficient of block Y3 is a prediction for the DC coefficient of Y0 of the next macroblock.
− For the chrominance, we use the DC coefficients of the corresponding value of the block in the previous
macrobloc
145
I-pictures Coding
𝑪𝒓 𝑪𝒃
𝒀
𝒀 𝟎 𝒀 𝟏
𝒀 𝟐 𝒀 𝟑
146. DC term is expressed as difference from previous DC term (DC_DIFF)
Encoded as two parts:
– Size of difference (i.e., log(DC_DIFF))
– Size number of bits that provides the value.
Size is encoded as a Huffman code.
AC terms are given as (run,value) pairs.
Encoded in one of two ways:
– Huffman code for (run, abs(value)) followed by single bit for sign of value.
– Special Huffman code indicating ESCAPE, followed by 6 bits for run and either 8 or 16 bits for value.
• 6 bits for run simply encode 0 through 63
• First 8 bits of value put value at –128 to 127.
• If first 8 bits is -128, next 8 bits provide codes for –128 through –255
• If first 8 bits is 0, next 8 bits provide codes for 128 through 255.
DC and AC Terms Coding
Macroblock Block to be encoded
8
8
8 8
DCT
Q
sz
DPCM
DC
AC
ZigZag
Scanning
Runlength
Encoding
VLC
sz: Step Size
JPEG encoded DC
JPEG encoded AC
146
147. Similar to those of H.261
− 8 types of macroblocks for P-frames:
• intra-d and intra-q: the same as used in I-frames
• pred-m: the macroblock is forward-predictive encoded (difference from
the previous frame) using a forward motion vector
• pred-c: the macroblock is encoded using a coded pattern; a 6-bit
coded block pattern is transmitted as a variable-length code and this
tells the decoder which of the 6 blocks in the macroblock are coded (1)
and which are not coded (0)
• pred-mc: the macroblock is forward-predictive encoded using a forward
motion vector and also a 6-bit coded pattern is included
• pred-cq: a pred-c macroblock with a new quantization scale
• pred-mcq: a forward-predictive macroblock encoded using a coded
pattern with a new quantization scale
• skipped: they have a zero motion vector and no code; the decoder
copies the corresponding macroblock from the previous frame into the
current frame
147
P-pictures Coding
148. − The encoder has more decisions to make than in the case of P-pictures.
− These are how to divide the picture into slices; determine the best motion vectors to use; decide
whether to use forward, backward or interpolated motion compensation or to code intra; and how to
set the quantiser scale.
− The encoder first calculates the best forward motion-compensated macroblock from the previous
anchor picture for forward motion compensation.
− It then calculates the best motion-compensated macroblock from the future anchor picture, as the
backward motion compensation.
− Finally, the average of the two motion-compensated errors is calculated to produce the interpolated
macroblock. It then selects one that had the smallest error difference with the current macroblock.
− In the event of a tie, an interpolated mode is chosen.
148
B-pictures Coding
149. 149
B-pictures Coding
12 types of macroblocks for B-frames
• intra-d, intra-q: the same as used for I-frames
• pred-i: bidirectionally-predictive encoded macroblock with forward
motion vector and backward motion vector
• pred-ic: a pred-c macroblock encoded using a 6-bit coded pattern
• pred-b: backward-predictive encoded macroblock with backward
motion vector
• pred-bc: a pred-b macroblock encoded using a 6-bit coded
pattern
• pred-f: forward-predictive encoded macroblock with forward
motion vector
• pred-fc: a pred-b macroblock encoded using a 6-bit coded pattern
• pred-icq: a pred-ic macroblock with a new quantization scale
• pred-fcq: a pred-fc macroblock with a new quantization scale
• pred-bcq: a pred-bc macroblock with a new quantization scale
• skipped: the same as for P-frames.
152. 152
Layers of MPEG-1 Video Bit stream
• Video Sequence Layer Header contains: the picture size (horizontal and
vertical), pel aspect ratio, picture rate, bit rate, minimum decoder buffer size,
constraint parameters flag, control for loading 64-bit values for intra and
nonintra quantization tables and user data
• GOP layer header contains: the time interval from the start of the video
sequence, the closed GOP flag (decoder needs frames from previous GOP or
not?), broken link flag and user data
• Picture layer header contains: the temporal reference of the picture, picture
type (I,P,B,D), decoder buffer initial occupancy, forward motion vector
resolution and range for P- and B-frames, backward motion vector resolution
and range for B-frames and user data
• Slice layer header contains: vertical position where the slice starts and the
quantizer scale for this slice
• Macroblock layer header contains: optional stuffing bits, macroblock address
increment, macroblock type, quantizer scale, motion vector, coded block
pattern
• A block contains: 8x8 coded DCT coefficients
153. 153
MPEG-1 Bit Stream Organization
Seq.
Header
Block
Data
MB
Header
Slice
Header
Picture
Header
GOP
Header
156. − The incoming bitstream is stored in the buffer and is demultiplexed into the coding parameters such as
DCT coefficients, motion vectors, macroblock types and addresses.
− They are then variable length decoded using the locally provided tables.
− The DCT coefficients after inverse quantisation are inverse DCT transformed and added to the motion-
compensated prediction (as required) to reconstruct the pictures.
− The frame stores are updated by the decoded I- and P-pictures.
− Finally, the decoded pictures are reordered to their original scanned form.
− At the beginning of the sequence, the decoder will decode the sequence header, including the
sequence parameters.
156
MPEG-1 Decoder
157. Picture Header
Picture Data Row Major Scan of Encoded Macroblocks
Macroblock Address Increment (1-bit)
Macroblock Type (1 or 2 bits)
Q Scale (5 bits)
Luminance Blocks U Block V Block
Stepping Back a Bit in Decoder
DC Size (2-7 bits)
DC Bits (0-8 bits)
First Non-zero AC Coeff.
(variable bit length)
Last Non-zero AC Coeff.
(variable bit length)
EOB (2 bits) 157
158. Encoder
Output buffer
Decoder
Input buffer
Filled at a variable rate because the
encoder output bit rate is variable
(depends on how much change is
going on between frames)
If a fixed bit rate channel is used, then buffering is required.
Emptied at a constant
rate by the channel.
• Feedback mechanism detects when buffer is
at risk of over-flowing or under-flowing.
• This is used to adjust the degree of quantisation
– and hence the quality of the images being
transmitted.
Buffering
158
159. − A coded bitstream contains different types of pictures, and each type ideally requires a different number of bits to
encode.
− In addition, the video sequence may vary in complexity with time, and it may be desirable to devote more coding bits
to one part of a sequence than to another.
− For constant bit rate coding, varying the number of bits allocated to each picture requires that the decoder has a buffer
to store the bits not needed to decode the immediate picture.
− The extent to which an encoder can vary the number of bits allocated to each picture depends on the size of this
buffer (i.e. decoder buffer).
− large buffer → greater variations → increasing the picture quality → increasing the decoding delay
− The delay is the time taken to fill the input buffer from empty to its current level
− An encoder needs to know the size of the decoder’s input buffer in order to determine to what extent it can vary the
distribution of coding bits among the pictures in the sequence.
159
Video Buffer Verifier (VBV)
160. − The decoder will display the decoded pictures at their specific rate.
− If the display clock is not locked to the channel data rate, and this is typically the case, then any
mismatch between the encoder and channel clock and the display clock will eventually cause a buffer
overflow or underflow.
Model Decoder
− The model decoder is defined to resolve three problems:
– It constrains the variability in the number of bits that may be allocated to different pictures;
– It allows a decoder to initialise its buffer when the system is started;
– It allows the decoder to maintain synchronisation while the stream is played.
160
Video Buffer Verifier (VBV)
161. The definition of the parameterised model decoder is known as Video Buffer Verifier (VBV).
− The parameters used by a particular encoder are defined in the bitstream.
− This really defines a model decoder that is needed if encoders are to be assured that the coded
bitstream they produce will be decodable.
• A fixed rate channel is assumed to put bits at a constant rate into the buffer, at regular intervals, set by the picture rate
• The picture decoder instantaneously removes all the bits pertaining to the next picture from the input buffer (Practical
decoders may differ).
• If there are too few bits in the input buffer, that is, all the bits for the next picture have been received, then the input buffer
underflows, and there is an underflow error.
• If during the time between the picture starts, the capacity of the input buffer is exceeded, then there is an overflow error. 161
Video Buffer Verifier (VBV)
162. − Practical decoders may differ from this model in several important ways.
− They may not remove all the bits required to decode a picture from the input buffer instantaneously.
− They may not be able to control the start of decoding very precisely as required by the buffer fullness parameters in the
picture header, and they take a finite time to decode.
− They may also be able to delay decoding for a short time to reduce the chance of underflow occurring.
− But these differences depend in degree and kind on the exact method of implementation.
− To satisfy the requirements of different implementations, the MPEG video committee chose a very simple model for the
decoder.
− Practical implementations of decoders must ensure that they can decode the bitstream constrained in this model.
− In many cases, this will be achieved by using an input buffer that is larger than the minimum required and by using a
decoding delay that is larger than the value derived from the buffer fullness parameter.
− The designer must compensate for any differences between the actual design and the model in order to guarantee
that the decoder can handle any bitstream that satisfies the model.
− Encoders monitor the status of the model to control the encoder so that overflow does not occur.
− The calculated buffer fullness is transmitted at the start of each picture so that the decoder can maintain
synchronisation.
162
Video Buffer Verifier (VBV)
163. − The encoder must make sure that the input buffer of the model decoder is neither overflowed nor
underflowed by the bitstream.
− Since the model decoder removes all the bits associated with a picture from its input buffer
instantaneously, it is necessary to control the total number of bits per picture.
− The encoder could control the bit rate by simply checking its output buffer content. As the buffer fills up,
the quantiser step size is raised to reduce the generated bit rate, and vice versa.
− This situation in MPEG-1, because of the existence of three different picture types, where each generates
a different bit rate, is slightly more complex.
− First, the encoder should allocate the total number of bits among the various types of picture within a
GOP, so that the perceived image quality is suitably balanced.
− The distribution will vary with the scene content and the particular distribution of I-, P- and B-pictures
within a GOP.
163
Rate Control and Adaptive Quantisation
164. − Investigations have shown that for most natural scenes, each P-picture might generate as many as two
to five times the number of bits of a B-picture, and an I-picture three times those of the P-picture.
− If there is little motion and high texture, then a greater proportion of the bits should be assigned to I-
pictures.
− Similarly, if there is strong motion, then a proportion of bits assigned to P-pictures should be increased.
− In both cases, lower quality from the B-pictures is expected to permit the anchor I- and P-pictures to be
coded at their best possible quality.
− Our investigations with variable bit rate (VBR) video, where the quantiser step size is kept constant (no
rate control), show that the ratios of generated bits are 6:3:2, for I-, P- and B-pictures, respectively.
− Of course, at these ratios, because of the fixed quantiser step size, the image quality is almost constant,
not only for each picture (in fact, slightly better for B-pictures due to better motion compensation) but
throughout the image.
− Again, if we lower the expected quality for B-pictures, we can change that ratio in favour of I- and P-
pictures (it is possible to make the encoder intelligent enough to learn the best ratio).
164
Rate Control and Adaptive Quantisation
166. − Following the universal success of H.261 and (MPEG)-1 video codecs, there was a growing need for
a video codec to address a wide variety of applications.
− Considering the similarity between H.261 and MPEG-1, ITU-T and ISO/IEC made a joint effort to
devise a generic video codec.
− Joining the study was a special group in ITU-T, Study Group 15 (SG15), who were interested in
coding of video for transmission over the future broadband integrated services digital networks
(BISDN) using asynchronous transfer mode (ATM) transport.
− The devised generic codec was finalised in 1995 and takes the name of MPEG-2/H.262, though it is
more commonly known as MPEG-2.
− It has error resilience for broadcasting, and ATM networks.
− It delivers multiple programmes simultaneously without requiring them to have a common time base.
These require that the MPEG-2 transport packet length should be short and fixed.
166
MPEG-2 Standard
167. At the time of the development, the following applications for the generic codec were foreseen:
• BSS broadcasting satellite service (to the home)
• CATV cable TV distribution on optical networks, copper, etc.
• CDAD cable digital audio distribution
• DAB digital audio broadcasting (terrestrial and satellite)
• DTTB digital terrestrial television broadcast
• EC electronic cinema
• ENG electronic news gathering (including satellite news gathering (SNG))
• FSS fixed satellite service (e.g. to head ends)
• HTT home television theatre
• IPC interpersonal communications (videoconferencing, videophone, etc.)
• ISM interactive storage media (optical discs, etc.)
• MMM multimedia mailing
• NCA news and current affairs
• NDS networked database services (via ATM, etc.)
• RVS remote video surveillance
• SSM serial storage media (digital VTR, etc.)
167
Application of MPEG-2 Coded
168. − Part 1, Systems : synchronization and multiplexing of audio and video
− Part 2, Video
− Part 3, Audio (an extension of the MPEG 1 audio standards)
− Part 4, Testing Compliance
− Part 5, Software Simulation
− Part 6, extensions for Digital Storage Media Command and Control (DSM-CC) (eg. rewind forward etc)
− Part 7, Advanced Audio Coding (AAC) (a 2nd audio standard there are even more parts)
− [Part 8 withdrawn due to lack of industry interest ]
− Part 9, Extensions for Real Time Interfaces
− Part 10, Conformance Extensions for DSM-CC
− Part 11, Intellectual Property Management and Protection
168
MPEG-2 Parts (MPEG-2 Related Standards)
169. Video
• 2-15 or 16-80 Mbit/s bit rate ( target bit rate: 4…9 Mbit/sec )
• TV and HDTV picture formats
• Supports interlaced material
• MPEG-2 consists of profiles and levels
• Main Profile, Main Level (MP@ML) refers to 720x480 resolution video at 30 frames/sec, at bit rates up to 15 Mbit/sec for NTSC video (typical ~4 Mbit/sec)
• Main Profile, High Level (MP@HL) refers to HDTV resolution of 1920x1152 pixels at 30 frames/sec, at a bit rate up to 80 Mbit/sec (typical ~15 Mbit/sec)
Audio
• Compatible multichannel extension of MPEG-1 audio
System
• Video, audio and data multiplexing defines tow presentations:
• Program Stream for applications using near error free media
• Transport Stream for more error prone channels
Applications
• Satellite, cable, and terrestrial broadcasting, digital networks, and digital VCR
169
MPEG-2 Audio, Video, System and Application Parts
170. 170
Comparison Between MPEG-1 and MPEG-2 MP@ML Video
Specifications MPEG-2 MP@ML MPEG-1
Video Format 720x480x30(NTSC) 320x240x30(NTSC)
720x576x25(PAL) 320x288x25(PAL)
Coded Data 4-6Mbps for CCIR601 1.8Mbps Max
Speed 15Mbps Max
Coded Picture Frame, Picture Frame
Prediction Inter Frame, Field Interframe
DCT Frame, Field Frame
Resolution 12 bits 9 bits
VLC Resol. 8, 9,10 bits 8 bits
Quantization Non-linear Mapping Linear Mapping
Pan, Scan Yes No
171. 171
MPEG-1 MPEG-2
Video format SIF
progressive
SIF, 4:2:0, 4:2:2, 4:4:4
progressive/interlaced
Picture quality VHS Distribution/contribution
Bit rate Variable
( 1.856 Mbps)
Variable up to 100Mbps
Low delay mode < 150 ms < 150 ms (no B pictures)
Accessibility Random access Random access/channel hopping
Scalability SNR, spatial, temporal,
simulcast, data partitioning
Compatibility Forward, backward, upward,
and downward
Transmission error Error protection Error resilience
Editing bit stream Yes Yes
DCT Noninterlaced Field (progressive) or
frame (interlaced)
Motion estimation Noninterlaced
Field, frame, and dual-prime
based. Top (168) block
and bottom (168) block
Motion vectors Motion vectors for
P, B picture only
Concealment motion vectors
for I pictures besides MV
for P & B
Scanning of DCT
coefficients
Zigzag scan Zigzag scan, alternate scan
for interlaced video
Functional Comparison Between MPEG-1 and MPEG-2 Video
172. − Picture resolutions vary from SIF to HDTV
− Frame and Field DCT Coding in MPEG-2
− Both Linear and Nonlinear Quantisation in MPEG-2
− All Chroma Channels Subsampling in MPEG-2
− Search range can be larger (distance between P-frames is larger than B1 and B2)
− A new range of macroblock (MB) types in the MPEG-2 standard, by combining of various picture formats and the
interlaced/progressive option create.
• While each MB in a progressive mode has 6 blocks in the 4:2:0 format, the number of blocks in the 4:4:4 image format is 12.
− Macroblock size can be 16 x 8 pixels
• The dimensions of the unit of blocks used for motion estimation/compensation can change.
• In the interlaced pictures, since the number of lines per field is half the number of lines per frame, with equal horizontal and vertical
resolutions for motion estimation, it might be appropriate to choose blocks of 16 × 8, that is, 16 pixels over eight lines. These types of
sub-MBs have half the number of blocks of the progressive mode.
− Scalability
• The scalable modes of MPEG-2 are intended to offer interoperability among different services or to accommodate the varying
capabilities of different receivers and networks upon which a single service may operate.
172
Main difference between MPEG-2 and MPEG-1
173. MPEG-1 and MPEG-2 syntax differences
− All MPEG-2 decoders that comply with currently defined profiles and levels are required to decode MPEG-1 constrained
bit streams:
− MPEG-2 syntax can be made to be very close to MPEG-1, by using particular values for the various MPEG-2 syntax
elements that do not exist in MPEG-1 syntax
− The IDCT mismatch control
− The run level values in VLC
− The constraint parameter flag mechanism in MPEG-1 is replaced by the profile and level structures in MPEG-2.
− The concept of the GOP layer is slightly different.
• GOP in MPEG-2 may indicate that certain B-pictures at the beginning of an edited sequence comprise a
broken link, which occurs if the forward reference picture needed to predict the current B-pictures is removed
from the bitstream by an editing process.
• It is an optional structure for MPEG-2 but mandatory for MPEG-1.
− The slices in MPEG-2 must always start and end on the same horizontal row of MBs.
• This is to assist the implementations in which the decoding process is split into some parallel operations along
horizontal strips within the same pictures. 173
Main difference between MPEG-2 and MPEG-1
174. • IDCT Mismatch Control
• Macroblock stuffing
• Run-level escape syntax
• Chrominance samples horizontal position (co-locate with luminance in MPEG-2, half the way between luminance samples in MPEG-1
• Slices (in MPEG-2 slices start on the same horizontal row of macroblocks, in MPEG-1 its possible to have all macroblocks of a picture in
one slice, for example
• D-pictures (not permitted in MPEG-2; in MPEG-1 only Intra-DC-coefficient, special end_of_macroblock code)
• Full-pel Motion Vectors (in MPEG-1 full-pel motion vectors possible, in MPEG-2 always half-pel motion vectors)
• Aspect Ratio Information (MPEG-1 specifies pel aspect ratio, MPEG-2 specifies display aspect ratio and pel aspect ratio can be
calculated from this and from frame size and display size)
• Forward_f_code and backward_f_code (differencies in parameter location and contents)
• Constrained_parameter_flag and maximum horizontal_size (MPEG-2 has profile and level mechanism)
• Bit_rate and vbv_delay (fixed values are reserved for variable bit rate in MPEG-1, other values are for constant bit rate; in MPEG-2
semantics for bit_rate are changed, etc.)
• VBV (in MPEG-1 VBV is only defined for constant bit rate operation; in MPEG-2 VBV is only defined for variable bit rate and constant bit
rate is assumed to be a special case of variable bit rate)
• temporal_reference (a small difference between MPEG-1 and MPEG-2)
174
Details of MPEG-2 and MPEG-1 Differences
176. 176
Structure of the Coded Bit-Stream
Video Sequence
... ...
Group of Pictures
Picture
Slice
Macroblock
8
pixels
8
pixels
Block
177. − All chroma channels subsampling! (4:4:4, 4:2:2 and 4:2:0 support)
177
All Chroma Channels Subsampling in MPEG-2
178. 178
4:2:0
YV Y Only
YU Y Only
Co-sited
Sampling
MPEG-2
Co-sited 4:2:0 Sampling in MPEG-2
4:2:0
Y V Y
Y U Y
JPEG/JFIF
H.261
MPEG-1
Downsize chrominance Components.
• 4:2:0 (with chrominance samples centered)
• Requires bilinear interpolation
Co-sited
179. 179
Luminance MB structure in frame-organized DCT coding (for slow moving)
Luminance MB in field-organized DCT coding (for fast moving)
Blocks (8×8)MB (16×16)
Frame Type DCT vs. Field Type DCT
Blocks (8×8)MB (16×16)
181. − Interlacing! (Motion estimation is different from MPEG-1)
− MPEG-2 can chose between Previous Frame and previous Field
− The odd and even fields can be coded together as if it were a frame or the can be coded independently
• if there is no motion then we can combine the two fields into a single image called a “frame-picture.”
Better for compression efficiency.
• if there is motion then the two fields are coded separately as if they were two pictures called “field-
pictures”.
181
Frame and Field DCT Coding in MPEG-2
Odd Field-Picture
Even Field-Picture
Frame Picture
182. − For interlaced pictures, since the vertical correlation in the field pictures is greatly reduced, should the
field prediction be used, an alternate scan may perform better than a zigzag scan.
182
Frame and Field DCT Coding in MPEG-2
183. 183
Five motion compensation modes in MPEG-2
16
8
16
816
16
Interlaced pictures
Five motion compensation modes in MPEG-2
More information: Standard Codecs, Dr, Ghanbari, 8.4 MPEG-2 nonscalable coding modes
184. − In motion compensation mode, a field of 16×16 pixel macroblocks is split into upper half and lower half
16×8 pixel blocks, and a separate field prediction is carried out for each.
− Two motion vectors are transmitted for each P-picture macroblock and two or four motion vectors for the
B-picture macroblock.
− This mode of motion compensation may be useful in field pictures that contain irregular motion.
− Here a field macroblock is split into two halves, and in the field prediction for frame pictures a frame
macroblock is split into two top and bottom field blocks.
− It should be noted that field pictures have some restrictions on I, P and B-picture coding type and motion
compensation.
− Normally, the second field picture of a frame must be of the same coding type as the first field. However, if
the first field picture of a frame is an I-picture, then the second field can be either I or P. If it is a P-picture,
the prediction macroblocks must all come from the previous I-picture, and dual prime cannot be used
184
Five motion compensation modes in MPEG-2
185. − In this case the target macroblock in a frame picture is split into two top field and bottom field pixels. (For
interlaced pictures, a target Macroblock can be split into two field macroblocks).
− Field prediction is then carried out independently for each of the 16 x 8 pixel target macroblocks.
− For P-pictures, two motion vectors are assigned for each 16×16 pixel target macroblock.
− The 16×8 predictions may be taken from either of the two most recently decoded anchor pictures.
− Note that the 16x8 field prediction cannot come from the same frame, as was the case in field prediction
for field pictures.
− For B-pictures, due to the forward and the backward motion, there can be two or four motion vectors for
each target macroblock.
− The 16×8 predictions may be taken from either field of the two most recently decoded anchor pictures.
185
Five motion compensation modes in MPEG-2
186. − Motion Vectors are differentially coded wrt the vector for the previous macroblock (ie. to the left)
• PMV – Previous Motion Vector.
• MV – Motion Vector for the Current Macroblock.
− Define 𝚫 = 𝜟 𝒙 𝜟 𝒚 = 𝟐 × 𝐌𝐕 − 𝐏𝐌𝐕
• Multiply by 2 as 0.5 pel quantisation used.
• Δ 𝑥 and Δ 𝑦 are coded separately.
Coding of Motion Vectors
186
187. Coding Δx and Δy
− The absolute value and sign of each component is coded separately.
− The absolute value is broken down as
𝚫∗ = 𝒂 − 𝟏 𝟐 𝒃
+ 𝒄 + 𝟏
𝑎 – is called the motion_code and ranges from 0 to 16.
It is Huffman Coded
𝑏 – is called the size and effectively limits the range of motion vector. It ranges from 0 to 8.
It is Fixed Length Coded (FLC) (4 bit binary value).
𝑐 – is the motion_residual. It ranges from 0 to 2 𝑏
− 1
It is Fixed Length Coded (FLC). It is a 𝑏-bit binary number.
187
188. Coding Δx and Δy
Δ∗
A table of how the choice of Size effects the range of difference that can be coded.
• Size is set once at the start of each Picture Layer. (ie. it is the same over the entire picture).
• It is common to choose larger size for P-frames cause motion is bigger.
188
189. Coding Δx and Δy
Size is chosen based on the range of motion vectors.
EX: Say we limit search width to 10.
• Then we could have a vector [10, 10] and a previous vector [-10 10].
• The max Δ 𝑥 or Δ 𝑦 is 2 × 10 + 10 = 40.
• Therefore we need to choose 𝑏 = 2.
• Given an MV [4.5, 3] and PMV [5, -1] then
𝚫 = 2 × 4.5 3 − 5 −1 = [−1 8]
Then for 𝑏 = 2,
Δ 𝑥 = 1 = 1 − 1 22
+ 0 + 1
Δ 𝑦 = 8 = 2 − 1 22
+ 3 + 1
𝑎 = 1, 𝑏 = 2, 𝑐 = 0
𝑎 = 2, 𝑏 = 2, 𝑐 = 3
189
190. Huffman Codes for motion_code
− s is 0 if the component is positive.
− s is 1 if the component is negative.
− Each vector is specified by a (motion_code, motion_residual)
pair.
• The Size value is specified at the start of the Picture Layer.
− If Δ∗ = 0 then we set the motion_code to 0 (codeword is 1).
There is no motion_residual.
190
191. Example
− if Δ 𝑥 = −1 then the motion_code is 1, the sign bit is 1 and the
motion_residual is 0. Therefore the code
𝟎𝟏𝟏 𝟎
is inserted into the bitstream.
− if Δ 𝑥 = −1 then the motion_code is 2, the sign bit is 0 and the
motion_residual is 3. Therefore the code
𝟎𝟎𝟏𝟎 𝟏𝟏𝟏
is inserted into the bitstream.
191