Video Compression, Part 2-Section 1, Video Coding Concepts

Dr. Mohieddin Moradi
mohieddinmoradi@gmail.com
1
Dream
Idea
Plan
Implementation

Section I
− Video Compression History
− A Generic Interframe Video Encoder
− The Principle of Compression
− Differential Pulse-Code Modulation (DPCM)
− Transform Coding
− Quantization of DCT Coefficients
− Entropy Coding
Section II
− Still Image Coding
− Prediction in Video Coding (Temporal and Spatial Prediction)
− A Generic Video Encoder/Decoder
− Some Motion Estimation Approaches
2
Outline

3
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC

Why Compression?
SD-SDI 270 Mbps
HD-SDI 1.5Gbps, 3Gbps
4K-UHD 12Gbps
8K-UHD 48Gbps
4

5
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
Codec

6
Coded
video
Coded
audio
Video format
.264, .265, VP9…
Container format
MP4, MOV, WebM, MXF…
Audio format
.aac, .ogg, .mp3…
Codec and Container Format
− A container or wrapper format is a metafile format whose specification describes how
different elements of data and metadata coexist in a computer file.
− Wrappers serve two purposes mainly:
• To gather programme material and related information
• To identify those pieces of information
Container

7
Codec and Container (Wrapper) Format
Media
Format
Wrapper
CODEC
AVC-Intra Class 100
DNXHD, ProRes
AVC LongG
MXF OPAtom
MXF OP1B,
Quicktime,
DVI
P2, AVCHD
HDCAM, Mini DV,
SR
LTO, HDD,
BluRay Disc
P2 Card,
SD Card

8
Ex: MXF File Structure of AVC-LongG OP-1b and OP-1a

Goal of Standards
− Ensuring Interoperability
− Enabling communication between devices made by different manufacturers
− Promoting a technology or industry
− Reducing costs
9
The Scope of Video Standardization
Decoder
Bitstream
Encoder

Goal of Standards
− Ensuring Interoperability
− Enabling communication between devices made by different manufacturers
− Promoting a technology or industry
− Reducing costs
− Not the encoder, Not the decoder
− Just the bitstream syntax and the decoding process (e.g. use IDCT, but not how to implement
the IDCT)
10
Decoder
Bitstream
Scope of Standardization
Encoder
(Decoding Processes)

Only Specifications of the Bitstream, Syntax, and Decoding Processes are standardized:
• Enables improved encoding & decoding strategies to be employed in a standard-compatible manner
• Provides no guarantees of quality
• Permits optimization beyond the obvious
• Permits complexity reduction for implementability
Pre-Processing
Source
Destination
Post-Processing & Error
Recovery
Scope of Standard
Encoding
Decoding
11
CODEC (enCODer/DECoder)
Standard defines this

12
− This allows future encoders of better performance to remain compatible with existing decoders.
− Also allows for commercially secret encoders to be compatible with standard decoders
Today’s Ho-Hum Encoder
Tomorrow’s Nifty Encoder
Very Secret Encoder
Today’s Decoder
Today’s decoder
still works!
• The international standard does not specify the design of the video encoders and decoders.
• It only specifies the syntax and semantics of the bitstream and signal processing at the encoder/decoder interface.
• Therefore, options are left open to the video codec manufacturers to trade-off cost, speed, picture quality and coding
efficiency.

JTC1
IEC ISO
SC 29
RAAGM
AG
WG12WG11WG1
WG
JBIG
JPEG
SG
MHEG-5
Main- tenance
MHEG-6
SG
Audio
SNHC
System
Video
Requirements
Implementation Studies
Test
SG
Liaisons
Advisory Group (AG) on Management (AGM)
• To advise SC 29 and its WGs on matters of management that
affect their works.
Advisory Group (AG) on Registration Authority (RA)
WG1: Still images, JPEG and JBIG
• Joint Photographic Experts Group and
Joint Bi-level Image Group
WG11: Video, MPEG
• Motion Picture Experts Group
WG12: Multimedia, MHEG
• Multimedia Hypermedia Experts Group
International
Standardization
Organization
Subcommittee 29
Title: “Coding of Audio, Picture, Multimedia and Hypermedia Information”
Joint Technical Committee
ISO/IEC JTC 1/SC 29 Structure and MPEG
MPEG (Moving Picture Experts Group, 1988 )
To develop standards for coded representation of
digital audio, video, 3D Graphics and other data
International
Electrotechnical
Committee
13

Telecommunication Standardization
Advisory Group (TSAG)
WTSA
World Telecommunication
Standardization Assembly
SG
Workshops,
Seminars,
Symposia
…
IPRs (Intellectual
Property Rights)
WP
Questions: Develop Recommendations
SG
WP WP
Q
Focus
Group
VCEG (ITU-T SG16/Q6) )
• Study Group 16
Multimedia terminals, systems and
applications
• Working Party 3
Media coding
• Question 6
Video coding
Rapporteurs (R):
Mr Gary SULLIVAN, Mr Thomas WIEGAND
SG16
WP3
14
ITU-T structure and VCEG (Video Coding Experts Group or Visual Coding Experts Group)
Administrative Entities
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q6 VCEG

15
ITU, International Telecommunication Union structure
− Founded in 1865, it is the oldest specialized agency of the United Nations system
− ITU is an International organization where governments, industries, telecom operators, service providers
and regulators work together to coordinate global telecommunication networks and services
− Help the world communicate!
What does ITU actually do?
• Spectrum allocation and registration
• Coordinate national spectrum planning
• International telecoms/ICT standardization
• Collaborate in international tariff-setting
• Cooperate in telecommunications development assistance
• Develop measures for ensuring safety of life
• Provide policy reviews and information exchange
• Insure and extend universal Telecom access

16
− Plenipotentiary Conference: Key event, all ITU Member States decide on the future role of the organization
(Held every four years)
− ITU Council: The role of the Council is to consider, in the interval between Plenipotentiary Conferences,
broad telecommunication policy issues to ensure that the Union's activities, policies and strategies fully
respond to today's dynamic, rapidly changing telecommunication environment (held yearly)

17
− General Secretariat: Coordinates and manages the administrative and financial aspects of the Union’s activities
(provision of conference services, information services, legal advice, finance, personnel, etc.)
− ITU-R: Coordinates radio communications, radio-frequency spectrum management and wireless services.
− ITU-D: Technical assistance and deployment of telecom networks and services in developing and least developed
countries to allow the development of telecommunication.
− ITU-T: Telecommunication standardization on a world-wide basis. Ensures the efficient and on-time production of high
quality standards covering all fields of telecommunications (technical, operating and tariff issues). (The Secretariat of ITU-T
(TSB: Telecommunication Standardization Bureau) provides services to ITU-T Participants)

18
Telecommunication Standardization Bureau (TSB) (Place des Nations, CH-1211 Geneva 20)
− The TSB provides secretarial support for ITU-T and services for participants in ITU-T work (e.g. organization of meeting,
publication of Recommendations, website maintenance etc.).
− Disseminates information on international telecommunications and establishes agreements with many international SDOs.
Mission of ITU-T Standardization Sector of ITU
− Helping people all around the world to communicate and to equally share the advantages and opportunities of
telecommunication reducing the digital divide by studying technical, operating and tariff matters to develop
telecommunication standards (Recommendations) on a worldwide basis.

19
World Telecommunication Standardization Assembly (WTSA)
− WTSA sets the overall direction and structure for ITU-T, meets every four years and for the next four-year period:
• Defines the general policy for the Sector
• Establishes the study groups (SG)
• Approves SG work programmes
• Appoints SG chairmen and vice-chairmen
Telecommunication Standardization Advisory Group (TSAG)
− TSAG provides ITU-T with flexibility between WTSAs, and reviews priorities, programmes, operations, financial matters and
strategies for the Sector (meets ~~ 9 months )
• Follows up on accomplishment of the work programme
• Restructures and establishes ITU-T study groups
• Provides guidelines to the study groups
• Advises the TSB Director
• Produces the A-series Recommendations on organization and working procedures

• ISO/IEC MPEG = “Moving Picture Experts Group”
(ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee 1, Subcommittee 29, Working Group 11)
• ITU-T VCEG = “Video Coding Experts Group”
(ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector (ITU-T,
a United Nations Organization, formerly CCITT), Study Group 16, Working Party 3, Question 6)
• JVT = “Joint Video Team”
Collaborative team of MPEG & VCEG, responsible for developing AVC (discontinued in 2009)
• JCT-VC = “Joint Collaborative Team on Video Coding”
Team of MPEG & VCEG , responsible for developing HEVC (established January 2010)
• JVET = “Joint Video Experts Team”
Exploring potential for new technology beyond HEVC (established Oct. 2015 as Joint Video Exploration Team, renamed
Apr. 2018)
20
Video Coding Standardization Organizations

21
H.263/+/++
(1995-2000+)
MPEG-4
Visual
(1998-2001+)
MPEG-1
(1993)
ISO/IECITU-T
H.261
(1990+)
H.262 / 13818-2
(1994/95-1998+)
(2003-2018+) (2013-2018+)
H.120
(1984-1988)
Computer
SD HD
H.264 / 14496-10
AVC
4K UHD
H.265 / 23008-2
HEVC
It developed by
Joint Video Team (JVT)
It developed by
Joint Collaborative Team on
Video Coding (JCT-VC)
(MPEG-2)
(2020-...)
8K, 360, ...
H.26x / 23090-3
VVC
It will be developed by
Joint Video Experts Team (JVET)
1990 1994 2003 2013 2020
History of Video Coding Standardization (1985 ~ 2020)
Video telephony

22
ITU-T
Standard
Joint
ITU-T/MPEG
Standards
MPEG
Standard
1988 1990 1992 1994 1996 1998 2002 2004 20062000 2008 2010
H.261
(Version 1)
H.261
(Version 2)
H.263 H.263+ H.263++
H.262/MPEG-2 H.264/MPEG-4 AVC H.265/HVC
MPEG-1
MPEG-4
(Version 1)
MPEG-4
(Version 2)
H.261 Video Compression Standard

23
H series are low delay codecs for telecom applications (International Telecommunication Union (ITU-T)
developed several recommendations for video coding)
− H.261 (1990): the first video codec specification, “Video Codec for Audio Visual Services at p x 64kbps”
− H.262 (1995) : Infrastructure of audiovisual services—Coding of moving video
− H.263 (1996): next conf. solution, Video coding for low bit rate communications
− H.263+ (H.263V2) (1998)
− H.263++ (H.263V3)(2000), follow-on solutions
− H.26L: “long-term” solution for low bit-rate video coding for communication applications (Not backward
compatible to H.263+)
− H.264 (H.26L) completed in May 2003 and lead to H.264: known as advanced video coding (AVC)
− H.265/HEVC (2013) High Efficiency Video Coding
ITU H.26x History

24
Motion Picture Experts Group (MPEG) codecs are designed for storage/broadcast/streaming applications
MPEG-1 (1992)
• Started in 1988 by Lenardo Chiariglione
• Compression standard for progressive frame-based video in SIF (360x240) formats
• Applications: VCD
MPEG-2 (1994-5)
• Compression standard for interlaced frame-based video in CCIR-601 (720x480) and high definition (1920x1088i)
formats
• Applications: DVD, SVCD, DIRECTV, GA, DVB, HDTV Studio, DTV Broadcast, DVD, HD, video standards for
television and telecommunications standards
MPEG-4 (1999)
• Multimedia standard for object-based video from natural or synthetic source
• Applications: Internet, cable TV, virtual studio, home LAN etc..
• Object-oriented
• Over-ambitious?
MPEG History
MPEG 21
MPEG-2
MPEG-1
MPEG-4
MPEG-7

25
Motion Picture Experts Group (MPEG) codecs are designed for storage/broadcast/streaming applications
MPEG-7, 2001
• Standardized descriptions of multimedia information, formally called “Multimedia Content Description
Interface”
• Metadata for audio-video streams
• Applications: Internet, video search engine, digital library
MPEG-21, 2002
• Intellectual right protection propose
• Distribution, exchange, user access of multimedia data and intellectual property management
AVC (2003), also known as MPEG-4 version 10
• Conventional to HD
• Emphasis on compression performance and loss resilience
HEVC (2013) High Efficiency Video Coding
MPEG History
MPEG 21
MPEG-2
MPEG-1
MPEG-4
MPEG-7

26
ITU and MPEG (ISO/IEC) have also worked together for joint codecs:
− MPEG-2 is also called H.262
− H.26L has lead to a codec now is called:
• H.264 in telecom
• MPEG-4 (version 10) in broadcast
• AVC (Advanced Video Coding) in broadcast
• Joint Video Team (JVT) Codec
− H.265/HEVC (2013) High Efficiency Video Coding
Joint ITU/MPEG

28
ITU and MPEG (ISO/IEC) have also worked together for joint codecs:
Joint ITU/MPEG
50% bitrate saving – Direct-to-home
30% bitrate saving – Contribution
50% bitrate saving – Direct-to-home
30% bitrate saving – Contribution
2020
VVC
2020
≈50% bitrate saving – Direct-to-home
≈30% bitrate saving – Contribution

29
Wrapper TypeBitrate (Mbps)
1920×1080, 4:2:2, 10 bit
Codec NameCodec Brand
MXF367 (50p)DNxHD 365x
AVID MXF184 (50i)DNxHD 185x
MXF174 (50i) /345 (50p), (12 bit)DNxHR HQX
MOV38(50i)/76(50p)ProRes 422 Proxy
APPLE
MOV85(50i)/170(50p)ProRes 422 LT
MOV122(50i)/245(50p)ProRes 422
MOV184(50i)/367(50p)ProRes 422 HQ
MXF/MP4112 (50i)/ 223 (50p) [MXF]XAVC Intra Class 100
SONY
MXF227 (50i)/ 454 (50p)XAVC Intra Class 200
MXF/MP4
50 (50i,50p) [MXF]
Max Bit Rate=80 Mb/s
XAVC Long GOP 50
MXF/MP4
35 (50i,50p) [MXF]
XAVC Long GOP 35
MXF/MP4
25 (50i) [MXF]
XAVC Long GOP 25
MXF226 (50i)/452 (50p)AVC-Intra 200
PANASONIC
MXF111 (50i)/222 (50p)AVC-Intra 100
MXF50 (50i)AVC-LongG 50
MXF25 (50i)/50 (50p)AVC-LongG 25
Some Famous Codecs for HD

30
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

31
Spatial Domain
− Elements are used “raw” in suitable combinations.
− The frequency of occurrence of such combinations is used to influence the design of the
coder so that shorter codewords are used for more frequent combinations and vice versa
(entropy coding).
Transform Domain
− Elements are mapped onto a different domain (i.e. the frequency domain).
− The resulting coefficients are quantised and entropy-coded.
Hybrid
− Combinations of the above.
Classification of Compression Techniques

Current Stage
Used since early days of video compression
standards, e.g. MPEG-1/-2/-4, H.264/AVC, HEVC and
also in most proprietary codecs (VC1, VP8 etc.)
Input Frame 1
,Q
32
A Generic Interframe Video Encoder

Input Frame 1 DCT
,Q
33

Quantized
010011101001…
Input Frame 1 DCT
,Q
34

QuantizedInput Frame 1 DCT
010011101001…
Reconstructed
Frame 1
,Q
35

Input Frame 2
,Q
36
Reconstructed
Frame 1

010011101001…
Entropy Coded MVs
,Q
37
Reconstructed
Frame 1
Input Frame 2

010011101001…
Entropy Coded MVs
,Q
38
Reconstructed Frame 1 with MC
Input Frame 2

Input Frame 2 Residual with MC (Frames 1&2)
,Q
39
Reconstructed Frame 1 with MC
If the motion prediction is successful, the energy
in the residual is lower than in the original frame
and can be represented with fewer bits.

Residual with MC
(Frames 1&2)
DCT
,Q
40

010011101001…
QuantizedDCT
Residual with MC
(Frames 1&2)
,Q
41

Reconstructed Residual with
MC (Frames 1&2)
QuantizedDCT
Residual with MC
(Frames 1&2)
,Q
42

,Q
43
Reconstructed Residual with
MC (Frames 1&2)
Reconstructed Frame 1
with MC
+
Reconstructed Frame 2
with MC
=

44
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

− Spatial Redundancy Reduction (pixels inside a picture are similar)
− Temporal Redundancy Reduction (Similarity between the frames)
− Statistical Redundancy Reduction (more frequent symbols are assigned short code words and
less frequent ones longer words)
The Principle of Compression
45

46
− It arises when parts of a picture are often replicated within a single frame of video (with minor
changes).
Spatial Redundancy in Still Images
This area
is all blue
This area is half
blue and half green
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue
Sky Blue

− Take advantage of similarity between successive frames
− It arises when successive frames of video display images of the same scene.
47
Temporal Redundancy in Moving Images
This picture is the
same as the previous
one except for this
area

All signals & data have some redundancy and some entropy.
– Data is compressed by keeping entropy and throwing away redundancy if possible!
– Redundancy is the useless stuff.
– Redundancy can be thrown away
– More redundancy in simple signals & data
• Black & burst, colour bars, flat scenery, talking heads, quiet music, 1kHz sine test tone, bitmap
images, database files, text files.
– Entropy is the useful stuff.
– Entropy is a term often used for ‘activity’ or ‘chaos’.
– More entropy in complex signals & data
• Multiburst and pathological test signals, football match, white noise, executables(computer file that
can be executed), DLL files.
48
The Principle of Compression

• .
Simple Complex
Dataorbandwidth
Max
Redundancy
Entropy
2:1 compression
Lost entropy
49
Redundancy & Entropy
High compression ratio could be lead to lost of entropy

• .
Simple Complex
Dataorbandwidth
Max
Redundancy
Entropy
Lost entropy
4:1 compression
50
Redundancy & Entropy
High compression ratio could be lead to lost of entropy

Spatial Redundancy Reduction
51
Spatial Redundancy Reduction
Transform coding
Discrete Sine
Transform (DST)
Discrete Wavelet
Transform (DWT)
Hadamard
Transform(HT)
Discrete Cosine
Transform (DCT)
Differential Pulse Code Modulation
(DPCM)

52
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

PCM was invented by the British engineer Alec Reeves in 1937 in France.
− Pulse code modulation (PCM) is produced by analog-to-digital conversion process.
− As in the case of other pulse modulation techniques, the rate at which samples are taken and
encoded must conform to the Nyquist sampling rate.
− The sampling rate must be greater than, twice the highest frequency in the analog signal,
𝒇 𝒔 > 𝟐𝒇 𝒎𝒂𝒙
Pulse Code Modulation (PCM)
53

Encoding in PCM
54
AllowedQuantizationLevel
1.52 → 1.5
1.08 → 1.1
0.92 → 0.9
0.56 → 0.6
0.28 → 0.3
0.27 → 0.3
0.11 → 0.1

Regeneration (re-amplification, retiming, reshaping)
Regeneration
56

Advantages of PCM
• Robustness to noise and interference
• Efficient regeneration
• Efficient SNR and bandwidth trade-off
• Uniform format
• Ease add and drop
• Secure
DS0
• A basic digital signaling rate of 64 kbit/s.
• To carry a typical phone call, the audio sound is digitized at an 8 kHz sample rate using 8-bit
pulse-code modulation.
Advantages of PCM
57

− Encode information in terms of signal transition; a transition is used to designate Symbol 0.
− Symbol 0→ Transition (0→1, 1→0)
Differential Encoding
58

− Usually PCM has the sampling rate higher than
the Nyquist rate.
− The encode signal contains redundant
information.
− DPCM can efficiently remove this redundancy.
− Prediction error of m[n] : 𝒆 𝒏 = 𝒎 𝒏 − ෝ𝒎 𝒏
− Quantized value of m[n] is: 𝒎 𝒒 𝒏 = 𝒆 𝒒 𝒏 + ෝ𝒎 𝒏
− Quantization error of 𝒆 𝒏 is defines as:
𝒒 𝒏 ≜ 𝒆[𝒏] − 𝒆 𝒒 𝒏
− We can proof that:
𝒎 𝒏 − 𝒎 𝒒 𝒏 =
( ෝ𝒎 𝒏 + 𝒆 𝒏 )- (𝒆 𝒒 𝒏 + ෝ𝒎 𝒏 )=𝒆 𝒏 − 𝒆 𝒒 𝒏 = 𝒒 𝒏
Differential Pulse-Code Modulation (DPCM)
59
𝒎 𝒏 − 𝒎 𝒒 𝒏 = 𝒒 𝒏
𝒎[𝒏 + 𝟏]
ෝ𝒎 [𝒏 + 𝟏]
ෝ𝒎 [𝒏 + 𝟏]
𝒎 𝒒 [𝒏]
𝒎 𝒒 [𝒏]
𝒆 𝒒 [𝒏 + 𝟏]

𝒎 𝒏 − 𝒎 𝒒 𝒏 = 𝒆 𝒏 − 𝒆 𝒒 𝒏 = 𝒒 𝒏 means that:
− The pointwise coding error in the input sequence
is exactly equal to q(n) that is equal to the
quantization error in e(n)
− With a reasonable predictor the mean square
value of the differential signal e(n) is much
smaller than that of m(n)
− For the same mean square quantization error, e[n]
requires fewer quantization bits than m[n]
⇒ The number of bits required for transmission
has been reduced while the quantization
error is kept the same.
60

− An important aspect of DPCM is that the prediction
is based on the output (the quantized samples)
rather than the input (the unquantized samples).
− This results in the predictor being in the “feedback
loop” around the quantizer, so that the quantizer
error at a given step is fed back to the quantizer
input at the next step.
− This has a “stabling effect” that prevents DC drift
and accumulation of error in the reconstructed
signal 𝒎 𝒒 𝒏 .
61

   
)(minimizeGmaximizefilter topredictionaDesign
GGain,Processing
)SNR(
isrationoiseonquantizati-to-signaltheand
errorspredictiontheofvariancetheiswhere
)SNR(
))(((SNR)
and0)]][[(ofvariancesareandwhere
(SNR)
issystemDPCMtheof(SNR)The
2
2
2
2
2
2
2
2
2
2
o
22
2
2
o
o
Ep
E
M
p
Q
E
Q
E
Qp
Q
E
E
M
QM
Q
M
G
nqnmEnm



















Processing Gain
62

63
Predictive Coding (from previous symbol)
Predictive Coding (generalised)
− Prediction is based on combination of previous symbols
− Prediction template needs to be “causal” i.e. template should
contain only “previous” elements w.r.t the direction of scanning
(shown with arrows).
− This is important for coding applications as the decoder will need
to have decoded the template elements first to perform the
prediction of the current element.

64
Predictive Coding (from previous symbol)
Predictive Coding (previous symbol)
− Previous symbol used as a prediction of current symbol
− Prediction error coded in a memoryless fashion
− Prediction error alphabet and codebook have twice the size
− i.e. symbol alphabet {1, 2, 3, 4} prediction alphabet {-3, -2, -1, 0, 1, 2, 3}
− A good predictor will minimise the error (most occurrence will be zero)

− If the frame is processed in raster order, then pixels
A, B and C in the current and previous rows are
available in both the encoder and the decoder
since these should already have been decoded
before X.
− The decoder forms the same prediction and adds
the decoded residual to reconstruct the pixel.
65
Predictive Image Coding
Pixel X to be encoded
P(X) is a prediction of X using A,B and C
Residual R(X) = X − P(X)
R(X) is encoded and transmitted
1
•Encoder forms a prediction for X based on
some combination of previously coded pixels
2
•Then subtracts this prediction from X
3
•Then encodes the residual (the result of the
subtraction)

Example
− Encoder prediction P(X) = (2A + B + C)/4
− Residual R(X) = X − P(X) is encoded and transmitted.
− Decoder decodes R(X) and forms the same prediction: P(X) = (2A + B + C)/4
− Reconstructed pixel X = R(X) + P(X)
66
Spatial prediction (DPCM)
1
•Encoder forms a prediction for X based on
some combination of previously coded
pixels
2
•Then subtracts this prediction from X
3
•Then encodes the residual (the result of the
subtraction)
By Encoder
By Decoder

− If the encoding process is lossy, i.e. if the residual is quantized (𝑹′
𝒙 )
• Then the decoded pixels 𝐴′
, 𝐵′
and 𝐶′
may not be identical to the original A, B and C due to losses
during encoding and so the above process could lead to a cumulative mismatch or ‘drift’ between
the encoder and decoder.
− Hence the encoder uses decoded pixels 𝑨′
, 𝑩′
and 𝑪′
to form the prediction, i.e. P(X) = (2𝑨′
+ 𝑩′
+ 𝑪′
) / 4 in
the above example.
− The compression efficiency of this approach depends on the accuracy of the prediction P(X).
67
To avoid this, the encoder should itself decode the residual 𝑹′
𝒙 and reconstruct each pixel.
In this way, both encoder and decoder use the same prediction P(X) and drift is avoided.
Quantizer 𝑹′
𝒙𝑹 𝒙 = 𝑿 − 𝑷(𝑿)

− If the prediction is successful, the energy in the residual is lower than in the original frame and the residual
can be represented with fewer bits (Motion compensation is an example of predictive coding).
− Spatial Prediction involves predicting an image sample or region from previously-transmitted samples in
the same image or frame and is sometimes described as ‘Differential Pulse Code Modulation’ (DPCM).
68
Spatial Prediction in a Frame=DPCM

69
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

 



1
0
)sin()cos(
2
)(
n
nn nxbnxa
a
tf
0)
2
cos()(
2 2
2
 

ndx
T
xn
xf
T
a
T
T
n

0)
2
sin()(
2 2
2
 

ndx
T
xn
xf
T
b
T
T
n




2
2
0 )(
1
T
T
dxxf
T
a
71
Fourier Series Recall

75
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10
t
squaresignal,sw(t)
Ideally need infinite terms.
Fourier Series Recall

How transform coding can lead to data compression?
− Although each pixel 𝑥1 or 𝑥2 may take any value uniformly
between 0 (black) and its maximum value 255 (white), since
there is a high correlation (similarity) between them, then it is
most likely that their joint occurrences lie mainly on a 45-degree
line.
− The joint occurrences on the new coordinates have a uniform
distribution along the 𝒚 𝟏 axis, but are highly peaked around
zero on the 𝒚 𝟐 axis.
− The 𝒚 𝟏 is called the average or DC value of 𝒙 𝟏 and 𝒙 𝟐
− The 𝒚 𝟐 represents residual differences of 𝒙 𝟏 and 𝒙 𝟐
− The normalization factor of
1
2
makes sure that the signal
energy due to transformation is not changed (Parseval
theorem).
76
Transform Coding
Joint occurrences of a
pair of pixels in one frame

− Transform domain coding is mainly used to remove the spatial redundancies in images by
mapping the pixels into a transform domain prior to data reduction.
− The strength of transform coding in achieving data compression is that the image energy of
most natural scenes is mainly concentrated in the low-frequency region, and hence into a few
transform coefficients.
− These coefficients can then be quantized with the aim of discarding insignificant coefficients,
without significantly affecting the reconstructed image quality.
77
Transform Coding

Through transformation, a group of correlated pixels are
converted into a group of none correlated coefficients.
− Only one coefficient becomes important, and the
rest carry non-significant energy.
− The larger the number of pixels transformed together,
the better compression efficiency
− If pixels intensity variations match to the
transformation basis vectors, then only one
coefficient (apart from DC) becomes significant
(unitary/orthonormality).
79
Transform Coding

The choice of transform depends on a number of criteria:
1. Data in the transform domain should be decorrelated, i.e. separated into components
within minimal inter-dependence, and compact, i.e. most of the energy in the transformed
data should be concentrated into a small number of values.
2. The transform should be reversible.
3. The transform should be computationally tractable, e.g. low memory requirement,
achievable using limited-precision arithmetic, low number of arithmetic operations, etc.
80
The Choice of Transform Coding

− A group of U pixels in each line are
1-D transformed.
− This is repeated for V lines.
− A group of V coefficients in the
vertical directions are transformed.
− This is repeated for U columns.
− The final output is UV 2-D transform
coefficients.
− Transform coefficients are quantized
for compression.
− Compressed coefficients are inverse
transformed to reconstruct the
image.
81
What Is a Two Dimensional Transform?
One-dimensional transformation
in the Horizontal direction
One-dimensional transformation
in the Vertical direction
U
V
Normally U=V
2D Coeff.
1D Coeff.

− No reduction in data, just replacement (Replaces the original pixel samples with coefficients).
− Coefficients describe how the samples are changing.
− Helps to separate entropy from redundancy.
− DCT always performed on a block of samples.
Discrete Cosine Transform
82

Smallest DCT block is a 2x2 block.
• Top left coefficient is the DC coefficient → Describes the average of the 4 samples.
• Top right coefficient is the horizontal coefficient → Describes how the 4 samples are changing horizontally.
• Bottom left coefficient is the vertical coefficient → Describes how the 4 samples are changing vertically.
• Bottom right coefficient is the diagonal coefficient → Describes how the 4 samples are changing diagonally.
83
Original
pixel
samples
Original
pixel
samples
DCT Inverse
DCT
DC
Horizontal
coefficient
Diagonal
coefficient
Vertical
coefficient

255 255
0 0
255 0
255- 0
255 255-
0 0
84
255 0
255 0
255 255
=
σ𝑖=1
4
𝑃𝑖
2
W to B → 127.5
B to W → -127.5

127.5 127.5-
127.5 127.5-
127.5 127.5
127.5 127.5
127.5 127.5
127.5- 127.5-
127.5 127.5-
127.5- 127.5
85

– DCT always performed on a block of samples.
86

87
Detail in a Block vs. DCT Coefficients Transmitted

Most compression systems use an 8x8 DCT block.
• The top left coefficient is the DC coefficient.
• Top row are horizontal coefficients → Low frequency changes to the left, high to the right.
• Left column are vertical coefficients → Low frequency changes at the top, high at the bottom.
• The other coefficients for different angle/frequencies → Low frequency to the top left, & high to the bottom right.
88
Pixel Domain Frequency Domain

55 55
5555
109
55
109
109
55
5555
55
109 109 109 109
109 109 109 109 109
109 109 109 109 109 109
109 109 109 109 109 109
55 55 55 55 55 55
55 55 55 55 55 55
55 55 55 55 55 55
55 55 55 55 55 55
55
55
55
55
55
55
55
m = 0 1 2 3 4 5 6 7
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
f(m,n)
Spatial
8 x 8
Pixel
Values
89

NINT[ ]
NINT = Nearest INteger Truncation
602 -69
-63147
-50
0
-24
-45
0
22-52
0
0 16 21 14
-22 0 15 19 12
0 0 0 0 0 0
16 8 0 -5 -7 -4
0 0 0 0 0 0
-11 5 0 3 4 3
0 0 0 0 0 0
9 4 0 -3 -4 -2
0
-15
0
12
34
0
-29
u = 0 1 2 3 4 5 6 7
v = 0
v = 1
v = 2
v = 3
v = 4
v = 5
v = 6
v = 7
F(u,v)
Frequency
Domain
8 x 8
Transform
Values
90

91
DCT
Big number
somewhere
here

DCT
Big number
somewhere
here
92

DCT
Big number
somewhere
here
93

DCT
Big number
somewhere
here
DCT
Big number
somewhere
here
97

− The Forward DCT (FDCT) of an N × N sample block is given by
− The Inverse DCT (IDCT) is given by
− A is an N × N transform matrix. The elements of A are
− FDCT and IDCT may be written in summation form:
98

− Ex: The transform matrix A for a 4 × 4 DCT is:
99

− The output of a 2-dimensional FDCT is a set of N × N
coefficients representing the image block data in the DCT
domain which can be considered as ‘weights’ of a set of
standard basis patterns.
− The basis patterns for the 4 × 4 DCT are shown.
− The basis patterns are composed of combinations of
horizontal and vertical cosine functions.
− Any image block may be reconstructed by combining all
N × N basis patterns, with each basis multiplied by the
appropriate weighting factor (coefficient).
100

101
u = 0 u = 1 u = 2 u = 3
v = 0
v = 1
v = 2
v = 3
Discrete Cosine Transform (4×4 basis patterns)

102
𝑁 = 8 𝑎𝑛𝑑 𝑖, 𝑗 = 0, . . , 7
𝐼𝑛 𝑟𝑒𝑎𝑙 𝑐𝑜𝑑𝑒𝑐𝑠: −2048 ≤ 𝐷(𝑖, 𝑗) ≤ +2047

103
𝑁 = 8 𝑎𝑛𝑑 𝑖, 𝑗 = 0, . . , 7

Discrete Cosine Transform DCT:
− Basis Vectors:
𝐶𝑂𝑆
𝑘𝜋 2𝑛 + 1
2𝑁
𝑘 & 𝑛 = 0, … … . , 𝑁 − 1
− For orthonormality, transform coefficients
are divided by 𝑵
− Both transforms are orthonormal, but DCT
has a smooth varying basis vector that
matches natural images better.
104
DCT and Hadamard 8x8 Matrices
1 1 1 1 1 1 1 1
cos x cos3x sin3x sin x - sin x - sin3x - cos3x - cos x
cos2x sin2x -sin2x -cos2x -cos2x -sin2x sin2x cos2x
cos3x -sin x -cos x -sin3x sin3x cos x sin x -cos3x
1 -1 -1 1 1 -1 -1 1
sin3x -cos x sin x cos3x -cos3x -sin x cos x -sin3x
sin2x -cos2x cos2x -sin2x -sin2x cos2x -cos2x sin2x
sin x -sin3x cos3x -cos x cos x -cos3x sin3x -sin x
2
2
2
2
2
2
1 1 1 1 1 1 1 1
1 1 1 1 -1 -1 -1 -1
1 1 -1 -1 -1 -1 1 1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 -1 -1 1 -1 1 1 -1
1 -1 1 -1 -1 1 -1 1
1 -1 1 -1 1 -1 1 -1










11
11
nn
nn
n
HH
HH
H
With H0=1
Hadamanrd Transform

By subtracting 128 from each array.
Because the DCT is designed to work on
pixels values ranging from -128 to 127.
D TMT 
105
Discrete Cosine Transform Implementation Example

− DCT calculations are mathematically intensive.
− Easier to use simple matrix manipulation and a “look-up” matrix.
− “Look-up” matrix act like a key or look-up table.
− This “look-up” matrix is called the basis pictures.
− For a 2x2 DCT block the basis pictures are 4x4.
− For an 8x8 DCT block the basis pictures are 64x64.
106

1078 × 8 DCT basis patterns
Discrete Cosine Transform (8×8 DCT basis patterns)
− The basis patterns for the 8 × 8 DCT are shown.
− The basis patterns are composed of
combinations of horizontal and vertical cosine
functions.
− Any image block may be reconstructed by
combining all N × N basis patterns, with each
basis multiplied by the appropriate weighting
factor (coefficient).

108

Note that:
– Low-low coefficients are much larger than high-high coefficients
– While pixel values change at all positions, DCT values are mainly larger at low frequency.
8x8 DCT Example
112

8x8 pixels are coded and the lowest N out of 64 coefficients are retained for inverse DCT
8x8 DCT Example
113

DCT coding with increasingly coarse quantization, block size 8x8
Typical DCT Coding Artifacts
Quantizer Stepsize For AC Coefficients: 25 Quantizer Stepsize For AC Coefficients: 100 Quantizer Stepsize For AC Coefficients: 200
114

115

116
Image section showing 4 × 4 block
Original block DCT coefficients
4x4 DCT Example

117
Original block DCT coefficients
Block reconstructed from 1, 2, 3, 5 coefficients
4x4 DCT Example

118
Discrete Cosine Transform (DCT basis patterns Comparison)

Top Field and Bottom Field Pixels
119

120
Luminance MB structure in frame-organized DCT coding (for slow moving)
Luminance MB in field-organized DCT coding (for fast moving)
Blocks (8×8)MB (16×16)
Frame Type DCT vs. Field Type DCT
Blocks (8×8)MB (16×16)

121
Frame Type DCT vs. Field Type DCT

− The significant DCT coefficients of a block of
image or residual samples are typically the ‘low
frequency’ positions around the DC (0,0)
coefficient.
− Figure plots the probability of non-zero DCT
coefficients at each position in an 8 × 8 block.
− The non-zero DCT coefficients are clustered
around the top-left (DC) coefficient and the
distribution is roughly symmetrical in the
horizontal and vertical directions.
122
DCT Coefficient Distribution
8 × 8 DCT coefficient distribution (Frame)

− Histograms for 8x8 DCT coefficient amplitudes
measured for natural images (from
Mauersberger).
− DC coefficient is typically uniformly distributed.
− For the other coefficients, the distribution
resembles a Laplacian pdf.
123
Amplitude Distribution of the DCT Coefficients

− Figure plots the probability of non-zero DCT
coefficients for a residual field.
− The coefficients are clustered around the DC
position but are ‘skewed’, i.e. more non-zero
coefficients occur along the left-hand edge of
the plot.
− This is because a field picture may have a
stronger high-frequency component in the
vertical axis due to the subsampling in the
vertical direction, resulting in larger DCT
coefficients corresponding to vertical
frequencies.
124
DCT Coefficient Distribution
8 × 8 DCT coefficient distribution (Field)

− The zig-zag scan may not be ideal for a field block because of the skewed coefficient distribution, and a
modified scan order may be more effective for some field blocks, in which coefficients on the left hand
side of the block are scanned before the right hand side.
125
DCT Coefficient Scan
Zigzag scan example : frame block Zigzag scan example : field block

• .
127

• .
Normally small numbers
Normally big numbers
128

• .
Normally big numbers
Normally small numbers
Redundancy
Entropy
129

130
3-Dimensional DCT
− Remove spatiotemporal correlation
− Good for low motion video
− Bad for high motion video
− Frame storage → Large delay
1 1 1
3
0 0 0
8 (2 1) (2 1) (2 1)
( , , ) ( ) ( ) ( ) ( , , )cos cos cos
2 2 2
N N N
t x y
x u y v t w
F x y t C u C v C w x y t
N N N N
    
  
       
            

for 0,..., 1 , 0,..., 1 and 0,..., 1
1/ 2 for 0
where 8 and ( )
1 otherwise
u N v N w N
k
N C k
     
 
  


The transform should
– Minimize the correlation among resulting coefficients, so that scalar quantization can be employed
without losing too much in coding efficiency compared to vector quantization
– Compact the energy into as few coefficients as possible
Optimal transform
− Karhunan Loeve Transform (KLT)
• Signal statistics dependent
• It is an optimum transform, for complete decorrelation
Suboptimal transform
− Discrete Cosine transform (DCT): nearly as good as KLT for common image signals
− Hadamard transform with all elements of +1, -1.
131
Why DCT? What Block Size?

Properties of the DCT:
− Smoothly varying basis vector that matches natural
images better (better than Hadamard)
− Basis vectors are not sparse (better than DFT, that
has many zero valued coefficient at small block
sizes)
− Basis vectors closely match natural scenes as KLT,
but uses a fix and a fast transformation algorithm
(better than KLT).
132
5%
4%
3%
2%
1%
4x4 8x8 16x16 32x32 64x64
Block size
Mean-squared-error
DFT
HT
KLT & DCT

Equal number of retained coefficients
&

Properties of the DCT:
− Efficiency as a function of block size NxN,
measured for 8 bit quantization in the original
domain and equivalent quantization in the
transform domain
− Block size 8x8 is a good compromise.
133
Efficiency

− Wavelet is a non-periodic element, i.e. a mini wave.
− Uses a set of ‘mother wavelets’.
− Scale and transform actions possible.
− Better at high frequency capture.
− Less visual degradation than DCT.
− Graceful degradation at high compression.
− Good for audio compression.
Wavelet Coding
134

The ‘wavelet transform’ is based on sets of filters with coefficients that are equivalent to discrete wavelet
functions
− A pair of filters is applied to the signal to decompose it into a low frequency band (L) and a high
frequency band (H).
− Each band is subsampled by a factor of two, so that the two frequency bands each contain N/2 samples.
− With the correct choice of filters, this operation is reversible.
136
Wavelet

− This approach may be extended to apply to a 2-dimensional signal such as an intensity
image.
− Each row of a 2D image is filtered with a low-pass and a high-pass filter (Lx and Hx)
− The output of each filter is down-sampled by a factor of two to produce the intermediate
images L and H.
− L is the original image low-pass filtered and downsampled in the x-direction and H is the
original image high-pass filtered and downsampled in the x-direction.
− Each column of these new images is filtered with low- and high-pass filters (Ly and Hy)
− The output of each filter is down-sampled by a factor of two to produce four sub-images LL,
LH, HL and HH.
137
Wavelet

• ‘LL’ is the original image, low-pass filtered in
horizontal and vertical directions and subsampled
by a factor of two.
• ‘HL’ is high-pass filtered in the vertical direction and
contains residual vertical frequencies
• ‘LH’ is high-pass filtered in the horizontal direction
and contains residual horizontal frequencies
• ‘HH’ is high-pass filtered in both horizontal and
vertical directions.
− Between them, the four sub-band images contain all
of the information present in the original image but
the sparse nature of the LH, HL and HH sub-bands
makes them amenable to compression.
138
Wavelet

− In an image compression application, the 2-dimensional wavelet decomposition is applied again to the
‘LL’ image, forming four new sub-band images.
− The resulting low-pass image, always the top-left sub-band image, is iteratively filtered to create a tree of
sub-band images.
139
Wavelet

− Many of the samples (coefficients) in the higher-frequency sub-band images are close to zero, shown
here as near-black, and it is possible to achieve compression by removing these insignificant coefficients
prior to transmission.
− At the decoder, the original image is reconstructed by repeated up-sampling, filtering and addition,
reversing the order of operations.
140
Wavelet

141
Wavelet
LEVEL 3
LEVEL 2
LEVEL 1
LEVEL 0

Many coefficients in higher sub-bands, towards the bottom-right of
the figure, are near zero and may be quantized to zero without
significant loss of image quality.
− Non-zero coefficients tend to be related to structures in the
image; for example, the violin bow appears as a clear
horizontal structure in all the horizontal and diagonal sub-
bands.
− When a coefficient in a lower-frequency sub-band is non-zero,
there is a strong probability that coefficients in the
corresponding position in higher frequency sub-bands will also
be non-zero.
142
Wavelet Coefficient Scan
A typical distribution of 2D wavelet coefficients

We may consider a ‘tree’ of non-zero quantized coefficients,
starting with a ‘root’ in a low-frequency sub-band.
− A single coefficient in the LL band of layer 1 has one
corresponding coefficient in each of the other bands of layer
1, i.e. these four coefficients correspond to the same region
in the original image.
− The layer 1 coefficient position (parent coefficient) maps to
four corresponding child coefficient positions in each sub-
band at layer 2.
− Recall that the layer 2 sub-bands have twice the horizontal
and vertical resolution of the layer 1 sub-bands.
143
Wavelet Coefficient Scan
LL
child coefficient
child coefficient
child coefficient
root
Parent
coefficient
Parent
coefficient
Parent
coefficient

− Idea: Conditional coding of all descendants (incl.
children)
− significant coefficients: Coefficient magnitude > Threshold
− Four cases (The coefficients are coded by symbol P, N,
ZTR, or IZ)
• ZTR (Zero Tree Root): coefficient and all descendants are not
significant
• IZ (Isolated Zero): coefficient is not significant, but some
descendants are significant
• POS: POSitive significant (greater than the given threshold)
• NEG: NEGative significant (greater than the given threshold )
144
Zero Tree Encoding (Embedded Zero-tree Wavelet Algorithm)

− It is desirable to encode the non-zero wavelet coefficients as compactly as possible prior to entropy
coding.
− An efficient way of achieving this is to encode each tree of non-zero coefficients starting from the lowest
or root level of the decomposition.
− A coefficient at the lowest layer is encoded, followed by its child coefficients at the next layer up, and so
on. The encoding process continues until the tree reaches a zero-valued coefficient.
− Further children of a zero-valued coefficient are likely to be zero themselves and so the remaining children
are represented by a single code that identifies a tree of zeros (zero tree).
− The decoder reconstructs the coefficient map starting from the root of each tree; non-zero coefficients
are decoded and reconstructed and when a zerotree code is reached, all remaining ‘children’ are set to
zero.
− This is the basis of the embedded zero tree (EZW) method of encoding wavelet coefficients.
145
Zero Tree Encoding

146
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

Transformation does not result in compression by its own
− Due to linearity of transformation, energy in pixel domain= energy in transform domain
− But transformation concentrate the energy in a few transform coefficients
− It is the Quantisation of transform coefficients that lead to compression (bit rate reduction)
− Small valued transform coefficients are set to zero
147
Quantisation of DCT Coefficients

− A quantizer maps a signal with a range of values X to a quantized signal with a reduced range
of values Y.
− It should be possible to represent the quantized signal with fewer bits than the original since
the range of possible values is smaller.
− A scalar quantizer maps one sample of the input signal to one quantized output value
148
Quantisation
Quantizer
(Mapping)
X Y (with reduced range)
Y is presented with fewer bits

− A more general example of a uniform quantizer is:
149
Scalar Quantization
Quantizer
(Mapping)
𝑋 𝑌 = 𝐹𝑄. 𝑄𝑃
𝐹𝑄 = 𝑅𝑜𝑢𝑛𝑑 (
𝑋
𝑄𝑃
)
𝑄𝑃: 𝑎 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 ‘𝑠𝑡𝑒𝑝 𝑠𝑖𝑧𝑒’
𝐹𝑄: 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑟

− In image and video compression CODECs, the quantization operation is usually made up of two parts, a
forward quantizer FQ in the encoder and an ‘inverse quantizer’ or ‘rescaler’ (IQ) in the decoder.
− If the step size is large, the range of quantized values is small and can therefore be efficiently represented
and hence highly compressed during transmission, but the re-scaled values are a crude approximation
to the original signal.
− If the step size is small, the re-scaled values match the original signal more closely but the larger range of
quantized values reduces compression efficiency.
150
Quantization
Encoder
(FQ: Forward Quantizer )
Decoder
(IQ: Inverse Quantizer)
𝑌 = 𝐹𝑄. 𝑄𝑃
𝐹𝑄 = 𝑅𝑜𝑢𝑛𝑑 (
𝑋
𝑄𝑃
)
𝑋

151
Linear and non-Linear Scalar Quantizer

− A vector quantizer maps a set of input data such as a block of image samples to a single value
(codeword) and at the decoder, each codeword maps to an approximation to the original set of input
data, a ‘vector’.
− The set of vectors are stored at the encoder and decoder in a codebook.
152
Vector Quantization
Vector Quantizer
(Mapping)
A Set Of Input Data A Single Value
(Codeword)

1. Partition the original image into regions such as N × N pixel blocks.
2. Choose a vector from the codebook that matches the current region as closely as possible.
3. Transmit an index that identifies the chosen vector to the decoder.
4. At the decoder, reconstruct an approximate copy of the region using the selected vector.
153
A typical application of Vector Quantization
− Here, quantization is applied in the image
(spatial) domain, i.e. groups of image
samples are quantized as vectors
− But it can equally be applied to motion
compensated and/or transformed data.
Key issues: the design of the codebook and
efficient searching of the codebook to find
the optimal vector.

154
1
2
n
.
.
.
source image
codebook
1
2
n
.
.
.
codebook
i index of nearest codeword
decoded image
Vector Quantization

Equal distances between adjacent decision levels and between adjacent reconstruction levels
𝒕𝒍 − 𝒕𝒍−𝟏 = 𝒓𝒍 − 𝒓𝒍−𝟏 = 𝒒
• Parameters of Uniform Quantization
– R: Bit Resolution
– L: Levels (𝑳 = 𝟐 𝑹)
– B: Dynamic Range of input 𝑩 = 𝒇 𝒎𝒂𝒙 – 𝒇 𝒎𝒊𝒏
– q: Quantization interval (step size)
• Quantization function
156
𝒒 =
𝑩
𝑳
= 𝑩. 𝟐−𝑹
Uniform Quantization
q
q

Input signal is continuous
• The output of a Charge-Coupled Device (CCD) camera
is in the range of 0.0 to 5 volt.
• 𝑳 = 𝟐𝟓𝟔
– 𝑞 = 5 / 256
– The output value in the interval (𝒍 × 𝒒, (𝒍 + 𝟏) × 𝒒) is
represented by index 𝑙, 𝒍 = 𝟎, … , 𝟐𝟓𝟓.
– The reconstruction level
𝑸 𝒇 =
𝒇−𝒇 𝒎𝒊𝒏
𝒒
× 𝒒 +
𝒒
𝟐
+ 𝒇 𝒎𝒊𝒏 → 𝒓𝒍 = 𝒍 × 𝒒 +
𝒒
𝟐
𝒍 = 𝟎, … , 𝟐𝟓𝟓.
157
Example 1 of Uniform Quantizer

Input signal is discrete
• Digital Image of 256 gray levels is quantize it into 4
levels
– 𝑞 =
256
4
= 64
– The reconstruction level
𝑸 𝒇 =
𝒇 − 𝒇 𝒎𝒊𝒏
𝒒
× 𝒒 +
𝒒
𝟐
+ 𝒇 𝒎𝒊𝒏 → 𝑸 𝒇 =
𝒇
𝟔𝟒
× 𝟔𝟒 + 𝟑𝟐
158
Example 2 of Uniform Quantizer

159
Uniform Quantization on Images

160
Uniform Threshold Quantiser (UTQ)
− The class of quantiser that has
been used in all standard video
codecs is based around the so-
called Uniform Threshold Quantiser
(UTQ).
− It has equal step sizes with
reconstruction values pegged to
the centroid of the steps.
− The centroid value is typically
defined midway between
quantisation intervals.
q q

161
Uniform Threshold Quantiser (UTQ) and Bit Rate Control
− The DC coefficient has a fairly uniform
distribution.
− Although AC transform coefficients
have nonuniform characteristics, and
hence can be better quantised with
nonuniform quantiser step sizes, but bit
rate control would be easier if they
were quantised linearly.
− Hence, a key property of UTQ is that
the step sizes can be easily adapted
to facilitate bit rate control.
q q

162
Uniform Threshold Quantiser (UTQ) (a) with and (b) without dead zone
UTQ-DZ UTQ

163
− Typically, UTQ is used for quantising intraframe DC, F(0, 0), coefficients, while UTQ-DZ is used for the AC
and the DC coefficients of interframe prediction error.
− This is intended primarily to cause more nonsignificant AC coefficients to become zero, thus increasing
the compression.
For quantising intraframe DC, F(0, 0), coefficientsFor quantizing AC and the DC coefficients of interframe prediction error.

164
− Both quantisers are derived from the generic quantiser, where in UTQ, th is set to zero, but in UTQ-DZ, it is
set to q/2, and in the most inner region the th is allowed to vary between q/2 to q, just to increase the
number of zero-valued outputs. → Thus, the dead zone length can be from q to 2q.
− In some implementations (e.g. H.263 or MPEG-4), the decision and/or the reconstruction levels of the
UTQ-DZ quantiser might be shifted by q/4 or q/2.
th is allowed to vary
between q/2 to q

− In practice, rather than transmitting a quantised coefficient (= 𝑭(𝒖, 𝒗)) to the decoder, its ratio
to the quantiser step size, called Quantisation Index, I, is transmitted:
− The reason for defining the quantisation index is that it has a much smaller entropy than the
quantised coefficient. At the decoder, the reconstructed coefficients, 𝐹 𝑞
(𝑢, 𝑣), after inverse
quantisation, are given by
− If required, depending on the polarity of the index, an addition or subtraction of half the
quantisation step is required to deliver the centroid representation, reflecting the quantisation
characteristics in previous slide.
165
Quantization Index
Quantizer
(Mapping)
𝑫𝑪𝑻 𝒄𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝑭 𝒖, 𝒗 = 𝑰 𝒖, 𝒗 . 𝒒

− For the standard codecs, the quantiser step size q is fixed at 8
for UTQ, but varies from 2 to 62, in even step sizes, for the UTQ-
DZ (2,4,6,8,…,60,62).
− Hence, the entire quantiser range, or the quantiser parameter
Qp, can be defined with 5 bits (1–31).
− Uniform quantisers with and without dead zone can also be
used in DPCM coding of pixels. Here, threshold is set to zero,
th=0, and the quantisers are usually identified with odd and
even number of levels, respectively.
166
Quantization Step Size
even number of levels
odd number of levels

One of the main problems of linear quantisers in DPCM is that for lower bit
rates, the number of quantisation levels is limited and hence the quantiser
step size is large.
In coding of plain areas of the picture (In plane areas DPCM output is near
zero):
− If a quantiser with even number of levels is used, then the reconstructed
pixels oscillate between -q/2 and +q/2.
− This type of noise at these areas, in particular at low luminance levels, is
visible and is called granular noise.
− Larger quantiser step sizes with the odd number of levels (dead zone)
reduce the granular noise, but cause loss of pixel resolution at the plain
areas.
− This type of noise when the quantiser step size is relatively large is
annoying and is called the contouring noise. 167
Granular and Contouring Noises
even number of levels
odd number of levels

Banding, Contouring
Granular noise
− It can be seen that when the original analog input signal has a relatively constant amplitude, the
reconstructed signal has variations that were not present in the original signal.
168
8 bits 256 Levels 10 bits 1024 Levels
Granular and Contouring Noises

• .
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
169
÷ =
DifferentStep-sizes(Q)
Quantisation Matrix Quantised DCT CoefficientsDCT Coefficients
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0

• .
Quantisation
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 0 0
-1 7 -3 -2 1 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 1 1 2 2
1 1 1 1 1 2 2 2
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
170
÷ =

• .
Quantisation
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 0 0
-7 -12 8 -8 -1 -1 0 1
4 5 -7 1 2 -2 0 0
-5 -4 2 -1 1 0 0 0
-1 7 -1 -1 0 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 1 1 2 2
1 1 1 1 1 2 2 2
1 1 1 1 2 2 2 4
1 1 1 2 2 2 4 4
1 1 2 2 2 4 4 4
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
171
÷ =
Zig-zag Scanning for Separating Redundancy and Entropy

• .
Quantisation
238 -43 -12 -14 -6 0 -2 -4
39 12 -9 13 4 -1 -1 -2
-16 12 10 8 -1 3 2 0
-3 -7 1 -1 2 0 0 0
-7 -12 4 -4 0 0 0 0
4 2 -3 0 1 -1 0 0
-2 -2 1 0 0 0 0 0
0 3 0 0 0 0 0 0
1 1 1 1 1 1 2 2
1 1 1 1 1 2 2 2
1 1 1 1 2 2 2 4
1 1 1 2 2 2 4 4
1 1 2 2 2 4 4 4
1 2 2 2 4 4 4 8
2 2 2 4 4 4 8 8
2 2 4 4 4 8 8 8
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
172
÷ =

• .
Quantisation
1 1 1 2 2 2 4 4
1 1 2 2 2 4 4 4
1 2 2 2 4 4 4 8
2 2 2 4 4 4 8 8
2 2 4 4 4 8 8 8
2 4 4 4 8 8 8 16
4 4 4 8 8 8 16 16
4 4 8 8 8 16 16 16
238 -43 -12 -7 -3 0 -1 -2
39 12 -4 6 2 0 0 -1
-16 6 5 4 0 1 1 0
-1 -3 0 0 1 0 0 0
-3 -6 2 -2 0 0 0 0
2 1 -1 0 0 0 0 0
-1 -1 0 0 0 0 0 0
0 1 0 0 0 0 0 0
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
173
÷ =

• .
Quantisation
1 2 2 4 4 4 8 8
2 2 4 4 4 8 8 8
2 4 4 4 8 8 8 16
4 4 4 8 8 8 16 16
4 4 8 8 8 16 16 16
4 8 8 8 16 16 16 32
8 8 4 16 16 16 32 32
8 8 16 16 16 32 32 32
238 -21 -6 -3 -1 0 0 -1
19 6 -2 3 1 0 0 0
-8 3 2 2 0 0 0 0
0 -1 0 0 0 0 0 0
-1 -3 1 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
238 -43 -12 -14 -6 0 -4 -8
39 12 -9 13 4 -2 -3 -4
-16 12 10 8 -3 7 5 0
-3 -7 1 -3 5 1 -1 0
-7 -12 8 -8 -1 -3 0 2
4 5 -7 1 5 -4 -1 0
-5 -4 2 -3 2 0 1 0
-1 7 -3 -2 1 0 0 0
174
÷ =

• .
Zig-zag Scanning
175
DC and low frequency coefficients are first
and the high frequency coefficients are last.

• .
176
RedundancyEntropy
DC and low frequency coefficients are first
and the high frequency coefficients are last.

− Use uniform quantiser for each coefficient
− Different coefficients are quantized with different step-sizes (Q):
− Human eye is more sensitive to low frequency components
• Low frequency coefficients with a smaller Q
• High frequency coefficients with a larger Q
− Specified in a normalization matrix (Standard Quantization Matrix)
− Normalization matrix can then be scaled by a scale factor
177
Different Step-sizes (Q)
(JPEG Standard Quantization Matrix)

In JPEG we have quality level from 1 to 100.
− With a quality level 50 we get high compression and excellent
decompressed image quality (Standard Quantization Matrix).
− For a quality level grater than 5o (less compression, higher image
quality), the standard quantization matrix is multiplied by
𝟏𝟎𝟎 − 𝑸𝒖𝒂𝒍𝒊𝒕𝒚 𝑳𝒆𝒗𝒆𝒍
𝟓𝟎
− For a quality level less than 50 (more compression, lower image quality),
the standard quantizarion matrix is multiplied by
𝟓𝟎
𝑸𝒖𝒂𝒍𝒊𝒕𝒚 𝑳𝒆𝒗𝒆𝒍
− The quantization matrix is then rounded and clipped to have positive
integer values ranging from 1 to 255.
178
Ex: Different Quality Level in JPEG by Quantization Matrix
(JPEG Standard Quantization Matrix)

By subtracting 128 from each array.
Because the DCT is designed to work on
pixels values ranging from -128 to 127.
D TMT 
179

180
(Standard Quantization Matrix)

181
Ex: Quantization with Matrix Q50 in JPEG

182
Ex: Inverse Quantitation in JPEG

183
Ex: Inverse DCT and adding 128 in JPEG
N=

184
Ex: Comparison between Original and Decompressed Block

187
Ex: JPEG
Original Quality 50, 84% Zeros

Example: Quantized Indices
Default Normalization
Matrix in JPEG
188
QM(i,j)

Quantized coefficients ratios to their quantizer step sizes give the indices
Previous matrix elements
(Normalization Matrix)
Example: Quantized Indices
189

Multiple of indices to the step size results
in quantized coefficients values to be
used for inverse transform
Example: Quantized Coefficients
190

OriginalCompressed
Compare pixel wise
Example: Reconstructed Image
191

192
Quantization Noise and Bit Resolution

193
Quantization Noise
Zoom in of Staircase
− Pink dots show that analog range that maps to an ADC Value.
− Black arrows show the Quantization error for 2 points.
PDF of Quantization Error
Slope=1

− Quantization error is uniformly distributed.
− Integrates to 1
194
Quantization Noise
Slope=1

− RMS value for a full scale sinusoidal input is:
− Then
195
Quantization Noise and SQNR
2 𝑁. ∆
= 𝑺𝑸𝑵𝑹 (𝒅𝑩)

196
𝑆𝑄𝑁𝑅 = 10 log
𝑅𝑀𝑆 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
𝑅𝑀𝑆 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑁𝑜𝑖𝑠𝑒 𝑃𝑜𝑤𝑒𝑟
= 6𝐵 + 1.78
𝑃𝑆𝑁𝑅 = 10 log
𝑃𝑒𝑎𝑘 𝑆𝑖𝑔𝑛𝑎𝑙 𝑃𝑜𝑤𝑒𝑟
=?
=
×
=(
(2𝐴)2
(
𝐴
2
)2
)×
=8
𝑷𝑺𝑵𝑹 = 10 log 8 ×
= 10 log 8 + (6𝐵 + 1.78) ≈ 𝟔𝑩 + 𝟏𝟏 (𝒅𝑩)
PSNR for a Sin Waveform
2𝐴

197
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video

198
Elementary Information Theory
− How much information does a symbol convey?
− Intuitively, the more unpredictable or surprising it is, the more information is conveyed.
− Conversely, if we strongly expected something, and it occurs, we have not learnt very much

199
− If p is the probability that a symbol will occur
− The amount of information, I, conveyed is:
− The information, I, is measured in bits
− It is the optimum code length for the symbol
− The entropy, H, is the average information per symbol
− Provides a lower bound on the compression that can be achieved
𝑰 = 𝐥𝐨𝐠 𝟐
𝟏
𝒑
𝐻 = ෍
𝑠
𝑝 𝑠 log2
1
𝑝(𝑠)

200
A simple example
− Suppose we need to transmit four possible weather conditions:
1. Sunny
2. Cloudy
3. Rainy
4. Snowy
− If all conditions are equally likely, p(s)=0.25→H=2
– i.e. we need a minimum of 2 bits per symbol

201
A simple example
− Suppose we need to transmit four possible weather conditions:
1. Sunny 0.5 of the time
2. Cloudy 0.25 of the time
3. Rainy 0.125 of the time
4. Snowy 0.125 of the time
− Then the entropy is
− i.e. we need a minimum of 1.75 bits per symbol
75.175.05.05.0
3125.02225.015.0
125.0
1
log125.02
25.0
1
log25.0
5.0
1
log5.0 222



H
H
H

– It reduces amount of data or bit rate.
– Truly lossless.
– Different types
• Fractal Coding
• Run Length Coding (RLC) or Run Level Encoding
• Variable Length Coding (VLC)– [ie Huffman/Arithmetic]
• Wavelet Coding
– Compression systems often do not use all of them together.
– Some systems combine different types.
Entropy Coding
202

− Resulting from studies by Benoit Mandlebrot.
− Images are self similar.
− Self similar shapes are called fractals.
− Scale, stretch, rotate, mirror, skew actions possible.
− Computationally intensive.
− Requires multiple sweeps.
− Difficult to do on video in real time.
Fractal Coding
203

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 19[2]
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24[0]
2 3 4 5 6 7 8 9 10 11 12 13 14 15 2:15
8 7 4 5 6 8 6 8 0 2 3 8 9 3 0 2 3 5 4 7 2 1 5 2 5326214
5472 = 1 152 = 4
023 = 2 8745 = 5
6868 = 3 893 = 6
• Replaces runs of
the same number
with a code …
… or
• Particular strings
of numbers with
a code.
204
Run Length Coding

205
Run Length Coding
Sample Block Zigzag Scanning (MPEG-2) for doing RLC

206
Run Length Coding
Sample Block Run-length Encoding (MPEG-2)

The output of the re-ordering process of transform coefficient is an array that typically contains
one or more clusters of non-zero coefficients near the start, followed by strings of zero coefficients.
− The large number of zero values may be encoded to represent them more compactly.
− The array of re-ordered coefficients are represented as (run,level) pairs where
run: indicates the number of zeros preceding a non-zero coefficient.
level: indicates the magnitude of the non-zero coefficient.
207
Run-Level Encoding

Example
1. Input array: 16,0,0,−3,5,6,0,0,0,0,−7
2. Output values: (0,16),(2,−3),(0,5),(0,6),(4,−7)
3. Each of these output values (run , level) is encoded as a separate symbol by the entropy encoder.
Three-dimensional’ Run-level Encoding
If ‘three-dimensional’ run-level encoding is used, each symbol encodes three quantities, run, level and
last.
In the example above, if –7 is the final non-zero coefficient, the 3-D values are:
(0, 16, 0), (2, −3, 0), (0, 5, 0), (0, 6, 0), (4, −7, 1)
The 1 in the final code indicates that this is the last non-zero coefficient in the block.
208
Run-Level Encoding

No. Code
0 = 0
+1 = 101
-1 = 100
+2 = 1101
-2 = 1100
+3 = 11101
-3 = 11100
+4 = 111101
-4 = 111100
+5 = 1111101
-5 = 1111100
. .
. .
Code table
Original numbers
Codes
+1 -3 0 0 +4 -5 +2 -1 0 +1 +3
101101111001011110001011110000101111000011110101111000011110111111001101100010111101
11 x 8bits
= 88 bits
39 bits
Variable Length Coding
209
Commonly
occurring numbers
Rare occurring
numbers

• .
Code table
Regenerated numbers
Codes 101111000011110111111001101100010111101
+1+1 -3+1 -3 0+1 -3 0 0+1 -3 0 0 +4+1 -3 0 0 +4 -5 +2 -1 0 +1 +3
No. Code
0 = 0
+1 = 101
-1 = 100
+2 = 1101
-2 = 1100
+3 = 11101
-3 = 11100
+4 = 111101
-4 = 111100
+5 = 1111101
-5 = 1111100
. .
. .
210
Commonly
occurring numbers
Rare occurring
numbers

211
Variable-length Encoding of Sample Block Coefficients (MPEG-2)

– True data reduction.
– Totally lossless.
– Replaces numbers with codes.
• Run length coding can also be called entropy coding.
– Commonly occurring numbers have a small code & rare numbers have a bigger code.
– Relies on common numbers occurring a lot.
212

− The lengths of the codes should vary inversely with the probability of occurrences of the various symbols in
VLC.
− The bit rate required to code these symbols is the inverse of the logarithm of probability, p, at base 2 (bits),
that is,𝐥𝐨𝐠 𝟐 𝟏/𝐩.
− Hence, the entropy of the symbols, which is the minimum average bits required to code the symbols, can
be calculated as
There are two types of VLC, Huffman and Arithmetic coding.
− It is noted that Huffman coding is a simple VLC, but its compression can never reach as low as the entropy
due to the constraint that the assigned symbols must have an integral number of bits.
− However, the arithmetic coding can approach the entropy since the symbols are not coded individually.
213
𝐻 𝑥 = ෍
𝑠
𝑝 𝑠 log2
1
𝑝(𝑠)
= − ෍
𝑖=0
𝑛
𝑃𝑖 log2 𝑃𝑖

Huffman Coding
− Huffman codes can be used to compress information
− Like WinZip – although WinZip doesn’t use the Huffman algorithm
− JPEGs do use Huffman as part of their compression process
− The basic idea is that instead of storing each character in a file as an 8-bit ASCII value, we
will instead store the more frequently occurring characters using fewer bits and less
frequently occurring characters using more bits
− On average this should decrease the filesize (usually ½)
214

Huffman Coding
− As an example, lets take the string:
“duke blue devils”
− First, a frequency count of the characters:
e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
− Next, use a Greedy algorithm to build up a Huffman Tree
− We start with nodes for each character
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
215

Huffman Coding
To pick the nodes with the smallest frequency and combine them together to form a new node
– The selection of these nodes is the Greedy part
• The two selected nodes are removed from the set, but replace by the combined node
• This continues until we have only 1 node left in the set
216
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1i,1 s,1

Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
217

Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1
i,1 s,1
2
218

Huffman Coding
e,3 d,2 u,2 l,2 sp,2 k,1
b,1 v,1 i,1 s,1
22
219

Huffman Coding
e,3 d,2 u,2 l,2 sp,2
k,1 i,1 s,1
2
b,1 v,1
2
3
220

Huffman Coding
e,3 d,2 u,2
l,2 sp,2 k,1 i,1 s,1
2
b,1 v,1
2
34
221

Huffman Coding
e,3
d,2 u,2 l,2 sp,2 k,1 i,1 s,1
2
b,1 v,1
2
344
222

Huffman Coding
e,3
d,2 u,2 l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
44 5
223

Huffman Coding
e,3
d,2 u,2
l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
4
4
57
224

Huffman Coding
e,3
d,2 u,2 l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
44 5
7 9
225

Huffman Coding
e,3
d,2 u,2 l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
44 5
7 9
16
226

Huffman Coding
e 00
d 010
u 011
l 100
sp 101
i 1100
s 1101
k 1110
b 11110
v 11111
− Now we assign codes to the tree by placing
– 0 on every left branch
– 1 on every right branch
− A traversal of the tree from root to leaf give the Huffman
code for that particular leaf character
− Note that no code is the prefix of another code
227
e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
0
1
0 0
0 0
0
1 1
1
1
Root
1
1
11
0
0
0
l 100
e,3
d,2 u,2 l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
44 5
7 9
16

Huffman Coding
− These codes are then used to encode the string
− Thus, “duke blue devils” turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101
− When grouped into 8-bit bytes:
01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx
− Thus it takes 7 bytes of space (as compressed)
− Compare it to 16 characters with 1 byte/char → 16 bytes uncompressed
228
e 00
d 010
u 011
l 100
sp 101
i 1100
s 1101
k 1110
b 11110
v 11111
0
1
0 0
0 0
0
1 1
1
1
Root
1
1
11
0
0
0
e,3
d,2 u,2 l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
44 5
7 9
16

Huffman Coding
− Uncompressing works by reading in the file bit by bit
• After getting the first bit, start from the root of the tree
• If a 0 is read, head left
• If a 1 is read, head right
• When a leaf is reached decode that character and
start over again at the root of the tree
− Thus, we need to save Huffman table information as a
header in the compressed file
• Doesn’t add a significant amount of size to the file for
large files (which are the ones you want to compress
anyway)
• Or we could use a fixed universal set of codes /
frequencies.
229
e 00
d 010
u 011
l 100
sp 101
i 1100
s 1101
k 1110
b 11110
v 11111
0
1
0 0
0 0
0
1 1
1
1
Root
1
1
11
0
0
0
e,3
d,2 u,2 l,2 sp,2
k,1i,1 s,1
2
b,1 v,1
2
3
44 5
7 9
16

− Table lists the probabilities of the most commonly-occurring motion vectors in the encoded
sequence and their information content, 𝐥𝐨𝐠 𝟐 𝟏/𝐩.
− To achieve optimum compression, each value should be represented with exactly 𝐥𝐨𝐠 𝟐 𝟏/𝐩
bits.
− ‘0’ is the most common value and the probability drops for larger motion vectors.
230
Example : Huffman Coding, Sequence of Motion Vectors

1. Generating the Huffman code tree
− To generate a Huffman code table for this set of
data, the following iterative procedure is carried out.
The procedure is repeated until there is a single ‘root’
node that contains all other nodes and data items
listed ‘beneath’ it.
1. Order the list of data in increasing order of probability.
2. Combine the two lowest-probability data items into a
‘node’ and assign the joint probability of the data items
to this node.
3. Re-order the remaining data items and node(s) in
increasing order of probability and repeat step 2.
231
P=0.6

P=0.6
1. Generating the Huffman code tree (Cont.)
− Original list:
The data items are shown as square boxes. Vectors (−2) and (+2) have the lowest
probability and these are the first candidates for merging to form node ‘A’.
− Stage 1:
The newly-created node ‘A’, shown as a circle, has a probability of 0.2, from the
combined probabilities of (−2) and (2).
There are now three items with probability 0.2.
Choose vectors (−1) and (1) and merge to form node ‘B’.
− Stage 2:
A now has the lowest probability (0.2) followed by B and the vector 0; choose A and B
as the next candidates for merging to form ‘C’.
− Stage 3:
Node C and vector (0) are merged to form ‘D’. Final tree: The data items have all
been incorporated into a binary ‘tree’ containing five data values and four nodes.
Each data item is a ‘leaf’ of the tree.
232

P=0.6
2. Encoding
− Each ‘leaf’ of the binary tree is mapped to a variable-length code. To find this code, the tree is traversed
from the root node, D in this case, to the leaf or data item.
− For every branch, a 0 or 1 is appended to the code, 0 for an upper branch, 1 for a lower branch.
− The lengths of the Huffman codes, each an integral number of bits, do not match the ideal lengths given
by log2 1/p.
− For example, the series of vectors (1, 0, −2) would be transmitted as the binary sequence 0111000.
233

3. Decoding
− The decoder must have a local copy of the Huffman code tree or look-up table (Note that once the tree
has been generated in Encoding, the codes may be stored in a look-up table).
− This may be achieved by transmitting the look-up table itself or by sending the list of data and probabilities
prior to sending the coded data.
− Each uniquely-decodeable code is converted back to the original data.
234
Example : Huffman Coding, Sequence oF Motion Vectors
P=0.6

− The Huffman coding process has two disadvantages for a practical video CODEC.
I. The encoder needs to transmit the information contained in the probability table before the decoder can decode
the bit stream and this extra overhead reduces compression efficiency, particularly for shorter video sequences.
II. The probability table for a large video sequence(to generate the Huffman tree) cannot be calculated until after
the video data is encoded which may introduce an unacceptable delay into the encoding process.
− For these reasons, image and video coding standards define sets of codewords based on the probability
distributions of ‘generic’ video material.
− The main differences from ‘true’ Huffman coding are
I. The codewords are pre-calculated based on ‘generic’ probability distributions
II. In the case of TCOEF (Transform coefficient), only 102 commonly-occurring symbols have defined codewords and
any other symbol encoded using a fixed-length code.
236
Pre-calculated Huffman-based Coding

237
Pre-calculated Huffman-based Coding
MPEG4 TCOEF VLCs (partial) (Some of the codes shown in left table
are represented in ‘tree’ form in this figure)
MPEG-4 Visual Transform Coefficient
(TCOEF) VLCs : partial, all codes < 9 bits
MPEG4 Motion Vector Difference (MVD) VLCs
− The following two examples of pre-calculated VLC tables are taken from MPEG-4 Visual (Simple Profile).

− Minimum assigned bit is 1, but for highly probable symbols it can be much less (e.g.
− 𝐥𝐨𝐠 𝟐 𝟎. 𝟗𝟓 ≈ 𝟎
− A scheme using an integral number of bits for each data symbol such as Huffman coding is unlikely to
come so close to the optimum number of bits
− The fractional bits can only be assigned, if symbols are coded together:
− Some with high bits and some with ZERO bits
− This is possible if ZERO bit is assigned to high probable symbols
− Arithmetic coding does this!
238
Problems with Huffman

– A form of variable length coding.
– Better than Huffman coding.
– Takes longer to do than Huffman coding.
– More delicate than Huffman coding.
– More limiting than Huffman coding.
– Subject to patents and royalty payments.
– IBM, AT&T, Mitsubishi.
Arithmetic Coding
239

The fundamental idea is to use a scale in which the coding intervals of real numbers between 0
and 1 are represented.
– This is in fact the cumulative probability density function of all the symbols which add up to 1.
– The interval needed to represent the message becomes smaller as the message becomes
longer, and the number of bits needed to specify that interval is increased.
– According to the symbol probabilities generated by the model, the size of the interval is
reduced by successive symbols of the message.
– The more likely symbols reduce the range less than the less likely ones and hence they
contribute fewer bits to the message.
Arithmetic Coding
240

– Once the symbol probability is known, each individual symbol needs to be assigned a portion of the [0, 1)
range that corresponds to its probability of appearance in the cumulative density function.
– The character range is [lower, upper).
– The most significant portion of an arithmetic coded message is the first symbol to be encoded.
– Ex: Message eaii! →
• The first symbol to be coded is e
• The symbol ! that is known by both decoder and encoder is used for the end of decoding symbol,
and the decoding process is terminated.
– After the first character is encoded, we know that the lower number and the upper number now bind our
range for the output.
– Each new symbol to be encoded will further restrict the possible range of the output number during the
rest of the encoding process.
Arithmetic Coding
241

Symbol Probability Range
a 0.2 [0.0, 0.2)
e 0.3 [0.2, 0.5)
i 0.1 [0.5, 0.6)
o 0.2 [0.6, 0.8)
u 0.1 [0.8, 0.9)
! 0.1 [0.9, 1.0)
New character Range
Initially: [0, 1)
After seeing a symbol: e [0.2, 0.5)
a [0.2, 0.26)
i [0.23, 0.236)
i [0.233, 0.2336)
! [0.23354, 0.2336)
Arithmetic Coding
242
Example:1, To code a set of symbols eaii!
– To explain how arithmetic coding works, a fixed-model arithmetic code is used in the example for easy
illustration.
– Suppose the alphabet is {a, e, i, o, u, !}, and the fixed model is used with the probabilities shown in Table.
– Ex: The final coded message has to be a number greater than or equal to 0.2 and less than 0.5 for e.
The final range, [0.23354, 0.2336), represents the message eaii!. This means that if we transmit any
number in the range of 0.23354 ≤ x < 0.2336, that number represents the whole message of eaii!.

!
o
u
e
i
a
Nothing
!
o
u
e
i
a
!
o
u
e
i
a
!
o
u
e
i
a
!
o
u
e
i
a
0.5
e a i !
!
o
u
e
i
a
i0.2360.26 0.2336
0.2
0.2336
0.2 0.23 0.233 0.23354
0.0+(0.2)x(1.0)
=0.2
0.0+(0.5)x(1.0)
=0.5
0.2+(0.0)x(0.3)
=0.2
0.2+(0.2)x(0.3)
=0.26
0.2+(0.5)x(0.06)
=0.23
0.2+(0.6)x(0.06)
=0.236
0.23+(0.5)x(0.006)
=0.233
0.23+(0.6)x(0.006)
=0.2336
0.233+(0.9)x(0.0006)
=0.23354
0.233+(1.0)x(0.0006)
=0.2336
Encode number =(0.5)x(0.00006)+0.23354=0.23355
243
a 0.2 [0.0, 0.2)
e 0.3 [0.2, 0.5)
i 0.1 [0.5, 0.6)
o 0.2 [0.6, 0.8)
u 0.1 [0.8, 0.9)
! 0.1 [0.9, 1.0)
Ex1: To code a set of symbols eaii!
To encode number of e 1.0
0.9
0.8
0.6
0.5
0.2
0.0
1.0
0.3
0.06
0.006
0.0006
0.00006
1.0
0.9
0.8
0.6
0.5
0.2
0.0
1.0
0.9
0.8
0.6
0.5
0.2
0.0
1.0
0.9
0.8
0.6
0.5
0.2
0.0
1.0
0.9
0.8
0.6
0.5
0.2
0.0
1.0
0.9
0.8
0.6
0.5
0.2
0.0
The final range, [0.23354, 0.2336), represents the message eaii!. This means that if we transmit any
number in the range of 0.23354 ≤ x < 0.2336, that number represents the whole message of eaii!.

!
o
u
e
i
a
1.0
0.0
!
o
u
e
i
a
!
o
u
e
i
a
!
o
u
e
i
a
a i !
!
o
u
e
i
a
i
0.120.2 0.112
0.0
0.112
0.1 0.11 0.1118
(0.0)x(1.0)
+0.0=0.0
(0.2)x(1.0)
+0.0=0.2
(0.5)x(0.2)
+0.0=0.1
(0.6)x(0.2)
+0.0=0.12
(0.5)x(0.02)
+0.1=0.11
(0.6)x(0.02)
+0.1=0.112
(0.9)x(0.002)
+0.11=0.1118
(1.0)x(0.002)
+0.1=0.112
Encode number =(0.2)x(0.0002)+0.1118=0.1119
To encode number of a
244
Ex2: To code a set of symbols aii!
0.9
0.8
0.6
0.5
0.2
a 0.2 [0.0, 0.2)
e 0.3 [0.2, 0.5)
i 0.1 [0.5, 0.6)
o 0.2 [0.6, 0.8)
u 0.1 [0.8, 0.9)
! 0.1 [0.9, 1.0)
The final range, [0.1118, 0.112), represents the message aii!. This means that if we transmit any
number in the range of 0.1118 ≤ x < 0.112, that number represents the whole message of aii!.
Arithmetic Coding

Decoding for Ex: 1
− In general, the decoding process can be formulated as:
• Where 𝑅 𝑛 is a code within the range of lower value 𝐿 𝑛 and upper value 𝑈 𝑛 of the nth symbol.
• 𝑅 𝑛+1 is the code for the next symbol.
Arithmetic Coding
245
𝑹 𝒏+𝟏 =
𝑹 𝒏 − 𝑳 𝒏
𝑼 𝒏 − 𝑳 𝒏
Corresponding
Range (𝑳 𝒏, 𝑼 𝒏] Output symbol
𝑅𝑒𝑐𝑖𝑒𝑣𝑒𝑑 𝐶𝑜𝑑𝑒= 0.23355 [0.2, 0.5) e
𝑹 𝒏+𝟏 =
0.23355−0.2
0.5−0.2
=0.11185
[0, 0.2) a
𝑹 𝒏+𝟏 =
0.11185−0
0.2−0
=0.55925
[0.5, 0.6) i
𝑹 𝒏+𝟏 =
0.55925 −0.5
0.6−0.5
=0.5925
[0.5, 0.6) i
𝑹 𝒏+𝟏 =
0.5925−0.5
0.6−0.5
=0.925
[0.9, 1) !
a 0.2 [0.0, 0.2)
e 0.3 [0.2, 0.5)
i 0.1 [0.5, 0.6)
o 0.2 [0.6, 0.8)
u 0.1 [0.8, 0.9)
! 0.1 [0.9, 1.0)

246
Example 3
Motion vectors, sequence 1: probabilities and sub-ranges
Sub-range example
New Range (1)
New Range (2)
New Range (3)
New Range (4)
New Range (5)
New Range (1) New Range (2) New Range (3) New Range (4) New Range (5)

247
Example 3
Encoding Procedure for Vector Sequence (0, −1, 0, 2)
New Range (1)
New Range (2)
New Range (3)
New Range (4)
+0.3×1
+0.7×1
+0.1×0.4
+0.3×0.4
+0.3×0.08
+0.7×0.08
+0.9×0.032
+1×0.032
New Range (5)

248
Example 3
Decoding Procedure
New Range (1)
New Range (2)
New Range (3)
New Range (4)
𝑹 𝒏+𝟏 =
𝑹 𝒏 − 𝑳 𝒏
𝑼 𝒏 − 𝑳 𝒏
Corresponding
Range (𝑳 𝒏, 𝑼 𝒏] Output symbol
𝑅𝑒𝑐𝑖𝑒𝑣𝑒𝑑 𝐶𝑜𝑑𝑒= 0.394 [0.3, 0.7) 0
𝑹 𝒏+𝟏 =
0.394−0.3
0.7−0.3
=0.235
[0.1, 0.3) -1
𝑹 𝒏+𝟏 =
0.235−0.1
0.3−0.1 =0.675
[0.3, 0.7) 0
𝑹 𝒏+𝟏 =
0.675 −0.3
0.7−0.3
=0.9375
[0.9, 1) +2
Reasonable Approach
Decoder do not have it!!

The principal advantage of arithmetic coding
− The transmitted number, 0.394 in this case, which may be
represented as a fixed-point number with sufficient accuracy
using 9 bits, is not constrained to an integral number of bits for
each transmitted data symbol.
− To achieve optimal compression, the sequence of data symbols
should be represented with:
−(𝐥𝐨𝐠 𝟐 𝑷 𝟎 + 𝐥𝐨𝐠 𝟐 𝑷−𝟏 + 𝐥𝐨𝐠 𝟐 𝑷 𝟎 + 𝐥𝐨𝐠 𝟐 𝑷 𝟐) = 𝟖. 𝟐𝟖 𝒃𝒊𝒕𝒔
− In this example, arithmetic coding achieves 9 bits, which is close
to optimum.
249
Example 3 (Cont.)
(0, −1, 0, 2)

Questions??
Discussion!!
Suggestions!!
Criticism!!
250

Video Compression, Part 2-Section 1, Video Coding Concepts

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Video Compression, Part 2-Section 1, Video Coding Concepts

Similar to Video Compression, Part 2-Section 1, Video Coding Concepts (20)

More from Dr. Mohieddin Moradi

More from Dr. Mohieddin Moradi (9)

Recently uploaded

Recently uploaded (20)

Video Compression, Part 2-Section 1, Video Coding Concepts