Student I.D.: 1000-616 811
Date: April 29,2010
Encoder block diagram of H.264
Decoder block diagram of H.264 
Variable block sizes used for motion compensation 
• Intra 4*4 prediction modes and prediction directions 
Intra 4*4 supports 9 modes
Mode 2 is the dc mode.
I through M are the previously
Intra 4*4 prediction modes and prediction directions 
Intra 4*4 prediction modes and prediction directions 
Intra 16*16 supports 4 modes
Intra 16*16 prediction modes and prediction directions
• The union of all mode evaluations, cost comparisons and
exhaustive search inside ME cause a great amount of time
spent by the encoder
• Complex and exhaustive ME evaluation is the key to good
performance achieved by H.264,
• but the cost is in the encoding time
• Data mining is used to reduce complexity coding mode
• Some metrics are fixed based on observation of
structural similarities present in video.
• Frame residual information
• Correlation of collocated MB mode type
• Mean , Variance
Approach for reducing H.264 encoder
• Determine MB coding mode decisions using features
derived from uncompressed video.
• Machine learning algorithm is used to deduce decision
tree, based on such features
• Once the tree is trained, replace with decision tree
• Data mining algorithms can detect correlations easily
• Tree is trained with a set of MB metrics.
Applying machine learning to video coding 
• Training of the decision tree is done offline.
• Attributes extracted and MB mode chosen by JM are
saved in an ARFF file.
• Next is the training stage where using WEKA tool, mode
decision trees are discovered through C4.5(J48) classifier
• C4.5 (known as a J48) is a system that constructs
• With learnt data, a classifier accurately predicts the class
to which a new case belongs.
• C4.5 first grows an initial tree using divide-and-conquer.
• Grow a tree and minimize entropy in the subsets.
Foreman image divided into multiple block sizes of 16x16, 16x8,
8x16,8x8,8x4,4x4 and 4x4 for Motion Estimation.
• For each frame and each MB of pixels, the following
metrics have to be calculated.
• 16*16 MB (mean and variance of 256 pixels)
• MB mean :
• MB Variance :
σ 2 16*16
Current 4*4 MB
• Mean 4 *4 :
X i ,
• Where i and j are indices of the image.
• Variance 4*4 :
X i , j 4*4
Intra Prediction modes
// 4x4 intra prediction modes
VERT_PRED = 0,
HOR_PRED = 1,
DC_PRED = 2,
DIAG_DOWN_LEFT_PRED = 3,
DIAG_DOWN_RIGHT_PRED = 4,
VERT_RIGHT_PRED = 5,
HOR_DOWN_PRED = 6,
VERT_LEFT_PRED = 7,
HOR_UP_PRED = 8
// 16x16 intra prediction modes
VERT_PRED_16 = 0,
HOR_PRED_16 = 1,
DC_PRED_16 = 2,
PLANE_16 = 3
C4.5 algorithm results.
Weka tool results
Figure 3 : Tree stucture for Container sequence from Weka tool.
Decision Tree implementation in JM 16.2 encoder.
Video Quality Assessment
• The metrics used to calculate the video quality of the
decoded sequence are
• Structural similarity index metric (SSIM)
• Peak to peak signal to noise ratio (PSNR)
• Mean square error (MSE)
• The MSE and its derivative PSNR are conventional metrics to compare any
two images. MSE measures the difference between the original image
I(i,j)and decoded image I’(i,j), where I and I’ are of size (MxN).
PSNR is a logarithmic representation of the inverse of this measure.
( 2 8) 1 255
for 8 bit PCM ^
• PSNR is easy to compute and is well understood by most researchers.
• However both MSE and PSNR do not correlate well with the subjective
quality of the reconstructed images.
• The subtle differences between degradations of different intensities are
not properly reflected using PSNR.
• The SSIM has proven to be a metric that is closest to the human perception
of the received video sequence. 
• SSIM is defined as the product of three local quantities: luminance
comparisons(function of mean), contrast comparisons(function of
variance), and structure comparisons (function of correlation coefficient
SSIM measurement system
Block diagram of Structural Similarity
Table 1: Comparison of intra coding mode decisions in JM and JM machine learning.
Encoding time in seconds and indicates the results for
Table 2: Encoding time and ME time results obtained using JM 16.2 and JM using machine
Table 3: Speed up in encoding time and motion estimation time using machine learning compared
to JM 16.2 encoder.
Table 4: Comparison of compressed file sizes.
Table 5: comparison of PSNR and MSE.
Table 6: Comparison of SSIM
• The results are tabulated in the Tables 7 through 11.
• From Table 8, it is clear that the average speed up in
the encoding time is 28.5%. The average speed up in
the motion estimation time is 42.846%.
• From table 9, the average percentage increase in
compressed file size is 0.36%.
• From Table 11, it is evident that the average decrease in
SSIM is less than 0.00107%.
• These results are in comparison of JM machine
learning with JM 16.2 encoder.
Snap shots of the video sequences used.
 Soon-kak Kwon, A. Tamhankar and K.R. Rao, ”Overview of H.264 / MPEG-4
Part 10”, J. Visual Communication and Image Representation, vol. 17,
pp.186-216, April 2006.
 P.Carrillo, H.Kalva, and T.Pin, “Low Complexity H.264 Video Encoding”,
Applications of Digital Image Processing XXXII.Proc. of SPIE Vol. 7443,
74430A, 2009 ·
A.Puri, X.Chen and A.Luthra, “Video Coding using the H.264/MPEG-4 AVC
compression Standard” Science Direct. Signal Processing: Image
Communication, Vol.19, pp.793–849, Oct. 2004.
 http://dms.irb.hr/tutorial/tut_dtrees.php for example of a dataset and its
 http://www.cs.waikato.ac.nz/ml/weka/ for Weka software tool and
*6+J.K. Lee, O. Yi, and M. Yung , “sIDMG: Small-Size Intrusion Detection Model
Generation of Complimenting Decision Tree Classification Algorithm”:
Springer, LNCS 4298, pp. 83–99, 2007.
C4.5: programs for machine learning by John Ross Quinlan from
 http://www.decisiontrees.net/node/21 to know about decision trees and
G.J.Sullivan and T.Wiegand, “ Video compression – from concepts to the H.264
/AVC Standard,” Proc. IEEE, Vol.93,no.1,pp.18-31,Jan.2005
A.Luthra,G.J.Sullivan and T.Wiegand,Eds., IEEE Trans. Circuits Systems Video
Technol.(Special ssue on the H.264/AVC Video coding standard, vol 13,no.7,
. www.vcodex.com for H.264 reference
. Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis
Lectures on Image, Video and Multimedia Processing. Morgan and Claypool,
 Z. Wang, A.C.Bovik, H.R.Sheikh and E.P.Simoncelli, "Image quality
assessment: from error visibility to structural similarity," IEEE Trans.Image
Processing, vol. 13, No. 4, pp. 600-612, April 2004.
I.Richardson, “The H. 264 Advanced Video Compression Standard”, John Wiley
& Sons Inc, June, 2006.
 I.Richardson, “The H. 264 Advanced Video Compression Standard”, John
Wiley & Sons Inc, Second edition, 2010.
 http://iphome.hhi.de/suehring/tml/download/ , JM
 http://trace.eas.asu.edu/yuv/index.html, Video
• ARFF – Attribute relation file format
• MB – Macroblock
• ME – Motion Estimation
Thank You ! !