06.11.2014 Views

Low Complexity H.264 Video Encoding using Machine Learning

Low Complexity H.264 Video Encoding using Machine Learning

Low Complexity H.264 Video Encoding using Machine Learning

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Thejaswini Purushotham

Student I.D.: 1000-616 811

Date: April 29,2010


Encoder block diagram of H.264[11]

4/30/2010 2


Decoder block diagram of H.264 [11]

4/30/2010 3


Encoder complexity

Variable block sizes used for motion compensation [2]

4/30/2010 4


• Intra 4*4 prediction modes and prediction directions [1]


Intra Prediction

Intra 4*4 supports 9 modes

Mode 2 is the dc mode.

I through M are the previously

decoded pixels.

Intra 4*4 prediction modes and prediction directions [1]

4/30/2010 6


Intra 4*4 prediction modes and prediction directions [1]

Intra 16*16 supports 4 modes


Intra 16*16 prediction modes and prediction directions


Encoder complexity

• The union of all mode evaluations, cost comparisons and

exhaustive search inside ME cause a great amount of time

spent by the encoder

• Complex and exhaustive ME evaluation is the key to good

performance achieved by H.264,

• but the cost is in the encoding time

4/30/2010 9


Data Mining

• Data mining is used to reduce complexity coding mode

decisions.

• Some metrics are fixed based on observation of

structural similarities present in video.

• Frame residual information

• Correlation of collocated MB mode type

• Mean , Variance

4/30/2010 10


Approach for reducing H.264 encoder

implementation complexity.

• Determine MB coding mode decisions using features

derived from uncompressed video.

Machine learning algorithm is used to deduce decision

tree, based on such features

• Once the tree is trained, replace with decision tree

• Data mining algorithms can detect correlations easily

• Tree is trained with a set of MB metrics.

4/30/2010 11


Applying machine learning to video coding [2]

4/30/2010 12


Training

• Training of the decision tree is done offline.

• Attributes extracted and MB mode chosen by JM are

saved in an ARFF file.

• Next is the training stage where using WEKA tool, mode

decision trees are discovered through C4.5(J48) classifier

algorithm

4/30/2010 13


C4.5 Classifier

• C4.5 (known as a J48) is a system that constructs

classifiers.

• With learnt data, a classifier accurately predicts the class

to which a new case belongs.

• C4.5 first grows an initial tree using divide-and-conquer.

• Grow a tree and minimize entropy in the subsets.

4/30/2010 14


Foreman image divided into multiple block sizes of 16x16, 16x8,

8x16,8x8,8x4,4x4 and 4x4 for Motion Estimation.


Macroblock metrics

• For each frame and each MB of pixels, the following

metrics have to be calculated.

• 16*16 MB (mean and variance of 256 pixels)

• MB mean :

• MB Variance :

σ 2 16*16

4/30/2010 16


Current 4*4 MB

• Mean 4 *4 :

1

16

i

3

0

j

3

0

X i ,

j

k ,4*4

• Where i and j are indices of the image.

• Variance 4*4 :

1

16

j

3

0

i

3

0

(

2

X i , j 4*4

)

2

4*4

4/30/2010 17


Intra Prediction modes

// 4x4 intra prediction modes

enum {

VERT_PRED = 0,

HOR_PRED = 1,

DC_PRED = 2,

DIAG_DOWN_LEFT_PRED = 3,

DIAG_DOWN_RIGHT_PRED = 4,

VERT_RIGHT_PRED = 5,

HOR_DOWN_PRED = 6,

VERT_LEFT_PRED = 7,

HOR_UP_PRED = 8

} I4x4PredModes;

// 16x16 intra prediction modes

enum {

VERT_PRED_16 = 0,

HOR_PRED_16 = 1,

DC_PRED_16 = 2,

PLANE_16 = 3

} I16x16PredModes;

4/30/2010 18


C4.5 algorithm results.


Weka tool results


Figure 3 : Tree stucture for Container sequence from Weka tool.


Decision Tree implementation in JM 16.2 encoder.


Video Quality Assessment

• The metrics used to calculate the video quality of the

decoded sequence are

• Structural similarity index metric (SSIM)

• Peak to peak signal to noise ratio (PSNR)

• Mean square error (MSE)

23


• MSE

• The MSE and its derivative PSNR are conventional metrics to compare any

two images. MSE measures the difference between the original image

I(i,j)and decoded image I’(i,j), where I and I’ are of size (MxN).

PSNR

20*log10[255]

• PSNR

PSNR is a logarithmic representation of the inverse of this measure.

PSNR

255

20*log10[

]

sqrt(

MSE)

( 2 8) 1 255

for 8 bit PCM ^

)

24


• PSNR is easy to compute and is well understood by most researchers.

• However both MSE and PSNR do not correlate well with the subjective

quality of the reconstructed images.

• The subtle differences between degradations of different intensities are

not properly reflected using PSNR.

• The SSIM has proven to be a metric that is closest to the human perception

of the received video sequence. [12]

25


SSIM

• SSIM is defined as the product of three local quantities: luminance

comparisons(function of mean), contrast comparisons(function of

variance), and structure comparisons (function of correlation coefficient

and variance).

26


SSIM measurement system

Block diagram of Structural Similarity[13]

27


Table 1: Comparison of intra coding mode decisions in JM and JM machine learning.

Encoding time in seconds and indicates the results for

machine learning.


Table 2: Encoding time and ME time results obtained using JM 16.2 and JM using machine

learning .


Table 3: Speed up in encoding time and motion estimation time using machine learning compared

to JM 16.2 encoder.


Table 4: Comparison of compressed file sizes.


Table 5: comparison of PSNR and MSE.


Table 6: Comparison of SSIM


Conclusions

• The results are tabulated in the Tables 7 through 11.

• From Table 8, it is clear that the average speed up in

the encoding time is 28.5%. The average speed up in

the motion estimation time is 42.846%.

• From table 9, the average percentage increase in

compressed file size is 0.36%.

• From Table 11, it is evident that the average decrease in

SSIM is less than 0.00107%.

• These results are in comparison of JM machine

learning with JM 16.2 encoder.


Snap shots of the video sequences used.


References

[1] Soon-kak Kwon, A. Tamhankar and K.R. Rao, ”Overview of H.264 / MPEG-4

Part 10”, J. Visual Communication and Image Representation, vol. 17,

pp.186-216, April 2006.

[2] P.Carrillo, H.Kalva, and T.Pin, “Low Complexity H.264 Video Encoding”,

Applications of Digital Image Processing XXXII.Proc. of SPIE Vol. 7443,

74430A, 2009 ·

[3]A.Puri, X.Chen and A.Luthra, “Video Coding using the H.264/MPEG-4 AVC

compression Standard” Science Direct. Signal Processing: Image

Communication, Vol.19, pp.793–849, Oct. 2004.

4/30/2010 40


References

[4] http://dms.irb.hr/tutorial/tut_dtrees.php for example of a dataset and its

attributes

[5] http://www.cs.waikato.ac.nz/ml/weka/ for Weka software tool and

documentation

*6+J.K. Lee, O. Yi, and M. Yung , “sIDMG: Small-Size Intrusion Detection Model

Generation of Complimenting Decision Tree Classification Algorithm”:

Springer, LNCS 4298, pp. 83–99, 2007.

[7]C4.5: programs for machine learning by John Ross Quinlan from

www.google.com

[8] http://www.decisiontrees.net/node/21 to know about decision trees and

their generation

4/30/2010 41


References

[9]G.J.Sullivan and T.Wiegand, “ Video compression – from concepts to the H.264

/AVC Standard,” Proc. IEEE, Vol.93,no.1,pp.18-31,Jan.2005

[10]A.Luthra,G.J.Sullivan and T.Wiegand,Eds., IEEE Trans. Circuits Systems Video

Technol.(Special ssue on the H.264/AVC Video coding standard, vol 13,no.7,

July 2003.

[11]. www.vcodex.com for H.264 reference

[12]. Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis

Lectures on Image, Video and Multimedia Processing. Morgan and Claypool,

2006.

[13] Z. Wang, A.C.Bovik, H.R.Sheikh and E.P.Simoncelli, "Image quality

assessment: from error visibility to structural similarity," IEEE Trans.Image

Processing, vol. 13, No. 4, pp. 600-612, April 2004.

[14]I.Richardson, “The H. 264 Advanced Video Compression Standard”, John Wiley

& Sons Inc, June, 2006.

[15] I.Richardson, “The H. 264 Advanced Video Compression Standard”, John

Wiley & Sons Inc, Second edition, 2010.

4/30/2010 42


References

[16] http://iphome.hhi.de/suehring/tml/download/ , JM

reference software.

[17] http://trace.eas.asu.edu/yuv/index.html, Video

sequences.


Acronyms used

• ARFF – Attribute relation file format

• MB – Macroblock

• ME – Motion Estimation

4/30/2010 44


Thank You ! !

4/30/2010 45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!