27.09.2014 Views

An Integrated and Scalable Approach to Video Enhancement in ...

An Integrated and Scalable Approach to Video Enhancement in ...

An Integrated and Scalable Approach to Video Enhancement in ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6<br />

10 x 107 Difference of t(x) for constant pixels<br />

9<br />

8<br />

Fig. 6: Examples of optimiz<strong>in</strong>g low light<strong>in</strong>g <strong>and</strong> high dynamic<br />

range enhancement algorithm by <strong>in</strong>troduc<strong>in</strong>g P (x):<br />

Input (Left), output of the enhancement algorithm without<br />

<strong>in</strong>troduc<strong>in</strong>g P (x) (Middle), <strong>and</strong> output of the enhancement<br />

algorithm by <strong>in</strong>troduc<strong>in</strong>g P (x) (Right).<br />

quantity<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

P (x)t(x) leads <strong>to</strong> slight “dull<strong>in</strong>g” of the pixel. This makes the<br />

overall visual quality more balanced <strong>and</strong> visually pleasant.<br />

For low light<strong>in</strong>g <strong>and</strong> high dynamic range videos, once J(x)<br />

is recovered, the <strong>in</strong>version operation (1) is performed aga<strong>in</strong> <strong>to</strong><br />

produce the enhanced videos of the orig<strong>in</strong>al <strong>in</strong>put. This process<br />

is illustrated <strong>in</strong> Fig. 4. The improvement after <strong>in</strong>troduc<strong>in</strong>g<br />

P (x) can be seen <strong>in</strong> Fig. 6.<br />

1<br />

0<br />

0 0.2 0.4 0.6 0.8 1<br />

relative difference of t(x)<br />

Fig. 7: Differences of t(x) values between the predicted block’s<br />

pixels <strong>and</strong> its reference block’s pixels.<br />

IV. OPTIMIZATIONS OF THE BASELINE SYSTEM<br />

A. Algorithmic Optimizations<br />

1) Motion Estimation Based Acceleration <strong>and</strong> Quality Improvement:<br />

The algorithm described <strong>in</strong> Section III is a frame<br />

based approach, <strong>and</strong> the calculation of t(x) consumes about<br />

60% of the <strong>to</strong>tal computation time. For real-time <strong>and</strong> low<br />

complexity process<strong>in</strong>g of video <strong>in</strong>puts, calculat<strong>in</strong>g t(x) frame<br />

by frame not only has high computational complexity, but also<br />

makes the output results much more sensitive <strong>to</strong> temporal <strong>and</strong><br />

spatial noise, <strong>and</strong> destroys the temporal <strong>and</strong> spatial consistency<br />

of the processed outputs.<br />

To remedy these problems, we notice that the t(x) <strong>and</strong> other<br />

model parameters are correlated temporally <strong>and</strong> spatially.<br />

As a result, its calculation can be expedited us<strong>in</strong>g motion<br />

estimation/compensation (ME/MC) techniques.<br />

ME/MC is a key procedure <strong>in</strong> all state-of-the-art video<br />

compression algorithms. By match<strong>in</strong>g blocks <strong>in</strong> subsequently<br />

encoded frames <strong>to</strong> f<strong>in</strong>d the “best” match of a current block <strong>and</strong><br />

a block of the same size that has already been encoded <strong>and</strong><br />

then decoded (the “reference”), video compression algorithms<br />

use the reference as a prediction of the current block <strong>and</strong><br />

encodes only the difference (termed the “residual”) between<br />

the reference <strong>and</strong> the current block, thereby improv<strong>in</strong>g cod<strong>in</strong>g<br />

efficiency. The process of f<strong>in</strong>d<strong>in</strong>g the best match between<br />

a current block <strong>and</strong> a block <strong>in</strong> a reference frame is called<br />

“motion estimation”, <strong>and</strong> the “best” match is usually determ<strong>in</strong>ed<br />

by jo<strong>in</strong>tly consider<strong>in</strong>g the rate <strong>and</strong> dis<strong>to</strong>rtion costs of<br />

the match. If a “best” match block is found, the current block<br />

will be encoded <strong>in</strong> the <strong>in</strong>ter mode <strong>and</strong> only the residual will be<br />

encoded. Otherwise, the current block will be encoded <strong>in</strong> the<br />

<strong>in</strong>tra mode. The most commonly used metric for dis<strong>to</strong>rtion <strong>in</strong><br />

motion estimation is the Sum of Absolute Differences (SAD).<br />

To verify the feasibility of us<strong>in</strong>g temporal block match<strong>in</strong>g <strong>and</strong><br />

ME <strong>to</strong> expedite t(x) calculation, we calculated the differences<br />

Fig. 8: Subsampl<strong>in</strong>g pattern of proposed fast SAD algorithm.<br />

of t(x) values for pixels <strong>in</strong> the predicted <strong>and</strong> reference blocks.<br />

The statistics <strong>in</strong> Fig. 7 shows that the differences are less<br />

than 10% <strong>in</strong> almost all cases <strong>and</strong> as a result, we could utilize<br />

ME/MC <strong>to</strong> by-pass the calculation of t(x) for the majority<br />

of the pixels/frames, <strong>and</strong> only calculate the t(x) for a small<br />

number of selective frames. For the the rema<strong>in</strong>der of the<br />

frames, we used the correspond<strong>in</strong>g t(x) values of the reference<br />

pixels. For motion estimation, we used mature fast motion<br />

estimation algorithms e.g. Enhanced Prediction Zonal Search<br />

(EPZS) [18]. When calculat<strong>in</strong>g the SAD, similar <strong>to</strong> [19] <strong>and</strong><br />

[20], we only utilized a subset of the pixels <strong>in</strong> the current<br />

<strong>and</strong> reference blocks us<strong>in</strong>g the pattern shown <strong>in</strong> Fig. 8. With<br />

this pattern, our calculation “<strong>to</strong>uched” a <strong>to</strong>tal of 60 pixels <strong>in</strong><br />

a 16 × 16 block, or roughly 25%. These pixels were located<br />

on either the diagonals or the edges, result<strong>in</strong>g <strong>in</strong> about 75%<br />

reduction <strong>in</strong> SAD calculation when implemented <strong>in</strong> software<br />

on a general purpose processor.<br />

Specifically, when the proposed algorithm is deployed prior <strong>to</strong><br />

video compression or after video decompression, we can first<br />

divide the <strong>in</strong>put frames <strong>in</strong><strong>to</strong> GOPs. The GOPs could either<br />

conta<strong>in</strong> a fix number of frames, or decided based on a max<br />

GOP size (<strong>in</strong> frames) <strong>and</strong> scene chang<strong>in</strong>g. Each GOP starts<br />

with an Intra coded frame (I frame), for which all t(x) values<br />

are calculated. ME is performed for the rema<strong>in</strong><strong>in</strong>g frames (P

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!