An Integrated and Scalable Approach to Video Enhancement in ...

More documents

Recommendations

Info

6 10 x 107 Difference of t(x) for constant pixels 9 8 Fig. 6: Examples of optimizing low lighting and high dynamic range enhancement algorithm by introducing P (x): Input (Left), output of the enhancement algorithm without introducing P (x) (Middle), and output of the enhancement algorithm by introducing P (x) (Right). quantity 7 6 5 4 3 2 P (x)t(x) leads to slight “dulling” of the pixel. This makes the overall visual quality more balanced and visually pleasant. For low lighting and high dynamic range videos, once J(x) is recovered, the inversion operation (1) is performed again to produce the enhanced videos of the original input. This process is illustrated in Fig. 4. The improvement after introducing P (x) can be seen in Fig. 6. 1 0 0 0.2 0.4 0.6 0.8 1 relative difference of t(x) Fig. 7: Differences of t(x) values between the predicted block’s pixels and its reference block’s pixels. IV. OPTIMIZATIONS OF THE BASELINE SYSTEM A. Algorithmic Optimizations 1) Motion Estimation Based Acceleration and Quality Improvement: The algorithm described in Section III is a frame based approach, and the calculation of t(x) consumes about 60% of the total computation time. For real-time and low complexity processing of video inputs, calculating t(x) frame by frame not only has high computational complexity, but also makes the output results much more sensitive to temporal and spatial noise, and destroys the temporal and spatial consistency of the processed outputs. To remedy these problems, we notice that the t(x) and other model parameters are correlated temporally and spatially. As a result, its calculation can be expedited using motion estimation/compensation (ME/MC) techniques. ME/MC is a key procedure in all state-of-the-art video compression algorithms. By matching blocks in subsequently encoded frames to find the “best” match of a current block and a block of the same size that has already been encoded and then decoded (the “reference”), video compression algorithms use the reference as a prediction of the current block and encodes only the difference (termed the “residual”) between the reference and the current block, thereby improving coding efficiency. The process of finding the best match between a current block and a block in a reference frame is called “motion estimation”, and the “best” match is usually determined by jointly considering the rate and distortion costs of the match. If a “best” match block is found, the current block will be encoded in the inter mode and only the residual will be encoded. Otherwise, the current block will be encoded in the intra mode. The most commonly used metric for distortion in motion estimation is the Sum of Absolute Differences (SAD). To verify the feasibility of using temporal block matching and ME to expedite t(x) calculation, we calculated the differences Fig. 8: Subsampling pattern of proposed fast SAD algorithm. of t(x) values for pixels in the predicted and reference blocks. The statistics in Fig. 7 shows that the differences are less than 10% in almost all cases and as a result, we could utilize ME/MC to by-pass the calculation of t(x) for the majority of the pixels/frames, and only calculate the t(x) for a small number of selective frames. For the the remainder of the frames, we used the corresponding t(x) values of the reference pixels. For motion estimation, we used mature fast motion estimation algorithms e.g. Enhanced Prediction Zonal Search (EPZS) [18]. When calculating the SAD, similar to [19] and [20], we only utilized a subset of the pixels in the current and reference blocks using the pattern shown in Fig. 8. With this pattern, our calculation “touched” a total of 60 pixels in a 16 × 16 block, or roughly 25%. These pixels were located on either the diagonals or the edges, resulting in about 75% reduction in SAD calculation when implemented in software on a general purpose processor. Specifically, when the proposed algorithm is deployed prior to video compression or after video decompression, we can first divide the input frames into GOPs. The GOPs could either contain a fix number of frames, or decided based on a max GOP size (in frames) and scene changing. Each GOP starts with an Intra coded frame (I frame), for which all t(x) values are calculated. ME is performed for the remaining frames (P
7 are shown in Fig. 13 and Fig. 14. Some of the comparisons can be found in Section V. Fig. 9: Flow diagram of the core enhancement algorithm with ME acceleration. frames) of the GOP, similar to conventional video encoding. To this end, each P frame is divided into non-overlapping 16×16 blocks, for which a motion search using the SAD is conducted. A threshold T is defined for the SAD of blocks: if the SAD is below the threshold which means a “best” match block is found, the calculation of t(x) for the entire MB is skipped. Otherwise, t(x) still needs to be calculated. In both cases, the values for the current frame are stored for reference by the next frame. The flow diagram is shown in Fig. 9. In addition to operating as a stand-along module with uncompressed pixel information as both the input and output, the ME accelerated enhancement algorithm could also be integrated into a video encoder or a video decoder. When the algorithm is integrated with a video encoder, the encoder and the enhancement can share the ME module. When integrated with the decoder, the system has the potential of using the motion information contained in the input video bitstream directly, and thereby by-passing the entire ME process. Such integration will usually lead to a Rate-Distortion (RD) loss. The reason for this loss is first and foremost that the ME module in the encoder with which the enhancement module is integrated or the encoder with which the bitstreams that a decoder with enhancement decodes may not be optimized for finding the best matches in t(x) values. For example, when the enhancement module is integrated with a decoder, it may have to decode an input bitstream encoded by a low complexity encoder using a really small ME range. The traditional SAD or SAD-plus-rate metrics for ME are also not optimal for t(x) match search. However, through extensive experiments with widely used encoders and decoders, we found that such quality loss were usually small, and well-justified by the savings in computational cost. The flow diagrams of integrating the ME acceleration enhancement algorithm into encoder and decoder 2) Visual Quality Improvement with Motion Detection: As mentioned in previous sections, depending on the target application and the camera and processing platforms used, different systems could introduce different add-on modules on top of the base line system for further improvements in visual quality. In this section, we describe one, among many, such possible modules. The idea here is to focus the processing on the moving objects that are more likely to be in the Regions of Interests (ROIs), and/or more visible to the human visual system. In our experiments, we implemented algorithm in [21] for segmentation of moving objects and static background. Then depending on whether a pixel belongs to the background or a moving object, we modify the parameters K and M in the calculation of P (x) in (7) to K moving and M moving for moving objects, or K background and M background for the background respectively. In addition, to avoid abrupt changes of luminance around the edges of moving objects, we define a band of W trans -pixels wide around the moving objects as the transition areas. For the transition areas, P (x) is calculated using K trans = and M trans = d K moving + W trans − d K background , (8) W trans W trans d M moving + W trans − d M background , (9) W trans W trans where d is the distance between the pixel x and the edge of the moving object with which the transition area borders. In our experiments, the K foreground is set as 0.6, M foreground is set as 0.5, K background is set as 0.8, M background is set as 1.2. B. Implementation Optimizations In addition to the algorithmic optimizations in the previous sections, the implementation of the core algorithm can also be further optimized by taking advantage of the redundancies inherent to the pixel wise calculations of t(x) and I c (x). First of all, we integrate the calculation of t(x) in (3) into the calculation of J(x) in (6), so that J(x) = I(x) − ωA min( min ( Ic (y) c y∈Ω(x) A )) c 1 − ω min( min ( Ic (x) c y∈Ω(x) A )) c . (10) This allows for the enhancement of the input I(x) directly without calculating t(x). It should be noted that the aforementioned ME-based acceleration is still applicable to (10) after replacing caching t(x) values with caching Ic (y) A c . Although the algorithm in this paper, as were the de-haze algorithms in many of the papers in the reference, was described in the RGB space, our algorithm can be easily adapted to work
Page 1 and 2: 1 An Integrated and Scalable Approa
Page 3 and 4: 3 be 14.07. The histogram of the mi
Page 5: 5 Fig. 5: The comparison of origina
Page 9 and 10: 9 Fig. 13: Flow diagram of the inte
Page 11: 11 TABLE III: Processing speeds of

An Integrated and Scalable Approach to Video Enhancement in ...

Create successful ePaper yourself

Delete template?

Save as template?