An Integrated and Scalable Approach to Video Enhancement in ...

More documents

Recommendations

Info

2 automate the process of editing and uploading video as much as possible. For video clips to be processed by popular webbased systems such as JayCut, they must first be compressed before they can be uploaded over the Internet, and often times, the compression is done with sub-optimal settings for the video encoder. Even for local editing with professional software by experts where compression and then uploading is not necessary, because video clips captured by portable cameras, such as mobile phones or the Flip camera are already compressed, usually by a low power video encoder introducing significant quality loss, it is required that video processing algorithms be able to handle video containing artifacts created by both capturing (e.g. low lighting, high dynamic range, and etc.) as well as compression. In this paper, we describe a novel integrated and scalable video enhancement approach applicable to a wide range of input impairments commonly encountered in mobile video applications. The core enhancement algorithm has much lower computational and memory complexities than other existing solutions of similar enhancement performance. In our system, a low complexity automatic module first determines the predominate source of impairment in the input video. The input is then pre-processed based on the particular source of impairment, followed by processing by the core enhancement module. Finally, post-processing is applied to produce the enhanced output. In addition, spatial and temporal correlations are utilized to improve the speed of the algorithm and the visual quality, enabling it to be embedded into video encoders or decoders to share temporal and spatial prediction modules in the video codec to further lower complexity. Although the system in the paper is described in detail in the context of using de-hazing as the core enhancement algorithm, it should be noted that the main contribution of the work is to establish the connections between the issues of enhancement of video captured with a wide range of lighting impairments. We show that it is possible to achieve reasonably good enhancement results in real time or close to real time, even with the limited resources available on a netbook or even a mobile phone. We also show that by introducing more optimization techniques, the processing quality can be further improved, thereby achieving a scalable architecture. Using the approach in this paper, one could focus on designing a suitable core enhancement algorithm (based on either de-hazing or low-lighting enhancement techniques), post-optimization algorithms, and/or temporal/spatial acceleration techniques. By integrating specific algorithms and techniques tailored for individual applications into the scalable system, optimized results could be achieved for generic applications as well as meeting highly specific requirements. A major advantage of the approach described in the paper is its flexibility. The flexibility is reflected in several key aspects. First of all, the system is of a low complexity that is possible to be embedded into a portable camera systems. It can also be incorporated into a post-processing software with various degrees of complexity for different quality-complexity tradeoffs targeting different applications; secondly, it can be adopted as a standalone module, or as an integrated part in a video encoder or decoder. By integrating the system into an encoder or a decoder, one can not only share the information between the codec and the enhancement systems, thereby lowering the combined complexity, but can also usually improve the quality after processing; finally, the multiple steps of the algorithm can be implemented as a complete system, or, the baseline features of the system can be implemented on a portable device for basic enhancement in real time applications, while the more sophisticated steps (e.g. further noise reduction, better tone mapping and etc. of the preliminary enhanced video) can be done on a cloud server. It is also conceivable that with the advance of computational photography and the development of good high dynamic range or low lighting capability image sensors, the core enhancement module could become the sensor itself, with only the pre-processing and post-processing modules required to handle the challenge in a large variety of applications. The paper is organized as the following. In Section II, we present the evidences and establish the connections between video de-hazing, low-lighting video and high dynamic range video enhancement. We show that for a wide range of applications, especially applications targeting low complexity and mobile platforms, satisfactory results can be achieved with a single core algorithm for a wide range of video enhancement problems. Then, using a low-complexity de-hazing algorithm explained in Section III as an example of a possible choice for the core algorithm, we explain various techniques for reducing the computational and memory complexities of the algorithm, and various techniques that can be used in conjunction of the core algorithm to further improve the visual quality of the core algorithm in Section IV. Given that in real-world applications, the video enhancement module could be deployed in multiple stages of the end to end system, e.g. before compression and transmission/storage, or after compression and transmission/storage but before decompression, or after decompression and before the video content displayed on the monitor, we examine the complexity and rate-distortion (RD) tradeoffs associated with applying the proposed algorithm in these different steps with experimental results in Sections V. Finally we conclude the paper with Section VI. II. AN INTEGRATED APPROACH TO VIDEO ENHANCEMENT The motivation for our algorithm is the observation made in [8] that if one performs a pixel-wise inversion of low lighting video, the results look quite similar to hazy video. Through experiments, we found that the same also hold true for a significant percentage of high dynamic range video. Here, the “inversion” operation is simply R c (x) = 255 − I c (x), (1) where R c (x) and I c (x) are intensities for the corresponding color (RGB) channel c for pixel x in the input and inverted frame respectively. To verify the claim, we randomly selected (by Google) and captured a total of 100 images and video clips each in
3 be 14.07. The histogram of the minimum intensities of all color channels of all pixels for hazy videos, inverted low lighting and inverted high dynamic range videos were used in the tests, some examples are shown in Fig. 2. The results of the chisquare tests are given in Table I. As can be seen from the table, the chi-square values are far smaller than 14.07, demonstrating that our hypothesis of the similarities between haze videos and inverted low lighting videos, and between haze videos and high dynamic range videos is reasonable. Fig. 1: Examples of original (Top), inverted low lighting videos/images (Middle) and haze videos/images (Bottom). hazy, low lighting and high dynamic range conditions. Some examples are shown in Fig. 1. As can be clearly seen from Fig. 1, visually, the videos in hazy weather are indeed similar to videos captured in low lighting and high dynamic range conditions after inversion. This can be understood using the widely used pixel degradation model for hazy images introduced by Koschmieder in 1924 [9], R(x) = J(x)t(x) + A(1 − t(x)), (2) where A is the global “airlight” (ambient light reflected into the line of sight by atmospheric particles), R(x) is the intensity of pixel x that the camera captures, J(x) is the original intensity of the pixel, and t(x) is the medium transmission function describing the percentage of the light emitted from the objects that reaches the camera. In this model, each degraded pixel is a mixture of the airlight and an unknown surface radiance, the intensities of both are influenced by the medium transmission, determined by the scene depth and the scattering coefficient of the atmosphere. For hazy, low lighting and high dynamic range videos, light captured by the camera is blended with the airlight . The main difference is the actual brightness of the airlight, brighter in the case of haze videos, darker in high dynamic range videos and black in the case of low lighting. We also performed the chi-square test to examine the statistical similarities between hazy videos and inverted low lighting and high dynamic range videos. The chi-square test is a standard statistical tool widely used to determine if the observed data are consistent with a hypothesis. As explained in [10], in chisquare tests, a p value is calculated, and usually, if p > 0.05, it is reasonable to assume that the deviation of the observed data from the expectation is due to chance alone. In our experiments, the expected distribution was calculated from hazy videos and the observed statistics from inverted low lighting and high dynamic range videos were tested. In the experiments, we divided the range [0, 255] of color channel intensities into eight equal intervals, corresponding to a degree of freedom of 7. According to the chi-square distribution table, if we adopt the common standard of p > 0.05, the corresponding upper threshold for the chi-square value should Finally, the observation was confirmed by various haze detection algorithms: we implemented haze detection using the HVS threshold range based method [11], the Dark Object Subtraction (DOS) approach [12], and the spatial frequency based technique [13], and found that hazy, inverted low lighting videos and inverted high dynamic range videos were all classified as hazy video clips, as whereas “normal” clips were not. In our experiments, we also tested with image and video clips captured in bad weather conditions such as rainy and snowy weathers. Some of the examples are given in later sections of the paper. Based on the visual observations and statistical tests, we believe that for the purpose of video enhancement, especially for applications on mobile devices, it is reasonable to categorize lighting impairments into two large classes, namely, low lighting video and hazy video. As the experiments and analysis also show similarities between inverted low lighting video and hazy video, it is conceivable that for applications such as in mobile systems, a system for video enhancement could employ the same core algorithm, integrated with an automatic classifier classifying if the input is low lighting or hazy video, followed by the inversion operation is necessary, then processing by the core algorithm. Although the rest of the paper uses a dehazing algorithm as the core enhancement algorithm, it is also possible for some systems to use a low-lighting enhancement module as the core processing module while performing the inversion operation on hazy, as opposed to low lighting inputs. III. BASELINE INTEGRATED VIDEO ENHANCEMENT SYSTEM FOR CHALLENGING LIGHTING CONDITIONS - AN EXAMPLE Given the connections between the enhancement problems for video captured in different challenging lighting conditions, our baseline experiment system consists of an automatic impairment detection module and a video-dehazing based core enhancement module. As already pointed out in the introduction, the particular implementation is simply one among many possible designs of the concept. It is intended to show that even a relatively simple design with off-the-shelf techniques can achieve reasonably good results for many applications and on many platforms.
Page 1: 1 An Integrated and Scalable Approa
Page 5 and 6: 5 Fig. 5: The comparison of origina
Page 7 and 8: 7 are shown in Fig. 13 and Fig. 14.
Page 9 and 10: 9 Fig. 13: Flow diagram of the inte
Page 11: 11 TABLE III: Processing speeds of

An Integrated and Scalable Approach to Video Enhancement in ...

Create successful ePaper yourself

Delete template?

Save as template?