15.07.2013 Views

EECE 541 Multimedia Systems Project Proposal: Logo ... - Courses

EECE 541 Multimedia Systems Project Proposal: Logo ... - Courses

EECE 541 Multimedia Systems Project Proposal: Logo ... - Courses

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>EECE</strong> <strong>541</strong> <strong>Multimedia</strong> <strong>Systems</strong><br />

<strong>Project</strong> <strong>Proposal</strong>:<br />

<strong>Logo</strong> Insertion for H.264 Compressed Video<br />

Instructor: Dr. Panos Nasiopoulos<br />

Group members: Gopichand Yalamanchili (20322061)<br />

Abdul K.Murad Agha (80089972)<br />

Teresa Zhou (74417999)<br />

Di Xu (31679079)<br />

February 26, 2008<br />

1


TABLE OF CONTENT<br />

TABLE OF CONTENT ............................................................................................................................... 2<br />

I. INTRODUCTION............................................................................................................................... 3<br />

II. NOTATION AND TERMINOLOGY............................................................................................... 3<br />

III. VIDEO CODING AND TRANSCODING................................................................................... 4<br />

A. BASIC TRANSCODING STRUCTURES .................................................................................................. 4<br />

B. MPEG2 VIDEO CODING.................................................................................................................... 6<br />

C. H.264 VIDEO CODING....................................................................................................................... 6<br />

C.1. INTRA PREDICTION............................................................................................................................. 6<br />

C.2. INTER PREDICTION ............................................................................................................................. 6<br />

IV. EXISTING METHODS FOR MPEG2 COMPRESSED VIDEO LOGO INSERTION.......... 7<br />

A. LOGO INSERTION POSITION............................................................................................................... 7<br />

B. LOGO INSERTION IN SPATIAL DOMAIN.............................................................................................. 7<br />

C. LOGO INSERTION IN TRANSFORM DOMAIN ...................................................................................... 9<br />

D. LOW COST AND EFFICIENT LOGO INSERTION.................................................................................. 10<br />

D.1. LOGO-AFFECTED RANGE OF FRAMES IN THE TEMPORAL DOMAIN .................................................. 10<br />

D.2. MOTION INFORMATION ADJUSTMENT IN THE LOGO AND LOGO-AFFECTED PARTS.......................... 11<br />

E. QUANTIZATION SCALE ADJUSTMENT ............................................................................................. 12<br />

E.1. CONSTANT QUALITY AT THE LOGO PART......................................................................................... 13<br />

E.2. BIT REALLOCATION.......................................................................................................................... 13<br />

V. MAIN ISSUES FOR H.264 COMPRESSED VIDEO LOGO INSERTION ............................... 13<br />

REFERENCES ........................................................................................................................................... 16<br />

2


I. INTRODUCTION<br />

Transcoding is the process of converting the content of a compressed video stream from<br />

one format to another. A format is determined by characteristics such as the bit rate,<br />

frame rate, spatial resolution, coding syntax, and the content. One useful and highly<br />

demanded application of transcoding is inserting a logo into a stream of encoded video.<br />

There are many commercial applications for this technology. As there are now many<br />

television networks, the inserted logo is extremely effective for the viewers to identify the<br />

station. Throughout the years, we have come to associate the “peacock” logo with NBC,<br />

or the “eye” logo with CBS. These logos can greatly improve a broadcaster’s chances of<br />

viewer recognition. Several logo-insertion methods are proposed for MPEG2 [1]<br />

compressed video, but there is not much work done for a much more complicated<br />

situation, that is the H.264 [2] compressed video logo insertion. In this project, we aim at<br />

inserting logos to H.264 compressed videos.<br />

The remainder of the project proposal is structured as follows. Section II briefly<br />

introduces the notation and terminology used herein. Then, Section III introduces some<br />

essential background of video coding and transcoding. In Section IV, we present the<br />

existing logo-insertion methods for MEPG2 compressed video, and point out the<br />

weaknesses of the methods especially when they are applied to the H.264 compressed<br />

videos. Finally, Section V states the problems need to be solved in the logo insertion for<br />

H.264 compressed video.<br />

II. NOTATION AND TERMINOLOGY<br />

Before further discussing logo insertion into the H.264 video stream, we need to define<br />

some terminology used herein. In what follows, a video frame is partitioned into logo<br />

unrelated and related parts. The logo related part further includes the “logo part” and<br />

“logo-affected part”. The region that is covered by logo is called logo part, and the region<br />

outside of the logo but motion predicted based on the logo part is called logo-affected<br />

part. In MPEG2 compressed video, the logo-affected part exists only in P and B frames,<br />

while in H.264 compressed video, the logo-affected part exists in all I, P, and B types of<br />

frames.<br />

<strong>Logo</strong>s have different features. Some commonly desired logos can be classified as nontransparent<br />

(i.e., solid) logos, transparent logos, rectangular-shaped logos, and arbitraryshaped<br />

logos.<br />

<strong>Logo</strong> insertion in the pixel domain can be performed by combing the pixel of the<br />

background image B(x,y) with the logo L(x,y) to obtain the output image P(x, y). The<br />

operation is usually expressed as a linear combination of the form:<br />

P(x, y) = α × (L(x,y)) + (1 – α) × (B(x,y)) , (1)<br />

where the transparency factor α determines the transparency of the logo. The value α is<br />

in the range of 0 < α ≤ 1. In particular, when α = 1, all pixels of the background image<br />

3


are replaced by the logo, giving rise to an opaque overlapping of the logo over the input<br />

image.<br />

A logo often occupies a small portion of a frame, and is static over a frame sequence.<br />

<strong>Logo</strong>s often appear in a corner of a frame (e.g. the top left corner, bottom right corner).<br />

They, however, can be anywhere in a slice for H.264 compressed video, since slice<br />

partition is rather flexible in H.264 standard. Moreover, logos may present only in groups<br />

of successive frames, as opposite to all frames in a video sequence.<br />

III. VIDEO CODING AND TRANSCODING<br />

Having introduced some logo-related terminology, in this section, we will present several<br />

essential concepts for video coding and video transcoding. We first introduce three<br />

commonly used transcoding structures in what follows.<br />

A. Basic Transcoding Structures<br />

One straightforward transcoding structure is the cascaded form. For the cascaded<br />

structure as shown in Figure 1, the decoder decodes the compressed video stream<br />

completely, and the encoder re-encodes the reconstructed video into the target format.<br />

The cascaded architecture achieves high video quality, but it is computationally very<br />

expensive. Therefore, the cascaded structure is not often used, especially in the real-time<br />

transcoding. It is better to re-use the information contained in the original bit stream to<br />

simplify the architecture.<br />

Figure 1. Cascaded architecture in pixel domain.<br />

Open-loop structure is another commonly used transcoding structure. In the open-loop<br />

system, the bit stream is first variable-length decoded (VLD) to reconstruct the quantized<br />

discrete cosine transform (DCT) coefficients, motion vectors, prediction modes, and<br />

other macroblock-level information. The quantized coefficients are then inverse<br />

quantized and modified according to the transcoding requirements. Finally, the modified<br />

data is re-quantized and variable length coded to achieve the new output format. Figure 2<br />

shows a requantization transcoder as an open-loop structure example.<br />

4


Figure 2. Open-loop architecture.<br />

Open-loop systems are relatively simple. They do not include motion estimation, motion<br />

compensation, DCT, and inverse DCT (IDCT). Therefore, open-loop structures are<br />

computationally very efficient, but they suffer from low video quality due to the drift<br />

problem. The drift problem is caused by the mismatch between the actual reference frame<br />

used for motion estimation in the encoder and the degraded reference frame used for<br />

motion compensation in the decoder. The drift problem accumulates and causes severe<br />

degradation to the video quality.<br />

To avoid the high complexity problem in the cascaded architecture and the poor video<br />

quality problem in the open-loop structure, closed-loop systems provide a good tradeoff<br />

between quality and computational complexity. In closed-loop structure, significant<br />

complexity saving can be achieved while still maintaining acceptable video quality.<br />

Closed-loop systems provide drift compensation for re-quantized data. They aim at<br />

eliminating the mismatch between predictive and residual components by approximating<br />

the cascaded transcoding architecture. In the simplified closed-loop structure as shown in<br />

Figure 3, only one reconstruction loop is required with one DCT and one IDCT. Some<br />

architecture inaccuracy is introduced due to the non-linear nature in which the reconstruction<br />

loops are combined. However, it has been found that the approximation has<br />

little effect on the video quality.<br />

Figure 3. Closed-loop architecture.<br />

5


B. MPEG2 Video Coding<br />

Since we are relatively familiar with MPEG2 coding process, we only present a point that<br />

is easily overlooked. In MPEG2, the frame to be compressed is divided into 16×16 pixel<br />

macroblocks. Then, for each of these macroblocks in P and B frames, the reconstructed<br />

reference frame is searched to find a macroblock that best matches the macroblock to be<br />

compressed. The offset is encoded as a motion vector. The match between the two<br />

macroblocks will often not be perfect. To correct this, the encoder computes the residual<br />

of the original pixel values and the predicted pixel values. The residual for each<br />

macroblock is appended to the motion vector, and the spatial redundancy is further<br />

reduced by the DCT transform. Sometimes, no suitable match is found. Then, the<br />

macroblock is treated like an I-frame macroblock. Therefore, both inter and intra modes<br />

can be used to compress macroblocks in MPEG2 P and B frames.<br />

C. H.264 Video Coding<br />

In this project, we aim to add logos into H.264 coded video streams. The H.264 standard<br />

is the latest video coding standard. A brief description of the H.264 standard is described<br />

below.<br />

C.1. Intra Prediction<br />

If a block or macroblock is encoded in the intra mode, a prediction block is formed based<br />

on previously encoded and reconstructed (but un-filtered) blocks from the same slice.<br />

This prediction block is subtracted from the current block prior to encoding. For the<br />

luminance (luma) samples, a prediction block may be formed for each 4×4 subblock or<br />

for a 16×16 macroblock. There are a total of nine optional prediction modes for each 4×4<br />

luma block; four optional modes for a 16×16 luma block; and four optional modes for<br />

each 8×8 chroma block. Note that the same mode is always applied to both chroma<br />

blocks.<br />

C.2. Inter Prediction<br />

Inter prediction creates a prediction model from one or more previously encoded video<br />

frames. The model is formed by shifting samples in the reference frame(s) (i.e., motion<br />

compensated prediction). The H.264 codec uses block-based motion compensation. The<br />

H.264 standard supports motion compensation block sizes ranging from 16×16 to 4×4<br />

luminance samples with many options in between. The luminance component of each<br />

macroblock (16×16 samples) may be split up in four ways: 16×16, 16×8, 8×16 or 8×8.<br />

Each of the sub-divided regions is a macroblock partition. If the 8×8 mode is chosen,<br />

each of the four 8×8 macroblock partitions within the macroblock may be split into a<br />

further four ways: 8×8, 8×4, 4×8 or 4×4 (known as macroblock sub-partitions). These<br />

partitions and sub-partitions give rise to a large number of possible combinations within<br />

each macroblock.<br />

A separate motion vector is required for each partition or sub-partition. Each motion<br />

vector must be coded and transmitted. In addition, the choice of partition(s) must be<br />

encoded and stored in the compressed bit stream. Choosing a large partition size (e.g.<br />

16×16, 16×8, 8×16) means that a small number of bits are required to signal the choice of<br />

motion vector(s) and the type of partition; however, the motion compensated residual<br />

6


may contain a significant amount of energy in frame areas with high detail. Choosing a<br />

small partition size (e.g. 8×4, 4×4, etc.) may give a lower-energy residual after motion<br />

compensation, but requires a larger number of bits to signal the motion vectors and<br />

choice of partition(s). The choice of partition size, therefore, has a significant impact on<br />

compression performance. In general, a large partition size is appropriate for<br />

homogeneous areas of the frame and a small partition size may be beneficial for detailed<br />

areas.<br />

IV. EXISTING METHODS FOR MPEG2 COMPRESSED VIDEO LOGO INSERTION<br />

Most methods for logo insertion are developed for MPEG2 compressed video. In what<br />

follows, we will introduce such methods presented in several difficult papers, which<br />

handle logo insertion from different perspectives. We will also analyze whether the<br />

methods are appropriate to be adopted in logo insertion for H.264 compressed video.<br />

A. <strong>Logo</strong> Insertion Position<br />

For logo insertion, the first step is that the position of the logo needs to be determined.<br />

Liu’s paper has placed the logo on the top left hand corner. Another major issue is that<br />

the best logo position needs to be searched so that the effect on the coded macroblocks<br />

can be minimized. In Liu’s paper [3], the logo is aligned to the macroblocks to minimize<br />

the amount of macroblocks affected. If the logo is not aligned, then nine macroblocks<br />

are affected whereas if it is aligned to the macroblocks, then only four macroblocks are<br />

affected as shown in Figure 4.<br />

Figure 4. Diagram of logo aligned to macroblocks.<br />

B. <strong>Logo</strong> Insertion in Spatial Domain<br />

As explained in [3], one of the approaches for logo insertion is to insert the logo in the<br />

spatial domain. Figure 5 gives the block diagram for spatial domain logo insertion<br />

structure.<br />

7


Figure 5. <strong>Logo</strong> insertion in spatial domain.<br />

The transcoding works as follows. First, the input video stream (containing the motion<br />

vectors and the residuals for the P and B frames, and I frame) goes through entropy<br />

decoding to get the quantized versions. Then, the residual component for the P and B<br />

frames are sent through an inverse quantizer for dequantization; then, it will be fed<br />

through an inverse DCT transform. The motion vectors goes through two different paths.<br />

The first path is through motion compensation, where it will be combined with the<br />

feedback of the previous frame to form a prediction of the current frame without the<br />

residual. The output of the first motion compensation is then added to the residual to form<br />

the complete picture for the current frame. The output is also placed into a buffer so that<br />

the next frame can use the previous frame’s picture for motion compensation. The motion<br />

vector also goes through the second motion compensation block in the encoder part of the<br />

loop. It is used to correct the logo insertion errors since the encoder uses the same<br />

motion vector of the inputted video.<br />

After the output of the first motion compensation block has been added with the residual<br />

of the current frame, we will add the logo at this point using the formula (1). That is<br />

P(x, y) = α × (L(x,y)) + (1 – α) × (B(x,y)) .<br />

This is then fed in for error correction. After that, it will be encoded using a DCT block,<br />

quantized, and then entropy encoded for output. There is a rate control mechanism to<br />

keep a consistent bit rate output for the stream. This mechanism increases or decreases<br />

the quantizer parameter for Q2 as a means of controlling the bit and also the quality of<br />

the output frame.<br />

8


In [4], it shares the same architecture as described above except for the definition of the<br />

motion vectors (MV). The early transcoders re-uses the motion vector from the input bitstream.<br />

However, this motion vector may not be pointer to the best match of<br />

macroblock’s that are close to the insertion area since part of the content is always static.<br />

To maintain a high coding efficiency, paper [4] suggested the motion vector to be set to<br />

zero for logo macroblocks that are dominated by logo content and the original motion<br />

vector from the input bit-stream is used for logo macroblocks that are dominated by video<br />

content. For example,<br />

MV(x, y) = (0, 0), when α is greater than or equal to a threshold, e.g., 0.5.<br />

MV(x, y) = MV’(x, y), otherwise. That is, using MV’s decoded from input bit-stream.<br />

The above scheme is developed for MPEG2 coded video. For H.264 coded video,<br />

however, this logo insertion architecture needs to be modified due to the multireferencing<br />

capability in H.264. The above motion vector redefinition concept can<br />

probably be used in the H.264 video stream logo insertion. However, the threshold of<br />

transparency factor α needs to be refined carefully in order to achieve a good coding<br />

efficiency for H.264 coded video.<br />

C. <strong>Logo</strong> Insertion in Transform Domain<br />

The transform domain insertion algorithm is a little less complicated than the spatial<br />

domain one, as described by Liu [3]. There is less DCT block to worry about on the<br />

decoding and the encoding sides. Because of the linearity property of the DCT transform<br />

as shown in (2), transform domain additions are the same as spatial domain.<br />

DCT(m+n) = DCT(m) + DCT(n) and<br />

DCT{α(m(x,y))+(l-α)(n(x,y))} = αDCT{m(x,y)} + (l-α)DCT{n(x,y)}. (2)<br />

The architecture of the logo insertion in the transform domain is shown in Figure 6. The<br />

video stream is first entropy decoded and inverse quantized. This is similar to the spatial<br />

domain strategy. The logo is then inserted in the DCT domain. Next, motion vectors are<br />

fed back for DCT motion compensation, and subtraction is done for error correction.<br />

These steps are all done in DCT domain. Then, the output is quantized before entropy<br />

decoding and rate control. The quantized version is also inversely quantized and fed back<br />

to the buffer for motion compensation of the next frame and error correction.<br />

9


Figure 6. <strong>Logo</strong> insertion in transform domain.<br />

D. Low Cost and Efficient <strong>Logo</strong> Insertion<br />

High accuracy and efficiency are two important criteria for logo insertion. One efficient<br />

logo-insertion method proposed by Shu Xiao, etc, is shown in [5]. In this paper, the<br />

authors presented efficient logo insertion methods for transparent and non-transparent<br />

logos for MPEG2 compressed video. They considered the refinement of prediction modes<br />

and motion vectors for different types of macroblocks. The method should be able to<br />

apply in both spatial and transform domains.<br />

In logo-insertion transcoding for MPEG2 compressed video, we need to compensate the<br />

changes caused by the logo insertion. Such changes propagate through frames when P<br />

and B frames refer to the reference frames in motion prediction process. For H.264<br />

compressed video, however, change propagation happens also for I frames due to the use<br />

of intra prediction as described in Section III. Sometimes, logos are not inserted in all<br />

frames of a video sequence. Now, we start to introduce the method used in [5] for<br />

determining the affected range of frames caused by logo insertion for MPEG2 coded<br />

video.<br />

D.1. <strong>Logo</strong>-Affected Range of Frames in the Temporal Domain<br />

Let [l, h] denote the range of the video sequence where a logo is required to be inserted.<br />

That is, the indices l and h are the lowest and highest frame numbers of this range,<br />

respectively. We further let [L, H] represent the range of video sequence which is<br />

affected by the logo insertion due to the change propagation caused by frame reference.<br />

Clearly, we have [l, h] being a subset of [L, H]. Reference frames are frames of a<br />

10


compressed video that are used to define future frames. In MPEG2 video coding standard,<br />

reference frames are I and P frames. The previous reference frame lr and next reference<br />

frame hr of the range [l, h] are defined to be the reference frames outside the range [l, h]<br />

with the largest and smallest frame numbers, respectively. Then, the indices L and H can<br />

be determined using (3) and (4), as follows:<br />

L = lr<br />

+ 1,<br />

(3)<br />

⎧ hr<br />

if hr<br />

is not I frame,<br />

H = ⎨<br />

(4)<br />

⎩hr<br />

−1<br />

otherwise.<br />

Figure 7 gives two examples of logo-affected range of frames. In this example, we use<br />

the typical group of picture (GOP) structure: IBBPBBPBBP…, where I and P frames are<br />

reference frames. The frame dependencies are also drawn in the figure. In the logo range<br />

example one, [5, 16] is the range of frames [l, h] where logos are added. The previous<br />

reference frame lr of the range [l, h] is 3, and the next reference frame hr of the range [l, h]<br />

is 18. According to (3) and (4), we got the logo-affected range of frames [L, H] as [4, 18].<br />

Similarly, in example two, the range [L, H] is [4, 14].<br />

Figure 7. Examples of logo affected range of frames.<br />

Once the logo affected frame range [L, H] is identified, we can focus on compensating<br />

the changes induced by logo insertion within this range of frames. The frames outside the<br />

range [L, H] are not affected, and therefore remain unchanged.<br />

The scheme described above cannot be directly applied to H.264 compressed video. This<br />

is because, in H.264 video coding standard, all three types of frames (i.e., I, P, and B<br />

frames) can be used as reference frames. Multiple reference frames are also used, which<br />

makes determining the range [L, H] more challenging. The affected-frame range for<br />

H.264 coded video needs to be wisely adjusted.<br />

D.2. Motion Information Adjustment in the <strong>Logo</strong> and <strong>Logo</strong>-Affected Parts<br />

Having introduced the logo-affected range of frames, we now state the motion<br />

information adjustment in [5]. As mentioned earlier in Section II, a frame can be<br />

partitioned into the logo part, logo affected part, and logo unrelated part. For simplicity,<br />

in [5], the authors assume that the logo is rectangular and covers integer number of<br />

macroblocks. The coding modes and motion vectors for macroblocks at the logo<br />

unrelated part shall remain unchanged. In what follows, we discuss the motion-vector<br />

refinement of macroblocks in the logo and logo-affected parts, respectively.<br />

11


For I and P frames in the logo part, if the first reference frame in the range of [l, h] is a P<br />

frame, then set the macroblock mode (of those macroblocks in the logo part) to be intracoded.<br />

If the first reference frame in the range of [l, h] is an I frame instead, the<br />

macroblock modes for all I and P frames remain the same.<br />

For B frames in the logo part, if its frame number is smaller than the first reference frame<br />

in the range of [L, H], set the macroblock mode to be backward predicted. If its frame<br />

number is larger than the last reference frame in the range of [L, H], set the macroblock<br />

mode to be forward predicted or intra coded. Then, set the prediction mode for all other B<br />

frames in [L, H] to be forward predicted.<br />

Having the prediction modes refined, we set all motion vectors for inter-coded<br />

macroblocks in the logo part to be zero. Clearly, this scheme is suitable for nontransparent<br />

logos. For transparent logos, however, especially for logos with their<br />

transparency factor α being small (e.g., α


E.1. Constant Quality at the <strong>Logo</strong> Part<br />

We begin to discuss the non-transparent logo situation. Non-transparent logos should<br />

look still in the video sequence. Therefore, we set the quantization scales for all the intracoded<br />

macroblocks in the logo area to be the same value, so that the logo region has<br />

approximately the same quality. For inter-coded macroblocks, the zero motion vector and<br />

stationary logo content ensure the prediction accuracy. No residual information is needed.<br />

Hence, we use the skip mode for P frames, and the maximum quantization scales for B<br />

frames due to the lack of skip mode for B frames.<br />

Unlike non-transparent logos, transparent logos are superimposed on the original video<br />

frames. Therefore, the logo parts are not identical over different frames. To ensure<br />

approximately the same quality in the logo area, we still use the same quantization scales<br />

for all the intra-coded macroblocks in the logo area. For the inter-coded macroblocks, the<br />

prediction residuals are not zero nor negligible any more. In [5], a slightly bigger<br />

quantization scale is used for inter-coded macroblocks than intra-coded macroblocks.<br />

Both quantization scales for the intra and inter macroblocks are inverse-proportional to<br />

the total output bit rate. In doing so, we prevent the logo parts from consuming too many<br />

bits for a low bit rate situation.<br />

E.2. Bit Reallocation<br />

Rate control is also an important requirement for video transcoding. It is usually<br />

accomplished by adjusting the quantization scales. The transcoded video with logo may<br />

consume more bits than the original coded video. In order to control the overall bit rate of<br />

the encoded video while achieving a constant visual quality in the logo area, we need to<br />

adjust the quantization scales for the macroblocks not covered by logo. One simple<br />

practical scheme proposed in [5] is to pre-encode the logo part, then deduct the consumed<br />

bits from the total target bits, and adjust the bit allocation of the non-logo-covered area<br />

according to the left bit rate.<br />

Note that the bit reallocation method described above is not the most efficient under some<br />

circumstances. If the logo is inserted in the top left corner of a frame, the pre-encoding of<br />

the logo part is not necessary. This is because the logo part is coded before most of the<br />

other macroblocks are coded in this case, and the bits consumed by the logo-part are<br />

known before most adjustment can be done. Furthermore, if the logo transparency is high,<br />

adopting the original motion vectors might be more efficient than using zero motion<br />

vectors for the inter-coded macroblocks.<br />

V. MAIN ISSUES FOR H.264 COMPRESSED VIDEO LOGO INSERTION<br />

As indicated earlier, most existing methods for logo insertion are developed for MPEG2<br />

compressed video. The H.264 standard is much more complex and flexible, and therefore<br />

more difficult to handle and introduces more challenges.<br />

In H.264, intra prediction is used for coding the Intra macroblocks in the frames. Hence, I<br />

frame has logo-affected part due to the dependence within the same slice. This challenge<br />

13


did not exist in MPEG2. The logo-affected areas caused by intra prediction needs to be<br />

solved and properly compensated. The intra predication issue implies a complex<br />

challenge. Because every intra macroblock depends on neighboring macroblock pixels<br />

(i.e. the bottom-right corner pixel of top-left neighboring macroblock, and bottom row of<br />

pixels of top neighboring macroblock or right column of pixels of left macroblock) the<br />

border (boundary) pixels of the logo-affected area should remain the same otherwise a<br />

drift error will accumulate over time.<br />

H.264 utilizes better half-pixel approximation (6-tap bicubic interpolation) to find better<br />

match in the motion estimation stage where MPEG2 uses bilinear interpolation. There are<br />

many proposals to reuse motion vectors from the decoder, however, directly using those<br />

motion vectors is not optimal and an extra process is needed to compensate the difference<br />

between bicubical and bilinear interpolation. In addition H.264 supports quarter-pixel<br />

samples accuracy to further refine the motion vectors. This did not exist in MPEG2. This<br />

introduces another challenge. The challenges of half-pixel and quarter-pixel make many<br />

of the MPEG2 motion vectors transcoding solution inapplicable (not easily usable) for<br />

H264 transcoding.<br />

There are papers discussing and proposing a logo insertion in the DCT domain for<br />

MPEG2. DCT is done in 8×8 macroblocks. In H.264 the block size is different (i.e. 4×4)<br />

and the transform is integer-DCT-like transform. The issues here are: 1) different<br />

transform sizes. 2) MPEG2 DCT is independent process where in H.264 the transform is<br />

combined with quantization. 3) MPEG2 DCT logo insertion is efficient for intra<br />

macroblocks only because they are absolutely independent. These issues make transform<br />

domain logo insertion for H.264 extremely hard.<br />

MPEG2 quantization has 32 quantization steps with some dead zone assumptions.<br />

MPEG2 quantization is applied by division operations. In H.264 quantization is mixed<br />

with the transform. It also uses 52 quantization steps and applied by lookup tables and<br />

shifts. There are several suggestions for MPEG2 quantization and requantization<br />

mapping for rate control purposes. Almost all those are inapplicable because the whole<br />

process is too different in terms of steps ranges, quantization error approximation, and<br />

corresponding quality of each quantization step. Another related issue is the rate control<br />

distortion approximation is completely different. This makes the optimization equation<br />

for MPEG2 very suboptimal for H.264.<br />

The wide variety of the block size in H.264 is important feature that in general can<br />

improve the coding efficiency. However, in case of logo insertion, some of the modes<br />

might not have an effect on quality because the transcoder is encoding previously<br />

encoded frames. There are some existing techniques for mode selection in case of<br />

MPEG2 logo insertion. However, wider range of modes and macroblock sizes in H.264<br />

make so hard to apply any of MPEG2 current methods to H.264.<br />

H.264 uses variable width in-loop deblocking filter affecting up to 2-3 border pixels of<br />

each macroblock. This means the deblocked frames are used as references for feature<br />

frames. This creates a significant challenge which manifests as more boundary pixels are<br />

14


not to be changed. In other words after logo insertion more boundary pixel have to be<br />

identical to before logo insertion.<br />

Another issue with H.264, multiple reference frames can be used as references for motion<br />

prediction in the inter prediction mode. The prediction modes and motion vectors have<br />

more combinations and possibilities. There is also multiple backward reference frames<br />

for Inter modes in B frames. The issue alone will multiply the resources requirement (i.e.<br />

temp memory) of the H.264 transcoder versus MPEG2.<br />

H264 complexity offers new features which did not exist before such as I_PCM (lossless<br />

compression) and arbitrary slices shapes and sizes. These features might make logo<br />

insertions in H.264 completely different than MPEG2 current work. These feature to be<br />

investigated to study the efficiency of using them. In this project, we aim at solving logoinsertion<br />

transcoding issues for H.264 coded video.<br />

15


REFERENCES<br />

[1] ISO/IEC 13818-2, Generic Coding of Moving Pictures and Associated Audio Information: Video<br />

International Organization for Standardization, Nov. 1994, Draft International Standard (MPEG-2<br />

Video).<br />

[2] “Draft Text of Final Draft International Standard for Advanced Video Coding,” Int. Telecommun.<br />

Union-Telecommun. (ITU-T), Geneva, Switzerland, Recommendation H.264 (draft), Mar. 2003<br />

[3] Y. Liu, G. Li, Q. Tang, and J. Guo, “DCT Domain <strong>Logo</strong> Insertion of MPEG2 Transcoding,” in Proc.<br />

IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), vol.2, May 2003, pp.<br />

1219- 1222.<br />

[4] K. Panusopone, X. Chen, and F. Ling, “<strong>Logo</strong> insertion in MPEG transcoder,” in Proc. IEEE<br />

International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City , USA, vol.2,<br />

May 2001, pp. 981-984.<br />

[5] S. Xiao, L. Lu, J. L. Kouloheris, and C. A. Gonzales, “Low-Cost and Efficient <strong>Logo</strong> Insertion Scheme<br />

in MPEG Video Transcoding,” Proc. of SPIE, Visual Communications and Image Processing, vol.<br />

4617, Jan. 2002, pp. 172-179.<br />

[6] J. Xin, C.-W. Lin, and M.-T. Sun, “Digital Video Transcoding,” in Proc. of the IEEE, vol. 93, Issue 1,<br />

Jan. 2005, pp. 84-97.<br />

[7] N. Roma and L. Sousa, “Insertion of Irregular-Shaped <strong>Logo</strong>s in the Compressed DCT Domain,” 14th<br />

International Conference on Digital Signal Processing, vol.1, 2002, pp. 125-128.<br />

[8] D. G. Morrison, M. E. Nilson, and M. Ghanbari, “Reduction of the Bit-Rate of Compressed Video<br />

While in Its Coded Form,” in Proc. 6th Int. Workshop Packet Video, 1994, pp. D17.1–D17.4.<br />

[9] G. Keesman, R. Hellinghuizen, F. Hoeksema, and G. Heideman, “Transcoding of MPEG Bitstreams,”<br />

Signal Process. Image Commun., vol. 8, no. 6, pp. 481–500, Sep. 1996.<br />

[10] P. A. A. Assuncao and M. Ghanbari, “A Frequency-Domain Video Transcoder for Dynamic Bitrate<br />

Reduction of MPEG-2 Bit streams,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 8, pp. 953–<br />

967, Dec. 1998.<br />

[11] S.-F. Chang and D. G. Messerschmitt, “Manipulation and Compositing of MC-DCT compressed<br />

video,” IEEE J. Sel. Areas Commun., vol. 13, no. 1, pp. 1–11, Jan. 1995.<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!