EECE 541 Multimedia Systems Project Proposal: Logo ... - Courses
EECE 541 Multimedia Systems Project Proposal: Logo ... - Courses
EECE 541 Multimedia Systems Project Proposal: Logo ... - Courses
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>EECE</strong> <strong>541</strong> <strong>Multimedia</strong> <strong>Systems</strong><br />
<strong>Project</strong> <strong>Proposal</strong>:<br />
<strong>Logo</strong> Insertion for H.264 Compressed Video<br />
Instructor: Dr. Panos Nasiopoulos<br />
Group members: Gopichand Yalamanchili (20322061)<br />
Abdul K.Murad Agha (80089972)<br />
Teresa Zhou (74417999)<br />
Di Xu (31679079)<br />
February 26, 2008<br />
1
TABLE OF CONTENT<br />
TABLE OF CONTENT ............................................................................................................................... 2<br />
I. INTRODUCTION............................................................................................................................... 3<br />
II. NOTATION AND TERMINOLOGY............................................................................................... 3<br />
III. VIDEO CODING AND TRANSCODING................................................................................... 4<br />
A. BASIC TRANSCODING STRUCTURES .................................................................................................. 4<br />
B. MPEG2 VIDEO CODING.................................................................................................................... 6<br />
C. H.264 VIDEO CODING....................................................................................................................... 6<br />
C.1. INTRA PREDICTION............................................................................................................................. 6<br />
C.2. INTER PREDICTION ............................................................................................................................. 6<br />
IV. EXISTING METHODS FOR MPEG2 COMPRESSED VIDEO LOGO INSERTION.......... 7<br />
A. LOGO INSERTION POSITION............................................................................................................... 7<br />
B. LOGO INSERTION IN SPATIAL DOMAIN.............................................................................................. 7<br />
C. LOGO INSERTION IN TRANSFORM DOMAIN ...................................................................................... 9<br />
D. LOW COST AND EFFICIENT LOGO INSERTION.................................................................................. 10<br />
D.1. LOGO-AFFECTED RANGE OF FRAMES IN THE TEMPORAL DOMAIN .................................................. 10<br />
D.2. MOTION INFORMATION ADJUSTMENT IN THE LOGO AND LOGO-AFFECTED PARTS.......................... 11<br />
E. QUANTIZATION SCALE ADJUSTMENT ............................................................................................. 12<br />
E.1. CONSTANT QUALITY AT THE LOGO PART......................................................................................... 13<br />
E.2. BIT REALLOCATION.......................................................................................................................... 13<br />
V. MAIN ISSUES FOR H.264 COMPRESSED VIDEO LOGO INSERTION ............................... 13<br />
REFERENCES ........................................................................................................................................... 16<br />
2
I. INTRODUCTION<br />
Transcoding is the process of converting the content of a compressed video stream from<br />
one format to another. A format is determined by characteristics such as the bit rate,<br />
frame rate, spatial resolution, coding syntax, and the content. One useful and highly<br />
demanded application of transcoding is inserting a logo into a stream of encoded video.<br />
There are many commercial applications for this technology. As there are now many<br />
television networks, the inserted logo is extremely effective for the viewers to identify the<br />
station. Throughout the years, we have come to associate the “peacock” logo with NBC,<br />
or the “eye” logo with CBS. These logos can greatly improve a broadcaster’s chances of<br />
viewer recognition. Several logo-insertion methods are proposed for MPEG2 [1]<br />
compressed video, but there is not much work done for a much more complicated<br />
situation, that is the H.264 [2] compressed video logo insertion. In this project, we aim at<br />
inserting logos to H.264 compressed videos.<br />
The remainder of the project proposal is structured as follows. Section II briefly<br />
introduces the notation and terminology used herein. Then, Section III introduces some<br />
essential background of video coding and transcoding. In Section IV, we present the<br />
existing logo-insertion methods for MEPG2 compressed video, and point out the<br />
weaknesses of the methods especially when they are applied to the H.264 compressed<br />
videos. Finally, Section V states the problems need to be solved in the logo insertion for<br />
H.264 compressed video.<br />
II. NOTATION AND TERMINOLOGY<br />
Before further discussing logo insertion into the H.264 video stream, we need to define<br />
some terminology used herein. In what follows, a video frame is partitioned into logo<br />
unrelated and related parts. The logo related part further includes the “logo part” and<br />
“logo-affected part”. The region that is covered by logo is called logo part, and the region<br />
outside of the logo but motion predicted based on the logo part is called logo-affected<br />
part. In MPEG2 compressed video, the logo-affected part exists only in P and B frames,<br />
while in H.264 compressed video, the logo-affected part exists in all I, P, and B types of<br />
frames.<br />
<strong>Logo</strong>s have different features. Some commonly desired logos can be classified as nontransparent<br />
(i.e., solid) logos, transparent logos, rectangular-shaped logos, and arbitraryshaped<br />
logos.<br />
<strong>Logo</strong> insertion in the pixel domain can be performed by combing the pixel of the<br />
background image B(x,y) with the logo L(x,y) to obtain the output image P(x, y). The<br />
operation is usually expressed as a linear combination of the form:<br />
P(x, y) = α × (L(x,y)) + (1 – α) × (B(x,y)) , (1)<br />
where the transparency factor α determines the transparency of the logo. The value α is<br />
in the range of 0 < α ≤ 1. In particular, when α = 1, all pixels of the background image<br />
3
are replaced by the logo, giving rise to an opaque overlapping of the logo over the input<br />
image.<br />
A logo often occupies a small portion of a frame, and is static over a frame sequence.<br />
<strong>Logo</strong>s often appear in a corner of a frame (e.g. the top left corner, bottom right corner).<br />
They, however, can be anywhere in a slice for H.264 compressed video, since slice<br />
partition is rather flexible in H.264 standard. Moreover, logos may present only in groups<br />
of successive frames, as opposite to all frames in a video sequence.<br />
III. VIDEO CODING AND TRANSCODING<br />
Having introduced some logo-related terminology, in this section, we will present several<br />
essential concepts for video coding and video transcoding. We first introduce three<br />
commonly used transcoding structures in what follows.<br />
A. Basic Transcoding Structures<br />
One straightforward transcoding structure is the cascaded form. For the cascaded<br />
structure as shown in Figure 1, the decoder decodes the compressed video stream<br />
completely, and the encoder re-encodes the reconstructed video into the target format.<br />
The cascaded architecture achieves high video quality, but it is computationally very<br />
expensive. Therefore, the cascaded structure is not often used, especially in the real-time<br />
transcoding. It is better to re-use the information contained in the original bit stream to<br />
simplify the architecture.<br />
Figure 1. Cascaded architecture in pixel domain.<br />
Open-loop structure is another commonly used transcoding structure. In the open-loop<br />
system, the bit stream is first variable-length decoded (VLD) to reconstruct the quantized<br />
discrete cosine transform (DCT) coefficients, motion vectors, prediction modes, and<br />
other macroblock-level information. The quantized coefficients are then inverse<br />
quantized and modified according to the transcoding requirements. Finally, the modified<br />
data is re-quantized and variable length coded to achieve the new output format. Figure 2<br />
shows a requantization transcoder as an open-loop structure example.<br />
4
Figure 2. Open-loop architecture.<br />
Open-loop systems are relatively simple. They do not include motion estimation, motion<br />
compensation, DCT, and inverse DCT (IDCT). Therefore, open-loop structures are<br />
computationally very efficient, but they suffer from low video quality due to the drift<br />
problem. The drift problem is caused by the mismatch between the actual reference frame<br />
used for motion estimation in the encoder and the degraded reference frame used for<br />
motion compensation in the decoder. The drift problem accumulates and causes severe<br />
degradation to the video quality.<br />
To avoid the high complexity problem in the cascaded architecture and the poor video<br />
quality problem in the open-loop structure, closed-loop systems provide a good tradeoff<br />
between quality and computational complexity. In closed-loop structure, significant<br />
complexity saving can be achieved while still maintaining acceptable video quality.<br />
Closed-loop systems provide drift compensation for re-quantized data. They aim at<br />
eliminating the mismatch between predictive and residual components by approximating<br />
the cascaded transcoding architecture. In the simplified closed-loop structure as shown in<br />
Figure 3, only one reconstruction loop is required with one DCT and one IDCT. Some<br />
architecture inaccuracy is introduced due to the non-linear nature in which the reconstruction<br />
loops are combined. However, it has been found that the approximation has<br />
little effect on the video quality.<br />
Figure 3. Closed-loop architecture.<br />
5
B. MPEG2 Video Coding<br />
Since we are relatively familiar with MPEG2 coding process, we only present a point that<br />
is easily overlooked. In MPEG2, the frame to be compressed is divided into 16×16 pixel<br />
macroblocks. Then, for each of these macroblocks in P and B frames, the reconstructed<br />
reference frame is searched to find a macroblock that best matches the macroblock to be<br />
compressed. The offset is encoded as a motion vector. The match between the two<br />
macroblocks will often not be perfect. To correct this, the encoder computes the residual<br />
of the original pixel values and the predicted pixel values. The residual for each<br />
macroblock is appended to the motion vector, and the spatial redundancy is further<br />
reduced by the DCT transform. Sometimes, no suitable match is found. Then, the<br />
macroblock is treated like an I-frame macroblock. Therefore, both inter and intra modes<br />
can be used to compress macroblocks in MPEG2 P and B frames.<br />
C. H.264 Video Coding<br />
In this project, we aim to add logos into H.264 coded video streams. The H.264 standard<br />
is the latest video coding standard. A brief description of the H.264 standard is described<br />
below.<br />
C.1. Intra Prediction<br />
If a block or macroblock is encoded in the intra mode, a prediction block is formed based<br />
on previously encoded and reconstructed (but un-filtered) blocks from the same slice.<br />
This prediction block is subtracted from the current block prior to encoding. For the<br />
luminance (luma) samples, a prediction block may be formed for each 4×4 subblock or<br />
for a 16×16 macroblock. There are a total of nine optional prediction modes for each 4×4<br />
luma block; four optional modes for a 16×16 luma block; and four optional modes for<br />
each 8×8 chroma block. Note that the same mode is always applied to both chroma<br />
blocks.<br />
C.2. Inter Prediction<br />
Inter prediction creates a prediction model from one or more previously encoded video<br />
frames. The model is formed by shifting samples in the reference frame(s) (i.e., motion<br />
compensated prediction). The H.264 codec uses block-based motion compensation. The<br />
H.264 standard supports motion compensation block sizes ranging from 16×16 to 4×4<br />
luminance samples with many options in between. The luminance component of each<br />
macroblock (16×16 samples) may be split up in four ways: 16×16, 16×8, 8×16 or 8×8.<br />
Each of the sub-divided regions is a macroblock partition. If the 8×8 mode is chosen,<br />
each of the four 8×8 macroblock partitions within the macroblock may be split into a<br />
further four ways: 8×8, 8×4, 4×8 or 4×4 (known as macroblock sub-partitions). These<br />
partitions and sub-partitions give rise to a large number of possible combinations within<br />
each macroblock.<br />
A separate motion vector is required for each partition or sub-partition. Each motion<br />
vector must be coded and transmitted. In addition, the choice of partition(s) must be<br />
encoded and stored in the compressed bit stream. Choosing a large partition size (e.g.<br />
16×16, 16×8, 8×16) means that a small number of bits are required to signal the choice of<br />
motion vector(s) and the type of partition; however, the motion compensated residual<br />
6
may contain a significant amount of energy in frame areas with high detail. Choosing a<br />
small partition size (e.g. 8×4, 4×4, etc.) may give a lower-energy residual after motion<br />
compensation, but requires a larger number of bits to signal the motion vectors and<br />
choice of partition(s). The choice of partition size, therefore, has a significant impact on<br />
compression performance. In general, a large partition size is appropriate for<br />
homogeneous areas of the frame and a small partition size may be beneficial for detailed<br />
areas.<br />
IV. EXISTING METHODS FOR MPEG2 COMPRESSED VIDEO LOGO INSERTION<br />
Most methods for logo insertion are developed for MPEG2 compressed video. In what<br />
follows, we will introduce such methods presented in several difficult papers, which<br />
handle logo insertion from different perspectives. We will also analyze whether the<br />
methods are appropriate to be adopted in logo insertion for H.264 compressed video.<br />
A. <strong>Logo</strong> Insertion Position<br />
For logo insertion, the first step is that the position of the logo needs to be determined.<br />
Liu’s paper has placed the logo on the top left hand corner. Another major issue is that<br />
the best logo position needs to be searched so that the effect on the coded macroblocks<br />
can be minimized. In Liu’s paper [3], the logo is aligned to the macroblocks to minimize<br />
the amount of macroblocks affected. If the logo is not aligned, then nine macroblocks<br />
are affected whereas if it is aligned to the macroblocks, then only four macroblocks are<br />
affected as shown in Figure 4.<br />
Figure 4. Diagram of logo aligned to macroblocks.<br />
B. <strong>Logo</strong> Insertion in Spatial Domain<br />
As explained in [3], one of the approaches for logo insertion is to insert the logo in the<br />
spatial domain. Figure 5 gives the block diagram for spatial domain logo insertion<br />
structure.<br />
7
Figure 5. <strong>Logo</strong> insertion in spatial domain.<br />
The transcoding works as follows. First, the input video stream (containing the motion<br />
vectors and the residuals for the P and B frames, and I frame) goes through entropy<br />
decoding to get the quantized versions. Then, the residual component for the P and B<br />
frames are sent through an inverse quantizer for dequantization; then, it will be fed<br />
through an inverse DCT transform. The motion vectors goes through two different paths.<br />
The first path is through motion compensation, where it will be combined with the<br />
feedback of the previous frame to form a prediction of the current frame without the<br />
residual. The output of the first motion compensation is then added to the residual to form<br />
the complete picture for the current frame. The output is also placed into a buffer so that<br />
the next frame can use the previous frame’s picture for motion compensation. The motion<br />
vector also goes through the second motion compensation block in the encoder part of the<br />
loop. It is used to correct the logo insertion errors since the encoder uses the same<br />
motion vector of the inputted video.<br />
After the output of the first motion compensation block has been added with the residual<br />
of the current frame, we will add the logo at this point using the formula (1). That is<br />
P(x, y) = α × (L(x,y)) + (1 – α) × (B(x,y)) .<br />
This is then fed in for error correction. After that, it will be encoded using a DCT block,<br />
quantized, and then entropy encoded for output. There is a rate control mechanism to<br />
keep a consistent bit rate output for the stream. This mechanism increases or decreases<br />
the quantizer parameter for Q2 as a means of controlling the bit and also the quality of<br />
the output frame.<br />
8
In [4], it shares the same architecture as described above except for the definition of the<br />
motion vectors (MV). The early transcoders re-uses the motion vector from the input bitstream.<br />
However, this motion vector may not be pointer to the best match of<br />
macroblock’s that are close to the insertion area since part of the content is always static.<br />
To maintain a high coding efficiency, paper [4] suggested the motion vector to be set to<br />
zero for logo macroblocks that are dominated by logo content and the original motion<br />
vector from the input bit-stream is used for logo macroblocks that are dominated by video<br />
content. For example,<br />
MV(x, y) = (0, 0), when α is greater than or equal to a threshold, e.g., 0.5.<br />
MV(x, y) = MV’(x, y), otherwise. That is, using MV’s decoded from input bit-stream.<br />
The above scheme is developed for MPEG2 coded video. For H.264 coded video,<br />
however, this logo insertion architecture needs to be modified due to the multireferencing<br />
capability in H.264. The above motion vector redefinition concept can<br />
probably be used in the H.264 video stream logo insertion. However, the threshold of<br />
transparency factor α needs to be refined carefully in order to achieve a good coding<br />
efficiency for H.264 coded video.<br />
C. <strong>Logo</strong> Insertion in Transform Domain<br />
The transform domain insertion algorithm is a little less complicated than the spatial<br />
domain one, as described by Liu [3]. There is less DCT block to worry about on the<br />
decoding and the encoding sides. Because of the linearity property of the DCT transform<br />
as shown in (2), transform domain additions are the same as spatial domain.<br />
DCT(m+n) = DCT(m) + DCT(n) and<br />
DCT{α(m(x,y))+(l-α)(n(x,y))} = αDCT{m(x,y)} + (l-α)DCT{n(x,y)}. (2)<br />
The architecture of the logo insertion in the transform domain is shown in Figure 6. The<br />
video stream is first entropy decoded and inverse quantized. This is similar to the spatial<br />
domain strategy. The logo is then inserted in the DCT domain. Next, motion vectors are<br />
fed back for DCT motion compensation, and subtraction is done for error correction.<br />
These steps are all done in DCT domain. Then, the output is quantized before entropy<br />
decoding and rate control. The quantized version is also inversely quantized and fed back<br />
to the buffer for motion compensation of the next frame and error correction.<br />
9
Figure 6. <strong>Logo</strong> insertion in transform domain.<br />
D. Low Cost and Efficient <strong>Logo</strong> Insertion<br />
High accuracy and efficiency are two important criteria for logo insertion. One efficient<br />
logo-insertion method proposed by Shu Xiao, etc, is shown in [5]. In this paper, the<br />
authors presented efficient logo insertion methods for transparent and non-transparent<br />
logos for MPEG2 compressed video. They considered the refinement of prediction modes<br />
and motion vectors for different types of macroblocks. The method should be able to<br />
apply in both spatial and transform domains.<br />
In logo-insertion transcoding for MPEG2 compressed video, we need to compensate the<br />
changes caused by the logo insertion. Such changes propagate through frames when P<br />
and B frames refer to the reference frames in motion prediction process. For H.264<br />
compressed video, however, change propagation happens also for I frames due to the use<br />
of intra prediction as described in Section III. Sometimes, logos are not inserted in all<br />
frames of a video sequence. Now, we start to introduce the method used in [5] for<br />
determining the affected range of frames caused by logo insertion for MPEG2 coded<br />
video.<br />
D.1. <strong>Logo</strong>-Affected Range of Frames in the Temporal Domain<br />
Let [l, h] denote the range of the video sequence where a logo is required to be inserted.<br />
That is, the indices l and h are the lowest and highest frame numbers of this range,<br />
respectively. We further let [L, H] represent the range of video sequence which is<br />
affected by the logo insertion due to the change propagation caused by frame reference.<br />
Clearly, we have [l, h] being a subset of [L, H]. Reference frames are frames of a<br />
10
compressed video that are used to define future frames. In MPEG2 video coding standard,<br />
reference frames are I and P frames. The previous reference frame lr and next reference<br />
frame hr of the range [l, h] are defined to be the reference frames outside the range [l, h]<br />
with the largest and smallest frame numbers, respectively. Then, the indices L and H can<br />
be determined using (3) and (4), as follows:<br />
L = lr<br />
+ 1,<br />
(3)<br />
⎧ hr<br />
if hr<br />
is not I frame,<br />
H = ⎨<br />
(4)<br />
⎩hr<br />
−1<br />
otherwise.<br />
Figure 7 gives two examples of logo-affected range of frames. In this example, we use<br />
the typical group of picture (GOP) structure: IBBPBBPBBP…, where I and P frames are<br />
reference frames. The frame dependencies are also drawn in the figure. In the logo range<br />
example one, [5, 16] is the range of frames [l, h] where logos are added. The previous<br />
reference frame lr of the range [l, h] is 3, and the next reference frame hr of the range [l, h]<br />
is 18. According to (3) and (4), we got the logo-affected range of frames [L, H] as [4, 18].<br />
Similarly, in example two, the range [L, H] is [4, 14].<br />
Figure 7. Examples of logo affected range of frames.<br />
Once the logo affected frame range [L, H] is identified, we can focus on compensating<br />
the changes induced by logo insertion within this range of frames. The frames outside the<br />
range [L, H] are not affected, and therefore remain unchanged.<br />
The scheme described above cannot be directly applied to H.264 compressed video. This<br />
is because, in H.264 video coding standard, all three types of frames (i.e., I, P, and B<br />
frames) can be used as reference frames. Multiple reference frames are also used, which<br />
makes determining the range [L, H] more challenging. The affected-frame range for<br />
H.264 coded video needs to be wisely adjusted.<br />
D.2. Motion Information Adjustment in the <strong>Logo</strong> and <strong>Logo</strong>-Affected Parts<br />
Having introduced the logo-affected range of frames, we now state the motion<br />
information adjustment in [5]. As mentioned earlier in Section II, a frame can be<br />
partitioned into the logo part, logo affected part, and logo unrelated part. For simplicity,<br />
in [5], the authors assume that the logo is rectangular and covers integer number of<br />
macroblocks. The coding modes and motion vectors for macroblocks at the logo<br />
unrelated part shall remain unchanged. In what follows, we discuss the motion-vector<br />
refinement of macroblocks in the logo and logo-affected parts, respectively.<br />
11
For I and P frames in the logo part, if the first reference frame in the range of [l, h] is a P<br />
frame, then set the macroblock mode (of those macroblocks in the logo part) to be intracoded.<br />
If the first reference frame in the range of [l, h] is an I frame instead, the<br />
macroblock modes for all I and P frames remain the same.<br />
For B frames in the logo part, if its frame number is smaller than the first reference frame<br />
in the range of [L, H], set the macroblock mode to be backward predicted. If its frame<br />
number is larger than the last reference frame in the range of [L, H], set the macroblock<br />
mode to be forward predicted or intra coded. Then, set the prediction mode for all other B<br />
frames in [L, H] to be forward predicted.<br />
Having the prediction modes refined, we set all motion vectors for inter-coded<br />
macroblocks in the logo part to be zero. Clearly, this scheme is suitable for nontransparent<br />
logos. For transparent logos, however, especially for logos with their<br />
transparency factor α being small (e.g., α
E.1. Constant Quality at the <strong>Logo</strong> Part<br />
We begin to discuss the non-transparent logo situation. Non-transparent logos should<br />
look still in the video sequence. Therefore, we set the quantization scales for all the intracoded<br />
macroblocks in the logo area to be the same value, so that the logo region has<br />
approximately the same quality. For inter-coded macroblocks, the zero motion vector and<br />
stationary logo content ensure the prediction accuracy. No residual information is needed.<br />
Hence, we use the skip mode for P frames, and the maximum quantization scales for B<br />
frames due to the lack of skip mode for B frames.<br />
Unlike non-transparent logos, transparent logos are superimposed on the original video<br />
frames. Therefore, the logo parts are not identical over different frames. To ensure<br />
approximately the same quality in the logo area, we still use the same quantization scales<br />
for all the intra-coded macroblocks in the logo area. For the inter-coded macroblocks, the<br />
prediction residuals are not zero nor negligible any more. In [5], a slightly bigger<br />
quantization scale is used for inter-coded macroblocks than intra-coded macroblocks.<br />
Both quantization scales for the intra and inter macroblocks are inverse-proportional to<br />
the total output bit rate. In doing so, we prevent the logo parts from consuming too many<br />
bits for a low bit rate situation.<br />
E.2. Bit Reallocation<br />
Rate control is also an important requirement for video transcoding. It is usually<br />
accomplished by adjusting the quantization scales. The transcoded video with logo may<br />
consume more bits than the original coded video. In order to control the overall bit rate of<br />
the encoded video while achieving a constant visual quality in the logo area, we need to<br />
adjust the quantization scales for the macroblocks not covered by logo. One simple<br />
practical scheme proposed in [5] is to pre-encode the logo part, then deduct the consumed<br />
bits from the total target bits, and adjust the bit allocation of the non-logo-covered area<br />
according to the left bit rate.<br />
Note that the bit reallocation method described above is not the most efficient under some<br />
circumstances. If the logo is inserted in the top left corner of a frame, the pre-encoding of<br />
the logo part is not necessary. This is because the logo part is coded before most of the<br />
other macroblocks are coded in this case, and the bits consumed by the logo-part are<br />
known before most adjustment can be done. Furthermore, if the logo transparency is high,<br />
adopting the original motion vectors might be more efficient than using zero motion<br />
vectors for the inter-coded macroblocks.<br />
V. MAIN ISSUES FOR H.264 COMPRESSED VIDEO LOGO INSERTION<br />
As indicated earlier, most existing methods for logo insertion are developed for MPEG2<br />
compressed video. The H.264 standard is much more complex and flexible, and therefore<br />
more difficult to handle and introduces more challenges.<br />
In H.264, intra prediction is used for coding the Intra macroblocks in the frames. Hence, I<br />
frame has logo-affected part due to the dependence within the same slice. This challenge<br />
13
did not exist in MPEG2. The logo-affected areas caused by intra prediction needs to be<br />
solved and properly compensated. The intra predication issue implies a complex<br />
challenge. Because every intra macroblock depends on neighboring macroblock pixels<br />
(i.e. the bottom-right corner pixel of top-left neighboring macroblock, and bottom row of<br />
pixels of top neighboring macroblock or right column of pixels of left macroblock) the<br />
border (boundary) pixels of the logo-affected area should remain the same otherwise a<br />
drift error will accumulate over time.<br />
H.264 utilizes better half-pixel approximation (6-tap bicubic interpolation) to find better<br />
match in the motion estimation stage where MPEG2 uses bilinear interpolation. There are<br />
many proposals to reuse motion vectors from the decoder, however, directly using those<br />
motion vectors is not optimal and an extra process is needed to compensate the difference<br />
between bicubical and bilinear interpolation. In addition H.264 supports quarter-pixel<br />
samples accuracy to further refine the motion vectors. This did not exist in MPEG2. This<br />
introduces another challenge. The challenges of half-pixel and quarter-pixel make many<br />
of the MPEG2 motion vectors transcoding solution inapplicable (not easily usable) for<br />
H264 transcoding.<br />
There are papers discussing and proposing a logo insertion in the DCT domain for<br />
MPEG2. DCT is done in 8×8 macroblocks. In H.264 the block size is different (i.e. 4×4)<br />
and the transform is integer-DCT-like transform. The issues here are: 1) different<br />
transform sizes. 2) MPEG2 DCT is independent process where in H.264 the transform is<br />
combined with quantization. 3) MPEG2 DCT logo insertion is efficient for intra<br />
macroblocks only because they are absolutely independent. These issues make transform<br />
domain logo insertion for H.264 extremely hard.<br />
MPEG2 quantization has 32 quantization steps with some dead zone assumptions.<br />
MPEG2 quantization is applied by division operations. In H.264 quantization is mixed<br />
with the transform. It also uses 52 quantization steps and applied by lookup tables and<br />
shifts. There are several suggestions for MPEG2 quantization and requantization<br />
mapping for rate control purposes. Almost all those are inapplicable because the whole<br />
process is too different in terms of steps ranges, quantization error approximation, and<br />
corresponding quality of each quantization step. Another related issue is the rate control<br />
distortion approximation is completely different. This makes the optimization equation<br />
for MPEG2 very suboptimal for H.264.<br />
The wide variety of the block size in H.264 is important feature that in general can<br />
improve the coding efficiency. However, in case of logo insertion, some of the modes<br />
might not have an effect on quality because the transcoder is encoding previously<br />
encoded frames. There are some existing techniques for mode selection in case of<br />
MPEG2 logo insertion. However, wider range of modes and macroblock sizes in H.264<br />
make so hard to apply any of MPEG2 current methods to H.264.<br />
H.264 uses variable width in-loop deblocking filter affecting up to 2-3 border pixels of<br />
each macroblock. This means the deblocked frames are used as references for feature<br />
frames. This creates a significant challenge which manifests as more boundary pixels are<br />
14
not to be changed. In other words after logo insertion more boundary pixel have to be<br />
identical to before logo insertion.<br />
Another issue with H.264, multiple reference frames can be used as references for motion<br />
prediction in the inter prediction mode. The prediction modes and motion vectors have<br />
more combinations and possibilities. There is also multiple backward reference frames<br />
for Inter modes in B frames. The issue alone will multiply the resources requirement (i.e.<br />
temp memory) of the H.264 transcoder versus MPEG2.<br />
H264 complexity offers new features which did not exist before such as I_PCM (lossless<br />
compression) and arbitrary slices shapes and sizes. These features might make logo<br />
insertions in H.264 completely different than MPEG2 current work. These feature to be<br />
investigated to study the efficiency of using them. In this project, we aim at solving logoinsertion<br />
transcoding issues for H.264 coded video.<br />
15
REFERENCES<br />
[1] ISO/IEC 13818-2, Generic Coding of Moving Pictures and Associated Audio Information: Video<br />
International Organization for Standardization, Nov. 1994, Draft International Standard (MPEG-2<br />
Video).<br />
[2] “Draft Text of Final Draft International Standard for Advanced Video Coding,” Int. Telecommun.<br />
Union-Telecommun. (ITU-T), Geneva, Switzerland, Recommendation H.264 (draft), Mar. 2003<br />
[3] Y. Liu, G. Li, Q. Tang, and J. Guo, “DCT Domain <strong>Logo</strong> Insertion of MPEG2 Transcoding,” in Proc.<br />
IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), vol.2, May 2003, pp.<br />
1219- 1222.<br />
[4] K. Panusopone, X. Chen, and F. Ling, “<strong>Logo</strong> insertion in MPEG transcoder,” in Proc. IEEE<br />
International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City , USA, vol.2,<br />
May 2001, pp. 981-984.<br />
[5] S. Xiao, L. Lu, J. L. Kouloheris, and C. A. Gonzales, “Low-Cost and Efficient <strong>Logo</strong> Insertion Scheme<br />
in MPEG Video Transcoding,” Proc. of SPIE, Visual Communications and Image Processing, vol.<br />
4617, Jan. 2002, pp. 172-179.<br />
[6] J. Xin, C.-W. Lin, and M.-T. Sun, “Digital Video Transcoding,” in Proc. of the IEEE, vol. 93, Issue 1,<br />
Jan. 2005, pp. 84-97.<br />
[7] N. Roma and L. Sousa, “Insertion of Irregular-Shaped <strong>Logo</strong>s in the Compressed DCT Domain,” 14th<br />
International Conference on Digital Signal Processing, vol.1, 2002, pp. 125-128.<br />
[8] D. G. Morrison, M. E. Nilson, and M. Ghanbari, “Reduction of the Bit-Rate of Compressed Video<br />
While in Its Coded Form,” in Proc. 6th Int. Workshop Packet Video, 1994, pp. D17.1–D17.4.<br />
[9] G. Keesman, R. Hellinghuizen, F. Hoeksema, and G. Heideman, “Transcoding of MPEG Bitstreams,”<br />
Signal Process. Image Commun., vol. 8, no. 6, pp. 481–500, Sep. 1996.<br />
[10] P. A. A. Assuncao and M. Ghanbari, “A Frequency-Domain Video Transcoder for Dynamic Bitrate<br />
Reduction of MPEG-2 Bit streams,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 8, pp. 953–<br />
967, Dec. 1998.<br />
[11] S.-F. Chang and D. G. Messerschmitt, “Manipulation and Compositing of MC-DCT compressed<br />
video,” IEEE J. Sel. Areas Commun., vol. 13, no. 1, pp. 1–11, Jan. 1995.<br />
16