IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009 151Proxy-Based Reference Picture Selection for ErrorResilient Conversational Video in Mobile NetworksWei Tu, Student Member, IEEE, and Eckehard Steinbach, Senior Member, IEEEAbstract—We propose a frame dependency managementstrategy for error robust transmission of conversational video inmobile networks. We consider an end-to-end video transmissionscenario that involves both a wireless uplink as well as a wirelessdownlink plus some intermediate wireline network transmission.We also investigate the special cases of an end-to-end scenariowhere only a wireless uplink or a wireless downlink is present.We cope with packet loss on the downlink by retransmitting lostpackets from the base station to the receiver for error recovery.Retransmissions are enabled by using fixed-distance referencepicture selection during encoding with a prediction distance thatcorresponds to the round-trip time of the downlink combinedwith accelerated decoding. We deal with transmission errors onthe uplink by sending acknowledgments and predicting the nextframe to encode from those slices that have been correctly receivedby the base station. We show that these two separate approachesfor uplink and downlink efficiently complement one another andthe resulting end-to-end scheme is characterized by very lowcomputational complexity. We compare our scheme to severalstate-of-the-art error resiliency approaches and report significantimprovements.Index Terms—Adaptive reference picture selection, error resilience,proxy, real-time video transmission.I. INTRODUCTIONA. MotivationWITH further evolution of mobile networks, packet-orientedvideo services are expected to be among themost popular ones and may be the key factor for success.Wireless video applications without real-time constraints(e.g., multimedia messaging service) have been successfullyintroduced to the market. However, conversational video communicationover packet-switched wireless networks, which ischaracterized by very low delay requirements, remains challenging.Digital video compression significantly decreases the datarate for video content by exploring the spatial and temporalredundancy among video frames. However, decoding of erroneousor incomplete video bit-streams leads to severe qualitydegradations. Because of motion compensated prediction, theseManuscript received May 23, 2007; revised November 01, 2007. First publishedDecember 09, 2008; current version published January 30, 2009. Thiswork was supported by Deutsche Forschungsgemeinschaft (DFG) under GrantSTE 1093/3-1. Part of this work has been presented at ICME 2005 . Thispaper was recommended by Associate Editor J. Cai.W. Tu and E. Steinbach are with the Institute of Communication Networks(LKN), Media Technology Group, Technische Universität München, 80290Munich, Germany (e-mail: firstname.lastname@example.org; email@example.com).Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TCSVT.2008.2009240impairments also propagate in space and time and therefore typicallystay visible for a significant amount of time. Hence, anerror resilient transmission scheme is essential to achieve goodquality in a wireless video communication system.B. State-of-the-ArtNumerous studies have been performed to improve the errorresiliency for video transmission over lossy channels. Recentoverviews can be found, for instance, in , . Error resiliencytools defined for H.264/AVC are described in .Random intra-macroblock (MB) encoding improves theerror robustness by locally cutting the temporal dependencybetween consecutive frames (see e.g., ). Rate-distortion(RD) optimized inter/intra-mode selection improves the codingefficiency compared with random intra-MB encoding by selectingthose MBs to be intra-coded, which contribute mostto the error robustness without increasing the rate too much.ROPE proposed in  estimates the reconstruction distortionat the decoder using an analytical model for given transmissionchannel characteristics. The approach in  is computationallyquite demanding and uses approximations of features availablein today’s video coding standards. A more accurate but alsorather complex approach is the multidecoder distortion estimation(MDDE) proposed in . In this approach, the encodedframe is decoded by decoders in parallel at the encoder afterpassing through randomly generated virtual transmissionchannels which have similar characteristics as the real transmissionchannel. The averaged reconstruction distortion across thereconstructed pictures at the encoder is used to estimate thedistortion at the receiver side. Based on this estimation, optimalcoding modes in the rate-distortion sense can be selected.The multiple state encoding scheme in  takes advantageof multiple description coding (MDC) and path diversity to improvethe error robustness of video transmission over lossy networks.The effective packet loss probability is decreased bysending multiple descriptions over two or more different transmissionpaths with different characteristics, which is not alwaysfeasible for mobile users. Furthermore, when a video frame ofone description is corrupted, it is approximated from timelynearby frames in other descriptions. However, typically not allerrors can be recovered and error propagation is still an issue.RD-optimized mode decision is used widely for error robustnesswhen no feedback channel is available. When the receiverand the sender communicate bidirectionally, retransmissionof lost information triggered by feedback from the receiveris considered to be one of the most suitable error resiliencyapproaches for traditional data communication applications.The big advantage of feedback-based retransmission is itsinherent adaptiveness to varying loss rates. Retransmissions are1051-8215/$25.00 © 2009 IEEEAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
152 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009only triggered if the information is actually lost. The overheadencountered is therefore a direct function of loss rate andthe sender does not need to receive or estimate informationabout the expected channel condition. For bidirectional conversationalservices like video telephony, however, the benefitof packet retransmission is limited because of the stringentone-way latency requirement which is typically in the rangeof 150–250 ms. In  an elegant retransmission-based approachfor end-to-end error recovery called RESCU has beenproposed. The main idea of RESCU is to change the framedependencies in a video sequence such that a retransmissionof lost information can be used for error recovery with thehelp of accelerated retroactive decoding (ARD) , despitethe low delay requirements of real-time video communication.ARD assumes that received video data can be decoded fasterthan real-time. In RESCU, ARD is used to generate error-freereference frames from retransmitted packets. Alternative waysto exploit feedback information from the receiver have alsobeen proposed. Error tracking , , for instance, usesfeedback about lost packets at the sender to reconstruct theerror propagation. Corrupted areas are encoded in intra-modewhich leads to error recovery without introducing additionaldelay. The achievable performance is mainly a function of theround-trip time (RTT) between the sender and the receiver.Another proposal that uses feedback information to stop errorpropagation is NEWPRED , . Here, feedback aboutlost packets or correctly received packets is used to restrict theprediction from those image areas that have been successfullydecoded. The reference picture selection (RPS) concept introducedin H.263 Annex U and adopted in H.264/AVC supportsa standard-compatible implementation of NEWPRED. Moreover,the feedback information can also be incorporated intoRD-optimized mode decision as shown in , .C. ContributionsAs wireless networks are typically much more error-proneand unstable than wired networks, we concentrate ourselves onthe real-time video transmission over wireless networks. In thiswork, we propose an adaptive frame dependency managementstrategy for video telephony applications where mobile usersare participating in the conversation as shown in Fig. 1. We assumethat the base stations can send feedback about lost packetsto the sender and can retransmit lost packets to the receiver.Another assumption is that the decoder has enough computationalresources to decode retransmitted slices fast enough touse them to stop error propagation. If the transmission involvesonly a wireless downlink (SE1 to MS2 in Fig. 1), we deal withdownlink packet loss by using fixed-distance reference pictureselection (FDRPS) in combination with a proxy-based retransmissionof lost packets. The reference frame to be used for predictionis determined as a function of the RTT between the basestation (BS2) and the receiver (MS2) on the downlink. If thetransmission involves only a wireless uplink (MS1 to RE1 inFig. 1), feedback about lost packets is sent from the base station(BS1) to the sender (MS1), so that the encoder can use thisinformation to predict the next frame to be encoded from thevideo parts which are not negatively acknowledged. If both thesender and the receiver are located in mobile networks (MS1 andFig. 1. Mobile video telephony scenario.MS2) and hence both a wireless uplink and a wireless downlinkare involved in the end-to-end communication, the two abovementioned approaches are combined. Feedback information betweenthe mobile users and co-located base stations (BS1 andBS2) significantly decreases the feedback delay compared to thecase where feedback goes directly from the receiver (MS2) tothe sender (MS1), and thus greatly improves the efficiency oferror recovery. The proposed proxy-based RPS (PRPS) schemeis compatible with the H.264/AVC standard syntax and is ofvery low complexity as it requires only little processing at thebase stations.The remainder of this paper is organized as follows. Section IIintroduces selected state-of-the-art error robustness schemes indetail as they will be later used for performance comparison.In Section III, we describe our proposed framework for errorrobust video telephony in mobile networks. Section IV presentsour simulation results that show the improvements achievedby our proposal compared to the reference schemes describedin Section II. We analyze and compare the complexity of allschemes in Section V and conclude the paper in Section VI.II. ERROR RESILIENCE SCHEMES USED FOR COMPARISONThe sensitivity of compressed video to transmission errors ismainly due to the use of motion-compensated temporal prediction.If video packets are lost or corrupted during transmission,the video quality degrades even with error concealment. Moreover,the mismatch of the reference frame(s) at encoder and decoderleads to error propagation both in time and space. If thedependency between frames is reduced or the reference frame atencoder and decoder are resynchronized, the mismatch betweenthe encoder and decoder can be decreased or removed.In this section, we describe selected state-of-the-art error resiliencyapproaches in more details, as they are used for comparisonin Section IV. They are:• random intra-MB update (RIMU);• multidecoder distortion estimation (MDDE);• NEWPRED;• RESCU.Readers who are familiar with these schemes can skip this sectionand directly proceed to Section III, where our proposederror recovery scheme is presented.A. Random Intra-MB Update (RIMU)One way to stop temporal error propagation is the periodicinsertion of INTRA-coded pictures or macroblocks (MBs). Anintra-frame normally has a much larger size than inter-framesand thus leads to a big fluctuation in bitrate, which is notAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
TU AND STEINBACH: PROXY-BASED REFERENCE PICTURE SELECTION FOR ERROR RESILIENT CONVERSATIONAL VIDEO IN MOBILE NETWORKS 153suitable for real-time transmission. When the statistics of thetransmission channel are approximately known to the encoder,a better approach is to encode a certain percentage of MBsin every frame in intra-mode to stop the error propagation.Random placement of the intra-MBs has been shown to beefficient. Moreover, the random intra-MB update (RIMU) approach has been integrated into almost all reference softwareimplementations of standard codecs and is widely used,especially when there is no feedback information available.However, without accurate information about the channel statistics,the efficiency of RIMU is limited, which is particularlytrue if the packet loss rate changes rapidly over a wide range.Fig. 2.MDDE with feedback.B. Multidecoder Distortion Estimation (MDDE)As mentioned above, intra-coded MBs are independently decodeableand cut off the temporal dependency from previousframes. In RD-optimized mode decision, the selection of inter/intra-modes is determined using a Lagrangian cost function:where is the number of bits generated for the current MBwhen encoded with mode and is the correspondingdistortion (mean square error) associated with this mode.is the set of available coding modes. Conventionally,considers only the encoding distortion caused by quantization,which achieves an optimal RD performance only when thetransmission is error free. In the presence of packet loss, thetotal distortion is composed of both the encoding distortion andtransmission distortion where and represent the encoding and transmissiondistortion, respectively. The encoding distortion isthe distortion introduced by quantization while the transmissiondistortion refers to the distortion that is observed in thepresence of transmission errors after error concealment. Generallyspeaking, the “ ” operation in (2) represents a joint considerationof the two types of distortion rather than a mathematicalsummation operation. However, as discussed in , a good approximationof the total distortion is obtained by summingup the encoding and the transmission distortion. This isbecause the two distortions show only little correlation over awide range of source rates and loss probabilities. By replacingin (1) by from (2), we obtain a cost function thatis suitable to make an error robust mode decision for the transmissionof compressed video over lossy networks.A MDDE extension of H.264/AVC is proposed in , where apowerful yet computationally demanding method is introducedto estimate the expected reconstruction distortion. copies ofrandom channels with the same statistics as assumed for the realtransmission channel are employed at the encoder to simulatethe expected distortion at the decoder. If the channels areidentically and independently distributed, then as ,itfollows by the strong law of large numbers that(1)(2)In , the encoder does not receive feedback about successfullyreceived or lost packets from the decoder. If is notlarge enough, the estimation in (3) will be inaccurate and affectsthe distortion estimation for later frames. However, increasingadds considerable computational complexity at the encoder.In some particular application scenarios, a feedback channel isavailable and the feedback information can help the encoderto update the distortion estimation status. Here we extend theMDDE to work with feedback information.Fig. 2 shows the structure of the feedback-based MDDE(F-MDDE) as used in this work. When a NACK for the lostpacket (slice of frame ) is received by the encoder, itfirst performs error concealment for slice in frame andreplaces frame in all Decoding Picture Buffers (DPBs)with the concealed frame . Let this updated th frame passagain through random channels, we get the updated versionof the estimated frame and store it back in the DPBs. Thisupdate procedure continues until the most recent frame in theDPBs has also been updated. The number of update loops isdetermined by the RTT between the sender and the receiver.If the RTT is equal to frame intervals, such kind of updateshould be run for times.C. NEWPREDNEWPRED ,  uses the feedback about lost packets orcorrectly received packets to prevent the prediction from thoseimage areas that have been corrupted. In ACK-based NEW-PRED (A-NEWPRED), only those frames that have been positivelyacknowledged are used as a reference frame. In Fig. 3(a)we illustrate A-NEWPRED for a RTT of two frame intervals. Ifframe in Fig. 3(a) is corrupted and not acknowledged, framewill then use the acknowledged frame earlier than as the referenceframe, which is frame in this example. The followingframes and use and as reference and can be correctlydecoded. The coding efficiency of A-NEWPRED is determinedby the RTT between the encoder and decoder. A-NEWPREDachieves the highest coding efficiency for instantaneous feedback,which leads to a prediction from the most recent frame inthe error-free case. A large RTT leads to a larger prediction distanceand thus lower coding efficiency.NACK-based NEWPRED (N-NEWPRED) uses the most recentframe for motion-compensated prediction in the absence(3)Authorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
154 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009Fig. 4. RESCU with RTT = 2.instant when the next periodic frame is to be decoded, framewill be immediately decoded, which produces an error free referencefor frame . Please note that, in order to make RESCUwork, the receiver has to be able to decode retransmitted informationfaster than real-time.Fig. 3. NEWPRED for a RTT of 2 frame intervals.of negative acknowledgments as shown in Fig. 3(b). In caseNACKs are received, the prediction distance is increased. Assumeagain that frame is corrupted during the transmission.It is concealed at the decoder and a NACK for frame is sentback to the encoder as shown in Fig. 3(b). The next frame to beencoded switches its reference to the last successfully decodedframe , because is corrupted and the error also propagates toframe . Frame uses frame as the reference frame. If frameis successfully decoded as assumed in Fig. 3(b), the error propagationcaused by the loss of frame is terminated. In case frameis also negatively acknowledged, the error would propagateto frame . Frame would then again use as the reference.The coding efficiency of N-NEWPRED is higher than that ofA-NEWPRED when no error occurs, because in this case alwaysthe most recent frame is used as the reference. The coding efficiencydecreases with increasing packet loss rate. Furthermore,when the RTT is large, many frames will suffer from error propagation,which degrades the video quality significantly.According to , A-NEWPRED and N-NEWPRED showvery similar performance in the presence of transmission errors.In our experiments in Section IV, we therefore use onlyA-NEWPRED as a comparison scheme.D. RESCUThe main idea of RESCU  is to change the frame dependenciesin a video sequence such that a retransmission of lostinformation can be used for error recovery despite the low delayrequirements of real-time video communication. In RESCU,every th frame (frame , , in Fig. 4) is a so called periodicframe that references a previous periodic frame which isframe intervals away. Frames in between two consecutive periodicframes predict only from their immediately preceding periodicframe. If a non-periodic frame is lost, only this frame itselfis affected. As shown in Fig. 4, if the periodic frame is lost, thedisplayed frames to are affected by error concealment anderror propagation. A retransmission is triggered by the NACKsent to the sender. If the retransmission arrives before the timeIII. PROXY-BASED REFERENCE PICTURE SELECTIONIn this section, we present our proposal for error resilienttransmission of conversational video when either the receiver isa mobile user, or the sender is a mobile user, or both. We assumethat during encoding, when no transmission error is reported,the encoder always uses only one previous frame as a reference.We also assume that the feedback packets are strongly protectedand are error free.A. Downlink Error RecoveryWe first consider the scenario where the receiver is in a mobilenetwork and the sender is located in a wired network. In thiscase, our main target is to improve the error robustness on thedownlink.Automatic repeat request (ARQ) is an efficient way in datacommunication to recover packet loss based on feedback informationfrom the receiver. For conversational video, if the currentframe is predicted from the most recent frame as shown inFig. 5(a), the retransmission of a lost packet will typically arrivetoo late at the receiver to be used for the decoding processbecause of the strict delay constraints for conversational video.Moreover, with the typical I-P-P-P… structure, if frame is corruptedby packet loss, the error will propagate to the followingframes, shown as the dotted boxes in Fig. 5(a). We refer to thisprediction structure as FDRPS with distance 1 in the following.To facilitate error recovery, we propose to adjust the predictiondistance in number of frames between the reference frameand the encoding frame to match the RTT on the downlink asshown in Fig. 5(b). We assume that the RTT in Fig. 5(b) correspondsto 3 frame intervals and hence frame uses frameas its reference. If the same loss as before happens, only one ofthe prediction groups is affected. In addition, the increaseddistance to the reference frame gives additional time for the retransmissionof lost packets. Successfully retransmitted packetscan then be used to stop error propagation as will be explained inthe following. In Fig. 5(c), when frame is corrupted during thetransmission, it is played out after applying error concealment.The next frames belong to the other prediction groups,which are assumed to have error free reference frames and canbe displayed without impairments. As long as is larger thanthe downlink RTT in number of frames, the first retransmissionAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
TU AND STEINBACH: PROXY-BASED REFERENCE PICTURE SELECTION FOR ERROR RESILIENT CONVERSATIONAL VIDEO IN MOBILE NETWORKS 155Fig. 6. Coding efficiency as a function of the prediction distance for two testsequences using H.264/AVC with QCIF @ 15 Hz.Fig. 5.Error propagation for FDRPS when frame i is corrupted.of the lost frame can be accomplished before decoding anddisplaying frame . If the retransmission succeeds, frameis decoded using ARD and the error free reconstructed frameis put into the decoding picture buffer (DPB), which now providesan error free reference picture for frame belonging tothe same prediction group. If the first retransmission also fails,frame is shown also with error concealment at its displaytime and the second retransmission is triggered. In this case,when frame is received error free, both frame and haveto be re-decoded to make the reference picture of frameerror free. At a packet loss rate of 10%, two retransmissions alreadyreduce the residual packet loss rate to 0.1%, which is quitesmall and leads to limited impairments. In such an extreme situationof multiple successive losses of a packet and its retransmissions,we can also send a feedback message directly to theencoder to ask for a resynchronization, which will stop errorpropagation. However, the end-to-end RTT determines how fastthe error recovery can happen.Conversational video is usually encoded with slice structurein order to improve the error resiliency. One video frame normallyconsists of several slices, which can be independentlydecoded. Therefore, we encapsulate one slice into one packet.When one or several packets from frame are corrupted, insteadof retransmitting the whole frame, only those affected packetshave to be resent.As can be seen from Fig. 5(c), the RTT on the downlink isa key factor influencing the efficiency of the proposed error recoveryscheme. Because the encoder can be left out from thisprocedure, any network node able to perform the retransmissioncan be used. In the considered scenario, the base stationBS2 in Fig. 1 is the closest point to the mobile receiver. Runningthe retransmission proxy on this base station or at least asclose as possible to it leads to the minimum round-trip delay.Fig. 6 shows the required bitrate as a function of the predictiondistance for two standard video test sequences. All points onthe curves in Fig. 6 are obtained with the same quantization parameter(QP) of 28 during the encoding, which means that thePSNR values on the same curve are very similar but not exactlyidentical. Although the coding efficiency in the error-free caseis decreased because of the increased prediction distance, wewill show in Section IV that even for low loss rates this effect iscompensated by the greatly improved error-resilience.A possible alternative would be to use a variable-distanceRPS encoding scheme, while keeping the prediction distancelarger than the RTT. We do not adopt this alternative in ourwork because of the following reasons. First, as described inSection III-B, when the uplink error recovery is performed, theprediction distance already becomes adaptive. Second, variabledistanceRPS will give only small performance improvementsas most of the blocks/frames are predicted only from the mostrecent frame . However, significant increase of computationalcomplexity will be introduced by variable-distance RPS.Finally, we assume that the RTT on the downlink is stable inthe short term. In the long term, the prediction distance canbe adapted when an indication about a change in RTT arrivesfrom the receiver. Our proposed FDRPS ARQ approach fordownlink error recovery has some similarity with RESCU  asdescribed in Section II-D. However, in comparison to RESCU,we consider proxy-based retransmission of lost information onthe downlink and therefore avoid low coding efficiency becauseof the large end-to-end delay encountered when using RESCU.Also, different to RESCU, in our scheme it does not matterwhich frame is affected by packet loss. In RESCU, if the periodicframe is corrupted, all frames which depend on the periodicframe and will be displayed before the retransmission succeedsare affected. In our approach, we perform at most tworetransmissions for every lost packet on the downlink. If the retransmissionarrives in time, the error free frame is reconstructedusing accelerated decoding and error propagation is stopped. Incase both retransmissions get lost, in our scheme only a subsequenceof the video will be affected by error propagation. Thisallows us to stop displaying this subsequence and to ask for aresynchronization frame. When comparing these two schemes,RESCU favors a channel with low packet loss rate and our approachperforms better at higher packet loss rates, which willalso be shown in our simulation results in Section IV-F.B. Uplink Error RecoveryThe error recovery strategy described in the last section isdesigned for packet losses on the downlink. If the sender is inAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
156 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009a mobile network (MS1 in Fig. 1) and the receiver (RE1) isconnected to a wired network, error recovery from packet losseson the uplink is required.As the sender here is also the encoder, we can use a similarapproach as NACK-based NEWPRED  (described in detailas N-NEWPRED in Section II-C) for error robustness on theuplink. As long as no NACK is received by the sender, N-NEW-PRED uses the most recent frame as the reference frame. Thisframe, however, might be corrupted during the transmissionor affected by error propagation as explained in Section II-C.In order to perform the prediction always from an error freeframe, a small change is made here compared to the originalN-NEWPRED scheme. Instead of using the immediately precedingframe as the reference frame in the absence of NACKs,we predict from the frame with a distance to the current framethat corresponds to the RTT on the uplink. Therefore, RTTis also a key factor for our uplink error recovery scheme. Weshow in Section IV-F that this modification leads to improvedperformance compared with the original NEWPRED. Again,we propose to setup the proxy on the base station close to thesender, which is BS1 in Fig. 1. This proxy is responsible forchecking the integrity of video packets sent over the uplink andreturning corresponding NACKs to the sender. If a video packetis not successfully received by BS1, it returns a NACK to MS1.When the NACK is received by the encoder, the following threepossible actions can be taken by the encoder to stop the errorpropagation.1) Frame Level RPS (FLRPS): In case a packet is lost on theuplink, BS1 returns a NACK to MS1 and the sender reacts to theNACK by changing the reference frame for the next frame to beencoded. The frame with lost packet(s) is not used as a referencefor any following frames. As illustrated with the dashed arrow inFig. 7(a), frame is predicted from the most recent error-freeframe earlier than frame , which is frame in our example.If frame is also corrupted, the prediction turns to frame, and so on and so forth. As a result, this packet loss on theuplink will only affect one single frame at the receiver, whichwill be later displayed using error concealment.2) Slice Level RPS Without Error Concealment (SLRPS):FLRPS stops error propagation by always using an error freeframe as the reference frame. However, a single packet loss in aframe leads to an increased prediction distance, which degradesthe coding efficiency. In SLRPS we apply the dynamic RPS onthe slice level. As shown in Fig. 7(b), when one packet (slice )in frame is lost on the uplink, we keep the concealed frameas a potential reference frame, while excluding the missing area.Additionally, frame also serves as a reference frame. Becauseof the strong correlation between temporally consecutiveframes, most parts of frame are still predicted from frame ,and only a small part of the frame takes a corresponding part inframe as reference, illustrated with the dash-dotted arrowand dashed arrow in Fig. 7(b), respectively. In case more thanhalf of the packets in frame and frame are lost, the predictionwill also include frame .3) Slice Level RPS With Error Concealment (SLRPSEC): Asmall alternation in SLRPS is to perform error concealment forthe corrupted frame as soon as the encoder learns about the loss.This is possible due to the proposed modification of N-NEW-Fig. 7. Adaptive RPS triggered by feedback from the base station to the sender.PRED, which adjusts the regular prediction distance to the RTTof the uplink. The concealment is performed before encodingthe next frame which predicts from the corrupted frame. If thisstrategy is applied, we can again use just one reference frame.The error concealment scheme should be the same as that usedat the decoder. Fig. 7(c) represents the error recovery performedwhen packet of frame is lost. The error caused by packet losscan be concealed temporally, spatially or using a combination ofboth. Frame is then predicted from the concealed frame ,shown with the dash-dotted arrow in Fig. 7(c). The advantagesof this approach are the lower motion estimation complexity andthe reduced memory requirement compared to SLRPS. However,additional complexity is introduced by the error concealment.C. CombinationIn the previous two sections, we have presented our proposalsfor error recovery from packet losses on the uplink and downlink,respectively. In this section, we consider the case whenboth the sender and the receiver are in mobile networks. Thetwo error recovery schemes can be employed individually on theuplink and downlink. From an end-to-end transmission point ofview, they cooperate and complement each other well. We referto this combination from now on as the proxy-based referencepicture selection (PRPS) framework, which provides an efficientand error robust solution for the end-to-end conversational videoapplication for mobile users.Let us assume that in Fig. 1, MS1 is the sender and MS2 is thereceiver. The video packets are sent uplink to base station BS1and from there to BS2. BS2 forwards the video stream downlinkto the receiver MS2. We also assume that during the connectionestablishment process, the RTT on the downlink is signaled toAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
TU AND STEINBACH: PROXY-BASED REFERENCE PICTURE SELECTION FOR ERROR RESILIENT CONVERSATIONAL VIDEO IN MOBILE NETWORKS 157Error robust mobile video telephony using the proposed PRPS frame-Fig. 8.work.Fig. 9. RD performance of RIMU for the Foreman sequence and 1% randompacket loss in both uplink and downlink.the encoder. If the RTT on the uplink is greater than or equalto the RTT on the downlink, we need not do any adjustmentto the adaptive RPS scheme used on the uplink. Otherwise, weadjust the prediction distance to the RTT that is observed on thedownlink. In other words, the larger RTT determines the defaultprediction distance being used by the encoder.Fig. 8 shows an example that illustrates how both parts worktogether when SLRPSEC is used for the uplink error recovery.The RTT on the downlink is assumed to be larger than the RTTon the uplink and corresponds to 3 frame intervals. Hence, thedefault prediction distance is selected to be 3. This is illustratedin the top row of Fig. 8. When packet of frame gets lost on theuplink, the encoder performs error concealment for frame anduses it as the reference for frame . This lost slice will be concealedand displayed at the decoder (see bottom row of Fig. 8).The lost packets on the uplink from MS1 to BS1 do not arrive atBS2, which saves some transmission rate on the downlink andcan be used for retransmission of lost packets. When slice inframe gets lost on the downlink, the NACK from MS2is received by BS2 and slice is retransmitted before sendingframe . Frame is then re-decoded without error and isused as the reference for frame . As frame is encodedwith the concealed frame as the reference frame, at the decoderusing the identical reference frame leads to a resynchronizationbetween the encoder and decoder. Please note, here we only usethe SLRPSEC as an example on the uplink, however, the othertwo schemes FLRPS and SLRPS can also be used in our proposedframework.IV. SIMULATION RESULTSThe purpose of this section is to evaluate the performanceof our proposed framework under different network conditions.We first show comprehensive results for the comparisonschemes introduced in Section II and then compare them withour proposed proxy-based RPS (PRPS) framework.We use the H.264/AVC test software version JM 11.0 as the video codec. The first 300 frames of the test sequencesForeman and Salesman at QCIF resolution are encoded at 15fps with an I-P-P-P… structure. We select a slice to correspondto one row of MBs. For transmission, one slice is putinto one packet. The default error concealment techniquesFig. 10. RD performance of MDDE and F-MDDE with K = 30 for theForeman sequence and 1% random packet loss in both uplink and downlink.defined in JM 11.0  are used at the receiver for display.In case a whole frame is lost, it is concealed by copying theprevious reconstructed frame. We assume that the maximumRTT on the uplink and downlink are 200 ms, which in our casecorresponds to a prediction distance of 3 frames (frames) between the current frame and the reference frameat a frame rate of 15 fps. The end-to-end round-trip delayincluding the wireless and wireline networks is assumed to be400 ms, which corresponding to 6 frame interval (end-to-endframes). A random packet loss channel model anda burst packet loss channel using the two-state Gilbert-Elliottmodel are employed in our simulation. Without specification,the average burst length for the burst loss model is set to be 5packets and n% packet loss means n% packet loss on the uplinkand n% packet loss on the downlink. For each simulation,100 different channel realizations are tested and the averagedPSNR value of the luminance component as reconstructed anddisplayed at the receiver is reported.A. RIMUAs described in Section II-A, RIMU is widely used to improvethe error resilience of video transmission when no feedbackchannel is available. The performance strongly dependson the accurate estimation of the packet loss rate on the transmissionchannel. An improper intra-MB rate leads to very lowperformance. Fig. 9 shows the RD performance of RIMU overAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
158 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009Fig. 11. RD performance of F-MDDE for a RTT of 6 frames for the Foreman and Salesman sequences.a 1% random loss channel for different numbers of intra-updatedMBs per frame. RIMU_MB00 shows a special case whenthe encoding mode of each MB is determined by the RD costfunction in the codec, which considers only quantization error.Without any protection, very poor performance is obtained evenat such a low packet loss rate. RIMU_MBn shows the performancewhen randomly selected MBs in each frame are encodedin intra-mode. We can see from the figure that the optimalupdate rate changes when the target bit rate varies. As theoptimal depends not only on the packet loss rate, but also onthe video content and the target bit rate, it is hard to determine.Therefore, we show the performance of RIMU in Section IV-Fwith the highest achievable quality at each loss rate, which formsan upper bound on the performance of RIMU. We obtain thisupper bound by running through all possible update rates andpicking the one which leads to the best performance.B. F-MDDEIn this section, we compare the performance of the originalMDDE (implemented in JM 11.0) to the feedback-basedMDDE (F-MDDE) scheme introduced in Section II-B. Accordingto , when the number of decoders at the encoderis equal to 30, the estimation of the distortion is already quiteaccurate while the computational complexity is still reasonable.Therefore, we set for all simulations in this paper. Inour F-MDDE implementation, the same RD mode decision asin MDDE is employed. The distortion is re-estimated usingupdated decoding pictures in the DPB after evaluating thefeedback information. Fig. 10 shows the results for MDDE andF-MDDE based RD optimized mode decision for a channelwith 1% random packet loss. 30dec_NOF stands for the casewithout feedback and 30dec_AK12, 30dec_AK6, 30dec_AK3represent the F-MDDE with a RTT of 12 frames, 6 frames and3 frames, respectively. The F-MDDE approach outperformsthe MDDE by 0.5 dB to 2 dB. The smaller the RTT, the moreaccurate the distortion estimation and thus the better the performance.When instantaneous feedback is available, the encoderexactly knows the reference frame at the decoder side. Thisis typically not possible in practice, however, the 30dec_AK0curve in Fig. 10 gives an upper bound on the performance ofF-MDDE. For the same QP used during encoding, a significantlysmaller rate at a slightly reduced PSNR is observedwhen compared to 30dec_AK3. This is due to the fact that onlysome of the most severely affected MBs in the current frameare encoded in intra-mode when there are some impairments inthe decoded picture, which significantly increases the codingefficiency. However, the RD-optimized mode decision alsoselects inter-mode for those MBs with small distortion, whichleads to some error propagation.Fig. 11 shows the performance of F-MDDE for differentchannel models for a fixed RTT of 6 frames. At 1% packet lossrate, the same performance for the random loss (30dec_AK6_r1)and the burst loss channel (30dec_AK6_b1) can be observed forthe Salesman sequence. For the Foreman sequence at 1% lossrate, the performance for the burst loss channel is significantlybetter than for the random loss channel because of the strongererror propagation for random losses.An important assumption made by MDDE is the correctknowledge of the average packet loss rate on the transmissionpath. Sometimes this information might not be available orthe estimation of it might be wrong. The two dash-dottedcurves in Fig. 11(a) show the reconstruction quality of theForeman sequence when the assumed packet loss rate is set tobe 2% while the real packet loss rate is 5% on the uplink anddownlink, respectively. The underestimated packet loss rateleads to fewer intra-MBs with less bitrate but also much lowerPSNR. For the low motion Salesman sequence in Fig. 11(b),the underestimation is not so critical, because the saved ratecan fully compensate the degradation of the video quality whenthe rate is lower than 75 kbps.C. NEWPREDIn our simulation, only the ACK based NEWPRED (A-NEW-PRED) is examined, because according to , A-NEWPREDand N-NEWPRED have similar performance. Fig. 12(b) showsthe RD performance curves for different RTTs of A-NEW-PRED when it is employed as the error resilience approachfor 1% and 5% packet loss channel, respectively. At the samepacket loss rate, A-NEWPRED performs better for burst losses(e.g., NEWPRED_b1_AK6) than random losses (e.g., NEW-PRED_r1_AK6) because the implementation here is frameAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
TU AND STEINBACH: PROXY-BASED REFERENCE PICTURE SELECTION FOR ERROR RESILIENT CONVERSATIONAL VIDEO IN MOBILE NETWORKS 159Fig. 12. RD performance of NEWPRED for the Foreman sequence for different RTTs.Fig. 13.RD performance of RESCU for the Foreman sequence for different RTTs.based, where even a single packet loss leads to the switchingof the reference frame and a larger prediction distance. Burstpacket loss with consecutive packet losses but same numberof total lost packets results in a smaller prediction distanceon average and thus a better performance. With 5% packetloss rate, the gap is even bigger between the random loss andburst loss. When the same transmission channel is used, theend-to-end RTT dominates the performance of A-NEWPRED.With an end-to-end RTT of 3 frames, NEWPRED_r1_AK3achieves about 1.5 dB improvements when compared withNEWPRED_r1_AK9, which has a much larger RTT.D. RESCUAs described in Section II-D, RESCU works end-to-endbetween the sender and receiver. When packets belongingto the periodic frame are lost, retransmissions of those lostpackets are triggered. However, frames till the next periodicframe will be affected by the error propagation from thisperiodic frame. Fig. 13 shows the performance of the RESCUapproach for the two different channel types. RESCU_r1_AK3and RESCU_b1_AK3 in Fig. 13(a) illustrate the performanceof RESCU for 1% random and burst packet loss channels,respectively, for a RTT of 3 video frames. When the RTTbecomes larger, the distance between the two periodic framesalso has to be increased. As all non-periodic frames are predictedfrom the immediately preceding periodic frame, thecoding efficiency is therefore degraded. Meanwhile, the errorpropagation also becomes more severe when a periodic frameis corrupted. Therefore, the larger the RTT, the lower thereconstruction quality at the receiver. At the same packet lossrate, RESCU over burst channel performs better than that overrandom channel, because fewer frames will be corrupted bythe packet loss when the loss is bursty and the reconstructionerrors in non-periodic frames do not propagate to later frames.Fig. 13(b) shows the performance of RESCU for 5% packetloss. The gap between the curves for different RTTs is largerthan in Fig. 13(a), which shows that the performance of RESCUdegrades significantly at high packet loss rates.E. Adaptive RPS for Uplink Error RecoveryIn Section III-B, three adaptive RPS approaches for uplinkerror recovery are introduced. In this section, we compare theperformance of these three approaches when combined with thesame error recovery scheme on the downlink (FDRPS+ARQ).To keep the error concealment simple at the encoder, we copythe slice at the same spatial position in the previous frame toconceal the lost slice. At the decoder, the frame to display usesthe standard error concealment scheme in JM 11.0 , whileAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
160 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009Fig. 14.Performance of the adaptive RPS schemes used for uplink error recovery, Foreman, RTT of 3 frames.the frame put into the decoder reference picture buffer is concealedusing the same concealment approach as that used at theencoder.Fig. 14 illustrates the performance of FLRPS, SLRPS andSLRPSEC at different channel conditions. At low loss rate (1%),it happens very rarely that two consecutive frames are corrupted,which means the prediction distance typically needs to be increasedby only one frame. As shown in Fig. 6, a one stepprediction distance increase leads to a 10%–40% rate increase.With only 1% loss in the uplink, at most 1% of the total framesneed to reselect their reference pictures, which results in at most0.4% rate increment. Therefore, the three approaches have almostthe same performance in this case. If the wireless channelhas 5% random loss, the 2% rate increment can already be seenin the corresponding subfigure. If the loss is bursty at 5%, performancegaps can be clearly observed among the three schemes.SLRPSEC has the lowest performance in this case because thestandard error concealment is less efficient for burst losses.F. Performance of PRPSIn this section, we first investigate the performance of ourproposed PRPS scheme as described in Section III-C for variousRTTs on the uplink and the downlink. Then, we comparethe PRPS scheme with the state-of-the-art error robustness approachesdescribed in Section II. SLRPS is used as the errorrecovery scheme on the uplink in the following in our PRPSframework. Fig. 15 shows the RD performance of PRPS as aFig. 15. RD performance of PRPS as a function of the RTT on the uplink anddownlink for a 5% packet loss channel. The mean burst length is 5 packets. Thetest sequence is Foreman.function of the RTT on the wireless links. PRPS_AK1 representsthe case when the maximum RTT on the uplink and downlinkis 1 frame interval, which means that the feedback informationcan be obtained almost instantaneously. As expected,this ideal condition achieves the highest RD performance andthe larger the RTT, the lower the performance. Please note thatthe performance degradation is not a linear function of the RTT.For increasing RTT, the additional performance degradation decreases.Authorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
TU AND STEINBACH: PROXY-BASED REFERENCE PICTURE SELECTION FOR ERROR RESILIENT CONVERSATIONAL VIDEO IN MOBILE NETWORKS 161Fig. 16. Performance of PRPS and the comparison schemes for the Foreman sequence.As mentioned in Section IV-A, to get the curves for RIMUin Fig. 16, we vary the number of intra-MBs from 0 to 80 perframe and pick the simulation run with the best performance.The RIMU scheme shown here for comparison therefore worksbetter than it would perform in practice where the current lossrate is normally unknown to the sender and picking the optimumintra-refresh rate would be impossible. For the feedback basedapproaches RESCU, NEWPRED and F-MDDE in Fig. 16, theend-to-end RTT is set to be 6 frames. Due to the faster feedbackprovided by the proxies in our approach, the RTT is set to 3frames on both uplink and downlink. We assume that there isno additional delay in the wired network.As shown in Fig. 16, our proposed approach (PRPS) outperformsthe other schemes for all channel conditions. RESCU performssecond best at low loss rate. Especially, when the lossis bursty, fewer periodic frames are corrupted and it achievesthe same RD curve as PRPS. For higher loss rate (5%), theerror propagation caused by the loss of periodic frames degradesthe performance of RESCU significantly and NEWPRED performsbetter. The other two approaches, F-MDDE and RIMUhave much lower performance. F-MDDE outperforms RIMUas a result of the RD-optimized mode decision with feedback.However, the distortion still needs to be estimated through multipledecoders. This leads to a performance gap of up to 2.5dB compared to the three approaches which have an exact decoderside information. Compared to the Foreman sequence, theSalesman sequence has a much lower motion activity. At lowloss rate, all approaches have a very close performance. Therefore,in Fig. 17 we only show the performance for 5% randomand burst packet loss. RESCU performs better than PRPS becauseSalesman is not so sensitive to packet loss. At the samereconstruction quality, PRPS has a higher bitrate because of thecoding structure it uses. However, when the RTT or the packetloss rate increases, the performance of RESCU declines muchfaster than for PRPS.Fig. 18 shows the mean reconstruction quality in PSNR forthe fives schemes as a function of loss rate for the Foreman sequence.As the wireless channel is normally bursty, in this experiment,we use the Gilbert-Elliott model to generate the upanddownlink test channels with 1%, 3%, 5%, 7%, 10% packetloss rate and keep the average burst length to be 5 packets. Theavailable transmission rate is restricted to be 150 kbps includingall overheads, such as the retransmission of lost packets. TheRIMU curve again connects the best points with optimal updaterate at all loss rates. Our proposed PRPS outperforms allother schemes at all loss rates and significant improvements canbe observed in Fig. 18. Compared with RIMU_MB0, the lowercoding efficiency of all error resilient schemes is compensatedby the much higher performance even for 1% packet loss, whichsupports our statement in Section III-A. Moreover, we can findout the trade-off between NEWPRED and RESCU. It can beseen, that RESCU is more efficient for low packet loss rates.In Fig. 19, we illustrate the impact of the RTT on the meanreconstruction quality for the Foreman sequence for all feed-Authorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
162 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009Fig. 17. Performance of PRPS and the comparison schemes for the Salesman sequence.Fig. 18. Mean reconstruction quality as a function of packet loss rate for a meanpacket burst loss length of 5 for the Foreman sequence.Fig. 19. Mean reconstruction quality as a function of RTT for 5% packet lossrate for a mean burst length of 5 for the Foreman sequence.back based approaches. The transmission channel is again assumedto have a capacity of 150 kbps at 5% packet loss ratewith a mean burst length of 5 packets. We assume that the RTTon the uplink and downlink are fixed to be 3 frames and theaxis represents the RTT of the wired network between thetwo base stations. When it equals to 0, it means there is no additionaldelay in the wired networks, which leads to an upperbound for the end-to-end error resiliency schemes. In this case,the end-to-end RTT is equal to 6 frames (3 frames on the uplink,0 frames on the wired link and 3 frames on the downlink).The proposed PRPS scheme has a constant performance whenthe RTT between the two base stations increases. F-MDDE alsoshows a stable but much lower performance because of its inaccuratedistortion estimation. The performance of NEWPREDand RESCU declines when the RTT increases and the larger theRTT in the wired networks, the bigger the gap between themand the PRPS approach.As described in Section III, we have proposed two differenterror recovery schemes for uplink and downlink. So far wehave shown the performance when the two schemes workin concert. In Table I, we give the performance when eithera wireless uplink or a wireless downlink is involved in theend-to-end transmission. All results in Table I are obtained for5% burst packet loss either on the uplink or the downlink forTABLE IONE MOBILE USER IN A WIRELESS NETWORKthe Foreman sequence at 150 kbps including all overheads. TheRTT for the other schemes is still assumed to be 6 frames and 3frames for PRPS. PRPS_uplink performs much better than theother approaches with the advantage of small RTT. Similarly,PRPS_dwlink outperforms the other approaches and has veryclose performance to RESCU. RESCU performs well here becausethere is only 5% packet loss in total on the transmissionpath, while in the previous simulations, 5% loss on the uplinkand 5% loss on the downlink result in up to 10% packet lossfor the end-to-end transmission. When the loss rate is low, theerror propagation in RESCU is limited and is compensated bythe relatively higher coding efficiency.V. COMPLEXITY ANALYSISAs shown in Section IV, all error resilient video transmissionstrategies improve the quality of the conversational videoapplications for mobile users. However, because of the limitedcomputational resources and memory of mobile terminals, notAuthorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
TU AND STEINBACH: PROXY-BASED REFERENCE PICTURE SELECTION FOR ERROR RESILIENT CONVERSATIONAL VIDEO IN MOBILE NETWORKS 163all above mentioned schemes can be applied in practice. Therefore,in this section, we evaluate the computational complexityand memory requirements of these approaches.Random intra-MB update is one of the error resiliency featureswhich have been included in the H.264/AVC codec .Given the number of MBs that should be encoded in intra-mode,a random number is generated for each MB to determine itsmode. By this early mode decision, the complexity of the modedecision for the whole frame is reduced and almost no additionalcomplexity or storage requirement is introduced.Most of the RD-based mode decision approaches lead to highcomputational complexity. All possible rate and distortion combinationsare examined to achieve an optimal RD performance.In ROPE, the distortion chain for every single pixel have to beconstructed. In MDDE, the encoder has to perform times decodingwhen encoding one single frame, which is a brute forceway to get the estimated distortion. Even if the is set to be30 as used in this paper, this scheme still imposes a very heavyburden on the mobile terminals. If the feedback informationwith a RTT of frames should also be included, a reconstructionof the distortion chain for frames is needed in ROPEand steps updating of DPB as well as additional decoding offrames are required by F-MDDE. At the same time, aframes memory buffer is needed at the encoder for both approaches.NEWPRED and RESCU have almost the same complexity.As in our work, they always predict from one single referenceframe and hence the computational complexity is almost thesame as for conventional encoding. With a RTT of frames,NEWPRED needs to store the previous frames in the buffer,no matter if it is A-NEWPRED or N-NEWPRED. In contrast,RESCU has a moderate storage requirement. As all non periodicframes only predict from the most recent periodic frame and arenot used as reference frames at the encoder, only one periodicframe has to be stored.In our approach, the decoder needs to report the status ofthe received packets, which is needed in all feedback based approaches.In addition, accelerated decoding has to be supportedby the decoder, which is also required by RESCU. As the decodinghas much lower complexity than encoding, it will not addtoo much burden to the decoder. At the encoder, with FLRPS,we have no additional complexity. With SLRPS, we will havemore than one reference frame for motion estimation if the defaultreference frame is not error free. As the motion searchrange is limited within 16 pixels and we put one row of MBsinto one slice, only part of the frame is affected if the referenceframe is corrupted. As shown in Fig. 20, if the 5th slice in frameis corrupted, in frame , only slice 4, 5, and 6 will be possiblyaffected. These three slices predict from two frames andall other slices in frame just use frame as the referenceframe, which limits the additional complexity on motion estimation.The SLRPSEC always uses a single reference frame withthe additional cost of performing error concealment.The storage cost of our proposed approach is limited by theinvolvement of proxies. With an end-to-end RTT of frames,the encoder only needs to store at most frames withSLRPS, even if we assume in the worst case that there is notransmission delay between the two base stations. Here is theFig. 20.Search range of the slice level RPS without error concealment.additional reference frame determined by the burst length of thechannel, which should be much smaller than . SLRPSEC andFLRPS use always one reference frame and hence need to storeonly frames in our PRPS scheme.VI. CONCLUSIONWe have presented a low complexity framework for errorresilient transmission of conversational video in wirelessenvironments. We combine fixed distance reference pictureselection with retransmission of lost packets to deal with losseson the downlink. The prediction distance is adjusted to theround-trip time of the downlink which gives us the opportunityto retransmit lost packets and to use successfully retransmittedpackets to stop error propagation by accelerated decoding. Thisstrategy is combined with adaptive RPS on the uplink triggeredby feedback from the base station to the sender. Please notethat our proposal is fully standard-compatible when usingH.264/AVC. The two major assumptions we make are thefollowing. We assume that the base stations can send feedbackabout lost packets to the sender and can retransmit lost packetsto the receiver. The second assumption is that the decoderhas enough computational resources to decode retransmittedslices fast enough to use them to stop error propagation. As afinal point, we would like to mention that our approach can becombined with other error resiliency approaches for which weexpect even better performance.REFERENCES W. Tu and E. Steinbach, “Proxy-based reference picture selection forreal-time video transmission over mobile networks,,” in Proc. IEEEInt. Conf. Multimedia Expo (ICME’05), Amsterdam, Netherlands, Jul.2005. Y. Wang, S. Wenger, J. T. Wen, and A. K. Katsaggelos, “Review oferror resilient coding techniques for real-time video communications,”IEEE Signal Process. Mag., vol. 17, no. 4, pp. 61–82, Jul. 2000. Y. Wang and Q. Zhu, “Error control and concealment for video communications:A review,” Proc. IEEE, vol. 86, pp. 974–997, May 1998. S. Kumar, L. Xu, M. K. Mandal, and S. Panchanathan, “Error resiliencyschemes in H.264/AVC standard,” J. Vis. Commun. Image Represent.,vol. 17, no. 2, pp. 425–450, Apr. 2006. T. Stockhammer, “Error robust macroblock mode and reference frameselection,” in Joint Video Team (JVT) of ISO/IEC MPEG & ITU-TVCEG JVT-B102, Geneva, Switzerland, Jan. 2002. R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with optimalinter/intra-mode switching for packet loss resilience,” IEEE J.Sel. Areas Commun., vol. 18, no. 6, pp. 966–976, Jun. 2000. T. Stockhammer, M. Hannuksela, and T. Wiegand, “H.264/AVC inwireless environments,” IEEE Trans. Circuits Syst. Video Technol., vol.13, no. 7, pp. 657–673, Jul. 2003. J. G. Apostolopoulos, “Reliable video communication over lossypacket networks using multiple state encoding and path diversity,” inProc. SPIE VCIP’01, San Jose, CA, Jan. 2001, pp. 392–409. I. Rhee and S. Joshi, “Error recovery for interactive video transmissionover the internet,” IEEE J. Sel. Areas Commun., vol. 18, no. 6, pp.1033–1049, Jun. 2000.Authorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.
164 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009 M. Ghanbari, “Post processing of late cells for packet video,” IEEETrans. Circuits Syst. Video Technol., vol. 6, no. 12, pp. 669–678, Dec.1996. E. Steinbach, N. Färber, and B. Girod, “Standard compatible extensionof H.263 for robust video transmission in mobile environments,” IEEETrans. Circuits Syst. Video Technol., vol. 7, no. 12, pp. 872–881, Dec.1997. W. Tu and E. Steinbach, “Proxy-based error tracking for H.264 basedreal-time video transmission in mobile environments,” in Proc. IEEEICME’04, Taipei, Taiwan, Jun. 2004, pp. 1367–1370. S. Fukunaga, T. Nakai, and H. Inoue, “Error-resilient video codingbydynamic replacing of reference pictures,” in Proc. IEEEGLOBECOM’96, London, UK,, Nov. 1996, pp. 1503–1508. “An Error Resilience Method Based on Back Channel Signalling andFEC,” ITU-T/SG15/LBC-96-033, Telenor R&D, San Jose, 1996. T. Wiegand, N. Färber, K. Stuhlmüller, and B. Girod, “Error-resilientvideo transmission using long-term memory motion-compensated prediction,”IEEE J. Select. Areas Commun., vol. 18, no. 6, pp. 1050–1062,Jun. 2000. Y. Liang, M. Flierl, and B. Girod, in Proc. IEEE ICIP’02, Rochester,NY, Sep. 2002, pp. 181–184. K. Stuhlmüller, N. Färber, and B. Girod, “Adaptive optimal intra-updatefor lossy video transmission,” in Proc. SPIE VCIP’00, Perth, Australia,Jun. 2000, pp. 286–295. Y. Huang, B. Hsieh, T. Wang, S. Chien, C. S. S. Ma, and L. Chen, inProc. IEEE ICME’03, Baltimore, MD, Jul. 2003, pp. 145–148. FhG HHi Berlin, JVT H.264/MPEG-4 AVC Reference Software. [Online].Available: http://iphome.hhi.de/suehring/tml/index.htm Y. K. Wang, M. Hannuksela, V. Varsa, A. Hourunranta, and M. Gabbouj,“The error concealment feature in the H.26L test model,” in Proc.IEEE ICIP’02, Rochester, NY, Sep. 2002, pp. 729–732.Wei Tu (S’04) received the B.S. degree fromEast China University of Science and Technology,Shanghai, China, in 1999 and the M.S. degree inelectronic engineering from Technische UniversitätMünchen, Munich, Germany, in 2003, where he iscurrently working toward the Ph.D. degree in theMedia Technology Group, Institute of CommunicationNetworks.His research interests include error robust videotransmission, RD-optimized scheduling and cachemanagement for video on demand services.Eckehard Steinbach (M’96–SM’08) studied electricalengineering at the University of Karlsruhe,Karlsruhe, Germany, the University of Essex, Colchester,U.K., and ESIEE, Paris, France. He receivedthe Engineering Doctorate from the University ofErlangen-Nuremberg, Germany, in 1999.From 1994 to 2000, he was a Member of theResearch Staff of the Image Communication Group,University of Erlangen-Nuremberg. From February2000 to December 2001, he was a PostdoctoralFellow with the Information Systems Laboratory,Stanford University, Stanford, CA. In February 2002, he joined the Departmentof Electrical Engineering and Information Technology, Technische UniversitätMünchen, Munich, Germany, as a Professor for media technology. His currentresearch interests are in the area of audio-visual-haptic information processing,image and video compression, error-resilient video communication, andnetworked multimedia systems.Authorized licensed use limited to: National Cheng Kung University. Downloaded on February 6, 2009 at 08:15 from IEEE Xplore. Restrictions apply.