ENCODING MULTIMEDIA PRESENTATIONS FOR ... - Project WAM

wam.inrialpes.fr

ENCODING MULTIMEDIA PRESENTATIONS FOR ... - Project WAM

ENCODING MULTIMEDIA PRESENTATIONS FOR USER PREFERENCES ANDLIMITED ENVIRONMENTSTayeb Lemlouma and Nabil LayaïdaOPERA Project, INRIA Rhône Alpes Research UnitTayeb.Lemlouma@inrialpes.fr Nabil.Layaida@inrialpes.frABSTRACTThis paper discusses a new approach of generating TV-likemultimedia presentations that are adapted to the target userpreferences and to limited devices. Three main points arediscussed: 1) The encoding of video presentations from a SMILspecification 2) The adaptation of the video content based on theuser preferences, and 3) The delivery of adapted multimediapresentations. The used architecture includes a content server, anadaptation proxy and a set of small devices in the form ofpersonal device assistants (PDA). These devices request thecontent through a wireless network. In order to show how thesystem behaves regarding the user preferences and capabilities,two negotiation dimensions are considered: the user languageand the memory capability of the device. The first dimension isused to generate a content that can be understood by the targetuser, e.g. a video with subtitles written in the preferred language.The second dimension is chosen to solve the problem of thesystem blocking that usually happens when limited devicesaccess rich multimedia presentations over the network.1. INTRODUCTIONSeveral multimedia presentation models exist; theyconsider not only the media content but also otherdimensions of the presentation: logical, spatial, temporaland hyperlink. The declarative specification of themultimedia presentations really represents an importantadvance in the multimedia handling field. This approachfacilitates the manipulation and the processing of thepresentation that has been considered as an atomic entityby classical approaches.SMIL 2.0 has become a World Wide Web Consortiumrecommendation in 2001. It is the dominant representationin Web technology for describing timing andsynchronization of multimedia presentations. A carefulattention has been given, in the design of SMIL, tomodularity and extensibility of the recommendation andthree language profiles have been proposed [2]. In thecontext of multimedia content adaptation for userpreferences, SMIL offers a set of interesting mechanismsthat provide a better flexibility [7]. In this paper weexploit SMIL to ensure the encoding of the video content.The SMIL content control module [1] is used in order togenerate a video presentation adapted to the userpreferences. The paper addresses also the problem ofvideo content adaptation for limited devices. Theconsidered capability dimension is the memory of thetarget device. This dimension is chosen in order to avoidthe blocking of limited users when they access to richserver multimedia presentations.2. FROM SMIL TO VIDEOThe video generation from the SMIL specificationincludes the SMIL parsing and the video encoding.SMIL specificationSMILVideo inputVideo encoderVideo outputFigure 1: Video generation from the SMIL specificationThe video encoding entity (Figure 1) includes thedecoding of the original video to an uncompressed form(RGB 24 bit format), the application of our SMIL encoderand the generation of the output video. The used videodecoder depends on which way the input video wasencoded initially. The decoder is configured to generatean uncompressed form of each video frame which issimply the pixel representation of the video. Each pixel inthe RBG form is represented by three bytes thatcorrespond to the red, green and blue color value of thepixel. A line l of an original frame is represented by a setof lineStride byte values. Where lineStride = pixelStride xvideoWidth. Here the pixelStride value equals to three (R,G and B components). Since a frame is a set ofvideoHeight lines, the size of an uncompressed frameequals to lineStride x videoHeigh bytes.Since we discuss only TV-like multimediapresentations obtained after manipulating the originalvideo; the SMIL encoder does not create pixels that arenot covered by the video frame box. This means formallythat an encoded pixel p (x, y) created by the SMILencoder satisfies always the condition: 0 ≤ x ≤ videoWidthand 0 ≤ y ≤ videoHeight. It is important to take intoaccount this last assertion while authoring the SMIL


same as the original answer of the server) to the user. Therole of the UCM listener is to receive the profile and theprofile changes of the users [4]. A profile contains a set ofpreferences and capabilities concerning the user [6], e.g.the screen size, the user agent, etc. This kind ofinformation (called usually negotiation dimensions) isused to perform properly the task of content adaptation.3.1. User preferencesThe negotiation dimension - related to the userpreferences - that we consider in this paper is representedby the system language of the user. Using the negotiationdimension communicated by the UCM module of theuser, the proxy evaluates the SMIL switch element andchooses, when possible, the first acceptable element. Anelement (a timing structure or a media object) with no testattribute is always acceptable. Example: Let us considerthe following part of a SMIL specification:Here, if the user prefers to receive the multimediapresentation in French, the proxy will adapt the previouspart to the following:The final SMIL presentation will so describe ascenario adapted to the target user. Thus from the sameSMIL presentation, the proxy can generate differentmultimedia presentations: e.g. - with different TV logos orsubtitles in different languages - by filtering some of themultimedia content based on the user type, etc. After thenegotiation step, the SMIL specification is adapted to theuser preferences. The proxy can send the new adaptedmultimedia presentation in the form of SMIL or video.The choice of the media type to send depends on the useragent of the target device (a video or a SMIL player).3.2. User capabilitiesThe capability dimension that we consider is representedby the memory capability of the target device. Thisdimension is chosen in order to resolve the problem of thesystem blocking that happens usually when limiteddevices access rich multimedia presentations over thenetwork. Indeed, embedded systems and user interfaces ofthese devices do not provide a good memory consumptioncontrol: applications still activated although they are notused, the UI does not make easy the consulting of theopened applications and documents, etc. Consequently,multimedia applications are usually subject of blockingwhen the available memory is not enough. Moreover, theuser application has no idea about the scenario and therequired resources of the multimedia presentation existingin the server side. One possible solution is the use of theproxy and making it responsible to transmit onlymultimedia data that can be played by the target device.To do so, we have enriched the UCM module by andadditional functionality that allows reporting (on-demandand regularly) the memory capacity and the currentmemory state of the device. This helps the proxy: 1- toclassify the device thanks to the whole capacity value ofthe memory, and 2- to follow the behavior and the currentstate of the user memory regarding the server multimediaapplication. The two following figures show an exampleof a SMIL presentation and the corresponding memorystate of a pocketPC (iPAQ 3600) that access to thepresentation using the HTTP protocol (through the NACproxy) over a 802.11 wireless network. The used player ispocketSMIL[5].Figure 3: The server SMIL presentationUsed memory (%)60504030201001 2 3 4 5 6 7 8 9 10 11 12time (s)Figure 4: The corresponding memory state of the PDA(sent by the UCM module)As we can note, the memory storage changes whenthe pocketSMIL starts to play the SMIL presentation.Naturally, the use of the memory increases when theplayer starts to download and decode the media objects. Inthis example the maximal used value represents 57% ofthe device memory. Actually, the memory-basedadaptation of the proxy stills in the form of a prototypeimplementation. The considered adaptation is representedby a simple dropping of the video frames when thepercentage of the used memory starts to be higher than agiven value: α. This value depends to the device and canbe estimated after several experimentations. It representsthe limit of the memory size that should not be exceededin order to avoid the device latency and blocking. Forexample, if α equals to 50% in the previous situation(Figure 4); the proxy drops the frames until the UCMreports a memory state that is less than α. When thishappens, the new frame bit rate will be kept untilreceiving a memory state that allows increasing thecurrent bit rate. This approach works fine and allowsvideoImage 1Image 2


avoiding the device blocking and the transmission ofuseless video frames. However, it presents two mainproblems: the estimation of the α value and theimpossibility of decreasing continuously the video bitrate. The last problem shows that the adopted approachdoes not always guarantee the adaptation of the video.This happens generally when the device has a very smallmemory capacity or when the memory is overloadedindependently to the multimedia application server.4. EXPERIMENTAL RESULTSIn the following some experimental results concerning thegeneration of the video content from two SMILspecifications (the same scenario with different subtitlesinstances) and using two different input videos. Regardingthe output video, the given measurements concern onlythe uncompressed generated video. The video stored inthe server side is obtained by applying a compressionmethod. The video compression is not the scope of thispaper. The following table gives some characteristics ofthe two videos used in the SMIL representations.Table1. The input videosVideo 1 Video 2Input format MPEG 1 Indeo Video 3Video dimensions352x288Frame rate (fps) 25.0Video size (Kbytes) 8501 8770Video duration (s) 30.99 30The general form of the used SMIL specifications canbe represented by the following temporal scenario:have seen in Section 2, the SMIL encoder uses thetemporal scenario to generate the aimed video. Here aresome screen shots from the generated video that respectsthe SMIL scenario.(a) (b) (c) (d)Figure 6: The encoded videoThe proxy can generate adapted videos that respect theuser preferences. Here is a screen shot from the videogenerated for a user that understands only English:Figure 7: User preferences consideration5. CONCLUSIONSThis paper combines the declarative definition ofmultimedia presentations with the video content encoding.A video encoding approach from a SMIL specification ispresented. The SMIL adaptability and its declarativedefinition have allowed adapting automatically (using aproxy based architecture) the multimedia presentation tothe user preferences. The generated presentation is in theform of a SMIL content or an encoded video. The samearchitecture was used to experiment a video adaptation forlimited device in order to avoid the frequent devicesblocking when they access to remote presentations.6. REFERENCESFigure 5: Temporal graphTable2. The output videosVideo 1 Video 2Output formatuncompressed format (RGB 24-bit)Video dimensions352x288Frame rate (fps) 25.8 25.0Video size (Kbytes) 197918 227940Video duration (s) 30.99 30Frame Number 801 750Frame duration (s) 0.0387 0.04Encoding and storing time(ms)35972 39577Average encoding time(s/frame)0.045 0.053Table 2 shows the characteristics of the video contentgenerated from the two used SMIL specifications. As we[1] Dick B. and Jeffrey A. SMIL 2.0 Content Control Modules.http://www.w3.org/TR/smil20/smil-content.html[2] Layaïda N. and Van Ossenbruggen J. SMIL 2.0 LanguageProfile. http://www.w3.org/TR/smil20/smil20-profile.html.[3] Lemlouma T. and Layaïda N. Adapted Content Delivery forDifferent Contexts. SAINT 2003 Conference, January 27-312003. Orlando, Florida, USA.[4] Lemlouma T. and Layaïda N. Universal Profiling for ContentNegotiation and Adaptation in Heterogeneous Environments.W3C Workshop on Delivery Context, W3C/INRIA Sophia-Antipolis, France, 4-5 March 2002.[5] Pocket SMIL. http://opera.inrialpes.fr/pocketsmil/. INRIARhône Alpes. 2002.[6] W3C. CC/PP. http://www.w3.org/TR/CCPP-struct-vocab/,W3C Working Draft 15 March 2001.[7] W3C. SMIL 2.0 Basic Profile and Scalability Framework.http://www.w3.org/TR/smil20/smil-basic.html.

More magazines by this user
Similar magazines