09.01.2013 Views

D3.1 Deliverable Description of the state-of-the-art ... - Hitech Projects

D3.1 Deliverable Description of the state-of-the-art ... - Hitech Projects

D3.1 Deliverable Description of the state-of-the-art ... - Hitech Projects

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>D3.1</strong> <strong>Deliverable</strong><br />

<strong>Description</strong> <strong>of</strong> <strong>the</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong><br />

CANTATA<br />

•••••••••••••••••••••••••••••••••••••••••••••<br />

Project number: ITEA05010<br />

Document version no.: 1.0<br />

Status: Final<br />

Edited by: Dominique Segers, Barco, Belgium<br />

Thursday, 26 April 2007<br />

ITEA Roadmap domains:<br />

Major: Services & S<strong>of</strong>tware creation<br />

Minor: Cyber Enterprise<br />

ITEA Roadmap technology categories:<br />

Major: Content<br />

Minor: Data and content management


History:<br />

Document<br />

version #<br />

Date Remarks<br />

v0.10 8/11/2006 Initial document st<strong>art</strong> by Dominique Segers, Barco<br />

V0.11 21/11/2006 First compilation by Dominique Segers, Barco<br />

V0.12 21/11/2006 Second compilation by Dominique Segers, Barco<br />

V0.13 22/11/2006 Edit after input from CodaSystem<br />

V0.14 15/12/2006 New Structure<br />

V0.15 19/12/2006 New Structure and sections by Juana Sánchez, Telefónica<br />

V0.16 20/12/2006 Edit after new input from Telefónica<br />

V0.17 12/01/2007 Edit after new input from CodaSystem<br />

V0.18 18/01/2007 Edit after input from Solid<br />

V0.19 06/02/2007 Edit after review from Egbert, LogicaCMG<br />

V1.0 26/04/2007 Final approval by <strong>the</strong> PMT<br />

Contributors:<br />

Dominique Segers, Barco<br />

Ismael Fuentes, I&IMS<br />

Juana Sánchez Pérez, Telefónica<br />

John de Vet, iLab<br />

Jorma Palo, Solid<br />

Johannes Peltola, VTT<br />

Raoul Djeutane, CodaSystem<br />

Gorka Marcos Ortego,VicomTech<br />

Nicolas Damien, Centre Henri Tudor<br />

This document will be treated as strictly confidential. It will only be public to those who<br />

have signed <strong>the</strong> ITEA Declaration <strong>of</strong> Non-Disclosure.


TABLE OF CONTENTS<br />

1 Introduction....................................................................................................................6<br />

1.1 The Aim <strong>of</strong> <strong>the</strong> activity.................................................................................................. 6<br />

1.2 Potential P<strong>art</strong>ners contributions: .................................................................................. 6<br />

2 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> User Interfaces <strong>of</strong> applications and services .................................... 7<br />

2.1 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> UI <strong>of</strong> applications on mobile phones ................................................... 7<br />

2.1.1 Introduction ............................................................................................................ 7<br />

2.1.2 Video applications on mobile phones ..................................................................... 7<br />

2.1.2.1 VideoImpression - Mobile Edition [53].................................................. 7<br />

2.1.3 Photo applications on mobile phones..................................................................... 9<br />

2.1.3.1 PhotoBase Deluxe - Mobile Edition [54]................................................ 9<br />

2.1.4 Video Surveillance over IP ................................................................................... 10<br />

2.1.4.1 IRIS [55] ............................................................................................... 10<br />

2.1.4.2 The 3rdi Security System [56] .............................................................. 10<br />

2.1.4.3 DLink DCS-2120 Wireless Internet Camera with 3G Mobile Video<br />

Support [57] .......................................................................................................... 10<br />

2.1.4.4 NIOO VISIO [58] ................................................................................. 12<br />

2.1.5 Video Surveillance over IP with content analysis on server ................................. 12<br />

2.1.5.1 Visio Wave [59].................................................................................... 13<br />

2.1.5.2 3rdeye - Video Surveillance on Your Mobile [60] ............................... 14<br />

2.1.6 Interactive composition and scene mixing............................................................ 16<br />

2.2 UI <strong>of</strong> services for IP-enabled TV and Set-Top Boxes ................................................. 19<br />

2.2.1 On-line services for IP-enabled TV and Set-Top Boxes ....................................... 19<br />

2.2.2 Flash-based content adaptation in Set-Top Boxes............................................... 19<br />

3 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> Compression Algorithms .................................................................. 20<br />

3.1 Motion JPEG-2000 and Wireless (P<strong>art</strong> 11) JPEG-2000 ............................................. 20<br />

3.1.1 Introduction .......................................................................................................... 20<br />

3.1.2 Scope and Features <strong>of</strong> Motion JPEG-2000 .......................................................... 21<br />

3.1.3 Scope and Features <strong>of</strong> Wireless JPEG-2000 ....................................................... 21<br />

3.1.4 Video Coding with Motion Compensated Prediction............................................. 23<br />

3.2 Codification technologies ........................................................................................... 26<br />

3.2.1 Introduction .......................................................................................................... 26<br />

3.2.2 MPEG-1 and MPEG-2 .......................................................................................... 26<br />

3.2.3 MPEG4................................................................................................................. 27<br />

3.2.3.1 MPEG-4 architecture ............................................................................ 27<br />

3.2.3.2 CODECS (MPEG-4 Visual y MPEG-4 Audio).................................... 27<br />

3.2.3.3 MPEG-4 Systems (BIFS)...................................................................... 28<br />

3.2.3.4 MPEG-4 P<strong>art</strong> 20 (LASeR and SAF [44]) ............................................. 28<br />

3.3 Additional formats for most power devices (future) .................................................... 33<br />

3.3.1 VC1 [21] ............................................................................................................... 33<br />

3.3.2 Device-oriented screens....................................................................................... 33<br />

3.4 Analysis <strong>of</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong> <strong>art</strong> image compression algorithms for medical applications .. 34<br />

3.4.1 Still image compression such as JPEG, JPEG-LS and JPEG-2000 ..................... 34<br />

3.4.2 Intra-frame image compression such as MJPEG-2000 ........................................ 36<br />

3.4.3 Inter-frame image compression such as MPEG-4 AVC........................................ 36


4 User Interface Adaptation ........................................................................................... 37<br />

4.1 Introduction ................................................................................................................ 37<br />

4.2 MPEG-4 Advanced Content visualization technologies.............................................. 37<br />

4.2.1 S<strong>of</strong>tware BIFS reproducers .................................................................................. 37<br />

4.2.2 GPAC: Osmo4...................................................................................................... 38<br />

4.2.3 IBM: M4Play......................................................................................................... 38<br />

4.2.4 Envivio TV............................................................................................................ 39<br />

4.2.5 Bitmanagement: BS Contact MPEG-4.................................................................. 39<br />

4.2.6 Octaga Pr<strong>of</strong>essional............................................................................................. 40<br />

4.2.7 Digimax: MAXPEG Player .................................................................................... 40<br />

4.2.8 COSMOS ............................................................................................................. 40<br />

4.3 UI adaptation based on XML...................................................................................... 40<br />

4.3.1 UI adaptation based on XML transformation ........................................................ 40<br />

4.3.2 Adaptation via XML publishing servers ................................................................ 42<br />

4.3.3 Adaptation based on <strong>the</strong> definition & identification <strong>of</strong> <strong>the</strong> device.......................... 43<br />

4.3.3.1 Composite Capabilities / Preference Pr<strong>of</strong>iles ....................................... 43<br />

4.3.3.2 UAPROF (OMA).................................................................................. 44<br />

4.3.3.3 Device <strong>Description</strong> Repository............................................................. 45<br />

4.3.4 XML based UI adaptation..................................................................................... 45<br />

4.3.4.1 UIML User Interface Meta Language................................................... 46<br />

4.3.4.2 AUIML ................................................................................................. 46<br />

4.3.4.3 XIML (eXtensible Interface Markup Language).................................. 46<br />

4.3.4.4 XUL ...................................................................................................... 47<br />

4.3.4.5 TERESA XML...................................................................................... 47<br />

4.3.4.6 USIXML ............................................................................................... 48<br />

4.3.4.7 AAIML [43].......................................................................................... 48<br />

4.3.4.8 XForms and RIML................................................................................ 49<br />

4.3.4.9 MPEG-21 .............................................................................................. 50<br />

4.4 Device ontology ......................................................................................................... 50<br />

4.5 Agent-base user interface adaptation......................................................................... 50<br />

5 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> system architecture........................................................................... 51<br />

5.1 DLNA ......................................................................................................................... 51<br />

5.2 mTag.......................................................................................................................... 52<br />

5.3 Content retrieval and device management................................................................. 54<br />

6 References.................................................................................................................... 61


1 Introduction<br />

1.1 The Aim <strong>of</strong> <strong>the</strong> activity<br />

The activity preparing <strong>the</strong> production <strong>of</strong> <strong>the</strong> <strong>Deliverable</strong> 3.1 encompasses all <strong>the</strong> Topics<br />

addressed in <strong>the</strong> WP 3:<br />

• Topic 3.1 Device-oriented UI adaptation.<br />

• Topic 3.2 User-oriented UI adaptation.<br />

• Topic 3.3 Content-oriented UI adaptation.<br />

• Topic 3.4 Presentation and interaction with users.<br />

And should also map with <strong>the</strong> different domains that are targeted within <strong>the</strong> CANTATA<br />

project:<br />

• Multimedia consumer.<br />

• Medical Imagery.<br />

• Surveillance.<br />

The deliverable <strong>D3.1</strong> aims thus at establishing an as complete as possible State <strong>of</strong> <strong>the</strong> <strong>art</strong><br />

analysis <strong>of</strong> <strong>the</strong> WP technologies.<br />

1.2 Potential P<strong>art</strong>ners contributions:<br />

• Barco.<br />

• I&IMS.<br />

• Telefonica.<br />

• iLab.<br />

• Solid.<br />

• VTT.<br />

• CodaSystem.<br />

• ViconTech.<br />

• Centre Henri Tudor.<br />

Remark VTT:<br />

Since VTT is still unfunded p<strong>art</strong>ner and WP2-management work takes all spare<br />

resources VTT cannot promise to p<strong>art</strong>icipate to <strong>the</strong> WP3 until <strong>the</strong>y have received<br />

funding. VTT may receive funding early 2007 if all goes well.


2 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> User Interfaces <strong>of</strong> applications and services<br />

2.1 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> UI <strong>of</strong> applications on mobile phones<br />

2.1.1 Introduction<br />

This document describes de <strong>state</strong> <strong>of</strong> <strong>the</strong> <strong>art</strong> <strong>of</strong> UI <strong>of</strong> Video applications on mobile phones.<br />

This <strong>state</strong> <strong>of</strong> <strong>the</strong> <strong>art</strong> will consist on presenting some video applications that exist and<br />

which run on mobile phone and we’ll describe <strong>the</strong>ir principal functionality.<br />

2.1.2 Video applications on mobile phones<br />

The applications below present a resume <strong>of</strong> what is done currently concerning video<br />

application on mobile phones.<br />

2.1.2.1 VideoImpression - Mobile Edition [53]<br />

VideoImpression is one solution developed by Arcs<strong>of</strong>t.<br />

This application allow to create and share custom mini-movies featuring user’s own<br />

videos, photos and slide shows, with custom animated titles, credit screens,<br />

soundtracks and scene transitions.


These are <strong>the</strong> principal functionalities:<br />

• Capture video on your mobile device.<br />

• Playback video you record, download, or receive from friends.<br />

• Trim video clips.<br />

• Combine multiple clips toge<strong>the</strong>r.<br />

• Add transition effects between clips.<br />

• Add titles and credits.<br />

• Share your movies via infrared, BlueTooth, email, or MMS.<br />

• File format support: ASF, 3GP, MP4 for video; PCM, ADPCM, MP3, and AMR for<br />

audio.<br />

• Video codec support: H.263, MPEG-4.


2.1.3 Photo applications on mobile phones<br />

2.1.3.1 PhotoBase Deluxe - Mobile Edition [54]<br />

Photobase is ano<strong>the</strong>r developed by Arcs<strong>of</strong>t:<br />

These are <strong>the</strong> key features <strong>of</strong> this application:<br />

ArcS<strong>of</strong>t Panorama Maker<br />

Designed specifically for low pr<strong>of</strong>ile devices, your customers can capture multiple<br />

photos and have <strong>the</strong>m automatically stitched toge<strong>the</strong>r.<br />

Auto Red-eye Removal<br />

Give your customers this quick fix that instantly and automatically removes pesky redeye.<br />

Still Image Capture<br />

When using a camera phone, it is important to have an intuitive application that allows<br />

users to capture stunning pictures. The Still Image Capture component <strong>of</strong>fers several<br />

quality enhancement options for your images on <strong>the</strong> device. Components include:<br />

• White Balance (hardware solution).<br />

• Brightness and Contrast.<br />

• Digital Zoom.<br />

• JPEG Encoding.<br />

Edit and Enhancement<br />

A variety <strong>of</strong> editing and enhancement functions are provided, such as red-eye removal,<br />

crop and rotate. Users can edit <strong>the</strong>ir photos before <strong>the</strong>y store or share <strong>the</strong>m.<br />

Media Management Sharing<br />

With PhotoBase Deluxe, your customers can manage <strong>the</strong>ir photos when <strong>the</strong>y are on <strong>the</strong><br />

go. This application provides a complete solution, allowing your customers to sort,<br />

album, display, and label <strong>the</strong>ir images. Instantly create a slide show with cool transition<br />

effects and sound. Users can share <strong>the</strong>ir images through BlueTooth, MMS, infrared and<br />

email.<br />

Fun Features<br />

PhotoBase Deluxe provides a variety <strong>of</strong> fun features and content. The Panorama<br />

Maker feature provides instant photo stitching capabilities to your mobile device. Add<br />

clip <strong>art</strong>, fun frames, and text to any image. Download more content for special holidays<br />

and occasions.


2.1.4 Video Surveillance over IP<br />

2.1.4.1 IRIS [55]<br />

IRIS cameras are able to transmit live or recorded video to your mobile phone over a<br />

standard mobile phone network. When you want to look at what’s going on just use <strong>the</strong><br />

IRIS viewing s<strong>of</strong>tware on your mobile phone to connect to your camera via <strong>the</strong> IRIS<br />

Control Centre.<br />

IRIS cameras can also detect when an intruder has entered your home through <strong>the</strong>ir<br />

sensors. When an alarm is triggered on your camera <strong>the</strong> IRIS Control Centre sends you<br />

a text message alert. You can <strong>the</strong>n look at a recording <strong>of</strong> <strong>the</strong> event that set <strong>the</strong> camera<br />

<strong>of</strong>f or see what’s happening now.<br />

2.1.4.2 The 3rdi Security System [56]<br />

3rdi cameras can detect when an intruder has entered your home using infrared and<br />

motion sensors. When an alarm is triggered on your camera <strong>the</strong> 3rdi control centre<br />

sends you a text message alert. You can <strong>the</strong>n look at a recording <strong>of</strong> <strong>the</strong> event that<br />

triggered <strong>the</strong> camera or see what's happening even if your phone is switched <strong>of</strong>f when<br />

<strong>the</strong> alert is sent to you, video <strong>of</strong> <strong>the</strong> event is stored at <strong>the</strong> 3rdi control centre for up to 30<br />

days so you can look at it when it's most convenient for you.<br />

You can also see what’s happening at <strong>the</strong> camera location by simply accessing<br />

it via your mobile phone.<br />

2.1.4.3 DLink DCS-2120 Wireless Internet Camera with 3G Mobile Video Support [57]<br />

DCS-2120 is a wireless internet security camera developed by DLink which allow<br />

remotely watching over and observing a place. It can connect to your network<br />

through a fast e<strong>the</strong>rnet port. This camera also allows sending alert messages<br />

(emails) if it detects a suspect movement.<br />

Here are <strong>the</strong> specifications <strong>of</strong> this camera.<br />

3g mobile video from your phone and more<br />

The DCS-2120 <strong>of</strong>fers both consumers and small businesses a flexible and convenient<br />

way to remotely monitor a home or <strong>of</strong>fice in real time from anywhere within a mobile<br />

phone’s 3G service area. When used in conjunction with <strong>the</strong> email alert system, mobile<br />

users can now view a camera feed without a notebook PC and wireless hotspot. This<br />

live video feed can <strong>the</strong>n be accessed through 3G cellular networks by compatible cell<br />

phones*.<br />

In addition to cellular phone monitoring, <strong>the</strong> 3GPP/ISMA video format also enables<br />

streaming playback on a computer. The camera is also viewable from any Internet<br />

Streaming Media Alliance (ISMA) compatible device and <strong>of</strong>fers support for RealPlayer®<br />

10.5 and QuickTime® 6.5 viewing. The DCS-2120 supports resolutions up to 640x480<br />

at up to 30fps using compression rates.


Convenient management options<br />

D-Link’s IP surveillance camera management s<strong>of</strong>tware is included to enhance <strong>the</strong><br />

functionality <strong>of</strong> <strong>the</strong> DCS-2120. Manage and monitor up to sixteen compatible cameras<br />

simultaneously with this program. IP surveillance can be used to archive video straight<br />

to a hard drive or network-attached storage devices, playback video, and set up motion<br />

detection to trigger video/audio recording or send e-mail alerts. Alternatively, it is<br />

possible to access and control <strong>the</strong> DCS-2120 via <strong>the</strong> web using Internet Explorer. As<br />

you watch remote video obtained by <strong>the</strong> DCS-2120, it is possible to take snapshots<br />

directly from <strong>the</strong> web browser to a local hard drive, making it ideal for capturing any<br />

moment no matter where you are.<br />

This is <strong>the</strong> diagram <strong>of</strong> this system.


2.1.4.4 NIOO VISIO [58]<br />

Nioo Visio is one solution developed by Neion Graphics which enables remote<br />

visualization without any constraint.<br />

This application allows to connect to on one many cameras, perform zoom, and to<br />

remotely capture photography, from a PDA or sm<strong>art</strong> phone.<br />

It can allow for example to visualize what is passing at your home when you are not<br />

present.<br />

Conclusion:<br />

From <strong>the</strong> example <strong>of</strong> applications before, we conclude that video applications which<br />

exist now allow creating, generating and managing videos and pictures. These<br />

applications do not allow to manage or done any action depending on <strong>the</strong> content <strong>of</strong> <strong>the</strong><br />

media based directly on <strong>the</strong> media.<br />

2.1.5 Video Surveillance over IP with content analysis on server<br />

There exist o<strong>the</strong>r solutions based on <strong>the</strong> Video content analysis which generate an action<br />

depending on <strong>the</strong> action in o<strong>the</strong>r camera. We have for example a solution developed by<br />

Visio Wave.


2.1.5.1 Visio Wave [59]<br />

Visio Wave developed a solution on video content analysis based on <strong>the</strong> scheme below:<br />

Here <strong>the</strong>re are many cameras connected to a server that analyses <strong>the</strong> video. When<br />

<strong>the</strong>re is a problem, an alert is generated and sent to a PDA, PC or some o<strong>the</strong>r device<br />

and <strong>the</strong> end user who has <strong>the</strong> device can be connected directly to <strong>the</strong> remote camera<br />

and see what happen at this moment.


2.1.5.2 3rdeye - Video Surveillance on Your Mobile [60]<br />

3rdeye is a video surveillance on <strong>the</strong> mobile phone system developed by <strong>the</strong> Romanian<br />

company Cratima. With <strong>the</strong> help <strong>of</strong> a mobile phone and <strong>of</strong> 3rdeye system, you can view<br />

live images with any location watched by a video camera. This location can be your<br />

own home, <strong>of</strong>fice, vacation home, store or, even a parking space. The quality <strong>of</strong> <strong>the</strong><br />

images is high, thanks to <strong>the</strong> GPRS transmission mode.<br />

3rdeye’s Architecture<br />

3rdeye consists in two applications:<br />

• The video server (to which <strong>the</strong> monitoring video cameras are connected)<br />

• The client application, which runs on <strong>the</strong> user's mobile phone.<br />

The server application is also divided by two components: <strong>the</strong> Video Grabbing Server –<br />

that receives <strong>the</strong> images straight from <strong>the</strong> video cameras and that sends <strong>the</strong>se images<br />

to <strong>the</strong> Video Streaming Server. This last component is responsible with properly<br />

sending <strong>the</strong> received images onto <strong>the</strong> client application on <strong>the</strong> mobile phone.<br />

Between <strong>the</strong> two basic s<strong>of</strong>tware modules <strong>of</strong> <strong>the</strong> Video Surveillance Server takes place a<br />

two-way exchange <strong>of</strong> information.<br />

The Video Grabbing Server grabs video images from <strong>the</strong> video cameras and sends<br />

<strong>the</strong>m, in digital format, to <strong>the</strong> Video Streaming Server (in order to prepare <strong>the</strong> video<br />

streams for <strong>the</strong> clients), while <strong>the</strong> Video Streaming Server sends back to <strong>the</strong> Video<br />

Grabbing Server <strong>the</strong> commands and control info, received from <strong>the</strong> client application.<br />

The client application can be configured to connect both to Video Surveillance Servers<br />

that have fixed IP address, or dynamically allocated (e.g. Dial-up). When <strong>the</strong> server has<br />

a fixed IP address, <strong>the</strong> client application will connect straight to <strong>the</strong> Video Surveillance<br />

Server.


When <strong>the</strong> server's IP address is dynamically allocated (different IP address from one<br />

connexion to ano<strong>the</strong>r), <strong>the</strong> client application will first interrogate <strong>the</strong> Fixed IP Address<br />

Server, permanently connected to <strong>the</strong> Internet, in order to obtain <strong>the</strong> IP address <strong>of</strong> <strong>the</strong><br />

Video Surveillance Server to which it is about to connect.<br />

3rdeye’s Functionality<br />

3rdeye allows you to watch in real time, on your Java enabled mobile phone (not<br />

necessarily 3G), <strong>the</strong> images provided by <strong>the</strong> video cameras connected to <strong>the</strong> video<br />

server and controls <strong>the</strong> position <strong>of</strong> <strong>the</strong> video cameras (Pan/Tilt/Zoom).


The connection to <strong>the</strong> video server is made through <strong>the</strong> Internet, using a GPRS<br />

connection (not necessarily, as already <strong>state</strong>d, a 3G connection; nei<strong>the</strong>r a “sm<strong>art</strong><br />

phone”).The received image can be presented both full screen/normal view and has<br />

multiple display modes: full frame, 1:2, 1:1 (in this case, <strong>the</strong> application is designed to<br />

have a scroll and an auto detection feature).<br />

The moment a client application connects to <strong>the</strong> server, it <strong>the</strong>n right away sends <strong>the</strong><br />

server info regarding <strong>the</strong> maximum size <strong>of</strong> <strong>the</strong> phone's display, so <strong>the</strong> server<br />

automatically adjusts <strong>the</strong> video images (width x height). Using an advanced technology<br />

(developed by Cratima S<strong>of</strong>tware), based on motion detection and tracking proprietary<br />

algorithm, <strong>the</strong> Video Grabbing Server records all <strong>the</strong> events occurred along with <strong>the</strong><br />

adequate motion images.<br />

All <strong>the</strong> recorded events can be viewed from <strong>the</strong> client's 3rdeye mobile phone<br />

application.<br />

3rdeye’s Applicability<br />

3rdeye has multiple usages and, being developed from one end to ano<strong>the</strong>r by Cratima, can be<br />

customized for every client's needs:<br />

Managing employee conduct and duties from remote locations.<br />

Off site monitoring <strong>of</strong> homes, cottages, shops, <strong>of</strong>fices, factories, warehouses, cars, boats.<br />

Child care monitoring for development at home, nurseries, kinderg<strong>art</strong>ens and schools or<br />

observing <strong>the</strong> well-being <strong>of</strong> senior citizens and disables.<br />

Pet/wea<strong>the</strong>r watch; snow or traffic condition; construction site video surveillance.<br />

Conclusion:<br />

From our study, we can conclude that for <strong>the</strong> moment <strong>the</strong>re is no solution for video content<br />

analysis on <strong>the</strong> mobile phone. The solutions which exist now allow modifying, managing media.<br />

There exist solutions which mobile phone in <strong>the</strong>ir platform but Content analysis is done away.<br />

2.1.6 Interactive composition and scene mixing<br />

The scene is <strong>the</strong> composition <strong>of</strong> <strong>the</strong> audiovisuals elements that are shown to <strong>the</strong> user. Initially<br />

it’s generated in <strong>the</strong> corresponding server. The user, interacting with its device, will actuate on<br />

<strong>the</strong> scene elements updating it depending <strong>of</strong> his preferences. The scene composition, and<br />

<strong>the</strong>refore <strong>the</strong> way <strong>of</strong> interacting with it, can be done on different ways.<br />

Composition or mixing in <strong>the</strong> server<br />

The scene and all <strong>the</strong> components that compose it are mixed in one unique stream that is sent<br />

to <strong>the</strong> client.<br />

When <strong>the</strong> user interacts to <strong>the</strong> application to modify <strong>the</strong> scene, <strong>the</strong> server receives <strong>the</strong> proper<br />

orders to compose <strong>the</strong> scene again and send it in one only stream for every user.<br />

The bandwidth is proportional to <strong>the</strong> number <strong>of</strong> user, because every user is served with a video<br />

stream specially codified for him.


It’s needed a potent video server able to codify <strong>the</strong> video elements, compose <strong>the</strong>m and codify<br />

<strong>the</strong>m in real time. I <strong>the</strong>se terms, it must codify simultaneously as many flows as users.<br />

Actual technologies:<br />

Video edition tools. There are some tools able to make this complete process. However, <strong>the</strong>se<br />

tools are defined to make a video postproduction. Some <strong>of</strong> <strong>the</strong>m allow to generate video in real<br />

time for direct emissions. But all <strong>of</strong> <strong>the</strong>m have a graphic operator interface and lack <strong>of</strong><br />

programmatic interface (api) so <strong>the</strong>y are not valid to provide interactivity with <strong>the</strong> user.<br />

Decoding and mixing using a frame server. The frames mixing can be done with a frame server.<br />

The frame servers are oriented to video postproduction. But <strong>the</strong>re are some developments that<br />

allow a certain grade <strong>of</strong> personalization although <strong>the</strong> interactivity is limited.<br />

AviSynth is a frame server composed by some apis that can be used such as by <strong>the</strong> reproducer<br />

as <strong>the</strong> video server. In this case, it will have to be installed into <strong>the</strong> video server for VoD or into<br />

<strong>the</strong> multicast transmitter for TV channels.<br />

Composition or mixing in <strong>the</strong> client<br />

The videos that compose <strong>the</strong> scene are sent as independent streams to <strong>the</strong> user and <strong>the</strong><br />

client device mix <strong>the</strong> video flows.<br />

When <strong>the</strong> user interacts with <strong>the</strong> application/reproducer to modify <strong>the</strong> scene, <strong>the</strong> server<br />

just receives <strong>the</strong> flow control requests <strong>of</strong> <strong>the</strong> user streams.<br />

The bandwidth is proportional to <strong>the</strong> number <strong>of</strong> users, multiply by <strong>the</strong> streams number<br />

that every user is visualizing.<br />

This solution is valid for multicast environments. This is because is not codified a stream<br />

for every user.<br />

It’s necessary a video server with <strong>the</strong> capacity <strong>of</strong> having such video processes as users<br />

multiplied by <strong>the</strong> videos number that every user can reproduce simultaneously.<br />

Actual technologies:<br />

• Decoding and mixing using a frame server.<br />

AviSynth is a frame server that needn’t a graphic interface. It can be used with a<br />

reproducer using a script in <strong>the</strong> client.<br />

• VRML (Virtual Reality Modeling Language) or X3D.<br />

VRML made possible to visualize 3D scenes (with contents) in <strong>the</strong> web. However, <strong>the</strong><br />

remote access to big and complex scenes where <strong>the</strong> bandwidth is limited is a lack<br />

until <strong>the</strong> user can interact with <strong>the</strong> scene elements and manage it.<br />

BIFS, being a comprised binary format that is encapsulated and sent with streaming,<br />

reduce this lack so <strong>the</strong> user can interact with <strong>the</strong> scene elements that are available<br />

improving <strong>the</strong> user experience.<br />

• MPEG-4: BIFS and LASeR.<br />

BIFS is <strong>the</strong> scene description MPEG-4 protocol to compose MPEG-4 objects,<br />

describe <strong>the</strong> interaction between <strong>the</strong>n and animate <strong>the</strong>m. BIFS is a binary format for<br />

2D or 3D content.<br />

LASeR is <strong>the</strong> proposed protocol in MPEG-4 standard to provide similar capacities to<br />

BIFS for devices with fewer capacities such as PDA’s and mobiles.


Composition or mixing in <strong>the</strong> client and server<br />

This is a mixed approximation, trying to take <strong>the</strong> best <strong>of</strong> every alternative.<br />

This alternative consists on codify several independent element <strong>of</strong> <strong>the</strong> scene in one. This<br />

must be done with <strong>the</strong> elements that don’t require separated interaction with every one <strong>of</strong><br />

<strong>the</strong>m. This reduces <strong>the</strong> streams number per user (so that <strong>the</strong> bandwidth) not reducing <strong>the</strong><br />

interaction possibilities.<br />

This codification should be done just one time and must be stored for a later use. In this<br />

way, it will be guarantied that <strong>the</strong> server won’t need a high processing capacity and <strong>the</strong><br />

interactivity won’t be penalized by <strong>the</strong> delays.


2.2 UI <strong>of</strong> services for IP-enabled TV and Set-Top Boxes<br />

2.2.1 On-line services for IP-enabled TV and Set-Top Boxes<br />

Services which can be directly rendered on IP-enabled TV screens are based on<br />

HTML/XML technology. Service providers are able to define <strong>the</strong> layout and style <strong>of</strong> <strong>the</strong><br />

user interface <strong>of</strong> <strong>the</strong>ir services (infotainment, travel, shopping, etc.).<br />

• CE-HTML is a remote UI protocol with a core based on XHTML for such services<br />

which is developed by CEA and adopted by DLNA [ref CEA-2014]. Version 1.0 is<br />

available since June 2006. It allows existing Internet-content to be easily re-purposed<br />

for a variety <strong>of</strong> CE devices (see also device based UI adaptation). Content-based<br />

adaptations to <strong>the</strong> user interface can be communicated via this protocol.<br />

• T-navi is an information service based on IP (conforms to HTML 4.0) developed by<br />

Matsushita. The T-Navi services are only available in Japan since 2006. T-Navi<br />

enabled sets are available from Panasonic (Viera series) and Toshiba.<br />

• acTVila, a successor to T-Navi, that will be launched in Japan in February 2007 with<br />

a IP and HTML television service combining text-based information with video and<br />

plans on providing a streaming based video on demand service by <strong>the</strong> end <strong>of</strong> 2007.<br />

acTVila service providers will have <strong>the</strong> freedom to create <strong>the</strong>ir own UI style by<br />

changing colors and website lay-out. E.g. Video-on-demand services can have a<br />

different layout as information-based services. Also brand specific logos or designs<br />

can be applied to <strong>the</strong> UI.<br />

2.2.2 Flash-based content adaptation in Set-Top Boxes<br />

NDS and Bluestreak toge<strong>the</strong>r bring middleware for set-top boxes to <strong>the</strong> market using<br />

UPnP multimedia streaming and Macromedia Flash as user interface engine. Flash<br />

allows <strong>the</strong> set-top box makers to add dynamic elements to <strong>the</strong> user interface (e.g.<br />

animations) adapted to different media categories (type <strong>of</strong> content) being watched. Also<br />

user-oriented UI adaptation can be supported: customization based on user preferences<br />

as well by choosing from a list <strong>of</strong> predefined skins.


3 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> Compression Algorithms<br />

3.1 Motion JPEG-2000 and Wireless (P<strong>art</strong> 11) JPEG-2000<br />

3.1.1 Introduction<br />

The JPEG-2000 standardization effort [1] demonstrated that <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> coding<br />

performance can be obtained in still-image compression with a coding architecture<br />

that enables a rich set <strong>of</strong> features for <strong>the</strong> compressed bitstream. In p<strong>art</strong>icular, unlike<br />

<strong>the</strong> previous JPEG standard, JPEG-2000 provided a precise rate-control mechanism<br />

based on embedded coding <strong>of</strong> wavelet coefficients. Moreover, multiple qualities and<br />

multiple resolutions <strong>of</strong> <strong>the</strong> same picture are possible within JPEG-2000 based on<br />

selective decoding <strong>of</strong> portions <strong>of</strong> <strong>the</strong> compressed bitstream. Additionally, it should be<br />

emphasized that, for image and video transmission over error-prone channels, <strong>the</strong><br />

embedded nature <strong>of</strong> JPEG-2000 allows for a layered content protection against <strong>the</strong><br />

channel errors [2].<br />

In <strong>the</strong> area <strong>of</strong> motion-compensated video compression, similar functionalities have<br />

long been pursued, mainly via <strong>the</strong> use <strong>of</strong> extensions <strong>of</strong> <strong>the</strong> basic MPEG coding<br />

structure [3]. In terms <strong>of</strong> related systems with immediate industrial applicability, i.e.<br />

scalable video coding standards, this resulted in <strong>the</strong> fine-granularity scalable video<br />

coding extension <strong>of</strong> MPEG-4 video (MPEG-4 FGS) [4]. However, MPEG-4 FGS left<br />

much to be desired. In p<strong>art</strong>icular, <strong>the</strong> compression efficiency <strong>of</strong> FGS was not as good<br />

as <strong>the</strong> equivalent non-scalable (baseline) coder. In addition, <strong>the</strong> use <strong>of</strong> <strong>the</strong><br />

conventional closed-loop video coding structure <strong>of</strong> MPEG-alike coders hindered <strong>the</strong><br />

scalability functionalities.<br />

As a result, recent research efforts on scalable video coding were targeted on<br />

extension <strong>of</strong> open-loop coding systems, such as JPEG-2000, to video coding.<br />

Although an extension <strong>of</strong> <strong>the</strong> basic technology <strong>of</strong> JPEG-2000 to three dimensions is a<br />

feasible task by extending its transform and coding modules to three dimensions [5] ,<br />

this does not guarantee <strong>the</strong> highest possible coding efficiency since motioncompensation<br />

tools are not included. Moreover, <strong>the</strong> end-to-end delay <strong>of</strong> such a<br />

coding system is substantially increased in comparison to <strong>the</strong> corresponding frameby-frame<br />

compression. Although <strong>the</strong> delay problem manifests itself in motioncompensated<br />

video coding as well, in this case <strong>the</strong> compression efficiency is<br />

significantly increased by <strong>the</strong> use <strong>of</strong> motion-compensated prediction. This may<br />

override <strong>the</strong> high-delay detriment in applications for which achieving a low end-to-end<br />

delay is not a critical issue.<br />

In this section, we present an overview <strong>of</strong> <strong>the</strong> fundamental tools behind scalable<br />

image and video coding that are suitable for transmission environments with losses.<br />

Our presentation is divided in two p<strong>art</strong>s: Section 2 is dedicated to <strong>the</strong> description <strong>of</strong><br />

<strong>the</strong> features <strong>of</strong> Motion JPEG-2000 and <strong>the</strong> upcoming Wireless JPEG-2000 standard<br />

as <strong>the</strong>y represent <strong>the</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> in intra-frame video coding for ideal and lossytransmission<br />

frameworks, respectively. Inter-frame video coding architectures<br />

involving motion compensated prediction are treated in Section 3.


3.1.2 Scope and Features <strong>of</strong> Motion JPEG-2000<br />

Motion JPEG-2000 (or MJPEG-2000) is an extension <strong>of</strong> <strong>the</strong> baseline (p<strong>art</strong> 1) JPEG-<br />

2000 standard that supports video data. Intra-frame coding is supported based on <strong>the</strong><br />

Embedded Block Coding with Optimized Truncation (EBCOT) algorithm <strong>of</strong> JPEG-<br />

2000 (i.e. without motion compensated prediction). Lossy and lossless compression<br />

is provided with one codec and for every video frame, similarly to JPEG-2000,<br />

scalability in resolution and quality is available from a single compressed bitstream.<br />

The input sample depth can be up to 32 bits per color component, while <strong>the</strong> maximum<br />

frame width and height is up to 32<br />

2 - 1 pixels. The output bitrate for each frame can<br />

be controlled based on a constant-bitrate (CBR) scheme. Alternatively, variablebitrate<br />

(VBR) schemes can be used, which provide uniform quality across time with<br />

high efficiency. For <strong>the</strong> integration <strong>of</strong> <strong>the</strong> various bitstreams in one stream, an MPEG-<br />

4 based file-format is used, which appropriately tags <strong>the</strong> various bitstreams to ensure<br />

correct synchronization <strong>of</strong> audio and video. This format provides <strong>the</strong> capability for<br />

metadata embedding and moreover multi-component, multisampling formats are<br />

supported, e.g. YUV 4:2:2, RGB 4:4:4, etc.<br />

In general, although intra-frame algorithms do not provide <strong>the</strong> highest coding<br />

efficiency for video data, MJPEG-2000 intra-frame coding provides important<br />

functionality requirements that are difficult to satisfy with inter-frame video coding<br />

based on motion-compensated prediction. For example, intra-frame coding greatly<br />

facilitates video editing, individual frame access, fast browsing with enhanced<br />

forward/backward capabilities, etc. In addition, in terms <strong>of</strong> complexity requirements<br />

and overall delay, intra-frame algorithms are always preferred over inter-frame<br />

algorithms since <strong>the</strong>y have lower memory requirements (typically up to only one input<br />

frame), no motion estimation/motion compensation is performed at <strong>the</strong> encoder or<br />

decoder, and <strong>the</strong> maximum delay corresponds to delay incurred by <strong>the</strong> end-to-end<br />

processing <strong>of</strong> one input frame.<br />

3.1.3 Scope and Features <strong>of</strong> Wireless JPEG-2000<br />

Wireless JPEG-2000 (a.k.a. JPWL) [6] is an upcoming extension <strong>of</strong> <strong>the</strong> JPEG-2000<br />

standard. JPWL defines a set <strong>of</strong> tools and methods to achieve <strong>the</strong> efficient<br />

transmission <strong>of</strong> JPEG 2000 bitstreams over an error-prone wireless network. Wireless<br />

networks are characterized by <strong>the</strong> frequent occurrence <strong>of</strong> transmission errors along<br />

with a low bandwidth, henceforth putting strong constraints on <strong>the</strong> transmission <strong>of</strong><br />

digital images. Since JPEG-2000 provides high compression efficiency, it is a good<br />

candidate for wireless multimedia applications. Moreover, due to its high scalability,<br />

JPEG-2000 enables a wide range <strong>of</strong> quality <strong>of</strong> service (QoS) strategies for network<br />

operators. However, to be suitable for wireless multimedia applications, JPEG-2000<br />

has to be robust to transmission errors.<br />

The baseline JPEG-2000 standard defines error resilience tools to improve<br />

performances over noisy channels. However, <strong>the</strong>se tools only detect where errors<br />

occur, conceal <strong>the</strong> erroneous data, and resynchronize <strong>the</strong> decoder. More specifically,<br />

<strong>the</strong>y do not correct transmission errors.


Fur<strong>the</strong>rmore, <strong>the</strong>se tools do not apply to <strong>the</strong> image headers, which are <strong>the</strong> most<br />

important p<strong>art</strong>s <strong>of</strong> <strong>the</strong> codestream. For <strong>the</strong>se reasons, <strong>the</strong>y are not sufficient in <strong>the</strong><br />

context <strong>of</strong> wireless transmissions.<br />

JPWL system description.<br />

For <strong>the</strong> purpose <strong>of</strong> efficient transmission over wireless networks, JPWL defines o<strong>the</strong>r<br />

mechanisms for error protection and correction. These mechanisms extend <strong>the</strong><br />

elements in <strong>the</strong> core coding system described in baseline (P<strong>art</strong> 1) JPEG-2000. These<br />

extensions are backward compatible in <strong>the</strong> sense that decoders which implement<br />

P<strong>art</strong> 1 are able to decode <strong>the</strong> p<strong>art</strong> <strong>of</strong> <strong>the</strong> data that conforms to P<strong>art</strong> 1 while skipping<br />

<strong>the</strong> extensions defined by JPWL.<br />

The JPWL system is illustrated in <strong>the</strong> figure above [6]. Basically, JPWL provides a<br />

generic file-format for robust transmission <strong>of</strong> JPEG-2000 bitstreams over error-prone<br />

networks without being linked to a specific network, error-resilient coder or transport<br />

protocol. Additionally, <strong>the</strong> JPWL provides a generic format for <strong>the</strong> description <strong>of</strong> <strong>the</strong><br />

degree <strong>of</strong> sensitivity to transmission errors <strong>of</strong> <strong>the</strong> different p<strong>art</strong>s <strong>of</strong> <strong>the</strong> bitstream, and<br />

a generic format for <strong>the</strong> description <strong>of</strong> <strong>the</strong> locations <strong>of</strong> residual errors in <strong>the</strong><br />

codestream.<br />

Thus basically, <strong>the</strong> JPWL standard signals <strong>the</strong> use <strong>of</strong> informative tools in order to<br />

protect <strong>the</strong> codestream against transmission errors. These tools include techniques<br />

such as error resilient entropy coding, FEC codes, UEP and data<br />

p<strong>art</strong>itioning/interleaving. It is important to point out that <strong>the</strong>se informative tools are not<br />

defined in <strong>the</strong> standard. Instead, <strong>the</strong>y are registered with <strong>the</strong> JPWL registration<br />

authority. Upon registration, each tool is assigned an ID, which uniquely identifies it.<br />

When encountering a JPWL codestream, <strong>the</strong> decoder can identify <strong>the</strong> tool(s) which<br />

have been used to protect this codestream by parsing <strong>the</strong> standardized JPWL<br />

markers and by querying <strong>the</strong> registration authority. The decoder can <strong>the</strong>n take <strong>the</strong><br />

appropriate steps to decode <strong>the</strong> codestream, e.g. acquire or download <strong>the</strong><br />

appropriate error-resilience tool.


3.1.4 Video Coding with Motion Compensated Prediction<br />

In this section, we review <strong>the</strong> conventional closed-loop video coding structure as well<br />

as <strong>the</strong> recently-introduced open-loop video coding schemes that perform a temporal<br />

decomposition using motion compensated temporal filtering. Both have been used in<br />

related literature [3] [7] to provide working video coding systems with scalability<br />

properties.<br />

All <strong>the</strong> currently-standardized video coding schemes are based on a structure in<br />

which <strong>the</strong> two-dimensional spatial transform and quantization is applied to <strong>the</strong> error<br />

frame coming from closed-loop temporal prediction. A simple structure describing<br />

such architectures is shown in <strong>the</strong> “Hybrid video compression scheme” figure (a) (see<br />

fur<strong>the</strong>r on). The operation <strong>of</strong> temporal prediction P typically involves block-based<br />

motion-compensated prediction. The decoder receives <strong>the</strong> motion vector information<br />

and <strong>the</strong> compressed error-frame C t and performs <strong>the</strong> identical loop using this<br />

information in order to replicate MCP within <strong>the</strong> P operator. Hence, in <strong>the</strong> decoding<br />

process (seen in <strong>the</strong> dashed area in <strong>the</strong> “The hybrid video compression scheme”<br />

figure (a), <strong>the</strong> reconstructed frame at time instant t can be written as:<br />

° ° - 1 - 1 ° - 1 - 1<br />

At = PAt-1 + TS QS Ct, A0 = TS Q S C0.<br />

(0.1)<br />

The recursive operation given by (0.1) creates <strong>the</strong> well-known drift effect between <strong>the</strong><br />

encoder and decoder if different information is used between <strong>the</strong> two sides, i.e. if<br />

Ct ¹ QST SHt at any time instant t in <strong>the</strong> decoder. This is not uncommon in practical<br />

systems, since transmission errors or loss <strong>of</strong> compressed data due to limited channel<br />

capacity can be a dominant scenario in wireless or IP-based networks, where a<br />

number <strong>of</strong> clients compete for <strong>the</strong> available network resources. In general, <strong>the</strong><br />

capability to seamlessly adapt <strong>the</strong> compression bitrate without transcoding, i.e. SNR<br />

scalability, is a very useful feature for such network environments. Solutions for SNR<br />

scalability based on <strong>the</strong> coding structure <strong>of</strong> <strong>the</strong> “Hybrid video compression scheme”<br />

figure basically try to remove <strong>the</strong> prediction drift by <strong>art</strong>ificially reducing at <strong>the</strong> encoder<br />

side <strong>the</strong> bitrate <strong>of</strong> <strong>the</strong> compressed information C t to a base layer for which <strong>the</strong><br />

network can guarantee <strong>the</strong> correct transmission [3]. An example <strong>of</strong> such a codec is<br />

<strong>the</strong> MPEG-4 FGS [4].<br />

This however reduces <strong>the</strong> prediction efficiency [3], <strong>the</strong>reby leading to degraded<br />

coding efficiency for SNR scalability. To overcome this drawback, techniques that<br />

include a certain amount <strong>of</strong> enhancement layer information into <strong>the</strong> prediction loop<br />

have been proposed. For example, leaky prediction [8] gracefully decays <strong>the</strong><br />

enhancement information introduced in <strong>the</strong> prediction loop in order to limit <strong>the</strong> error<br />

propagation and accumulation. Scalable coding schemes employing this technique<br />

achieve notable coding gains over <strong>the</strong> standard MPEG-4 FGS [4] and a good trade<strong>of</strong>f<br />

between low drift errors and high coding efficiency [8] [9]. Progressive Fine<br />

Granularity Scalable (PFGS) coding [10] yields also significant improvements over<br />

MPEG-4 FGS by introducing two prediction loops with different quality references. A<br />

generic PFGS coding framework employing multiple prediction loops with different<br />

quality references and careful drift control lead to considerable coding gains over<br />

MPEG-4 FGS, as reported in [11] [12].


To address <strong>the</strong> issues <strong>of</strong> efficient video transmission, several proposals suggested an<br />

open-loop system, depicted in <strong>the</strong> “Motion-compensated temporal filtering“ figure (b)<br />

(see fur<strong>the</strong>r on), which incorporates recursive temporal filtering. This can be<br />

perceived as a temporal wavelet transform with motion compensation [13], i.e.<br />

motion-compensated temporal filtering. This scheme begins with a separation <strong>of</strong> <strong>the</strong><br />

input into even and odd temporal frames (temporal split). Then <strong>the</strong> temporal predictor<br />

performs MCP to match <strong>the</strong> information <strong>of</strong> frame A 2t+ 1 with <strong>the</strong> information present in<br />

frame A 2t . Subsequently, <strong>the</strong> MCU operator U inverts <strong>the</strong> information <strong>of</strong> <strong>the</strong><br />

prediction error back to frame A 2t , <strong>the</strong>reby producing, for each pair <strong>of</strong> input frames,<br />

an error frame H t and an updated frame L t . The MCU operator performs ei<strong>the</strong>r<br />

motion compensation using <strong>the</strong> inverse vector set produced by <strong>the</strong> predictor [14], or<br />

generates a new vector set by backward motion estimation [15]. The process iterates<br />

on <strong>the</strong> L t frames, which are now at half temporal-sampling rate (following <strong>the</strong><br />

multilevel operation <strong>of</strong> <strong>the</strong> conventional lifting), <strong>the</strong>reby forming a hierarchy <strong>of</strong><br />

temporal levels for <strong>the</strong> input video. The decoder performs <strong>the</strong> mirror operation: <strong>the</strong><br />

scheme in <strong>the</strong> “Motion-compensated temporal filtering“ figure (b) operates from right<br />

to left, <strong>the</strong> signs <strong>of</strong> <strong>the</strong> P , U operators are inverted and a temporal merging occurs<br />

at <strong>the</strong> end to join <strong>the</strong> reconstructed frames. As a result, having performed <strong>the</strong><br />

reconstruction <strong>of</strong> <strong>the</strong> L t , denoted by L ° t , at <strong>the</strong> decoder we have:<br />

° ° - 1 - 1 ° ° - 1 - 1<br />

A 2t = Lt - UT Q C , A2t+ 1 = P A2t + T Q C<br />

(0.2)<br />

S S t S S t<br />

where A° 2t, A ° 2t+ 1 denote <strong>the</strong> reconstructed frames at time instants 2t , 2t + 1.<br />

As<br />

seen from (0.2), even if Ct ¹ QST SHt in <strong>the</strong> decoder, <strong>the</strong> error affects locally <strong>the</strong><br />

reconstructed frames A° 2t, A ° 2t+ 1 and does not propagate linearly in time over <strong>the</strong><br />

reconstructed video. Error-propagation may occur only across <strong>the</strong> temporal levels<br />

through <strong>the</strong> reconstructed L ° t frames. However, after <strong>the</strong> generation <strong>of</strong> <strong>the</strong> temporal<br />

decomposition, embedded coding may be applied in each group-<strong>of</strong>-frames GOP by<br />

prioritizing <strong>the</strong> information <strong>of</strong> <strong>the</strong> higher temporal levels based on a dyadic-scaling<br />

framework, i.e. following <strong>the</strong> same principle <strong>of</strong> prioritization <strong>of</strong> information used in<br />

wavelet-based SNR-scalable image coding [6]. Hence, <strong>the</strong> effect <strong>of</strong> error propagation<br />

in <strong>the</strong> temporal pyramid is limited and seamless video-quality adaptation can be<br />

obtained in SNR scalability [7] [16]. In fact, experimental results obtained with <strong>the</strong><br />

SNR-scalable MCTF video coders, as well as <strong>the</strong> results obtained with o<strong>the</strong>r <strong>state</strong>-<strong>of</strong><strong>the</strong>-<strong>art</strong><br />

algorithms [17] [18], suggest that this coding architecture can be comparable<br />

in rate-distortion sense to an equivalent non-scalable coder that uses <strong>the</strong> closed-loop<br />

structure. However, one significant disadvantage <strong>of</strong> this type <strong>of</strong> techniques for realtime<br />

communications concerns <strong>the</strong> end-to-end codec delay. In p<strong>art</strong>icular, following<br />

<strong>the</strong> analysis <strong>of</strong> [19], it can be shown that for a GOP <strong>of</strong> N (where N is typically 16 or<br />

32 for a frame-rate <strong>of</strong> 30 or 60 frames-per-second, respectively), <strong>the</strong> required end-to-<br />

N end delay in terms <strong>of</strong> number <strong>of</strong> decoded frames can be as high as 2 + 1 frames.


A +<br />

+<br />

A +<br />

+<br />

t<br />

-<br />

P<br />

TS S Q TS S Q<br />

Frame<br />

Delay<br />

+<br />

+ +<br />

A<br />

Q<br />

T<br />

t<br />

(a) The hybrid video compression scheme.<br />

A , A +<br />

2 t 2 t 1<br />

A 2 t + 1<br />

Temporal<br />

+<br />

-<br />

+<br />

− 1<br />

S<br />

− 1<br />

S<br />

Split P U<br />

A<br />

2t<br />

(b) Motion-compensated temporal filtering.<br />

Notations:<br />

+<br />

+<br />

A H<br />

0 , 0 t , t<br />

C<br />

t<br />

T S Q S<br />

H<br />

C<br />

+ t L<br />

+ t L<br />

At consists <strong>the</strong> input video frame at time instant t = 0, t, 2 t, 2t + 1<br />

A ° t is <strong>the</strong> reconstructed frame<br />

H t is <strong>the</strong> error frame, whereas L t is <strong>the</strong> updated frame<br />

C t denotes <strong>the</strong> transformed and quantized error frame obtained by using <strong>the</strong> spatial operators T S and Q S ,<br />

respectively<br />

P denotes temporal prediction<br />

U denotes <strong>the</strong> temporal update.<br />

Our description on motion-compensated <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> video coders is concluded<br />

with <strong>the</strong> presentation <strong>of</strong> two indicative coding systems that represent <strong>the</strong> current<br />

<strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> in <strong>the</strong> closed-loop and open-loop temporal prediction structures,<br />

namely <strong>the</strong> Advanced Video Coder (AVC), also called as <strong>the</strong> H.264 coder, which was<br />

jointly standardized by MPEG and ITU-T [20], and <strong>the</strong> motion-compensated<br />

embedded zero-block coder (MC-EZBC) <strong>of</strong> [17]. While <strong>the</strong> AVC is a non-scalable<br />

coding scheme, optimized for a certain set <strong>of</strong> quantization parameters, <strong>the</strong> MC-EZBC<br />

has <strong>the</strong> capability <strong>of</strong> simultaneous scalability in bitrate, resolution and SNR.<br />

t<br />

t


3.2 Codification technologies<br />

3.2.1 Introduction<br />

Video codification is a necessary element for compressing <strong>the</strong> video size making <strong>the</strong><br />

most possible <strong>of</strong> <strong>the</strong> capabilities <strong>of</strong> net and storage.<br />

MPEG (Moving Picture Expert Group) is an ISO/IEC work group in charge <strong>of</strong><br />

development standard for audio and video codification. The first standard, MPEG-1,<br />

was <strong>the</strong> basis for codification formats such as Video CD and MP3. After, <strong>the</strong> definition<br />

<strong>of</strong> standard MPEG-2 was <strong>the</strong> basis <strong>of</strong> products like DVD and digital TV set-top boxes.<br />

The last standard that has been defined is MPEG-4. MPEG-4 is a multimedia<br />

standard for wire and wireless nets. MPEG-4 has been defined for representing<br />

audio-visual real and virtual objects. Moreover, MPEG-7 has been created to describe<br />

and locate audio-visual contents, and MPEG-21: <strong>the</strong> multimedia framework.<br />

3.2.2 MPEG-1 and MPEG-2<br />

MPEG-1<br />

Used for video streaming. It has a bit rate <strong>of</strong> 1,5 Mbit/s approximately. Oriented to<br />

digital storage specially for CD-ROM.<br />

MPEG-2<br />

This is an advanced video compression technique to generate better bit rates and<br />

better compression. It let codification <strong>of</strong> gradual and entwined video sequences until<br />

HDTV level.<br />

The most important audio codec defined in <strong>the</strong> MPEG-2(P<strong>art</strong> 7) standard is AAC<br />

(Advanced Audio Code). AAC defines a format for multi-channel audio codification.<br />

This format wins similar qualities than o<strong>the</strong>rs codec with a bit rate better.<br />

Video MPEG-2 is a video compression standard with bit rates between 4 and 10<br />

Mbit/s. It defines 5 pr<strong>of</strong>iles, referred to complexity <strong>of</strong> compression algorithm, and 4<br />

levels referred to <strong>the</strong> resolution <strong>of</strong> original video. The main level and <strong>the</strong> main pr<strong>of</strong>ile<br />

(MP/ML) is <strong>the</strong> most used combination.<br />

Systems MPEG-2 define two multiplexation systems: ”Program Stream” compatible<br />

with MPEG-1 and “Transport Stream” that lets to send multiples streams with<br />

independent origins.<br />

MPEG-2 is <strong>the</strong> most successful standard for multimedia representation in market. The<br />

digital entertainment uses mainly MPEG-2.<br />

The most conceptual innovation in MPEG-2 is <strong>the</strong> video scalable codification.


MPEG-4 defines new functionality and new capacities, probably, <strong>the</strong> future standard<br />

for multimedia applications.<br />

MPEG-4 adds an important conceptual advance in <strong>the</strong> representation <strong>of</strong> multimedia<br />

contents: <strong>the</strong> model <strong>of</strong> representation based on objects. This new model considers<br />

that <strong>the</strong> audio-visual contents describe a world composed by elements called objects.<br />

The audio-visual scene is <strong>the</strong> composition <strong>of</strong> independent objects, each one with its<br />

own codification, characteristics and behaviours. If <strong>the</strong> elements are encoded<br />

individually, <strong>the</strong>y will be accessed in a individual way. This architecture provides a<br />

complete range <strong>of</strong> interactive possibilities.<br />

MPEG-4 has <strong>the</strong> characteristics <strong>of</strong> MPEG-1 and MPEG-2 with a better video<br />

codification and adding new characteristics like advance 3D graphical support<br />

(textures, animations, etc.) for 3d scenes, files oriented to objects(audio, video, 3D<br />

objects, streaming text), support for DRM (Digital Rights Management).<br />

3.2.3 MPEG4<br />

3.2.3.1 MPEG-4 architecture<br />

The following p<strong>art</strong>s compose <strong>the</strong> MPEG-4 architecture:<br />

• MPEG-4 Systems: Specifies <strong>the</strong> global architecture <strong>of</strong> <strong>the</strong> standard and defines<br />

how integer Visual MPEG-4 and Audio MPEG-4. MPEG-4 Systems introduce <strong>the</strong><br />

concept <strong>of</strong> BIFS (Binary Format for scenes). BIFS defines <strong>the</strong> interaction between<br />

objects.<br />

• DMIF: Delivery Multimedia Integration Framework. This p<strong>art</strong> defines <strong>the</strong><br />

advanced content streaming or “Rich Media”.<br />

• Visual MPEG-4: This p<strong>art</strong> defines <strong>the</strong> nature and syn<strong>the</strong>tic video content<br />

representation.<br />

• Audio MPEG-4: This p<strong>art</strong> defines <strong>the</strong> nature and syn<strong>the</strong>tic audio content<br />

representation.<br />

3.2.3.2 CODECS (MPEG-4 Visual y MPEG-4 Audio)<br />

A codec (COder – DECoder) is <strong>the</strong> algorithm that defines how to encode and<br />

decode <strong>the</strong> video and audio content to reduce its size or its necessary band width<br />

for transmission, with <strong>the</strong> minimum lose <strong>of</strong> quality as possible.<br />

The audio codec, MPEG-4 AAC(advanced Audio Codec), is an extension <strong>of</strong> MPEG-<br />

2 AAC (MPEG-2 P<strong>art</strong> 7).<br />

The main video codecs are:<br />

• The codecs included in <strong>the</strong> standard p<strong>art</strong> 2, specially <strong>the</strong> ones bind to <strong>the</strong> simple<br />

pr<strong>of</strong>iles (SP) and advanced simple (ASP).<br />

• 264/AVC (Advanced Video Coding)/MPEG-4 P<strong>art</strong> 10. MPEG-4 AVC allows an<br />

efficient video compression much better than <strong>the</strong> o<strong>the</strong>rs, providing more flexibility<br />

for applications.


MPEG-4 Scalable Video Coding (SVC) is a future extension <strong>of</strong> <strong>the</strong> standard MPEG-<br />

4 AVC. SVC uses <strong>the</strong> same video streaming (an unique content codification) for<br />

different devices in different nets. SVC provides scalability in three aspects:<br />

• Space scalability: suitable resolution.<br />

• Time scalability: Selecting frame rate.<br />

• Quality scalability: selecting bit rate.<br />

MPEG-4 SVC generates a compatible layer with MPEG-4 AVC, and one or more<br />

additional layers. The base layer contains <strong>the</strong> minimum quality, frame rate and<br />

resolution, and <strong>the</strong> following layers increase <strong>the</strong> quality and/or resolution and/or<br />

frame rate.<br />

3.2.3.3 MPEG-4 Systems (BIFS)<br />

Exploring o<strong>the</strong>r possibilities for advanced devices, it has been demonstrated that<br />

ISO/IEC 14496-11 “Scene description and application engine” (also known as BIFS)<br />

is ano<strong>the</strong>r good alternative, but it’s necessary an interoperability to create <strong>the</strong>se two<br />

formats, at <strong>the</strong> same time, to produce ISO/IEC 14496-20 and ISO/IEC 14496-11<br />

simultaneously.<br />

ISO/IEC 14496-11 specifies <strong>the</strong> coded representation <strong>of</strong> interactive audio-visual<br />

scenes and applications.<br />

It specifies <strong>the</strong> following tools:<br />

The coded representation <strong>of</strong> <strong>the</strong> space-temporal positioning <strong>of</strong> audio-visual objects<br />

as well as <strong>the</strong>ir behaviour in response to interaction (scene description)<br />

The coded representation <strong>of</strong> syn<strong>the</strong>tic two-dimensional (2D) or three-dimensional<br />

(3D) objects that can be manifested audibly and/or visually<br />

The Extensible MPEG-4 Textual (XMT) format, a textual representation <strong>of</strong> <strong>the</strong><br />

multimedia content described in ISO/IEC 14496 using <strong>the</strong> Extensible Markup<br />

Language (XML) and a system level description <strong>of</strong> an application engine (format,<br />

delivery, lifecycle, and behaviour <strong>of</strong> downloadable Java byte code applications).<br />

3.2.3.4 MPEG-4 P<strong>art</strong> 20 (LASeR and SAF [44])<br />

Because <strong>of</strong> resource limitations <strong>of</strong> mobiles, sm<strong>art</strong>phones, PDA’s, SetTopBoxes and<br />

older desktop or portable PCs, we need to optimize requirements to accommodate<br />

all devices into one compatible format that permits this interoperability across<br />

different cases. For all this, we are exploring all emerging audio, video and streams<br />

formats to find <strong>the</strong> best choice.


It seems that <strong>the</strong> actual best choice would be ISO/IEC 14496 (also known as<br />

MPEG-4) and <strong>the</strong>ir primary p<strong>art</strong>s: ISO/IEC 14496-1 “Systems” [45], ISO/IEC 14496-<br />

2 “Visual” [46], ISO/IEC 14496-3 “Audio” [47], ISO/IEC 14496-10 “Advanced Video<br />

Coding” [48] and ISO/IEC 14496-20 “Lightweight Application Scene Representation<br />

(LASeR) and Simple Aggregation Format (SAF)” [49]. Optionally, we analyse<br />

ISO/IEC 14496-11 “Scene description and application engine” (also known as BIFS)<br />

[50] but actually it can’t be adapted to <strong>the</strong> less power resources <strong>of</strong> limited devices<br />

like mobiles.<br />

The fundamental p<strong>art</strong> <strong>of</strong> this optimum formats we find are:<br />

• ISO/IEC 14496-20 which defines a scene description format (LASeR) and an<br />

aggregation format (SAF) suitable for representing and delivering rich-media<br />

services to resource-constrained devices such as mobile phones. A rich media<br />

service is a dynamic, interactive collection <strong>of</strong> multimedia data such as audio,<br />

video, graphics, and text. Services range from movies enriched with vector<br />

graphic overlays and interactivity (possibly enhanced with closed captions) to<br />

complex multi-step services with fluid interaction and different media types at<br />

each step.<br />

• LASeR aims at fulfilling all <strong>the</strong> requirements <strong>of</strong> rich-media services at <strong>the</strong> scene<br />

description level. LASeR supports:<br />

o An optimized set <strong>of</strong> objects inherited from SVG to describe rich-media<br />

scenes.<br />

o A small set <strong>of</strong> key compatible extensions over SVG.<br />

o The ability to encode and transmit a LASeR stream and <strong>the</strong>n reconstruct<br />

SVG content.<br />

o Dynamic updating <strong>of</strong> <strong>the</strong> scene to achieve a reactive, smooth and<br />

continuous service.<br />

o Simple yet efficient compression to improve delivery and parsing times, as<br />

well as storage size, one <strong>of</strong> <strong>the</strong> design goals being to allow both for a direct<br />

implementation <strong>of</strong> <strong>the</strong> SDL as documented, as well as for a decoder<br />

compliant with ISO/IEC 23001-1 “Binary MPEG format for XML” to decode<br />

<strong>the</strong> LASeR bitstream.<br />

o An efficient interface with audio and visual streams with frame-accurate<br />

synchronization.<br />

o Use <strong>of</strong> any font format, including <strong>the</strong> OpenType industry standard and.<br />

o Easy conversion from o<strong>the</strong>r popular rich-media formats in order to leverage<br />

existing content and developer communities.<br />

Information taken from http://www.mpeg-laser.com<br />

Introduction<br />

LASeR is a scene description format, where a scene is a spatial, temporal and<br />

behavioral composition <strong>of</strong> audio media, visual media, graphics elements and text.<br />

LASeR is binary or compressed like BIFS or Flash, as opposed to textual scene<br />

descriptions such as XMT or VRML or SVG. LASeR stands for Lightweight<br />

Application Scene Representation.


SAF is a streaming-ready format for packaging scenes and media toge<strong>the</strong>r and<br />

streaming <strong>the</strong>m onto such protocols as HTTP/TCP. SAF services include:<br />

A simple multiplex for elementary streams (media, fonts or scenes).<br />

Synchronization and packaging signaling.<br />

SAF stands for Simple Aggregation Format. LASeR and SAF have been designed<br />

for use in mobile, interactive applications.<br />

Why LASeR?<br />

The decision to create yet ano<strong>the</strong>r standard for scene description was taken after a<br />

thorough survey <strong>of</strong> available open or de-facto standards: BIFS, Flash and SVGT.<br />

Pr<strong>of</strong>iling <strong>of</strong> BIFS was tried to create a small enough subset to be used on mobile<br />

phones, to no avail. Flash is proprietary and is too big for most mobiles. SVGT1.1 is<br />

getting some traction, but on one hand SVGT1.1 does not have AV interfaces or<br />

dynamicity, and on <strong>the</strong> o<strong>the</strong>r hand its successor, SVGT1.2, is still in flux, and while<br />

it will feature AV interfaces, it will still miss dynamicity, compression, streaming and<br />

is significantly heavier than SVGT1.1. Also, SVGT in general relies on a host <strong>of</strong><br />

o<strong>the</strong>r standards such as DOM, SMIL, ECMA-Script, XHTML and CSS, MIME<br />

multip<strong>art</strong>… and to manage such a pile <strong>of</strong> standards is a true challenge in terms <strong>of</strong><br />

interoperability.<br />

Why SAF?<br />

The decision to create yet ano<strong>the</strong>r standard for distribution <strong>of</strong> mobile content was<br />

taken after implementing and trying interactive services on small devices, based on<br />

RTP/RTSP or MP4/3GP download (progressive or not) on TCP/HTTP. In most<br />

cases, <strong>the</strong> need for a simpler, lighter solution was obvious. In order to package<br />

efficiently and download progressively or stream a scene with a few media, RTP is<br />

overkill, and MP4/3GP is not well suited to <strong>the</strong> job: MP4/3GP format is a file format,<br />

and it can only be used for progressive download by using special cases (moov<br />

atom in front <strong>of</strong> <strong>the</strong> file, media interleaved in time order). In addition, MP4/3GP has<br />

a host <strong>of</strong> features that burden a mobile implementation for no reason. In order to<br />

reduce <strong>the</strong> design time <strong>of</strong> SAF and get almost an immediate validation, SAF was<br />

designed around a simple configuration <strong>of</strong> a proven technology: <strong>the</strong> MPEG-4<br />

Systems Sync Layer. This enables as a bonus <strong>the</strong> availability <strong>of</strong> an RTP payload<br />

format for SAF for free with RFC3640.<br />

So as a summary, SAF has <strong>the</strong> minimal/optimal set <strong>of</strong> features for <strong>the</strong> job, and can<br />

be mapped easily to o<strong>the</strong>r transport mechanisms (RTP, MP4/3GP, MPEG-2 TS…).<br />

Requirements <strong>of</strong> LASeR<br />

The requirements which structure <strong>the</strong> design <strong>of</strong> LASeR are:<br />

1 Support efficient and compact representation <strong>of</strong> scene data supporting at least<br />

<strong>the</strong> subset <strong>of</strong> SVG T 1.1 object set functionality. (Today LASeR is aligned as<br />

much as possible with SVGT1.2).


2 Allow an easy conversion from o<strong>the</strong>r graphics formats (e.g. BIFS, SMIL/SVG,<br />

PDF, Flash, …).<br />

3 Provide efficient coding, to be suitable for <strong>the</strong> mobile environment.<br />

4 Allow separate streams for 2D and 3D content.<br />

5 Allow <strong>the</strong> representation <strong>of</strong> scalable scenes.<br />

6 Allow <strong>the</strong> representation <strong>of</strong> adaptable scenes, for use within <strong>the</strong> MPEG-21 DIA<br />

framework.<br />

7 Be extensible in an efficient manner.<br />

8 Allow small pr<strong>of</strong>iles definition.<br />

9 Allow <strong>the</strong> representation <strong>of</strong> error-resilient scenes.<br />

10 Allow encoding modes easily reconfigurable and signaled in band.<br />

11 Provide an optimal balance between compression efficiency and complexity<br />

and memory footprint <strong>of</strong> decoder and compositor code.<br />

12 Allow integer-only implementation <strong>of</strong> decoding and rendering.<br />

13 Allow to save/restore several scene <strong>state</strong>s. The saving and restoring shall be<br />

triggerable ei<strong>the</strong>r by <strong>the</strong> server or by <strong>the</strong> user.<br />

14 Allow low-complexity pr<strong>of</strong>iles implementable on Java MIDP platform.<br />

15 Allow <strong>the</strong> representation <strong>of</strong> differential scenes, i.e. scenes meant to build on top<br />

<strong>of</strong> ano<strong>the</strong>r scene.<br />

16 Allow interaction through available input devices, such as mobile keyboard or<br />

pen, and support <strong>the</strong> input <strong>of</strong> strings.<br />

17 Allow safe implementation <strong>of</strong> scene decoder.<br />

In addition, it is deemed crucial that LASeR is designed in such a way that<br />

implementations can:<br />

• Be as small as possible.<br />

• Be as fast as possible.<br />

• Require as small as possible runtime memory.<br />

• Be implementable at least p<strong>art</strong>ially in hardware.<br />

Requirements for Simple Aggregation Format (SAF)<br />

The requirements which structure <strong>the</strong> design <strong>of</strong> SAF are:<br />

1 Provide a simple aggregation mechanism for Access Units for various media in<br />

aggregated packets (Video, Audio, Graphics, Images, Text/Font…).<br />

2 Allow a synchronized presentation <strong>of</strong> <strong>the</strong> various media elements in a packet or<br />

a sequence <strong>of</strong> such aggregated packets.<br />

3 Be as bit efficient as possible.<br />

4 Be byte aligned.<br />

5 Be easily transported on popular interactive transport protocol (e.g. HTTP).<br />

6 Be easily mapped on popular streaming protocol (e.g. MPEG-4 RTP payload<br />

format RFC 3640).<br />

7 Be extensible in an efficient manner.<br />

8 Allow <strong>the</strong> management <strong>of</strong> pre-loaded objects that enables <strong>the</strong> server to<br />

anticipate <strong>the</strong> downloading <strong>of</strong> <strong>the</strong> corresponding objects to improve user<br />

experience.


What is LASeR?<br />

LASeR is:<br />

• A SVGT scene tree, with an SVG rendering model.<br />

• An updating protocol, allowing actions on <strong>the</strong> scene tree such as inserting an<br />

object, deleting an object, replacing an object or changing a property: this is <strong>the</strong><br />

key to <strong>the</strong> design <strong>of</strong> dynamic services and a fluid user experience. This<br />

updating protocol can also be seen as a kind <strong>of</strong> micro-scripting language.<br />

• OpenType text and fonts, including downloadable/streamable fonts.<br />

• A binary encoding which, coupled with <strong>the</strong> updating protocol, allows <strong>the</strong><br />

incremental loading/streaming <strong>of</strong> scenes, with excellent bandwidth usage.<br />

• Few LASeR extensions to improve <strong>the</strong> support <strong>of</strong> input devices, or <strong>the</strong> flexibility<br />

<strong>of</strong> event processing without a full scripting language, or simple axis-aligned<br />

rectangular clipping.<br />

Because <strong>of</strong> <strong>the</strong> above, LASeR may also have:<br />

• A micro-DOM or JSR226 interface, since <strong>the</strong> scene tree is almost purely SVG,<br />

thus allowing <strong>the</strong> design <strong>of</strong> complete applications on top <strong>of</strong> <strong>the</strong> LASeR engine.<br />

• The micro-DOM interface also makes it possible to use ECMA-Script with<br />

LASeR scenes.<br />

• Because <strong>of</strong> <strong>the</strong> updating protocol which is similar to that <strong>of</strong> Flash, it is easy to<br />

convert Flash content to LASeR.<br />

What is SAF?<br />

SAF is:<br />

• A fixed configuration <strong>of</strong> <strong>the</strong> MPEG-4 Systems Sync Layer, providing an easy yet<br />

powerful way <strong>of</strong> packaging elementary streams.<br />

• A simplified stream description mechanism.<br />

• A simple multiplex for several media, fonts and scene streams.<br />

SAF streams may be:<br />

• Packaged in RTP/RSTP using <strong>the</strong> payload format defined in RFC3640.<br />

• Packaged in MP4/3GP files using a mapping defined with SAF.<br />

• Packaged in MPEG-2 Transport Stream using <strong>the</strong> SL mapping defined in<br />

ISO/IEC. 14496-8.<br />

Although it seems that this format has a patent fee, we think this was not a problem<br />

since it also seems to be our best solution. Actually we are waiting for <strong>the</strong> release<br />

<strong>of</strong> <strong>the</strong> final reference s<strong>of</strong>tware to check its viability and stability.


3.3 Additional formats for most power devices (future)<br />

3.3.1 VC1 [21]<br />

O<strong>the</strong>r aspects to introduce, in near future, are best resolutions to cover high definition<br />

contents without resource penalties. For all this, we find that video codec SMPTE<br />

421M “VC-1 Compressed Video Bitstream Format and Decoding Process“ (known as<br />

VC-1) is a great choice to cover also <strong>the</strong>se big resolutions.<br />

VC-1 minimizes <strong>the</strong> complexity <strong>of</strong> decoding high definition (HD) content through<br />

improved intermediate stage processing and more robust transforms. As a result, VC-<br />

1 decodes HD video twice as fast as H.264, while <strong>of</strong>fering two to three times better<br />

compression than MPEG-2.<br />

Since VC-1 is optimized for decoding performance, it ensures a superior playback<br />

experience across <strong>the</strong> widest possible array <strong>of</strong> systems regardless <strong>of</strong> bit rate or<br />

resolution. These systems range from <strong>the</strong> PC (where VC-1 playback at 1080p is<br />

possible), to set-top-boxes, gaming systems, and even wireless handsets.<br />

VC-1 <strong>of</strong>fers superior quality across a wide variety <strong>of</strong> content types and bit rates, which<br />

has been well documented by independent sources:<br />

• DV Magazine found VC-1 to be superior to both MPEG-2 and MPEG-4.<br />

• TANDBERG Television found VC-1 produces significantly better quality than MPEG-2<br />

and comparable quality to H.264. These results were presented at <strong>the</strong> 2003<br />

International Broadcasting Convention (IBC).<br />

• C'T Magazine, Germany's premier audio-video magazine, compared various codec<br />

standards—including VC-1, H.264, and MPEG-4—and selected VC-1 as producing<br />

<strong>the</strong> best subjective and objective quality for HD video.<br />

• The European Broadcasting Union (EBU) found VC-1 had <strong>the</strong> most consistent quality<br />

in tests that compared VC-1, RealMedia V9, <strong>the</strong> Envivio MPEG-4 encoder, and <strong>the</strong><br />

Apple MPEG-4 encoder.<br />

3.3.2 Device-oriented screens<br />

Analysing hundreds <strong>of</strong> devices by major manufacturers (Nokia, Sony-Ericsson, Motorola,<br />

Fujitsu-BenQ-Siemens, Samsung, Alcatel, Phillips, Acer, HP, Blackberry, Qtek (HTC),<br />

Palm…) we find that square pixel is <strong>the</strong> most used proportion to represent pixel<br />

information onto <strong>the</strong>ir screens (typical based upon TFT). Because <strong>the</strong>re are many<br />

sources (primary documentary and films) recorded into panoramic formats, we think that<br />

it is important to accommodate all <strong>the</strong>se into <strong>the</strong>ir original aspect format and reduce size<br />

<strong>of</strong> bitstream and <strong>the</strong>ir complexity.<br />

We think that is important to create automatically and simultaneously all formats into a<br />

dedicated servers to provide <strong>the</strong> same information, in real time. With all this, we can<br />

make a standard to transmit all information to all devices independently <strong>of</strong> <strong>the</strong>ir power,<br />

and control <strong>the</strong> total server power to create each channel.


And eventually, we find that <strong>the</strong>se next resolutions are desirable to take advantage <strong>of</strong> our<br />

analysed device physical screens:<br />

Aspect Lowest Low Medium High Highest Ultra HD 1* HD 2*<br />

M M M+P P+C P+C C+T C+T C+T<br />

4:3 128x96 176x132 240x180 320x240 480x360 640x480 --- ---<br />

16:9 128x72 176x99 240x135 320x180 480x270 640x360 1280x720 1920x1080<br />

*: Optionally for future.<br />

M: Mobiles & Sm<strong>art</strong>phones.<br />

P: PDA’s.<br />

C: Computers.<br />

T: TV & Advanced SetTopBoxes.<br />

For all this we consider we need a source minimum resolution <strong>of</strong> 640x480 for an aspect<br />

ratio <strong>of</strong> 4:3 and six simultaneous compressed (or live) streams to accommodate all<br />

possible devices; and 640x360 for an aspect ratio <strong>of</strong> 16:9 and six streams with actual<br />

requirements, or 1920x1080 to cover <strong>the</strong> maximum future HD and need eight<br />

simultaneous streams. Because <strong>of</strong> server resources consumption, we prefer to limit<br />

computer final resolutions and oversample source video image <strong>of</strong> destination computer to<br />

a possible big screen resolution.<br />

3.4 Analysis <strong>of</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong> <strong>art</strong> image compression algorithms for<br />

medical applications<br />

3.4.1 Still image compression such as JPEG, JPEG-LS and JPEG-2000<br />

Following results were obtained using optimized s<strong>of</strong>tware for JPEG-2K, JPEG-LS and<br />

lossless JPEG compression running on a Pentium IV 3 GHz. A set <strong>of</strong> grayscale<br />

medical images was compressed and <strong>the</strong> images were <strong>of</strong> size SXGA (1280x1024<br />

pixels).<br />

Performance measurements lossless mode<br />

CODEC Throughput<br />

Mbit/s<br />

Throughput<br />

Fps for<br />

SXGA<br />

Processing<br />

time / frame<br />

Average<br />

CR (1:x)<br />

Coded<br />

Stream BW<br />

Mbit/s<br />

JPEG2000 22 0.7 1420 ms 3.4 6.5<br />

JPEG-LS 62 2.0 500 ms 2.9 21.5<br />

Lossless<br />

JPEG<br />

230 7.3 137 ms 1.7 135.3


Performance measurements lossy mode<br />

CODEC Throughput<br />

Mbit/s<br />

Throughput<br />

Fps for SXGA<br />

Processing<br />

time /<br />

frame<br />

Average<br />

CR (1:x)<br />

JPEG2000 @<br />

10:1<br />

20 0.6 1667 ms 10 2<br />

JPEG2000 @<br />

20:1<br />

22 0.7 1429 ms 20 1.1<br />

JPEG @ 10:1 650 20.7 48 ms 10 65<br />

JPEG @ 20:1 800 25.4 39 ms 20 40<br />

Discussion<br />

Coded<br />

Stream<br />

BW<br />

Mbit/s<br />

There are two reasons why existing <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> still image compression<br />

algorithms are not suitable for our (realtime) application. First <strong>of</strong> all: <strong>the</strong> throughput<br />

(framerate) <strong>of</strong> <strong>the</strong>se compression algorithms is too low. JPEG2000 for instance only<br />

achieves an average <strong>of</strong> 0.7 frames per second, which is not acceptable. JPEG-LS<br />

and lossless JPEG perform better but still 7 frames per second is too low to allow<br />

fluent interaction between user and application and to show medical video<br />

sequences. The second reason is that <strong>the</strong> compression ratios <strong>of</strong> <strong>the</strong>se algorithms are<br />

still too low. Even <strong>the</strong> best algorithm (JPEG2000) only achieves an average<br />

compression ratio <strong>of</strong> 3.4 on medical images. Current wireless networks (802.11g)<br />

have a <strong>the</strong>oretical bandwidth <strong>of</strong> 54 Mbit per second but <strong>the</strong> actual throughput is more<br />

around 20 Mbit per second. If we want to transmit medical color images <strong>of</strong> size<br />

1600x1200 (which is ra<strong>the</strong>r low for medical imaging) <strong>the</strong>n <strong>the</strong> size per image is<br />

1600x1200x3= 5.760.000 bytes or 46.080.000 bits since <strong>the</strong>re are three color planes.<br />

This means that a compression ratio <strong>of</strong> 3.4 would only allow to send 46.080.000 bits/<br />

20 Mbit= 2.2 images per second over <strong>the</strong> wireless network. Again, this is too low to<br />

display medical video data and to allow fluent interaction between <strong>the</strong> user and <strong>the</strong><br />

medical s<strong>of</strong>tware application.<br />

Above discussion was for lossless compression. If we would switch to lossy<br />

compression <strong>the</strong>n <strong>the</strong> problem <strong>of</strong> low compression ratio is solved. However, at <strong>the</strong><br />

same time uncontrolled <strong>art</strong>ifacts and distortions are introduced in <strong>the</strong> medical images<br />

due to <strong>the</strong> lossy nature <strong>of</strong> <strong>the</strong> compression algorithms. This is absolutely<br />

unacceptable: up to today <strong>the</strong> general opinion in <strong>the</strong> medical imaging community is<br />

that it is not a priori allowed to apply lossy compression on medical images that will<br />

be used for diagnosis. Lossy compression in medical imaging is only allowed to<br />

reduce size <strong>of</strong> archived images or if one can prove that <strong>the</strong> lossy nature cannot<br />

influence <strong>the</strong> clinical image quality (which no one has been able to prove up to now).<br />

Also even with lossy compression, <strong>the</strong>re is still <strong>the</strong> problem <strong>of</strong> limited throughput<br />

(framerate) <strong>of</strong> <strong>the</strong> existing compression algorithms.


Note that <strong>the</strong> performance results presented above are in line with results <strong>of</strong> o<strong>the</strong>r<br />

people and results available on <strong>the</strong> Internet (such as for <strong>the</strong> highly optimized ‘Kakadu’<br />

implementation <strong>of</strong> JPEG2000).<br />

3.4.2 Intra-frame image compression such as MJPEG-2000<br />

Motion JPEG2000 uses only key frame (intra-frame) compression, allowing each<br />

frame to be independently accessed. Advantage <strong>of</strong> applying frame-by-frame<br />

compression is that computationally expensive motion estimation is avoided.<br />

Disadvantage is that <strong>the</strong> compression ration <strong>of</strong> algorithms using only intra-frame<br />

compression will be significantly lower than inter-frame based algorithms. One could<br />

see intra-frame image algorithms as an extension <strong>of</strong> still image compression<br />

algorithms.<br />

The same drawbacks exist for this type <strong>of</strong> algorithms: <strong>the</strong> compression ratio is still<br />

insufficient when working in lossless mode and <strong>the</strong> latency is too low to really support<br />

medical video sequences (at typical medical image resolutions).<br />

3.4.3 Inter-frame image compression such as MPEG-4 AVC<br />

Intra-frame image compression provides very high compression ratio especially when<br />

used in lossy mode (where <strong>the</strong>y are designed for). Both <strong>the</strong> closed-loop or open-loop<br />

video codec architectures require complex hierarchical block-based motion models in<br />

order to efficiently reduce <strong>the</strong> uncertainty about <strong>the</strong> true motion and to improve <strong>the</strong><br />

compression efficiency. Employing complex motion models however, reduces <strong>the</strong><br />

chances <strong>of</strong> attaining real-time video encoding. Additionally, opting for a classical<br />

video codec brings a delay that is as high as high as N/2 +1 frames, where N is<br />

typically 16 or 32 for a frame-rate <strong>of</strong> 30 or 60 frames-per-second, respectively. This<br />

means that for a system running at 30 frames per second, <strong>the</strong> introduced delay due to<br />

<strong>the</strong> compression would be 17 frames or 567 milliseconds. It is obvious interaction<br />

between <strong>the</strong> user and <strong>the</strong> s<strong>of</strong>tware application generating <strong>the</strong> image data is<br />

completely impossible with a delay <strong>of</strong> 0.5 seconds. For example: such a delay would<br />

mean that <strong>the</strong> display system will respond to any action <strong>of</strong> <strong>the</strong> user (such as clicking a<br />

button, rotating a medical image, performing window level, moving a window, …) with<br />

a delay <strong>of</strong> more than half a second. For <strong>of</strong>f-line analysis <strong>of</strong> medical images (MRI, CT,<br />

etc) this is not a problem. For vision aided surgery this is indeed is a problem.


4 User Interface Adaptation<br />

4.1 Introduction<br />

User interface adaptation is an issue as old as <strong>the</strong> history <strong>of</strong> <strong>the</strong> existing devices<br />

and ways <strong>of</strong> interaction.<br />

During <strong>the</strong> last years more and more entertainment or pr<strong>of</strong>essional services and<br />

applications that can be used (interfaced) and accessed through different devices<br />

have been developed.<br />

Before analyzing <strong>the</strong> <strong>state</strong> <strong>of</strong> <strong>the</strong> <strong>art</strong> <strong>of</strong> this type <strong>of</strong> applications, some criteria<br />

should be established in order to narrow <strong>the</strong> scope <strong>of</strong> <strong>the</strong> analysis. In order to do<br />

that, <strong>the</strong> following classification, based on <strong>the</strong> way this adaptation is done, is<br />

proposed:<br />

a) Customized adaptation: This category compiles <strong>the</strong> user interfaces adapted<br />

manually. The main advantage <strong>of</strong> those applications is that <strong>the</strong> adaptation is<br />

perfectly suited for <strong>the</strong> final needs <strong>of</strong> <strong>the</strong> device. Each interface is redefined in<br />

a manual or semi-automatic way in order to have <strong>the</strong> perfect appearance that<br />

it should have. The cost, <strong>the</strong> lack <strong>of</strong> use <strong>of</strong> standards and <strong>the</strong> impossibility to<br />

launch automatic processes are <strong>the</strong> main disadvantages.<br />

b) Adaptation based on standard adaptation solutions or on generic<br />

standard tools: This kind <strong>of</strong> adaptation is base on standard or semi-standard<br />

tools that allow <strong>the</strong> automatic adaptation <strong>of</strong> <strong>the</strong> interfaces. This adaptation<br />

process can be fully standardized or can be based on generic standard<br />

transformation tools (i.e. XSLT) but both cases have a common feature: <strong>the</strong><br />

adaptation is based on solutions that facilitate <strong>the</strong> interoperability and<br />

serialization, although sometimes <strong>the</strong> price is <strong>the</strong> loss <strong>of</strong> granularity in <strong>the</strong><br />

adaptation process.<br />

4.2 MPEG-4 Advanced Content visualization technologies<br />

4.2.1 S<strong>of</strong>tware BIFS reproducers<br />

According to <strong>the</strong> Market, <strong>the</strong>re are available some developments that support MPEG-<br />

4 System (BIFS), such as universities that actuate as research institutions or<br />

companies in Research and Development aspects and services commercialization.<br />

The standard specification MPEG-4 BIFS guarantee <strong>the</strong> interoperability. In this way,<br />

a MP4 content generated with any tool that follow <strong>the</strong> standard, will be able to be<br />

reproduced in any device compatible with BIFS.


However, <strong>the</strong> most <strong>of</strong> <strong>the</strong> reproducers don’t implement <strong>the</strong> 100% BIFS nodes, and<br />

that implies that <strong>the</strong> interoperability is not completely achieved.<br />

4.2.2 GPAC: Osmo4<br />

Osmo4 is a p<strong>art</strong> <strong>of</strong> GPAC (Project on Advanced Content) framework developed by<br />

<strong>the</strong> National Superior Telecommunications School <strong>of</strong> France. GPAC allows generate<br />

2D and 3D advanced contents using <strong>the</strong> MP4Box tool and reproduce <strong>the</strong>m using<br />

Osmo4.<br />

GPAC is distributed under license LGPL (lesser General Public License).<br />

Characteristics:<br />

• It supports several multimedia formats, since simple contents (avi, mov, mpg) until<br />

2D/3D advanced contents.<br />

• It supports local files reproduction, unload http and reproduction and rtp/rstp<br />

streaming on UDP (unicast or multicast) or TCP.<br />

• Video and audio presentation based on open source plugins. It’s available a decoder<br />

development kit (DDK) to connect <strong>the</strong> player with <strong>the</strong> necessary codec.<br />

• Reproduction control: play, pause and advance.<br />

• Graphic characteristics: antialiasing, zoom, rendering area size update, complete<br />

screen.<br />

Osmo4 allows:<br />

• Animated c<strong>art</strong>oon reproduction (unloaded or by means <strong>of</strong> streaming).<br />

• Graphic, text, video/audio interactive and synchronized mixing.<br />

• MPEG-7 and MPEG-21 p<strong>art</strong>ial support: meta-data, encrypted, watermarking, DRM.<br />

4.2.3 IBM: M4Play<br />

IBM has developed a MPEG-4 toolkit. It consists on a classes Java and APIs set that<br />

allow to generate MPEG-4 advanced contents and reproduce <strong>the</strong>m. The toolkit is<br />

distributed under commercial license.<br />

M4Play player is a p<strong>art</strong> <strong>of</strong> <strong>the</strong> toolkit and its characteristics are as follow:<br />

Characteristics:<br />

• Based on java: multiplatform.<br />

• Two versions:<br />

o Independent application.<br />

o Adaptable Applet for html page.<br />

• It supports streaming on rtp/rtsp and local files reproduction.<br />

• Can reproduce:<br />

o MP4 according to ISMA specifications.<br />

o MP4 including MPEG-4 systems.<br />

o AVI : MPEG-4simple pr<strong>of</strong>ile video (.cmp, .m4v, .263).<br />

o AAC: Low-Complexity Pr<strong>of</strong>ile audio (.aac, .adif, .adts).<br />

o MP3: MPEG-1 and third audio level MPEG. (.mp3).


4.2.4 Envivio TV<br />

Envivio has developed and commercialize a MPEG-4 reproducer for set-top-boxes,<br />

PCs, and PDA’s.<br />

Characteristics:<br />

• It’s installed as:<br />

o Independent player.<br />

o Plugin for known players (QuickTime v.4.1.2 or later, RealNetworks v.7.0 or<br />

later, and Windows Media placer v6.4 or later).<br />

• Portable code C/C++ for set top boxes and mobile telephones.<br />

• According to 2D BIFS specification.<br />

• Local or with streaming MP4 files reproduction.<br />

• Protocols: RTP, RTCP, or RTSP on UDP or meanwhile http tunnels, unicast and<br />

multicast.<br />

The independent reproducer version can be integrated or ported to any device<br />

including set-top-boxes, PC, PDA and video game.<br />

Envivio has been certified by RealNetworks and is a p<strong>art</strong> <strong>of</strong> <strong>the</strong> automatic update<br />

program such as <strong>the</strong> MPEG-4 plugin for RealNetworks v8.0 reproducer and later.<br />

4.2.5 Bitmanagement: BS Contact MPEG-4<br />

Bitmanagement has developed a MPEG-4 player with 2d and 3D support. The<br />

implementation covers more than 80% <strong>of</strong> <strong>the</strong> MPEG-4 nodes. This reproducer is<br />

being used in several European projects <strong>of</strong> Telefonica I+D. The MPEG consortium<br />

has solicited to use <strong>the</strong> bitmanagement key s<strong>of</strong>tware technology as a reference<br />

implementation for <strong>the</strong> standard.<br />

The predecessor <strong>of</strong> this reproducer is <strong>the</strong> 3D blaxxun contact motor, that was <strong>the</strong><br />

fires VRML visor that introduced DirectX 7 acceleration hardware support and<br />

incorporated some 3D advanced characteristics (p<strong>art</strong>icles systems, multi-texture,<br />

nurbs, animation, etc.) and interactivity.<br />

The Bitmanagement player incorporate some characteristics as 2D/3D streaming,<br />

animations streaming, compressed scenarios and standardized interfaces for Digital<br />

Rights management and encryption.<br />

SoNG (portals <strong>of</strong> Next Generation) was a European Commission project <strong>of</strong> Telefonica<br />

I+D that used <strong>the</strong> Bitmanagement developed reproducer and was <strong>the</strong> first MPEG-4<br />

reproducer prototype with 2D/3D. Actually, Bitmanagement commercialize this<br />

reproducer.<br />

Characteristics:<br />

• It’s installed as:<br />

o Active X plugin for Micros<strong>of</strong>t Internet Explorer<br />

o Netscape plugin for Netscape 4.x<br />

o Control activeX embedded in any language that support COM (Visual C++,<br />

Visual Basic)<br />

o Control activeX embedded in Java2 way JNI


Bitmanagement assure that this reproducer has been probed with GPAC(ENST) and<br />

IBM generated content.<br />

4.2.6 Octaga Pr<strong>of</strong>essional<br />

Octaga commercialize a 3D MPEG-4 advanced content reproducer: Octaga<br />

Pr<strong>of</strong>essional.<br />

Characteristics:<br />

• Can reproduce MP4 files generated <strong>the</strong> GPAC creation tools<br />

• It’s installed as:<br />

o Independent application<br />

o Plug in that can be inserted into a html page for Internet Explorer, Firefox and<br />

Opera browsers.<br />

4.2.7 Digimax: MAXPEG Player<br />

Digimax commercialize a 2D/3D reproducer compatible with MEPG-4 (BIFS).<br />

Characteristics:<br />

• Portable: C++ code portable to different platforms (STB and mobile)<br />

• Can reproduce MP4 files generated by an own tool: MAXPEG Author.<br />

4.2.8 COSMOS<br />

COSMOS (COllaborative System based on MPEG-4 Objects and Streams) is a<br />

framework for developing applications in collaborative environments (CVE-<br />

Colaborative Virtual Environment).<br />

Completely developed in java, it allows keeping a 3D virtual environment where<br />

exchanging 3D objects and manipulate <strong>the</strong>m in real time.<br />

Allow to send by means <strong>of</strong> broadcast/multicast a change on a BIFS node to <strong>the</strong> whole<br />

interested p<strong>art</strong>icipants, updating in this way all <strong>the</strong> involved scenarios.<br />

4.3 UI adaptation based on XML<br />

4.3.1 UI adaptation based on XML transformation<br />

Most <strong>of</strong> <strong>the</strong> approaches for UI adaptation are based on XML [22] and its<br />

transformation technologies. In [23] and [24] <strong>the</strong>re are two very interesting tutorials<br />

concerning <strong>the</strong>se techniques.<br />

Most <strong>of</strong> <strong>the</strong> applications take into account <strong>the</strong> following assumption: considering <strong>the</strong><br />

user interface as a tree, this tree can be transformed (adapted) into a different tree by<br />

recombining <strong>the</strong> set <strong>of</strong> leaves it is composed <strong>of</strong>.<br />

In <strong>the</strong> following images can be seen how <strong>the</strong> authors <strong>of</strong> <strong>the</strong> work [25] present a<br />

possible architecture to carry out this type <strong>of</strong> user interface adaptation.


In <strong>the</strong> paper <strong>the</strong> reader can also get information about an authoring tool to develop<br />

such transformations.<br />

Architecture and tool components <strong>of</strong> <strong>the</strong> system described in [24].<br />

Ano<strong>the</strong>r example <strong>of</strong> this approach is <strong>the</strong> AUIT [26] methodology, which basing on<br />

XML transformations proposes a four layer architecture to adapt <strong>the</strong> user interface to<br />

different devices. This methodology has been improved during <strong>the</strong> last 4 years and<br />

<strong>the</strong>re are several implementations based on it.


AUIT architecture.<br />

In <strong>the</strong>re are several works done basing on XML transformations in <strong>the</strong> framework <strong>of</strong><br />

<strong>the</strong> SEESCOA (S<strong>of</strong>tware Engineering for Embedded Systems using a Component-<br />

Oriented Approach) initiative [27].<br />

4.3.2 Adaptation via XML publishing servers<br />

Based on similar technologies <strong>the</strong>re are widely used frameworks that provide<br />

mechanisms to implement user interface adaptations for those applications which<br />

access is based on IP technology.<br />

These frameworks act as Web servers which are able to handle different types <strong>of</strong><br />

devices (represented by different types or versions <strong>of</strong> web clients [28]) and implement<br />

a different behavior for each or <strong>the</strong>m. According to this, a web-site or a pizza-ordering<br />

service can be accessed, browsed and visualized in a very different way (in a TV,<br />

PDA, mobile, …).<br />

One <strong>of</strong> <strong>the</strong>se frameworks which use is quite extended and that has survived and been<br />

successfully improved during <strong>the</strong> last decade is Cocoon. This framework, basing on<br />

XML transformation technologies, systematizes <strong>the</strong> adaptation process in a very<br />

significant way.<br />

In [29] <strong>the</strong>re is an example <strong>of</strong> <strong>the</strong> use <strong>of</strong> one <strong>of</strong> those frameworks in order to do <strong>the</strong>se<br />

techniques.<br />

In <strong>the</strong>re is ano<strong>the</strong>r application based on Cocoon: PALIO (Personalized Access to<br />

Local Information and services for tourists) service framework. The PALIO framework<br />

is being used in <strong>the</strong> development <strong>of</strong> location-aware information systems for tourists,<br />

and is capable <strong>of</strong> delivering fully adaptive information to a wide range <strong>of</strong> devices,<br />

including mobile ones.


PALIO example.<br />

Sitemesh [30] is a device web-page layout and decoration Java framework that allows<br />

<strong>the</strong> device-oriented user interface adaptation basing on XML transformation. It does<br />

not act as a XML publishing engine but is integrated in <strong>the</strong> web server.<br />

4.3.3 Adaptation based on <strong>the</strong> definition & identification <strong>of</strong> <strong>the</strong> device<br />

4.3.3.1 Composite Capabilities / Preference Pr<strong>of</strong>iles<br />

Composite Capabilities/Preference Pr<strong>of</strong>iles (CC/PP) [31] recommendation <strong>of</strong> W3C,<br />

which using <strong>the</strong> Semantic Web oriented language RDF [32] was able to define <strong>the</strong><br />

pr<strong>of</strong>iles and capabilities <strong>of</strong> <strong>the</strong> device in order to carry out <strong>the</strong> appropriate<br />

adaptation. This Working group is closed and its work has been transferred to <strong>the</strong><br />

“Device Independent” group [33].<br />

One <strong>of</strong> <strong>the</strong> recent results <strong>of</strong> this group is <strong>the</strong> definition <strong>of</strong> <strong>the</strong> specification <strong>of</strong><br />

“Delivery Context: Interfaces (DCI) Accessing Static and Dynamic Properties [34]”.<br />

This document defines platform and language neutral interfaces that provide Web<br />

applications access to a hierarchy <strong>of</strong> dynamic properties representing device<br />

capabilities, configurations, user preferences and environmental conditions.


User Interface adaptation: concepts involved according DCI group [33].<br />

There is a well documented implementation <strong>of</strong> Sun <strong>of</strong> <strong>the</strong> CC/PP specification. This<br />

implementation describes how to process CC/PP in Java (JSR-000188, [35]).<br />

4.3.3.2 UAPROF (OMA)<br />

One <strong>of</strong> <strong>the</strong> outputs <strong>of</strong> <strong>the</strong> CC/PP had a direct impact in <strong>the</strong> active forum Open<br />

Mobile Alliance (OMA) [36] [37]. The result is <strong>the</strong> UAPr<strong>of</strong>, which is a concrete<br />

implementation <strong>of</strong> CC/PP developed by <strong>the</strong>. The UAPr<strong>of</strong> is a framework for<br />

describing and transporting information about <strong>the</strong> capabilities <strong>of</strong> a device. This<br />

information may include hardware characteristics (e.g. screen size, type <strong>of</strong><br />

keyboard, etc.) and s<strong>of</strong>tware characteristics (e.g. browser manufacturer, markup<br />

languages supported, etc.) The final purpose is that <strong>the</strong> origin servers, gateways<br />

and proxies use this information to customize content for <strong>the</strong> user. The current<br />

version <strong>of</strong> this specification is UAPr<strong>of</strong> 2.0.<br />

One <strong>of</strong> <strong>the</strong> applications that employ this technology can be found in [38].


Architecture defined for <strong>the</strong> Web UI adaptation in [37<br />

4.3.3.3 Device <strong>Description</strong> Repository<br />

The Device <strong>Description</strong> Repository is a concept proposed by <strong>the</strong> World Wide Web<br />

Consortium (W3C) Device <strong>Description</strong> Working Group (DDWG). The proposed<br />

repository would contain information about Web-enabled devices (p<strong>art</strong>icularly<br />

mobile devices) so that content could be adapted to suit. Information would include<br />

<strong>the</strong> screen dimensions, input mechanisms, supported colors, known limitations,<br />

special capabilities etc.<br />

The idea <strong>of</strong> implementing a Device <strong>Description</strong> Repository has been recently<br />

discussed at an international workshop held by <strong>the</strong> DDWG in Madrid, Spain in July,<br />

2006. Thus, using such approach in Cantata to include mobile devices in <strong>the</strong><br />

demonstrators could be interesting.<br />

4.3.4 XML based UI adaptation<br />

A s<strong>of</strong>tware application is known as being device independent when its functions are<br />

universal on different types <strong>of</strong> device. This generally means that it is written in a<br />

meta-language that can be read on any platform.<br />

XML (eXtensible Markup Language) seems to be a good approach to create deviceoriented<br />

interface. Indeed XML is a platform-neutral language that organizes and<br />

exchanges complex information. It is lightweight, easy and increasingly available in<br />

applications nowadays. In addition, XML provides a facility to define tags and <strong>the</strong><br />

structural relationships between <strong>the</strong>m. It is very powerful and useful language for<br />

creating a uniform information format for complex multimedia content and documents.


XML also supports XSL style sheets and allow creating customized presentation for<br />

different devices and users.<br />

XML-based user interface description seems to become a lot more visible such as<br />

Extensible User Interface Language (XUL) or TERESA XML. These approaches<br />

propose specific characteristics and different functionalities.<br />

The approaches that appear in this section have a common feature: <strong>the</strong> adaptation is<br />

achieved due to <strong>the</strong> definition <strong>of</strong> <strong>the</strong> interface without including <strong>the</strong> final presentation.<br />

Thus, <strong>the</strong>se approaches force <strong>the</strong> final device to be compliant with <strong>the</strong>m or to develop<br />

a renderer for each one <strong>of</strong> <strong>the</strong> devices.<br />

4.3.4.1 UIML User Interface Meta Language<br />

4.3.4.2 AUIML<br />

UIML [39] is an XML based (markup) language to define interfaces. UIML allows <strong>the</strong><br />

definition <strong>of</strong> interfaces by concatenating definitions <strong>of</strong> <strong>the</strong> different elements that<br />

compose that interface.<br />

There are renderers for different technologies and platforms (J2EE, QT, HTML,C++,<br />

VoiceXML) that transform <strong>the</strong> UIML expressed interface into <strong>the</strong> appropriate output.<br />

AUIML is similar to UIML but more abstract. AUIML does not include UI<br />

appearance features to be 100% platform and implementation technology<br />

independent. According to <strong>the</strong> IBM definition (which provides a toolkit) “AUIML<br />

captures relative positioning information <strong>of</strong> user interface components and<br />

delegates <strong>the</strong>ir display to a platform-specific renderer. Depending on <strong>the</strong> platform or<br />

device being used, <strong>the</strong> renderer decides <strong>the</strong> best way to present <strong>the</strong> user interface<br />

to <strong>the</strong> user and receive user input.”<br />

4.3.4.3 XIML (eXtensible Interface Markup Language)<br />

This initiative [40] has a similar philosophy to <strong>the</strong> previous ones, but it seems to be<br />

no very active.


Wea<strong>the</strong>r forecast application using XMIL.<br />

4.3.4.4 XUL<br />

The Extensive User Interface Language (XUL) is a Mozilla’s XML-based language<br />

for describing window layout. XUL provides a separation among <strong>the</strong> client<br />

application definition and programmatic logic and its graphical presentation and<br />

language-specific text labels.<br />

An User Interface (UI) can be described as a set <strong>of</strong> structured interface elements<br />

(such as windows, menubar, button …) along with a predefined set <strong>of</strong> properties.<br />

XUL has its focus on window-based graphical user interfaces so it might be not<br />

applicable to interfaces <strong>of</strong> small mobile devices for example.<br />

4.3.4.5 TERESA XML<br />

Teresa is a project, supported by <strong>the</strong> European project Cameleon IST, from <strong>the</strong> HCI<br />

Group <strong>of</strong> ISTI-C.N.R with <strong>the</strong> aim to design and develop a concrete user interface<br />

adapted to specific platform [41]. The Teresa XML language is composed <strong>of</strong> two<br />

p<strong>art</strong>s: a XML-description <strong>of</strong> <strong>the</strong> CTT (ConcurTaskTree [42]) notation and a language<br />

for describing user interfaces.<br />

This XML-based language describes <strong>the</strong> organization <strong>of</strong> <strong>the</strong> Abstract Interaction<br />

Objects (AIO) that composing <strong>the</strong> interface. The user interface dialog is also<br />

described with this language.<br />

A User Interface (UI) is a structured set <strong>of</strong> one or more presentation element(s).<br />

Each presentation element is characterized by a structure, which describes <strong>the</strong><br />

static organization <strong>of</strong> <strong>the</strong> UI and <strong>the</strong> relationships among <strong>the</strong> various presentation<br />

elements.


4.3.4.6 USIXML<br />

The Teresa XML is used in <strong>the</strong> TERESA tool that supports <strong>the</strong> generation <strong>of</strong> tasks<br />

models, abstracts UIs, and running UIs.<br />

UsiXML (which stands for USer Interface eXtensible Markup Language) is a XMLcompliant<br />

markup language that allow <strong>the</strong> description <strong>of</strong> <strong>the</strong> User Interface (UI) for<br />

multiple contexts <strong>of</strong> use, such as Character User Interfaces (CUIs), Graphical User<br />

Interfaces (GUIs), Auditory User Interfaces (AUI), and Multimodal User Interfaces<br />

(MUI).<br />

UsiXML consists <strong>of</strong> a User Interface <strong>Description</strong> Language (UIDL) that is a<br />

declarative language capturing <strong>the</strong> essence <strong>of</strong> what a UI is or should be,<br />

independently <strong>of</strong> physical characteristics.<br />

UsiXML supports device independence: a UI can be described in a way that<br />

remains independence from <strong>the</strong> interactions devices, such as e.g. mouse, screen,<br />

keyboard, voice recognition system. If needed, a reference to a p<strong>art</strong>icular device<br />

can be added to <strong>the</strong> description.<br />

(Information taken from www.usixml.org)<br />

4.3.4.7 AAIML [43]<br />

The Alternate User Interface Access Standard (AAIML) is an initiative <strong>of</strong> <strong>the</strong> V2<br />

technical committee <strong>of</strong> <strong>the</strong> National Committee for Information technology<br />

Standards (NCITS).<br />

This standard aims to allow people with disabilities to remotely control a large set <strong>of</strong><br />

electronics devices (for example copy machines or elevators) from <strong>the</strong>ir personal<br />

device (such as personal mobile phone).<br />

An abstract user interface is transmitted by <strong>the</strong> targeted device to <strong>the</strong> user with<br />

p<strong>art</strong>icular input and output mechanisms that are appropriate for this user. The<br />

concept <strong>of</strong> “Universal Remote Control” (URC) is introduced. This XML-based<br />

language is used to convey an abstract user interface description from <strong>the</strong> target<br />

device to <strong>the</strong> URC. On <strong>the</strong> URC, this abstract description must be mapped to a<br />

concrete description available on <strong>the</strong> platform.


A Compaq iPAQ handheld computer (running Java/Swing on Linux) controlling a TV simulation on a PC<br />

via 802.11b wireless connection and Jini/Java technology<br />

4.3.4.8 XForms and RIML<br />

The W3C XForms specification is a technology intended as <strong>the</strong> next generation <strong>of</strong><br />

forms for <strong>the</strong> web. Although its focus is on ga<strong>the</strong>ring <strong>the</strong> input provided by <strong>the</strong> user,<br />

it provides some information display facilities. Despite its specialized scope,<br />

XForms provides many <strong>of</strong> features necessary for a more general abstract language.<br />

Indeed XForms separates three aspects <strong>of</strong> a form interface:<br />

• The data model used by <strong>the</strong> target.<br />

• The presentation <strong>of</strong> <strong>the</strong> data model to <strong>the</strong> user.<br />

• The processing model.<br />

In XForms, <strong>the</strong> data model can be used by specialized interfaces. In fact XForms<br />

allows that resources such a label can be substituted according to <strong>the</strong> delivery<br />

context.


Renderer Independent Markup Language (RIML) is based on emerging standards.<br />

The current draft <strong>of</strong> XHTML2.0 is used for content such as paragraphs, tables,<br />

images, hyperlinks, etc. For form-based interaction, XForms elements have been<br />

included<br />

RIML stresses <strong>the</strong> separation <strong>of</strong> content definition (i.e. what is to be presented)<br />

from <strong>the</strong> description <strong>of</strong> dynamic adaptations, which can be performed on <strong>the</strong><br />

content in order to match varying capabilities <strong>of</strong> devices.<br />

4.3.4.9 MPEG-21<br />

ISO/IEC is defining <strong>the</strong> MPEG-21 framework, which is intended to support<br />

transparent use <strong>of</strong> multimedia resources across a wide range <strong>of</strong> networks and<br />

devices.<br />

One aspect <strong>of</strong> <strong>the</strong> requirements for MPEG-21 is Digital Item Adaptation, which is<br />

based on a Usage Environment <strong>Description</strong>. It proposes <strong>the</strong> description <strong>of</strong><br />

capabilities for at least <strong>the</strong> terminal, network, delivery, user, and natural<br />

environment, and notes <strong>the</strong> desirability <strong>of</strong> remaining compatible with o<strong>the</strong>r<br />

recommendations such as CC/PP and UAPr<strong>of</strong> (see 4.2.3.1 and 4.2.3.2).<br />

(Information taken from www.w3.org)<br />

4.4 Device ontology<br />

In 2001, <strong>the</strong> initiative <strong>of</strong> <strong>the</strong> FIPA proposes a device ontology [51]. This ontology<br />

describes <strong>the</strong> s<strong>of</strong>tware and hardware properties as well as <strong>the</strong> services proposed by<br />

devices. Thanks to this ontology, device’s pr<strong>of</strong>iles can be built and used by agents.<br />

The knowledge <strong>of</strong> this ontology permits agents receiving <strong>the</strong> pr<strong>of</strong>ile <strong>of</strong> a specific device<br />

to know if <strong>the</strong> properties or services <strong>of</strong> <strong>the</strong> latter allow <strong>the</strong>m to achieve <strong>the</strong>ir objectives.<br />

The FIPA-device ontology could be used in a CC/PP pr<strong>of</strong>ile (see 4.2.3.1).<br />

For some examples see:<br />

http://www.fipa.org/specs/fipa00091/PC00091A.html#_Toc511707116<br />

4.5 Agent-base user interface adaptation<br />

MATE<br />

MATE is a prototype <strong>of</strong> Computer-Human Interface based on a society <strong>of</strong> reactive<br />

agents and on a language <strong>of</strong> spatial description <strong>of</strong> tasks. Implemented as a text editor,<br />

this tool aims at showing that a s<strong>of</strong>tware (<strong>of</strong> an <strong>of</strong>fice automation type) can be build<br />

using <strong>the</strong> advantages <strong>of</strong> <strong>the</strong> agent paradigm and <strong>the</strong> power <strong>of</strong> script languages in order<br />

to make this interface more personalizable, more extendable and more intuitive for<br />

non-expert users [52].


5 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> system architecture<br />

5.1 DLNA<br />

Digital.Living.Network.Alliance.(DLNA) is a cross-industry organization <strong>of</strong> leading<br />

consumer electronics, computing industry and mobile device companies share a vision<br />

<strong>of</strong> a wired and wireless network <strong>of</strong> interoperable consumer electronics (CE), personal<br />

computers (PC) and mobile devices in <strong>the</strong> home and on <strong>the</strong> road, enabling a seamless<br />

environment for sharing and growing new digital media and content services.<br />

DLNA is focused on delivering interoperability guidelines based on open industry<br />

standards to complete <strong>the</strong> cross-industry digital convergence. DLNA has published a<br />

common set <strong>of</strong> industry design guidelines that allow manufacturers to p<strong>art</strong>icipate in a<br />

growing marketplace <strong>of</strong> networked devices, leading to more innovation, simplicity and<br />

value for consumers. The DLNA Networked Device Interoperability Guidelines are use<br />

case driven and specify <strong>the</strong> interoperable building blocks that are available to build<br />

platforms and s<strong>of</strong>tware infrastructure.<br />

The DLNA Networked Device Interoperability Guidelines refer to standards from<br />

established, open industry standards organizations and provide CE, PC and mobile<br />

device manufacturers with <strong>the</strong> information needed to build compelling, interoperable<br />

digital.home platforms, devices and applications.


This Figure shows <strong>the</strong> technology ingredients covered by <strong>the</strong> DLNA Networked Device<br />

Interoperability Guidelines.<br />

The digital home consists <strong>of</strong> a network <strong>of</strong> CE, PC and mobile devices that cooperate<br />

transparently, delivering simple, seamless interoperability that enhances and enriches<br />

user experiences. This is <strong>the</strong> communications and control backbone for <strong>the</strong> home<br />

network and is based on IP networking UPnP and Internet Engineering Task Force<br />

technologies.<br />

Information taken from<br />

http://www.dlna.org/en/industry/pressroom/DLNA_white_paper.pdf<br />

5.2 mTag<br />

There are several new approaches on market focusing on a sm<strong>art</strong> tags which enables<br />

not only new way to point and select desired source <strong>of</strong> information but also initiate data<br />

access and direct desired content to terminal initiated by end user.<br />

An example <strong>of</strong> this kind <strong>of</strong> new approach is mTag architecture. With focus on sm<strong>art</strong><br />

environment and capabilities to <strong>of</strong>fer User Interface with a distributed event driven<br />

architecture for discovering location specific mobile web services mTag shows<br />

architecture where service discovery is initiated by touching a fixed RFID reader with a<br />

mobile passive RFID tag attached e.g. to a phone, which results in information <strong>of</strong><br />

available services being pushed to user’s preferred device [mTag].


As <strong>state</strong>d by mTag project: “The principal advantage <strong>of</strong> <strong>the</strong> proposed architecture is<br />

that it can be realized with today’s <strong>of</strong>f-<strong>the</strong>-shelf commercial products. We presented a<br />

proposal for an Internet based deployment and two case studies, where prototype<br />

implementations were empirically evaluated in <strong>the</strong> true environment <strong>of</strong> use. The case<br />

studies showed that <strong>the</strong> service was found as an easy way to access location based<br />

mobile web services.


Users were satisfied with <strong>the</strong> possibility to fully control <strong>the</strong> information pushed to <strong>the</strong>ir<br />

devices, in comparison to <strong>the</strong> automatic location based information delivery <strong>of</strong> <strong>the</strong><br />

comparative Bluetooth based service in <strong>the</strong> second case study.” [mTag]<br />

[mTag]: Korhonen J, Ojala T, Klemola M & Väänänen P (2006)<br />

mTag – Architecture for discovering location specific mobile web services using<br />

RFID and its evaluation with two case studies. Proc. International Conference on<br />

Internet and Web Applications and Services, Guadeloupe.<br />

5.3 Content retrieval and device management<br />

The delivery system will be managed like any o<strong>the</strong>r network system, but <strong>the</strong> devices<br />

present special challenges. The crucial insight is that content delivery is first and<br />

foremost a data management problem at multiple levels. Content delivery systems<br />

must be built around a set <strong>of</strong> database-related requirements: queryable metadata,<br />

secure and transactional distribution <strong>of</strong> data between databases, and <strong>the</strong> unbreakable<br />

linkage between content and its meta data. Additionally, <strong>the</strong> distributed system must be<br />

able to keep its application and configuration data under control to ensure proper<br />

functionality <strong>of</strong> <strong>the</strong> system and autonomic behavior from end user and device point <strong>of</strong><br />

view without much need for user intervention.<br />

This chapter describes a simple content distribution technique that enables a user to<br />

easily select content from <strong>the</strong> vast libraries that are available, download it, view it, and<br />

be charged for it.


The architecture for <strong>the</strong> presented system is based on a communicating network <strong>of</strong><br />

database servers that manage all <strong>the</strong> data <strong>of</strong> <strong>the</strong> system. The next figure illustrates <strong>the</strong><br />

different components at a conceptual level.<br />

The system has following components.<br />

• The conceptual centerpiece <strong>of</strong> <strong>the</strong> system is occupied by <strong>the</strong> Rendering Devices<br />

which accept different types <strong>of</strong> content from multiple media sources.<br />

• The Content Libraries contain digital content and <strong>the</strong> associated metadata.<br />

• The Preference Server contains user-specific data related to content and usage <strong>of</strong><br />

<strong>the</strong> system. Identity, au<strong>the</strong>ntication, and saved queries are stored in <strong>the</strong> preference<br />

server.<br />

• The Ontology Server maintains common ontology data that is shareable across <strong>the</strong><br />

o<strong>the</strong>r components <strong>of</strong> <strong>the</strong> system. This data makes <strong>the</strong> content machine searchable.<br />

• The Configuration Management Server manages <strong>the</strong> configuration <strong>of</strong> <strong>the</strong> system<br />

and its devices.<br />

The user’s network terminal device (typically a PC or a set top box) interacts with all<br />

<strong>the</strong> above host components using data synchronization across a protocol like http. It<br />

can download new components to upgrade itself. It can download results sets for<br />

fur<strong>the</strong>r local analysis. And <strong>of</strong> course, it can download content. It can also use <strong>the</strong><br />

system to back up preferences, configurations, user data, and media that no longer fits<br />

on <strong>the</strong> device.


At <strong>the</strong> core <strong>of</strong> <strong>the</strong> presented approach is <strong>the</strong> Solid BoostEngine, a small-footprint<br />

relational database manager that provides all <strong>the</strong> typical functionality <strong>of</strong> a modern data<br />

manager, including <strong>the</strong> SQL language for defining schemas and queries, transactions,<br />

multi-user capabilities, support for programmability (procedures, triggers, events) and<br />

automatic data recovery. Applications and devices communicate with <strong>the</strong> data manager<br />

using standard ODBC (Open Database Connectivity) and JDBC (Java Database<br />

Connectivity) application programming interfaces (APIs).<br />

New advanced databases <strong>of</strong>fers new ways to manage required content and critical<br />

information based on applications and user interface requirements. Solid BoostEngine<br />

has two separate storage methods: one for typical alphanumeric data, and a second<br />

mechanism optimized for <strong>the</strong> storage and retrieval <strong>of</strong> Binary Large Objects (BLOBs). In<br />

Solid, digital content can be handled within <strong>the</strong> database as efficiently as if <strong>the</strong> data<br />

were to be stored in operating system files. This provides relational database<br />

functionality for media content, a solution with many benefits:<br />

• The same API is used for accessing and distributing both alphanumeric and content<br />

data, which simplifies application design.<br />

• Access to content and metadata can be combined in <strong>the</strong> same query, ensuring that<br />

property rights data always accompanies content data.<br />

• All data can be treated transactionally, meaning that changes to content and<br />

changes to meta-data can be tightly linked.<br />

• The DBMS protects all data in <strong>the</strong> system with a unified access control mechanism.<br />

The data distribution component <strong>of</strong> <strong>the</strong> Solid Platform is <strong>the</strong> Solid Sm<strong>art</strong>Flow Option.<br />

It links toge<strong>the</strong>r a set <strong>of</strong> loosely coupled, cooperative databases that share data with<br />

one ano<strong>the</strong>r under strict integrity and security rules. Key aspects <strong>of</strong> <strong>the</strong> architecture<br />

include <strong>the</strong> following:<br />

• A hierarchical relationship <strong>of</strong> master and replica databases.<br />

• A publish/subscribe mechanism for distributing data from a master database to one<br />

or more replica databases.<br />

• A transaction propagation mechanism for forwarding local changes from a replica<br />

database to its master.<br />

• Transactional and recoverable message queuing for data transfer between<br />

databases.<br />

The content delivery network will be a very large system with numerous different<br />

components under <strong>the</strong> control <strong>of</strong> a variety <strong>of</strong> entities. Such a system must be designed<br />

for manageability from <strong>the</strong> ground up. Recent developments in Autonomic Computing<br />

show promise in this area. Autonomic systems are self-configuring, self -healing, self -<br />

optimizing and self –protecting so that <strong>the</strong>y effectively take care <strong>of</strong> <strong>the</strong>mselves without<br />

much need for user intervention. The delivery system will be managed like any o<strong>the</strong>r<br />

network system, but <strong>the</strong> devices present special challenges.


Device management includes at least <strong>the</strong> following tasks:<br />

• Managing user identification and au<strong>the</strong>ntication.<br />

• Automatically installing and upgrading s<strong>of</strong>tware on local devices.<br />

• Maintaining valid s<strong>of</strong>tware configurations without requiring user interaction.<br />

• Backing up and/or deleting unused s<strong>of</strong>tware and content from devices<br />

• Transferring user preferences from one device to ano<strong>the</strong>r.<br />

The configuration manager holds data relating to system configuration. This includes<br />

applications that may be needed by terminals and rendering devices. The configuration<br />

management data can be divided into following components:<br />

• Version “header information”.<br />

• Application binaries (Java classes and resources) <strong>of</strong> <strong>the</strong> new version.<br />

• SQL Scripts needed to create or upgrade <strong>the</strong> database schemas.<br />

• State information about each <strong>of</strong> <strong>the</strong> managed nodes.<br />

• Log information for troubleshooting purposes.<br />

All system configuration management operations are performed by preparing <strong>the</strong><br />

required configuration as a publication in <strong>the</strong> master and <strong>the</strong>n distributing it to <strong>the</strong><br />

managed terminals and rendering devices through data synchronization. After<br />

refreshing <strong>the</strong> local copy <strong>of</strong> <strong>the</strong> management data, <strong>the</strong> managed device may run some<br />

installation procedures (e.g. execute schema upgrade SQL scripts in <strong>the</strong> target<br />

database) to complete <strong>the</strong> task.<br />

Centralizing configuration data in this way solves <strong>the</strong> important problem <strong>of</strong> knowing <strong>the</strong><br />

<strong>state</strong> <strong>of</strong> any managed node at any point in time. The configuration manager can alter<br />

that <strong>state</strong> into a new consistent <strong>state</strong> by asking <strong>the</strong> device to subscribe to a new<br />

publication or refresh an old publication.<br />

The rendering device: In order to provide <strong>the</strong> media service to <strong>the</strong> end user, <strong>the</strong><br />

rendering device acquires applications and content data from <strong>the</strong> four components<br />

mentioned above. Within <strong>the</strong> database <strong>of</strong> this device, data may be organized as shown<br />

in Figure 7.


The diagram shows that <strong>the</strong> rendering device operates on data that it obtains from a<br />

number <strong>of</strong> sources. The data has been organized into logical databases, each <strong>of</strong> which<br />

may be synchronized with <strong>the</strong> source (master database) <strong>of</strong> <strong>the</strong> data. Much <strong>of</strong> this data<br />

is downloaded or pushed to <strong>the</strong> device as needed.<br />

The sequence <strong>of</strong> steps needed to query video content from a content library and deliver<br />

it to a rendering device has been described in outline earlier in this document in <strong>the</strong><br />

section on System Functionality. Figure 8 below shows how <strong>the</strong> various information<br />

resources contribute to resolving a user’s query.


Queries can take advantage <strong>of</strong> any or all <strong>of</strong> <strong>the</strong> metadata associated with <strong>the</strong> media in order<br />

to focus down on desired content. Figure 8 shows <strong>the</strong> use <strong>of</strong> two types <strong>of</strong> metadata:<br />

enumerated and free text. The user queries against both <strong>of</strong> <strong>the</strong>m.<br />

Users may retain <strong>the</strong>ir queries for reuse. In our use case, Amy wants to find recent video<br />

news clips that she has not seen yet about her favorite rock band’s world tour. She may<br />

wish to re-execute this query every few days to find recent news. Each query is made up <strong>of</strong><br />

a single row in <strong>the</strong> CONTENT_QUERY table which is linked to one or more rows in <strong>the</strong><br />

enumerated and free text tables, each <strong>of</strong> which represents a condition that must be met with<br />

regard to this content.<br />

The matchmaking procedure finds clips where <strong>the</strong> metadata and query items match, and it<br />

and produces rows in a QUERY_MATCH table. This table has a separate entry for each<br />

piece <strong>of</strong> content whose met data matches <strong>the</strong> query criteria. In this example <strong>the</strong> criteria will<br />

be: Amy’s favorite band, news clips, not yet seen. In <strong>the</strong> real world, <strong>the</strong> query may interact<br />

with Amy’s preferences about which news sources she prefers and how much she is willing<br />

to pay for this kind <strong>of</strong> content.<br />

The packaging procedure goes through <strong>the</strong> QUERY_MATCH table and creates rows in <strong>the</strong><br />

SEGMENT_ASSIGNMENT table <strong>of</strong> all content that matches <strong>the</strong> query and that has not yet<br />

been assigned to <strong>the</strong> rendering device. This step protects Amy from inadvertently<br />

downloading <strong>the</strong> same content twice. Amy will interact with this list, ei<strong>the</strong>r directly or through<br />

matching to her preferences, to determine what she will actually download. Rows in this<br />

table will be used to parameterize Amy’s content publication so that it defines <strong>the</strong> content <strong>of</strong><br />

current interest to her.


WP3.1 <strong>Deliverable</strong><br />

Cantata<br />

(ITEA 05010)<br />

Version 0.14<br />

Page 60 <strong>of</strong> 63<br />

At this point, <strong>the</strong> rendering device is able to obtain content by forwarding a refresh request<br />

to <strong>the</strong> content library, asking it to refresh <strong>the</strong> data <strong>of</strong> <strong>the</strong> CONTENT_OF_REPLICA (replica<br />

ID) publication. It is here that <strong>the</strong> content assigned to a replica can be downloaded to <strong>the</strong><br />

device or terminal.<br />

Because <strong>of</strong> <strong>the</strong> vast quantity <strong>of</strong> digital content, providing users with an easy way to locate<br />

content <strong>of</strong> interest to <strong>the</strong>m is key to <strong>the</strong> usability <strong>of</strong> <strong>the</strong> system. Technically, this comes down to<br />

giving users an intuitive way to create queries against content meta-data stores. It must be easy<br />

for both <strong>the</strong> naïve and <strong>the</strong> skilled user to define a query over a range <strong>of</strong> media servers. Queries<br />

must provide powerful and flexible search functions, including ways to select by <strong>the</strong> content <strong>of</strong><br />

<strong>the</strong> media. Searches must be efficient, i.e. fast to execute. User <strong>of</strong> <strong>the</strong> system must be able to<br />

retain queries for re-execution against new media or o<strong>the</strong>r media servers.


6 References<br />

WP3.1 <strong>Deliverable</strong><br />

Cantata<br />

(ITEA 05010)<br />

Version 0.14<br />

Page 61 <strong>of</strong> 63<br />

[1] M. Boliek, C. Christopoulos, and E. Majani, "JPEG2000 P<strong>art</strong> I Final Draft International<br />

Standard," ISO/IEC JTC1/SC29/WG1, Report September 25, 2000 2000.<br />

[2] J. Editors, "JPEG-2000 image coding system - P<strong>art</strong> 11: Wireless JPEG-2000 -<br />

Committee Draft," ISO/IEC/SC29/WG1 (JPEG), CD, 2005.<br />

[3] H. M. Radha, M. v. d. Schaar, and Y. Chen, "The MPEG-4 Fine-grained Scalable Video<br />

Coding for Multimedia Streaming over IP," IEEE Transactions on Multimedia, vol. 3, pp.<br />

53-68, 2001.<br />

[4] W. Li, "Streaming Video Pr<strong>of</strong>ile in MPEG-4," IEEE Transactions on Circuits and Systems<br />

for Video Technology, vol. 11, pp. 301-317, 2001.<br />

[5] C. Brislawn and P. Schelkens, "JPEG 2000 P<strong>art</strong> 12: Extensions for Three-Dimensional<br />

and Floating Point Data Scope and Requirements document, draft version 1," ISO/IEC<br />

JTC1/SC29/WG1, Sydney, Australia, Report WG1N2378, November 12-16, 2001 2001.<br />

[6] ISO/IEC, "JPEG 2000 image coding system – P<strong>art</strong> 11: Wireless JPEG 2000," ISO/IEC<br />

JTC1/SC29/WG11, N3386, 2004.<br />

[7] S.-J. Choi and J. W. Woods, "Motion-compensated 3-D subband coding <strong>of</strong> video," IEEE<br />

Transactions on Image Processing, vol. 8, pp. 155-167, 1999.<br />

[8] S. Han and B. Girod, "SNR Scalable Coding with Leaky Prediction," ITU-T Q.6/SG16,<br />

VCEG-N53 2001.<br />

[9] H. C. Huang, C.-N. Wang, and T. Chiang, "A Robust Fine Granularity Scalability Using<br />

Trellis Based Predictive Leak," IEEE Transactions on Circuits and Systems for Video<br />

Technology, vol. 12, pp. 372-385, 2002.<br />

[10] F. Wu, S. Li, and Y.-Q. Zhang, "A Framework for Efficient Progressive Fine Granularity<br />

Scalable Video Coding," IEEE Transactions on Circuits and Systems for Video<br />

Technology, vol. 11, pp. 332-344, 2001.<br />

[11] Y. He, R. Yan, F. Wu, and S. Li, "H.26L-based fine granularity scalable video coding,"<br />

ISO/IEC JTC1/SC29/WG1, M7788, December 2001 2001.<br />

[12] F. Wu, S. Li, R. Yan, X. Sun, and Y.-Q. Zhang, "Efficient and Universal Scalable Video<br />

Coding," presented at IEEE International Conference on Image Processing (ICIP),<br />

Rochester, NY, USA, 2002.<br />

[13] J.-R. Ohm, "Three-dimensional subband coding with motion compensation," IEEE<br />

Transactions on Image Processing, vol. 3, pp. 559-571, 1994.<br />

[14] B. Pesquet-Popescu and V. Bottreau, "Three Dimensional Lifting Schemes for Motion<br />

Compensated Video Compression," presented at IEEE International Conference on<br />

Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, USA, 2001.<br />

[15] A. Secker and D. Taubman, "Motion-Compensated Highly Scalable Video Compression<br />

using Adaptive 3D Wavelet Transform Based on Lifting," presented at IEEE International<br />

Conference on Image Processing (ICIP), Thessaloniki, Greece, 2001.<br />

[16] A. Secker and D. Taubman, "Lifting-Based Invertible Motion Adaptive Transform<br />

(LIMAT) Framework for Highly Scalable Video Compression," IEEE Transactions Image<br />

Processing, vol. 12, pp. 1530-1542, 2003.<br />

[17] P. Chen and J. W. Woods, "Bidirectional MC-EZBC with Lifting Implementation," IEEE<br />

Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 1183-1194,<br />

2004.


WP3.1 <strong>Deliverable</strong><br />

Cantata<br />

(ITEA 05010)<br />

Version 0.14<br />

Page 62 <strong>of</strong> 63<br />

[18] J. W. Woods and J.-R. Ohm, "Special issue on subband/wavelet interframe video<br />

coding," Signal Processing: Image Communication, vol. 19, 2004.<br />

[19] D. S. Turaga, M. v. d. Schaar, Y. Andreopoulos, A. Munteanu, and P. Schelkens,<br />

"Unconstrained Motion Compensated Temporal Filtering (UMCTF) for Efficient and<br />

Flexible Interframe Wavelet Video Coding," Signal Processing: Image Communication,<br />

to appear.<br />

[20] T. Wiegand and G. Sullivan, "Draft ITU-T Recommendation and Final Draft International<br />

Standard <strong>of</strong> Joint Video Specification," ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6<br />

2003.<br />

[21] SMPTE 421M: VC-1 Compressed Video Bitstream Format and Decoding Process.<br />

http://www.micros<strong>of</strong>t.com/windows/windowsmedia/forpros/events/NAB2005/VC-1.aspx<br />

[22] http://www.w3.org/XML/<br />

[23] Transformation with XSL.<br />

http://www.adobe.com/designcenter/indesign/<strong>art</strong>icles/indcs2at_xsl/indcs2at_xsl.pdf<br />

[24] XML Transformation Flow Processing.<br />

http://www.mulberrytech.com/Extreme/Proceedings/typesetpdf/2001/Euzenat01/EML2001Euzenat01.pdf<br />

[25] Grundy, J. and Yang, B. 2003. An environment for developing adaptive, multi-device<br />

user interfaces. In Proceedings <strong>of</strong> <strong>the</strong> Fourth Australasian User interface Conference on<br />

User interfaces 2003 - Volume 18 (Adelaide, Australia). R. Biddle and B. Thomas, Eds.<br />

ACM International Conference Proceeding Series, vol. 36. Australian Computer Society,<br />

Darlinghurst, Australia, 47-56.<br />

[26] Grundy, J. and Zou, W. AUIT: Adaptable User Interface Technology, with Extended Java<br />

Server Pages, in: Seffah, A. and Javahery, H. (eds.) Multiple User Interfaces:<br />

Crossplatform applications and context-aware interfaces, pages 149-167, Wiley, 2004.<br />

[27] SEESCOA http://www.cs.kuleuven.ac.be/cwis/research/distrinet/projects/SEESCOA/<br />

[28] Complete list <strong>of</strong> web-browsers (including mobile browsers or micro-browsers)<br />

http://en.wikipedia.org/wiki/List_<strong>of</strong>_web_browsers.<br />

[29] TWEEP – Design and implementation <strong>of</strong> a multilingual Web server with adapted<br />

interfaces to PC and Television.<br />

http://www.vicomtech.es/ingles/html/proyectos/index_proyecto46.html<br />

[30] Sitemesh: web-page layout and decoration framework.<br />

http://today.java.net/pub/a/today/2004/03/11/sitemesh.html<br />

[31] CC/PP Information Page http://www.w3.org/Mobile/CCPP/<br />

[32] RDF (Resource <strong>Description</strong> Framework) http://www.w3.org/RDF/<br />

[33] Device Independency <strong>of</strong> W3C http://www.w3.org/2001/di/<br />

[34] Delivery Context: http://www.w3.org/TR/2005/WD-DPF-20051111/<br />

[35] JSR 188 http://jcp.org/aboutJava/communityprocess/final/jsr188/index.html<br />

[36] http://www.openmobilealliance.org/<br />

[37] White Paper on UAPr<strong>of</strong> Best Practices Guide.<br />

http://www.openmobilealliance.org/docs/OMA-WP-UAPr<strong>of</strong>_Best_Practices_Guide-<br />

20060718-A.pdf<br />

[38] Example <strong>of</strong> Web UI adaptation.<br />

http://users.tkk.fi/~majakobs/<strong>the</strong>sis/WebUIAdaptation.pdf<br />

[39] UIML http://www.uiml.org/<br />

[40] XIML eXtensible Interface Markup Language.


WP3.1 <strong>Deliverable</strong><br />

Cantata<br />

(ITEA 05010)<br />

Version 0.14<br />

Page 63 <strong>of</strong> 63<br />

[41] Paternò. F and Santoro. C One model, many interfaces. In Ch Kolski and J.<br />

Vanderdonckt (Eds), editors, Proceedings <strong>of</strong> <strong>the</strong> 4 th International Conference on<br />

Computer-Aided Design <strong>of</strong> User Interfaces CADUI’2002 (Valenciennes, 15-17 May<br />

2002), pages 143-154, Dordrecht, 2002. Kluwer Academics Publishers.<br />

[42] Paternò F., Mancini C., Meniconi S. ConcurTaskTrees: A Diagrammatic Notation for<br />

Specifying Task Models.<br />

[43] Zimmermann, G., Vanderheiden, G., Gilman, A. “Prototype Implementations for a<br />

Universal Remote Console Specification,” in CHI'2002. 2002. Minneapolis, MN: pp. 510-<br />

511.<br />

[44] LASeR and SAF.<br />

http://www.mpeg-laser.org<br />

[45] ISO/IEC 14496-1: Systems.<br />

http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=38559<br />

[46] ISO/IEC 14496-2: Visual.<br />

http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=39259<br />

[47] ISO/IEC 14496-3: Audio.<br />

http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=42739<br />

[48] ISO/IEC 14496-10: Advanced Video Coding<br />

http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=43058<br />

[49] ISO/IEC 14496-20: Lightweight Application Scene Representation (LASeR) and Simple<br />

Aggregation Format (SAF).<br />

http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=41650<br />

[50] ISO/IEC 14496-11: Scene description and application engine.<br />

http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=38560<br />

[51] http://www.fipa.org/specs/fipa00091/PC00091A.html<br />

[52] Siléo C. Hutzler G. MATE: un éditeur de texte basé sur une société d’agents réactifs,<br />

RSTI/hors série. JFSMA 2003.<br />

[53] http://www.arcs<strong>of</strong>t.com/products/videoimpression/ : Video Impression<br />

[54] http://www.arcs<strong>of</strong>t.com/products/mobiledevicesolution/photo.asp : PhotoBase Deluxe<br />

[55] http://www.iris.tv/indexFlash.htm: IRIS<br />

[56] http://www.3rdisecure.tv/domestic_products.asp : 3rdi<br />

[57] http://www.dlink.com/products/?pid=500&sec=0 : DLink DCS-2120 Wireless Internet<br />

Camera with 3G Mobile Video Support<br />

[58] http://www.neiongfx.com/neion-video-surveillance-mobile.html:<br />

[59] http://visiowave.com/ : Nioo Visio<br />

[60] http://www.3rdeye.ro/index.php?mod=aplic: 3rdeye<br />

[61] http://www.dlna.org

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!