D3.1 Deliverable Description of the state-of-the-art ... - Hitech Projects
D3.1 Deliverable Description of the state-of-the-art ... - Hitech Projects
D3.1 Deliverable Description of the state-of-the-art ... - Hitech Projects
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>D3.1</strong> <strong>Deliverable</strong><br />
<strong>Description</strong> <strong>of</strong> <strong>the</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong><br />
CANTATA<br />
•••••••••••••••••••••••••••••••••••••••••••••<br />
Project number: ITEA05010<br />
Document version no.: 1.0<br />
Status: Final<br />
Edited by: Dominique Segers, Barco, Belgium<br />
Thursday, 26 April 2007<br />
ITEA Roadmap domains:<br />
Major: Services & S<strong>of</strong>tware creation<br />
Minor: Cyber Enterprise<br />
ITEA Roadmap technology categories:<br />
Major: Content<br />
Minor: Data and content management
History:<br />
Document<br />
version #<br />
Date Remarks<br />
v0.10 8/11/2006 Initial document st<strong>art</strong> by Dominique Segers, Barco<br />
V0.11 21/11/2006 First compilation by Dominique Segers, Barco<br />
V0.12 21/11/2006 Second compilation by Dominique Segers, Barco<br />
V0.13 22/11/2006 Edit after input from CodaSystem<br />
V0.14 15/12/2006 New Structure<br />
V0.15 19/12/2006 New Structure and sections by Juana Sánchez, Telefónica<br />
V0.16 20/12/2006 Edit after new input from Telefónica<br />
V0.17 12/01/2007 Edit after new input from CodaSystem<br />
V0.18 18/01/2007 Edit after input from Solid<br />
V0.19 06/02/2007 Edit after review from Egbert, LogicaCMG<br />
V1.0 26/04/2007 Final approval by <strong>the</strong> PMT<br />
Contributors:<br />
Dominique Segers, Barco<br />
Ismael Fuentes, I&IMS<br />
Juana Sánchez Pérez, Telefónica<br />
John de Vet, iLab<br />
Jorma Palo, Solid<br />
Johannes Peltola, VTT<br />
Raoul Djeutane, CodaSystem<br />
Gorka Marcos Ortego,VicomTech<br />
Nicolas Damien, Centre Henri Tudor<br />
This document will be treated as strictly confidential. It will only be public to those who<br />
have signed <strong>the</strong> ITEA Declaration <strong>of</strong> Non-Disclosure.
TABLE OF CONTENTS<br />
1 Introduction....................................................................................................................6<br />
1.1 The Aim <strong>of</strong> <strong>the</strong> activity.................................................................................................. 6<br />
1.2 Potential P<strong>art</strong>ners contributions: .................................................................................. 6<br />
2 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> User Interfaces <strong>of</strong> applications and services .................................... 7<br />
2.1 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> UI <strong>of</strong> applications on mobile phones ................................................... 7<br />
2.1.1 Introduction ............................................................................................................ 7<br />
2.1.2 Video applications on mobile phones ..................................................................... 7<br />
2.1.2.1 VideoImpression - Mobile Edition [53].................................................. 7<br />
2.1.3 Photo applications on mobile phones..................................................................... 9<br />
2.1.3.1 PhotoBase Deluxe - Mobile Edition [54]................................................ 9<br />
2.1.4 Video Surveillance over IP ................................................................................... 10<br />
2.1.4.1 IRIS [55] ............................................................................................... 10<br />
2.1.4.2 The 3rdi Security System [56] .............................................................. 10<br />
2.1.4.3 DLink DCS-2120 Wireless Internet Camera with 3G Mobile Video<br />
Support [57] .......................................................................................................... 10<br />
2.1.4.4 NIOO VISIO [58] ................................................................................. 12<br />
2.1.5 Video Surveillance over IP with content analysis on server ................................. 12<br />
2.1.5.1 Visio Wave [59].................................................................................... 13<br />
2.1.5.2 3rdeye - Video Surveillance on Your Mobile [60] ............................... 14<br />
2.1.6 Interactive composition and scene mixing............................................................ 16<br />
2.2 UI <strong>of</strong> services for IP-enabled TV and Set-Top Boxes ................................................. 19<br />
2.2.1 On-line services for IP-enabled TV and Set-Top Boxes ....................................... 19<br />
2.2.2 Flash-based content adaptation in Set-Top Boxes............................................... 19<br />
3 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> Compression Algorithms .................................................................. 20<br />
3.1 Motion JPEG-2000 and Wireless (P<strong>art</strong> 11) JPEG-2000 ............................................. 20<br />
3.1.1 Introduction .......................................................................................................... 20<br />
3.1.2 Scope and Features <strong>of</strong> Motion JPEG-2000 .......................................................... 21<br />
3.1.3 Scope and Features <strong>of</strong> Wireless JPEG-2000 ....................................................... 21<br />
3.1.4 Video Coding with Motion Compensated Prediction............................................. 23<br />
3.2 Codification technologies ........................................................................................... 26<br />
3.2.1 Introduction .......................................................................................................... 26<br />
3.2.2 MPEG-1 and MPEG-2 .......................................................................................... 26<br />
3.2.3 MPEG4................................................................................................................. 27<br />
3.2.3.1 MPEG-4 architecture ............................................................................ 27<br />
3.2.3.2 CODECS (MPEG-4 Visual y MPEG-4 Audio).................................... 27<br />
3.2.3.3 MPEG-4 Systems (BIFS)...................................................................... 28<br />
3.2.3.4 MPEG-4 P<strong>art</strong> 20 (LASeR and SAF [44]) ............................................. 28<br />
3.3 Additional formats for most power devices (future) .................................................... 33<br />
3.3.1 VC1 [21] ............................................................................................................... 33<br />
3.3.2 Device-oriented screens....................................................................................... 33<br />
3.4 Analysis <strong>of</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong> <strong>art</strong> image compression algorithms for medical applications .. 34<br />
3.4.1 Still image compression such as JPEG, JPEG-LS and JPEG-2000 ..................... 34<br />
3.4.2 Intra-frame image compression such as MJPEG-2000 ........................................ 36<br />
3.4.3 Inter-frame image compression such as MPEG-4 AVC........................................ 36
4 User Interface Adaptation ........................................................................................... 37<br />
4.1 Introduction ................................................................................................................ 37<br />
4.2 MPEG-4 Advanced Content visualization technologies.............................................. 37<br />
4.2.1 S<strong>of</strong>tware BIFS reproducers .................................................................................. 37<br />
4.2.2 GPAC: Osmo4...................................................................................................... 38<br />
4.2.3 IBM: M4Play......................................................................................................... 38<br />
4.2.4 Envivio TV............................................................................................................ 39<br />
4.2.5 Bitmanagement: BS Contact MPEG-4.................................................................. 39<br />
4.2.6 Octaga Pr<strong>of</strong>essional............................................................................................. 40<br />
4.2.7 Digimax: MAXPEG Player .................................................................................... 40<br />
4.2.8 COSMOS ............................................................................................................. 40<br />
4.3 UI adaptation based on XML...................................................................................... 40<br />
4.3.1 UI adaptation based on XML transformation ........................................................ 40<br />
4.3.2 Adaptation via XML publishing servers ................................................................ 42<br />
4.3.3 Adaptation based on <strong>the</strong> definition & identification <strong>of</strong> <strong>the</strong> device.......................... 43<br />
4.3.3.1 Composite Capabilities / Preference Pr<strong>of</strong>iles ....................................... 43<br />
4.3.3.2 UAPROF (OMA).................................................................................. 44<br />
4.3.3.3 Device <strong>Description</strong> Repository............................................................. 45<br />
4.3.4 XML based UI adaptation..................................................................................... 45<br />
4.3.4.1 UIML User Interface Meta Language................................................... 46<br />
4.3.4.2 AUIML ................................................................................................. 46<br />
4.3.4.3 XIML (eXtensible Interface Markup Language).................................. 46<br />
4.3.4.4 XUL ...................................................................................................... 47<br />
4.3.4.5 TERESA XML...................................................................................... 47<br />
4.3.4.6 USIXML ............................................................................................... 48<br />
4.3.4.7 AAIML [43].......................................................................................... 48<br />
4.3.4.8 XForms and RIML................................................................................ 49<br />
4.3.4.9 MPEG-21 .............................................................................................. 50<br />
4.4 Device ontology ......................................................................................................... 50<br />
4.5 Agent-base user interface adaptation......................................................................... 50<br />
5 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> system architecture........................................................................... 51<br />
5.1 DLNA ......................................................................................................................... 51<br />
5.2 mTag.......................................................................................................................... 52<br />
5.3 Content retrieval and device management................................................................. 54<br />
6 References.................................................................................................................... 61
1 Introduction<br />
1.1 The Aim <strong>of</strong> <strong>the</strong> activity<br />
The activity preparing <strong>the</strong> production <strong>of</strong> <strong>the</strong> <strong>Deliverable</strong> 3.1 encompasses all <strong>the</strong> Topics<br />
addressed in <strong>the</strong> WP 3:<br />
• Topic 3.1 Device-oriented UI adaptation.<br />
• Topic 3.2 User-oriented UI adaptation.<br />
• Topic 3.3 Content-oriented UI adaptation.<br />
• Topic 3.4 Presentation and interaction with users.<br />
And should also map with <strong>the</strong> different domains that are targeted within <strong>the</strong> CANTATA<br />
project:<br />
• Multimedia consumer.<br />
• Medical Imagery.<br />
• Surveillance.<br />
The deliverable <strong>D3.1</strong> aims thus at establishing an as complete as possible State <strong>of</strong> <strong>the</strong> <strong>art</strong><br />
analysis <strong>of</strong> <strong>the</strong> WP technologies.<br />
1.2 Potential P<strong>art</strong>ners contributions:<br />
• Barco.<br />
• I&IMS.<br />
• Telefonica.<br />
• iLab.<br />
• Solid.<br />
• VTT.<br />
• CodaSystem.<br />
• ViconTech.<br />
• Centre Henri Tudor.<br />
Remark VTT:<br />
Since VTT is still unfunded p<strong>art</strong>ner and WP2-management work takes all spare<br />
resources VTT cannot promise to p<strong>art</strong>icipate to <strong>the</strong> WP3 until <strong>the</strong>y have received<br />
funding. VTT may receive funding early 2007 if all goes well.
2 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> User Interfaces <strong>of</strong> applications and services<br />
2.1 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> UI <strong>of</strong> applications on mobile phones<br />
2.1.1 Introduction<br />
This document describes de <strong>state</strong> <strong>of</strong> <strong>the</strong> <strong>art</strong> <strong>of</strong> UI <strong>of</strong> Video applications on mobile phones.<br />
This <strong>state</strong> <strong>of</strong> <strong>the</strong> <strong>art</strong> will consist on presenting some video applications that exist and<br />
which run on mobile phone and we’ll describe <strong>the</strong>ir principal functionality.<br />
2.1.2 Video applications on mobile phones<br />
The applications below present a resume <strong>of</strong> what is done currently concerning video<br />
application on mobile phones.<br />
2.1.2.1 VideoImpression - Mobile Edition [53]<br />
VideoImpression is one solution developed by Arcs<strong>of</strong>t.<br />
This application allow to create and share custom mini-movies featuring user’s own<br />
videos, photos and slide shows, with custom animated titles, credit screens,<br />
soundtracks and scene transitions.
These are <strong>the</strong> principal functionalities:<br />
• Capture video on your mobile device.<br />
• Playback video you record, download, or receive from friends.<br />
• Trim video clips.<br />
• Combine multiple clips toge<strong>the</strong>r.<br />
• Add transition effects between clips.<br />
• Add titles and credits.<br />
• Share your movies via infrared, BlueTooth, email, or MMS.<br />
• File format support: ASF, 3GP, MP4 for video; PCM, ADPCM, MP3, and AMR for<br />
audio.<br />
• Video codec support: H.263, MPEG-4.
2.1.3 Photo applications on mobile phones<br />
2.1.3.1 PhotoBase Deluxe - Mobile Edition [54]<br />
Photobase is ano<strong>the</strong>r developed by Arcs<strong>of</strong>t:<br />
These are <strong>the</strong> key features <strong>of</strong> this application:<br />
ArcS<strong>of</strong>t Panorama Maker<br />
Designed specifically for low pr<strong>of</strong>ile devices, your customers can capture multiple<br />
photos and have <strong>the</strong>m automatically stitched toge<strong>the</strong>r.<br />
Auto Red-eye Removal<br />
Give your customers this quick fix that instantly and automatically removes pesky redeye.<br />
Still Image Capture<br />
When using a camera phone, it is important to have an intuitive application that allows<br />
users to capture stunning pictures. The Still Image Capture component <strong>of</strong>fers several<br />
quality enhancement options for your images on <strong>the</strong> device. Components include:<br />
• White Balance (hardware solution).<br />
• Brightness and Contrast.<br />
• Digital Zoom.<br />
• JPEG Encoding.<br />
Edit and Enhancement<br />
A variety <strong>of</strong> editing and enhancement functions are provided, such as red-eye removal,<br />
crop and rotate. Users can edit <strong>the</strong>ir photos before <strong>the</strong>y store or share <strong>the</strong>m.<br />
Media Management Sharing<br />
With PhotoBase Deluxe, your customers can manage <strong>the</strong>ir photos when <strong>the</strong>y are on <strong>the</strong><br />
go. This application provides a complete solution, allowing your customers to sort,<br />
album, display, and label <strong>the</strong>ir images. Instantly create a slide show with cool transition<br />
effects and sound. Users can share <strong>the</strong>ir images through BlueTooth, MMS, infrared and<br />
email.<br />
Fun Features<br />
PhotoBase Deluxe provides a variety <strong>of</strong> fun features and content. The Panorama<br />
Maker feature provides instant photo stitching capabilities to your mobile device. Add<br />
clip <strong>art</strong>, fun frames, and text to any image. Download more content for special holidays<br />
and occasions.
2.1.4 Video Surveillance over IP<br />
2.1.4.1 IRIS [55]<br />
IRIS cameras are able to transmit live or recorded video to your mobile phone over a<br />
standard mobile phone network. When you want to look at what’s going on just use <strong>the</strong><br />
IRIS viewing s<strong>of</strong>tware on your mobile phone to connect to your camera via <strong>the</strong> IRIS<br />
Control Centre.<br />
IRIS cameras can also detect when an intruder has entered your home through <strong>the</strong>ir<br />
sensors. When an alarm is triggered on your camera <strong>the</strong> IRIS Control Centre sends you<br />
a text message alert. You can <strong>the</strong>n look at a recording <strong>of</strong> <strong>the</strong> event that set <strong>the</strong> camera<br />
<strong>of</strong>f or see what’s happening now.<br />
2.1.4.2 The 3rdi Security System [56]<br />
3rdi cameras can detect when an intruder has entered your home using infrared and<br />
motion sensors. When an alarm is triggered on your camera <strong>the</strong> 3rdi control centre<br />
sends you a text message alert. You can <strong>the</strong>n look at a recording <strong>of</strong> <strong>the</strong> event that<br />
triggered <strong>the</strong> camera or see what's happening even if your phone is switched <strong>of</strong>f when<br />
<strong>the</strong> alert is sent to you, video <strong>of</strong> <strong>the</strong> event is stored at <strong>the</strong> 3rdi control centre for up to 30<br />
days so you can look at it when it's most convenient for you.<br />
You can also see what’s happening at <strong>the</strong> camera location by simply accessing<br />
it via your mobile phone.<br />
2.1.4.3 DLink DCS-2120 Wireless Internet Camera with 3G Mobile Video Support [57]<br />
DCS-2120 is a wireless internet security camera developed by DLink which allow<br />
remotely watching over and observing a place. It can connect to your network<br />
through a fast e<strong>the</strong>rnet port. This camera also allows sending alert messages<br />
(emails) if it detects a suspect movement.<br />
Here are <strong>the</strong> specifications <strong>of</strong> this camera.<br />
3g mobile video from your phone and more<br />
The DCS-2120 <strong>of</strong>fers both consumers and small businesses a flexible and convenient<br />
way to remotely monitor a home or <strong>of</strong>fice in real time from anywhere within a mobile<br />
phone’s 3G service area. When used in conjunction with <strong>the</strong> email alert system, mobile<br />
users can now view a camera feed without a notebook PC and wireless hotspot. This<br />
live video feed can <strong>the</strong>n be accessed through 3G cellular networks by compatible cell<br />
phones*.<br />
In addition to cellular phone monitoring, <strong>the</strong> 3GPP/ISMA video format also enables<br />
streaming playback on a computer. The camera is also viewable from any Internet<br />
Streaming Media Alliance (ISMA) compatible device and <strong>of</strong>fers support for RealPlayer®<br />
10.5 and QuickTime® 6.5 viewing. The DCS-2120 supports resolutions up to 640x480<br />
at up to 30fps using compression rates.
Convenient management options<br />
D-Link’s IP surveillance camera management s<strong>of</strong>tware is included to enhance <strong>the</strong><br />
functionality <strong>of</strong> <strong>the</strong> DCS-2120. Manage and monitor up to sixteen compatible cameras<br />
simultaneously with this program. IP surveillance can be used to archive video straight<br />
to a hard drive or network-attached storage devices, playback video, and set up motion<br />
detection to trigger video/audio recording or send e-mail alerts. Alternatively, it is<br />
possible to access and control <strong>the</strong> DCS-2120 via <strong>the</strong> web using Internet Explorer. As<br />
you watch remote video obtained by <strong>the</strong> DCS-2120, it is possible to take snapshots<br />
directly from <strong>the</strong> web browser to a local hard drive, making it ideal for capturing any<br />
moment no matter where you are.<br />
This is <strong>the</strong> diagram <strong>of</strong> this system.
2.1.4.4 NIOO VISIO [58]<br />
Nioo Visio is one solution developed by Neion Graphics which enables remote<br />
visualization without any constraint.<br />
This application allows to connect to on one many cameras, perform zoom, and to<br />
remotely capture photography, from a PDA or sm<strong>art</strong> phone.<br />
It can allow for example to visualize what is passing at your home when you are not<br />
present.<br />
Conclusion:<br />
From <strong>the</strong> example <strong>of</strong> applications before, we conclude that video applications which<br />
exist now allow creating, generating and managing videos and pictures. These<br />
applications do not allow to manage or done any action depending on <strong>the</strong> content <strong>of</strong> <strong>the</strong><br />
media based directly on <strong>the</strong> media.<br />
2.1.5 Video Surveillance over IP with content analysis on server<br />
There exist o<strong>the</strong>r solutions based on <strong>the</strong> Video content analysis which generate an action<br />
depending on <strong>the</strong> action in o<strong>the</strong>r camera. We have for example a solution developed by<br />
Visio Wave.
2.1.5.1 Visio Wave [59]<br />
Visio Wave developed a solution on video content analysis based on <strong>the</strong> scheme below:<br />
Here <strong>the</strong>re are many cameras connected to a server that analyses <strong>the</strong> video. When<br />
<strong>the</strong>re is a problem, an alert is generated and sent to a PDA, PC or some o<strong>the</strong>r device<br />
and <strong>the</strong> end user who has <strong>the</strong> device can be connected directly to <strong>the</strong> remote camera<br />
and see what happen at this moment.
2.1.5.2 3rdeye - Video Surveillance on Your Mobile [60]<br />
3rdeye is a video surveillance on <strong>the</strong> mobile phone system developed by <strong>the</strong> Romanian<br />
company Cratima. With <strong>the</strong> help <strong>of</strong> a mobile phone and <strong>of</strong> 3rdeye system, you can view<br />
live images with any location watched by a video camera. This location can be your<br />
own home, <strong>of</strong>fice, vacation home, store or, even a parking space. The quality <strong>of</strong> <strong>the</strong><br />
images is high, thanks to <strong>the</strong> GPRS transmission mode.<br />
3rdeye’s Architecture<br />
3rdeye consists in two applications:<br />
• The video server (to which <strong>the</strong> monitoring video cameras are connected)<br />
• The client application, which runs on <strong>the</strong> user's mobile phone.<br />
The server application is also divided by two components: <strong>the</strong> Video Grabbing Server –<br />
that receives <strong>the</strong> images straight from <strong>the</strong> video cameras and that sends <strong>the</strong>se images<br />
to <strong>the</strong> Video Streaming Server. This last component is responsible with properly<br />
sending <strong>the</strong> received images onto <strong>the</strong> client application on <strong>the</strong> mobile phone.<br />
Between <strong>the</strong> two basic s<strong>of</strong>tware modules <strong>of</strong> <strong>the</strong> Video Surveillance Server takes place a<br />
two-way exchange <strong>of</strong> information.<br />
The Video Grabbing Server grabs video images from <strong>the</strong> video cameras and sends<br />
<strong>the</strong>m, in digital format, to <strong>the</strong> Video Streaming Server (in order to prepare <strong>the</strong> video<br />
streams for <strong>the</strong> clients), while <strong>the</strong> Video Streaming Server sends back to <strong>the</strong> Video<br />
Grabbing Server <strong>the</strong> commands and control info, received from <strong>the</strong> client application.<br />
The client application can be configured to connect both to Video Surveillance Servers<br />
that have fixed IP address, or dynamically allocated (e.g. Dial-up). When <strong>the</strong> server has<br />
a fixed IP address, <strong>the</strong> client application will connect straight to <strong>the</strong> Video Surveillance<br />
Server.
When <strong>the</strong> server's IP address is dynamically allocated (different IP address from one<br />
connexion to ano<strong>the</strong>r), <strong>the</strong> client application will first interrogate <strong>the</strong> Fixed IP Address<br />
Server, permanently connected to <strong>the</strong> Internet, in order to obtain <strong>the</strong> IP address <strong>of</strong> <strong>the</strong><br />
Video Surveillance Server to which it is about to connect.<br />
3rdeye’s Functionality<br />
3rdeye allows you to watch in real time, on your Java enabled mobile phone (not<br />
necessarily 3G), <strong>the</strong> images provided by <strong>the</strong> video cameras connected to <strong>the</strong> video<br />
server and controls <strong>the</strong> position <strong>of</strong> <strong>the</strong> video cameras (Pan/Tilt/Zoom).
The connection to <strong>the</strong> video server is made through <strong>the</strong> Internet, using a GPRS<br />
connection (not necessarily, as already <strong>state</strong>d, a 3G connection; nei<strong>the</strong>r a “sm<strong>art</strong><br />
phone”).The received image can be presented both full screen/normal view and has<br />
multiple display modes: full frame, 1:2, 1:1 (in this case, <strong>the</strong> application is designed to<br />
have a scroll and an auto detection feature).<br />
The moment a client application connects to <strong>the</strong> server, it <strong>the</strong>n right away sends <strong>the</strong><br />
server info regarding <strong>the</strong> maximum size <strong>of</strong> <strong>the</strong> phone's display, so <strong>the</strong> server<br />
automatically adjusts <strong>the</strong> video images (width x height). Using an advanced technology<br />
(developed by Cratima S<strong>of</strong>tware), based on motion detection and tracking proprietary<br />
algorithm, <strong>the</strong> Video Grabbing Server records all <strong>the</strong> events occurred along with <strong>the</strong><br />
adequate motion images.<br />
All <strong>the</strong> recorded events can be viewed from <strong>the</strong> client's 3rdeye mobile phone<br />
application.<br />
3rdeye’s Applicability<br />
3rdeye has multiple usages and, being developed from one end to ano<strong>the</strong>r by Cratima, can be<br />
customized for every client's needs:<br />
Managing employee conduct and duties from remote locations.<br />
Off site monitoring <strong>of</strong> homes, cottages, shops, <strong>of</strong>fices, factories, warehouses, cars, boats.<br />
Child care monitoring for development at home, nurseries, kinderg<strong>art</strong>ens and schools or<br />
observing <strong>the</strong> well-being <strong>of</strong> senior citizens and disables.<br />
Pet/wea<strong>the</strong>r watch; snow or traffic condition; construction site video surveillance.<br />
Conclusion:<br />
From our study, we can conclude that for <strong>the</strong> moment <strong>the</strong>re is no solution for video content<br />
analysis on <strong>the</strong> mobile phone. The solutions which exist now allow modifying, managing media.<br />
There exist solutions which mobile phone in <strong>the</strong>ir platform but Content analysis is done away.<br />
2.1.6 Interactive composition and scene mixing<br />
The scene is <strong>the</strong> composition <strong>of</strong> <strong>the</strong> audiovisuals elements that are shown to <strong>the</strong> user. Initially<br />
it’s generated in <strong>the</strong> corresponding server. The user, interacting with its device, will actuate on<br />
<strong>the</strong> scene elements updating it depending <strong>of</strong> his preferences. The scene composition, and<br />
<strong>the</strong>refore <strong>the</strong> way <strong>of</strong> interacting with it, can be done on different ways.<br />
Composition or mixing in <strong>the</strong> server<br />
The scene and all <strong>the</strong> components that compose it are mixed in one unique stream that is sent<br />
to <strong>the</strong> client.<br />
When <strong>the</strong> user interacts to <strong>the</strong> application to modify <strong>the</strong> scene, <strong>the</strong> server receives <strong>the</strong> proper<br />
orders to compose <strong>the</strong> scene again and send it in one only stream for every user.<br />
The bandwidth is proportional to <strong>the</strong> number <strong>of</strong> user, because every user is served with a video<br />
stream specially codified for him.
It’s needed a potent video server able to codify <strong>the</strong> video elements, compose <strong>the</strong>m and codify<br />
<strong>the</strong>m in real time. I <strong>the</strong>se terms, it must codify simultaneously as many flows as users.<br />
Actual technologies:<br />
Video edition tools. There are some tools able to make this complete process. However, <strong>the</strong>se<br />
tools are defined to make a video postproduction. Some <strong>of</strong> <strong>the</strong>m allow to generate video in real<br />
time for direct emissions. But all <strong>of</strong> <strong>the</strong>m have a graphic operator interface and lack <strong>of</strong><br />
programmatic interface (api) so <strong>the</strong>y are not valid to provide interactivity with <strong>the</strong> user.<br />
Decoding and mixing using a frame server. The frames mixing can be done with a frame server.<br />
The frame servers are oriented to video postproduction. But <strong>the</strong>re are some developments that<br />
allow a certain grade <strong>of</strong> personalization although <strong>the</strong> interactivity is limited.<br />
AviSynth is a frame server composed by some apis that can be used such as by <strong>the</strong> reproducer<br />
as <strong>the</strong> video server. In this case, it will have to be installed into <strong>the</strong> video server for VoD or into<br />
<strong>the</strong> multicast transmitter for TV channels.<br />
Composition or mixing in <strong>the</strong> client<br />
The videos that compose <strong>the</strong> scene are sent as independent streams to <strong>the</strong> user and <strong>the</strong><br />
client device mix <strong>the</strong> video flows.<br />
When <strong>the</strong> user interacts with <strong>the</strong> application/reproducer to modify <strong>the</strong> scene, <strong>the</strong> server<br />
just receives <strong>the</strong> flow control requests <strong>of</strong> <strong>the</strong> user streams.<br />
The bandwidth is proportional to <strong>the</strong> number <strong>of</strong> users, multiply by <strong>the</strong> streams number<br />
that every user is visualizing.<br />
This solution is valid for multicast environments. This is because is not codified a stream<br />
for every user.<br />
It’s necessary a video server with <strong>the</strong> capacity <strong>of</strong> having such video processes as users<br />
multiplied by <strong>the</strong> videos number that every user can reproduce simultaneously.<br />
Actual technologies:<br />
• Decoding and mixing using a frame server.<br />
AviSynth is a frame server that needn’t a graphic interface. It can be used with a<br />
reproducer using a script in <strong>the</strong> client.<br />
• VRML (Virtual Reality Modeling Language) or X3D.<br />
VRML made possible to visualize 3D scenes (with contents) in <strong>the</strong> web. However, <strong>the</strong><br />
remote access to big and complex scenes where <strong>the</strong> bandwidth is limited is a lack<br />
until <strong>the</strong> user can interact with <strong>the</strong> scene elements and manage it.<br />
BIFS, being a comprised binary format that is encapsulated and sent with streaming,<br />
reduce this lack so <strong>the</strong> user can interact with <strong>the</strong> scene elements that are available<br />
improving <strong>the</strong> user experience.<br />
• MPEG-4: BIFS and LASeR.<br />
BIFS is <strong>the</strong> scene description MPEG-4 protocol to compose MPEG-4 objects,<br />
describe <strong>the</strong> interaction between <strong>the</strong>n and animate <strong>the</strong>m. BIFS is a binary format for<br />
2D or 3D content.<br />
LASeR is <strong>the</strong> proposed protocol in MPEG-4 standard to provide similar capacities to<br />
BIFS for devices with fewer capacities such as PDA’s and mobiles.
Composition or mixing in <strong>the</strong> client and server<br />
This is a mixed approximation, trying to take <strong>the</strong> best <strong>of</strong> every alternative.<br />
This alternative consists on codify several independent element <strong>of</strong> <strong>the</strong> scene in one. This<br />
must be done with <strong>the</strong> elements that don’t require separated interaction with every one <strong>of</strong><br />
<strong>the</strong>m. This reduces <strong>the</strong> streams number per user (so that <strong>the</strong> bandwidth) not reducing <strong>the</strong><br />
interaction possibilities.<br />
This codification should be done just one time and must be stored for a later use. In this<br />
way, it will be guarantied that <strong>the</strong> server won’t need a high processing capacity and <strong>the</strong><br />
interactivity won’t be penalized by <strong>the</strong> delays.
2.2 UI <strong>of</strong> services for IP-enabled TV and Set-Top Boxes<br />
2.2.1 On-line services for IP-enabled TV and Set-Top Boxes<br />
Services which can be directly rendered on IP-enabled TV screens are based on<br />
HTML/XML technology. Service providers are able to define <strong>the</strong> layout and style <strong>of</strong> <strong>the</strong><br />
user interface <strong>of</strong> <strong>the</strong>ir services (infotainment, travel, shopping, etc.).<br />
• CE-HTML is a remote UI protocol with a core based on XHTML for such services<br />
which is developed by CEA and adopted by DLNA [ref CEA-2014]. Version 1.0 is<br />
available since June 2006. It allows existing Internet-content to be easily re-purposed<br />
for a variety <strong>of</strong> CE devices (see also device based UI adaptation). Content-based<br />
adaptations to <strong>the</strong> user interface can be communicated via this protocol.<br />
• T-navi is an information service based on IP (conforms to HTML 4.0) developed by<br />
Matsushita. The T-Navi services are only available in Japan since 2006. T-Navi<br />
enabled sets are available from Panasonic (Viera series) and Toshiba.<br />
• acTVila, a successor to T-Navi, that will be launched in Japan in February 2007 with<br />
a IP and HTML television service combining text-based information with video and<br />
plans on providing a streaming based video on demand service by <strong>the</strong> end <strong>of</strong> 2007.<br />
acTVila service providers will have <strong>the</strong> freedom to create <strong>the</strong>ir own UI style by<br />
changing colors and website lay-out. E.g. Video-on-demand services can have a<br />
different layout as information-based services. Also brand specific logos or designs<br />
can be applied to <strong>the</strong> UI.<br />
2.2.2 Flash-based content adaptation in Set-Top Boxes<br />
NDS and Bluestreak toge<strong>the</strong>r bring middleware for set-top boxes to <strong>the</strong> market using<br />
UPnP multimedia streaming and Macromedia Flash as user interface engine. Flash<br />
allows <strong>the</strong> set-top box makers to add dynamic elements to <strong>the</strong> user interface (e.g.<br />
animations) adapted to different media categories (type <strong>of</strong> content) being watched. Also<br />
user-oriented UI adaptation can be supported: customization based on user preferences<br />
as well by choosing from a list <strong>of</strong> predefined skins.
3 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> Compression Algorithms<br />
3.1 Motion JPEG-2000 and Wireless (P<strong>art</strong> 11) JPEG-2000<br />
3.1.1 Introduction<br />
The JPEG-2000 standardization effort [1] demonstrated that <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> coding<br />
performance can be obtained in still-image compression with a coding architecture<br />
that enables a rich set <strong>of</strong> features for <strong>the</strong> compressed bitstream. In p<strong>art</strong>icular, unlike<br />
<strong>the</strong> previous JPEG standard, JPEG-2000 provided a precise rate-control mechanism<br />
based on embedded coding <strong>of</strong> wavelet coefficients. Moreover, multiple qualities and<br />
multiple resolutions <strong>of</strong> <strong>the</strong> same picture are possible within JPEG-2000 based on<br />
selective decoding <strong>of</strong> portions <strong>of</strong> <strong>the</strong> compressed bitstream. Additionally, it should be<br />
emphasized that, for image and video transmission over error-prone channels, <strong>the</strong><br />
embedded nature <strong>of</strong> JPEG-2000 allows for a layered content protection against <strong>the</strong><br />
channel errors [2].<br />
In <strong>the</strong> area <strong>of</strong> motion-compensated video compression, similar functionalities have<br />
long been pursued, mainly via <strong>the</strong> use <strong>of</strong> extensions <strong>of</strong> <strong>the</strong> basic MPEG coding<br />
structure [3]. In terms <strong>of</strong> related systems with immediate industrial applicability, i.e.<br />
scalable video coding standards, this resulted in <strong>the</strong> fine-granularity scalable video<br />
coding extension <strong>of</strong> MPEG-4 video (MPEG-4 FGS) [4]. However, MPEG-4 FGS left<br />
much to be desired. In p<strong>art</strong>icular, <strong>the</strong> compression efficiency <strong>of</strong> FGS was not as good<br />
as <strong>the</strong> equivalent non-scalable (baseline) coder. In addition, <strong>the</strong> use <strong>of</strong> <strong>the</strong><br />
conventional closed-loop video coding structure <strong>of</strong> MPEG-alike coders hindered <strong>the</strong><br />
scalability functionalities.<br />
As a result, recent research efforts on scalable video coding were targeted on<br />
extension <strong>of</strong> open-loop coding systems, such as JPEG-2000, to video coding.<br />
Although an extension <strong>of</strong> <strong>the</strong> basic technology <strong>of</strong> JPEG-2000 to three dimensions is a<br />
feasible task by extending its transform and coding modules to three dimensions [5] ,<br />
this does not guarantee <strong>the</strong> highest possible coding efficiency since motioncompensation<br />
tools are not included. Moreover, <strong>the</strong> end-to-end delay <strong>of</strong> such a<br />
coding system is substantially increased in comparison to <strong>the</strong> corresponding frameby-frame<br />
compression. Although <strong>the</strong> delay problem manifests itself in motioncompensated<br />
video coding as well, in this case <strong>the</strong> compression efficiency is<br />
significantly increased by <strong>the</strong> use <strong>of</strong> motion-compensated prediction. This may<br />
override <strong>the</strong> high-delay detriment in applications for which achieving a low end-to-end<br />
delay is not a critical issue.<br />
In this section, we present an overview <strong>of</strong> <strong>the</strong> fundamental tools behind scalable<br />
image and video coding that are suitable for transmission environments with losses.<br />
Our presentation is divided in two p<strong>art</strong>s: Section 2 is dedicated to <strong>the</strong> description <strong>of</strong><br />
<strong>the</strong> features <strong>of</strong> Motion JPEG-2000 and <strong>the</strong> upcoming Wireless JPEG-2000 standard<br />
as <strong>the</strong>y represent <strong>the</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> in intra-frame video coding for ideal and lossytransmission<br />
frameworks, respectively. Inter-frame video coding architectures<br />
involving motion compensated prediction are treated in Section 3.
3.1.2 Scope and Features <strong>of</strong> Motion JPEG-2000<br />
Motion JPEG-2000 (or MJPEG-2000) is an extension <strong>of</strong> <strong>the</strong> baseline (p<strong>art</strong> 1) JPEG-<br />
2000 standard that supports video data. Intra-frame coding is supported based on <strong>the</strong><br />
Embedded Block Coding with Optimized Truncation (EBCOT) algorithm <strong>of</strong> JPEG-<br />
2000 (i.e. without motion compensated prediction). Lossy and lossless compression<br />
is provided with one codec and for every video frame, similarly to JPEG-2000,<br />
scalability in resolution and quality is available from a single compressed bitstream.<br />
The input sample depth can be up to 32 bits per color component, while <strong>the</strong> maximum<br />
frame width and height is up to 32<br />
2 - 1 pixels. The output bitrate for each frame can<br />
be controlled based on a constant-bitrate (CBR) scheme. Alternatively, variablebitrate<br />
(VBR) schemes can be used, which provide uniform quality across time with<br />
high efficiency. For <strong>the</strong> integration <strong>of</strong> <strong>the</strong> various bitstreams in one stream, an MPEG-<br />
4 based file-format is used, which appropriately tags <strong>the</strong> various bitstreams to ensure<br />
correct synchronization <strong>of</strong> audio and video. This format provides <strong>the</strong> capability for<br />
metadata embedding and moreover multi-component, multisampling formats are<br />
supported, e.g. YUV 4:2:2, RGB 4:4:4, etc.<br />
In general, although intra-frame algorithms do not provide <strong>the</strong> highest coding<br />
efficiency for video data, MJPEG-2000 intra-frame coding provides important<br />
functionality requirements that are difficult to satisfy with inter-frame video coding<br />
based on motion-compensated prediction. For example, intra-frame coding greatly<br />
facilitates video editing, individual frame access, fast browsing with enhanced<br />
forward/backward capabilities, etc. In addition, in terms <strong>of</strong> complexity requirements<br />
and overall delay, intra-frame algorithms are always preferred over inter-frame<br />
algorithms since <strong>the</strong>y have lower memory requirements (typically up to only one input<br />
frame), no motion estimation/motion compensation is performed at <strong>the</strong> encoder or<br />
decoder, and <strong>the</strong> maximum delay corresponds to delay incurred by <strong>the</strong> end-to-end<br />
processing <strong>of</strong> one input frame.<br />
3.1.3 Scope and Features <strong>of</strong> Wireless JPEG-2000<br />
Wireless JPEG-2000 (a.k.a. JPWL) [6] is an upcoming extension <strong>of</strong> <strong>the</strong> JPEG-2000<br />
standard. JPWL defines a set <strong>of</strong> tools and methods to achieve <strong>the</strong> efficient<br />
transmission <strong>of</strong> JPEG 2000 bitstreams over an error-prone wireless network. Wireless<br />
networks are characterized by <strong>the</strong> frequent occurrence <strong>of</strong> transmission errors along<br />
with a low bandwidth, henceforth putting strong constraints on <strong>the</strong> transmission <strong>of</strong><br />
digital images. Since JPEG-2000 provides high compression efficiency, it is a good<br />
candidate for wireless multimedia applications. Moreover, due to its high scalability,<br />
JPEG-2000 enables a wide range <strong>of</strong> quality <strong>of</strong> service (QoS) strategies for network<br />
operators. However, to be suitable for wireless multimedia applications, JPEG-2000<br />
has to be robust to transmission errors.<br />
The baseline JPEG-2000 standard defines error resilience tools to improve<br />
performances over noisy channels. However, <strong>the</strong>se tools only detect where errors<br />
occur, conceal <strong>the</strong> erroneous data, and resynchronize <strong>the</strong> decoder. More specifically,<br />
<strong>the</strong>y do not correct transmission errors.
Fur<strong>the</strong>rmore, <strong>the</strong>se tools do not apply to <strong>the</strong> image headers, which are <strong>the</strong> most<br />
important p<strong>art</strong>s <strong>of</strong> <strong>the</strong> codestream. For <strong>the</strong>se reasons, <strong>the</strong>y are not sufficient in <strong>the</strong><br />
context <strong>of</strong> wireless transmissions.<br />
JPWL system description.<br />
For <strong>the</strong> purpose <strong>of</strong> efficient transmission over wireless networks, JPWL defines o<strong>the</strong>r<br />
mechanisms for error protection and correction. These mechanisms extend <strong>the</strong><br />
elements in <strong>the</strong> core coding system described in baseline (P<strong>art</strong> 1) JPEG-2000. These<br />
extensions are backward compatible in <strong>the</strong> sense that decoders which implement<br />
P<strong>art</strong> 1 are able to decode <strong>the</strong> p<strong>art</strong> <strong>of</strong> <strong>the</strong> data that conforms to P<strong>art</strong> 1 while skipping<br />
<strong>the</strong> extensions defined by JPWL.<br />
The JPWL system is illustrated in <strong>the</strong> figure above [6]. Basically, JPWL provides a<br />
generic file-format for robust transmission <strong>of</strong> JPEG-2000 bitstreams over error-prone<br />
networks without being linked to a specific network, error-resilient coder or transport<br />
protocol. Additionally, <strong>the</strong> JPWL provides a generic format for <strong>the</strong> description <strong>of</strong> <strong>the</strong><br />
degree <strong>of</strong> sensitivity to transmission errors <strong>of</strong> <strong>the</strong> different p<strong>art</strong>s <strong>of</strong> <strong>the</strong> bitstream, and<br />
a generic format for <strong>the</strong> description <strong>of</strong> <strong>the</strong> locations <strong>of</strong> residual errors in <strong>the</strong><br />
codestream.<br />
Thus basically, <strong>the</strong> JPWL standard signals <strong>the</strong> use <strong>of</strong> informative tools in order to<br />
protect <strong>the</strong> codestream against transmission errors. These tools include techniques<br />
such as error resilient entropy coding, FEC codes, UEP and data<br />
p<strong>art</strong>itioning/interleaving. It is important to point out that <strong>the</strong>se informative tools are not<br />
defined in <strong>the</strong> standard. Instead, <strong>the</strong>y are registered with <strong>the</strong> JPWL registration<br />
authority. Upon registration, each tool is assigned an ID, which uniquely identifies it.<br />
When encountering a JPWL codestream, <strong>the</strong> decoder can identify <strong>the</strong> tool(s) which<br />
have been used to protect this codestream by parsing <strong>the</strong> standardized JPWL<br />
markers and by querying <strong>the</strong> registration authority. The decoder can <strong>the</strong>n take <strong>the</strong><br />
appropriate steps to decode <strong>the</strong> codestream, e.g. acquire or download <strong>the</strong><br />
appropriate error-resilience tool.
3.1.4 Video Coding with Motion Compensated Prediction<br />
In this section, we review <strong>the</strong> conventional closed-loop video coding structure as well<br />
as <strong>the</strong> recently-introduced open-loop video coding schemes that perform a temporal<br />
decomposition using motion compensated temporal filtering. Both have been used in<br />
related literature [3] [7] to provide working video coding systems with scalability<br />
properties.<br />
All <strong>the</strong> currently-standardized video coding schemes are based on a structure in<br />
which <strong>the</strong> two-dimensional spatial transform and quantization is applied to <strong>the</strong> error<br />
frame coming from closed-loop temporal prediction. A simple structure describing<br />
such architectures is shown in <strong>the</strong> “Hybrid video compression scheme” figure (a) (see<br />
fur<strong>the</strong>r on). The operation <strong>of</strong> temporal prediction P typically involves block-based<br />
motion-compensated prediction. The decoder receives <strong>the</strong> motion vector information<br />
and <strong>the</strong> compressed error-frame C t and performs <strong>the</strong> identical loop using this<br />
information in order to replicate MCP within <strong>the</strong> P operator. Hence, in <strong>the</strong> decoding<br />
process (seen in <strong>the</strong> dashed area in <strong>the</strong> “The hybrid video compression scheme”<br />
figure (a), <strong>the</strong> reconstructed frame at time instant t can be written as:<br />
° ° - 1 - 1 ° - 1 - 1<br />
At = PAt-1 + TS QS Ct, A0 = TS Q S C0.<br />
(0.1)<br />
The recursive operation given by (0.1) creates <strong>the</strong> well-known drift effect between <strong>the</strong><br />
encoder and decoder if different information is used between <strong>the</strong> two sides, i.e. if<br />
Ct ¹ QST SHt at any time instant t in <strong>the</strong> decoder. This is not uncommon in practical<br />
systems, since transmission errors or loss <strong>of</strong> compressed data due to limited channel<br />
capacity can be a dominant scenario in wireless or IP-based networks, where a<br />
number <strong>of</strong> clients compete for <strong>the</strong> available network resources. In general, <strong>the</strong><br />
capability to seamlessly adapt <strong>the</strong> compression bitrate without transcoding, i.e. SNR<br />
scalability, is a very useful feature for such network environments. Solutions for SNR<br />
scalability based on <strong>the</strong> coding structure <strong>of</strong> <strong>the</strong> “Hybrid video compression scheme”<br />
figure basically try to remove <strong>the</strong> prediction drift by <strong>art</strong>ificially reducing at <strong>the</strong> encoder<br />
side <strong>the</strong> bitrate <strong>of</strong> <strong>the</strong> compressed information C t to a base layer for which <strong>the</strong><br />
network can guarantee <strong>the</strong> correct transmission [3]. An example <strong>of</strong> such a codec is<br />
<strong>the</strong> MPEG-4 FGS [4].<br />
This however reduces <strong>the</strong> prediction efficiency [3], <strong>the</strong>reby leading to degraded<br />
coding efficiency for SNR scalability. To overcome this drawback, techniques that<br />
include a certain amount <strong>of</strong> enhancement layer information into <strong>the</strong> prediction loop<br />
have been proposed. For example, leaky prediction [8] gracefully decays <strong>the</strong><br />
enhancement information introduced in <strong>the</strong> prediction loop in order to limit <strong>the</strong> error<br />
propagation and accumulation. Scalable coding schemes employing this technique<br />
achieve notable coding gains over <strong>the</strong> standard MPEG-4 FGS [4] and a good trade<strong>of</strong>f<br />
between low drift errors and high coding efficiency [8] [9]. Progressive Fine<br />
Granularity Scalable (PFGS) coding [10] yields also significant improvements over<br />
MPEG-4 FGS by introducing two prediction loops with different quality references. A<br />
generic PFGS coding framework employing multiple prediction loops with different<br />
quality references and careful drift control lead to considerable coding gains over<br />
MPEG-4 FGS, as reported in [11] [12].
To address <strong>the</strong> issues <strong>of</strong> efficient video transmission, several proposals suggested an<br />
open-loop system, depicted in <strong>the</strong> “Motion-compensated temporal filtering“ figure (b)<br />
(see fur<strong>the</strong>r on), which incorporates recursive temporal filtering. This can be<br />
perceived as a temporal wavelet transform with motion compensation [13], i.e.<br />
motion-compensated temporal filtering. This scheme begins with a separation <strong>of</strong> <strong>the</strong><br />
input into even and odd temporal frames (temporal split). Then <strong>the</strong> temporal predictor<br />
performs MCP to match <strong>the</strong> information <strong>of</strong> frame A 2t+ 1 with <strong>the</strong> information present in<br />
frame A 2t . Subsequently, <strong>the</strong> MCU operator U inverts <strong>the</strong> information <strong>of</strong> <strong>the</strong><br />
prediction error back to frame A 2t , <strong>the</strong>reby producing, for each pair <strong>of</strong> input frames,<br />
an error frame H t and an updated frame L t . The MCU operator performs ei<strong>the</strong>r<br />
motion compensation using <strong>the</strong> inverse vector set produced by <strong>the</strong> predictor [14], or<br />
generates a new vector set by backward motion estimation [15]. The process iterates<br />
on <strong>the</strong> L t frames, which are now at half temporal-sampling rate (following <strong>the</strong><br />
multilevel operation <strong>of</strong> <strong>the</strong> conventional lifting), <strong>the</strong>reby forming a hierarchy <strong>of</strong><br />
temporal levels for <strong>the</strong> input video. The decoder performs <strong>the</strong> mirror operation: <strong>the</strong><br />
scheme in <strong>the</strong> “Motion-compensated temporal filtering“ figure (b) operates from right<br />
to left, <strong>the</strong> signs <strong>of</strong> <strong>the</strong> P , U operators are inverted and a temporal merging occurs<br />
at <strong>the</strong> end to join <strong>the</strong> reconstructed frames. As a result, having performed <strong>the</strong><br />
reconstruction <strong>of</strong> <strong>the</strong> L t , denoted by L ° t , at <strong>the</strong> decoder we have:<br />
° ° - 1 - 1 ° ° - 1 - 1<br />
A 2t = Lt - UT Q C , A2t+ 1 = P A2t + T Q C<br />
(0.2)<br />
S S t S S t<br />
where A° 2t, A ° 2t+ 1 denote <strong>the</strong> reconstructed frames at time instants 2t , 2t + 1.<br />
As<br />
seen from (0.2), even if Ct ¹ QST SHt in <strong>the</strong> decoder, <strong>the</strong> error affects locally <strong>the</strong><br />
reconstructed frames A° 2t, A ° 2t+ 1 and does not propagate linearly in time over <strong>the</strong><br />
reconstructed video. Error-propagation may occur only across <strong>the</strong> temporal levels<br />
through <strong>the</strong> reconstructed L ° t frames. However, after <strong>the</strong> generation <strong>of</strong> <strong>the</strong> temporal<br />
decomposition, embedded coding may be applied in each group-<strong>of</strong>-frames GOP by<br />
prioritizing <strong>the</strong> information <strong>of</strong> <strong>the</strong> higher temporal levels based on a dyadic-scaling<br />
framework, i.e. following <strong>the</strong> same principle <strong>of</strong> prioritization <strong>of</strong> information used in<br />
wavelet-based SNR-scalable image coding [6]. Hence, <strong>the</strong> effect <strong>of</strong> error propagation<br />
in <strong>the</strong> temporal pyramid is limited and seamless video-quality adaptation can be<br />
obtained in SNR scalability [7] [16]. In fact, experimental results obtained with <strong>the</strong><br />
SNR-scalable MCTF video coders, as well as <strong>the</strong> results obtained with o<strong>the</strong>r <strong>state</strong>-<strong>of</strong><strong>the</strong>-<strong>art</strong><br />
algorithms [17] [18], suggest that this coding architecture can be comparable<br />
in rate-distortion sense to an equivalent non-scalable coder that uses <strong>the</strong> closed-loop<br />
structure. However, one significant disadvantage <strong>of</strong> this type <strong>of</strong> techniques for realtime<br />
communications concerns <strong>the</strong> end-to-end codec delay. In p<strong>art</strong>icular, following<br />
<strong>the</strong> analysis <strong>of</strong> [19], it can be shown that for a GOP <strong>of</strong> N (where N is typically 16 or<br />
32 for a frame-rate <strong>of</strong> 30 or 60 frames-per-second, respectively), <strong>the</strong> required end-to-<br />
N end delay in terms <strong>of</strong> number <strong>of</strong> decoded frames can be as high as 2 + 1 frames.
A +<br />
+<br />
A +<br />
+<br />
t<br />
-<br />
P<br />
TS S Q TS S Q<br />
Frame<br />
Delay<br />
+<br />
+ +<br />
A<br />
Q<br />
T<br />
t<br />
(a) The hybrid video compression scheme.<br />
A , A +<br />
2 t 2 t 1<br />
A 2 t + 1<br />
Temporal<br />
+<br />
-<br />
+<br />
− 1<br />
S<br />
− 1<br />
S<br />
Split P U<br />
A<br />
2t<br />
(b) Motion-compensated temporal filtering.<br />
Notations:<br />
+<br />
+<br />
A H<br />
0 , 0 t , t<br />
C<br />
t<br />
T S Q S<br />
H<br />
C<br />
+ t L<br />
+ t L<br />
At consists <strong>the</strong> input video frame at time instant t = 0, t, 2 t, 2t + 1<br />
A ° t is <strong>the</strong> reconstructed frame<br />
H t is <strong>the</strong> error frame, whereas L t is <strong>the</strong> updated frame<br />
C t denotes <strong>the</strong> transformed and quantized error frame obtained by using <strong>the</strong> spatial operators T S and Q S ,<br />
respectively<br />
P denotes temporal prediction<br />
U denotes <strong>the</strong> temporal update.<br />
Our description on motion-compensated <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> video coders is concluded<br />
with <strong>the</strong> presentation <strong>of</strong> two indicative coding systems that represent <strong>the</strong> current<br />
<strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> in <strong>the</strong> closed-loop and open-loop temporal prediction structures,<br />
namely <strong>the</strong> Advanced Video Coder (AVC), also called as <strong>the</strong> H.264 coder, which was<br />
jointly standardized by MPEG and ITU-T [20], and <strong>the</strong> motion-compensated<br />
embedded zero-block coder (MC-EZBC) <strong>of</strong> [17]. While <strong>the</strong> AVC is a non-scalable<br />
coding scheme, optimized for a certain set <strong>of</strong> quantization parameters, <strong>the</strong> MC-EZBC<br />
has <strong>the</strong> capability <strong>of</strong> simultaneous scalability in bitrate, resolution and SNR.<br />
t<br />
t
3.2 Codification technologies<br />
3.2.1 Introduction<br />
Video codification is a necessary element for compressing <strong>the</strong> video size making <strong>the</strong><br />
most possible <strong>of</strong> <strong>the</strong> capabilities <strong>of</strong> net and storage.<br />
MPEG (Moving Picture Expert Group) is an ISO/IEC work group in charge <strong>of</strong><br />
development standard for audio and video codification. The first standard, MPEG-1,<br />
was <strong>the</strong> basis for codification formats such as Video CD and MP3. After, <strong>the</strong> definition<br />
<strong>of</strong> standard MPEG-2 was <strong>the</strong> basis <strong>of</strong> products like DVD and digital TV set-top boxes.<br />
The last standard that has been defined is MPEG-4. MPEG-4 is a multimedia<br />
standard for wire and wireless nets. MPEG-4 has been defined for representing<br />
audio-visual real and virtual objects. Moreover, MPEG-7 has been created to describe<br />
and locate audio-visual contents, and MPEG-21: <strong>the</strong> multimedia framework.<br />
3.2.2 MPEG-1 and MPEG-2<br />
MPEG-1<br />
Used for video streaming. It has a bit rate <strong>of</strong> 1,5 Mbit/s approximately. Oriented to<br />
digital storage specially for CD-ROM.<br />
MPEG-2<br />
This is an advanced video compression technique to generate better bit rates and<br />
better compression. It let codification <strong>of</strong> gradual and entwined video sequences until<br />
HDTV level.<br />
The most important audio codec defined in <strong>the</strong> MPEG-2(P<strong>art</strong> 7) standard is AAC<br />
(Advanced Audio Code). AAC defines a format for multi-channel audio codification.<br />
This format wins similar qualities than o<strong>the</strong>rs codec with a bit rate better.<br />
Video MPEG-2 is a video compression standard with bit rates between 4 and 10<br />
Mbit/s. It defines 5 pr<strong>of</strong>iles, referred to complexity <strong>of</strong> compression algorithm, and 4<br />
levels referred to <strong>the</strong> resolution <strong>of</strong> original video. The main level and <strong>the</strong> main pr<strong>of</strong>ile<br />
(MP/ML) is <strong>the</strong> most used combination.<br />
Systems MPEG-2 define two multiplexation systems: ”Program Stream” compatible<br />
with MPEG-1 and “Transport Stream” that lets to send multiples streams with<br />
independent origins.<br />
MPEG-2 is <strong>the</strong> most successful standard for multimedia representation in market. The<br />
digital entertainment uses mainly MPEG-2.<br />
The most conceptual innovation in MPEG-2 is <strong>the</strong> video scalable codification.
MPEG-4 defines new functionality and new capacities, probably, <strong>the</strong> future standard<br />
for multimedia applications.<br />
MPEG-4 adds an important conceptual advance in <strong>the</strong> representation <strong>of</strong> multimedia<br />
contents: <strong>the</strong> model <strong>of</strong> representation based on objects. This new model considers<br />
that <strong>the</strong> audio-visual contents describe a world composed by elements called objects.<br />
The audio-visual scene is <strong>the</strong> composition <strong>of</strong> independent objects, each one with its<br />
own codification, characteristics and behaviours. If <strong>the</strong> elements are encoded<br />
individually, <strong>the</strong>y will be accessed in a individual way. This architecture provides a<br />
complete range <strong>of</strong> interactive possibilities.<br />
MPEG-4 has <strong>the</strong> characteristics <strong>of</strong> MPEG-1 and MPEG-2 with a better video<br />
codification and adding new characteristics like advance 3D graphical support<br />
(textures, animations, etc.) for 3d scenes, files oriented to objects(audio, video, 3D<br />
objects, streaming text), support for DRM (Digital Rights Management).<br />
3.2.3 MPEG4<br />
3.2.3.1 MPEG-4 architecture<br />
The following p<strong>art</strong>s compose <strong>the</strong> MPEG-4 architecture:<br />
• MPEG-4 Systems: Specifies <strong>the</strong> global architecture <strong>of</strong> <strong>the</strong> standard and defines<br />
how integer Visual MPEG-4 and Audio MPEG-4. MPEG-4 Systems introduce <strong>the</strong><br />
concept <strong>of</strong> BIFS (Binary Format for scenes). BIFS defines <strong>the</strong> interaction between<br />
objects.<br />
• DMIF: Delivery Multimedia Integration Framework. This p<strong>art</strong> defines <strong>the</strong><br />
advanced content streaming or “Rich Media”.<br />
• Visual MPEG-4: This p<strong>art</strong> defines <strong>the</strong> nature and syn<strong>the</strong>tic video content<br />
representation.<br />
• Audio MPEG-4: This p<strong>art</strong> defines <strong>the</strong> nature and syn<strong>the</strong>tic audio content<br />
representation.<br />
3.2.3.2 CODECS (MPEG-4 Visual y MPEG-4 Audio)<br />
A codec (COder – DECoder) is <strong>the</strong> algorithm that defines how to encode and<br />
decode <strong>the</strong> video and audio content to reduce its size or its necessary band width<br />
for transmission, with <strong>the</strong> minimum lose <strong>of</strong> quality as possible.<br />
The audio codec, MPEG-4 AAC(advanced Audio Codec), is an extension <strong>of</strong> MPEG-<br />
2 AAC (MPEG-2 P<strong>art</strong> 7).<br />
The main video codecs are:<br />
• The codecs included in <strong>the</strong> standard p<strong>art</strong> 2, specially <strong>the</strong> ones bind to <strong>the</strong> simple<br />
pr<strong>of</strong>iles (SP) and advanced simple (ASP).<br />
• 264/AVC (Advanced Video Coding)/MPEG-4 P<strong>art</strong> 10. MPEG-4 AVC allows an<br />
efficient video compression much better than <strong>the</strong> o<strong>the</strong>rs, providing more flexibility<br />
for applications.
MPEG-4 Scalable Video Coding (SVC) is a future extension <strong>of</strong> <strong>the</strong> standard MPEG-<br />
4 AVC. SVC uses <strong>the</strong> same video streaming (an unique content codification) for<br />
different devices in different nets. SVC provides scalability in three aspects:<br />
• Space scalability: suitable resolution.<br />
• Time scalability: Selecting frame rate.<br />
• Quality scalability: selecting bit rate.<br />
MPEG-4 SVC generates a compatible layer with MPEG-4 AVC, and one or more<br />
additional layers. The base layer contains <strong>the</strong> minimum quality, frame rate and<br />
resolution, and <strong>the</strong> following layers increase <strong>the</strong> quality and/or resolution and/or<br />
frame rate.<br />
3.2.3.3 MPEG-4 Systems (BIFS)<br />
Exploring o<strong>the</strong>r possibilities for advanced devices, it has been demonstrated that<br />
ISO/IEC 14496-11 “Scene description and application engine” (also known as BIFS)<br />
is ano<strong>the</strong>r good alternative, but it’s necessary an interoperability to create <strong>the</strong>se two<br />
formats, at <strong>the</strong> same time, to produce ISO/IEC 14496-20 and ISO/IEC 14496-11<br />
simultaneously.<br />
ISO/IEC 14496-11 specifies <strong>the</strong> coded representation <strong>of</strong> interactive audio-visual<br />
scenes and applications.<br />
It specifies <strong>the</strong> following tools:<br />
The coded representation <strong>of</strong> <strong>the</strong> space-temporal positioning <strong>of</strong> audio-visual objects<br />
as well as <strong>the</strong>ir behaviour in response to interaction (scene description)<br />
The coded representation <strong>of</strong> syn<strong>the</strong>tic two-dimensional (2D) or three-dimensional<br />
(3D) objects that can be manifested audibly and/or visually<br />
The Extensible MPEG-4 Textual (XMT) format, a textual representation <strong>of</strong> <strong>the</strong><br />
multimedia content described in ISO/IEC 14496 using <strong>the</strong> Extensible Markup<br />
Language (XML) and a system level description <strong>of</strong> an application engine (format,<br />
delivery, lifecycle, and behaviour <strong>of</strong> downloadable Java byte code applications).<br />
3.2.3.4 MPEG-4 P<strong>art</strong> 20 (LASeR and SAF [44])<br />
Because <strong>of</strong> resource limitations <strong>of</strong> mobiles, sm<strong>art</strong>phones, PDA’s, SetTopBoxes and<br />
older desktop or portable PCs, we need to optimize requirements to accommodate<br />
all devices into one compatible format that permits this interoperability across<br />
different cases. For all this, we are exploring all emerging audio, video and streams<br />
formats to find <strong>the</strong> best choice.
It seems that <strong>the</strong> actual best choice would be ISO/IEC 14496 (also known as<br />
MPEG-4) and <strong>the</strong>ir primary p<strong>art</strong>s: ISO/IEC 14496-1 “Systems” [45], ISO/IEC 14496-<br />
2 “Visual” [46], ISO/IEC 14496-3 “Audio” [47], ISO/IEC 14496-10 “Advanced Video<br />
Coding” [48] and ISO/IEC 14496-20 “Lightweight Application Scene Representation<br />
(LASeR) and Simple Aggregation Format (SAF)” [49]. Optionally, we analyse<br />
ISO/IEC 14496-11 “Scene description and application engine” (also known as BIFS)<br />
[50] but actually it can’t be adapted to <strong>the</strong> less power resources <strong>of</strong> limited devices<br />
like mobiles.<br />
The fundamental p<strong>art</strong> <strong>of</strong> this optimum formats we find are:<br />
• ISO/IEC 14496-20 which defines a scene description format (LASeR) and an<br />
aggregation format (SAF) suitable for representing and delivering rich-media<br />
services to resource-constrained devices such as mobile phones. A rich media<br />
service is a dynamic, interactive collection <strong>of</strong> multimedia data such as audio,<br />
video, graphics, and text. Services range from movies enriched with vector<br />
graphic overlays and interactivity (possibly enhanced with closed captions) to<br />
complex multi-step services with fluid interaction and different media types at<br />
each step.<br />
• LASeR aims at fulfilling all <strong>the</strong> requirements <strong>of</strong> rich-media services at <strong>the</strong> scene<br />
description level. LASeR supports:<br />
o An optimized set <strong>of</strong> objects inherited from SVG to describe rich-media<br />
scenes.<br />
o A small set <strong>of</strong> key compatible extensions over SVG.<br />
o The ability to encode and transmit a LASeR stream and <strong>the</strong>n reconstruct<br />
SVG content.<br />
o Dynamic updating <strong>of</strong> <strong>the</strong> scene to achieve a reactive, smooth and<br />
continuous service.<br />
o Simple yet efficient compression to improve delivery and parsing times, as<br />
well as storage size, one <strong>of</strong> <strong>the</strong> design goals being to allow both for a direct<br />
implementation <strong>of</strong> <strong>the</strong> SDL as documented, as well as for a decoder<br />
compliant with ISO/IEC 23001-1 “Binary MPEG format for XML” to decode<br />
<strong>the</strong> LASeR bitstream.<br />
o An efficient interface with audio and visual streams with frame-accurate<br />
synchronization.<br />
o Use <strong>of</strong> any font format, including <strong>the</strong> OpenType industry standard and.<br />
o Easy conversion from o<strong>the</strong>r popular rich-media formats in order to leverage<br />
existing content and developer communities.<br />
Information taken from http://www.mpeg-laser.com<br />
Introduction<br />
LASeR is a scene description format, where a scene is a spatial, temporal and<br />
behavioral composition <strong>of</strong> audio media, visual media, graphics elements and text.<br />
LASeR is binary or compressed like BIFS or Flash, as opposed to textual scene<br />
descriptions such as XMT or VRML or SVG. LASeR stands for Lightweight<br />
Application Scene Representation.
SAF is a streaming-ready format for packaging scenes and media toge<strong>the</strong>r and<br />
streaming <strong>the</strong>m onto such protocols as HTTP/TCP. SAF services include:<br />
A simple multiplex for elementary streams (media, fonts or scenes).<br />
Synchronization and packaging signaling.<br />
SAF stands for Simple Aggregation Format. LASeR and SAF have been designed<br />
for use in mobile, interactive applications.<br />
Why LASeR?<br />
The decision to create yet ano<strong>the</strong>r standard for scene description was taken after a<br />
thorough survey <strong>of</strong> available open or de-facto standards: BIFS, Flash and SVGT.<br />
Pr<strong>of</strong>iling <strong>of</strong> BIFS was tried to create a small enough subset to be used on mobile<br />
phones, to no avail. Flash is proprietary and is too big for most mobiles. SVGT1.1 is<br />
getting some traction, but on one hand SVGT1.1 does not have AV interfaces or<br />
dynamicity, and on <strong>the</strong> o<strong>the</strong>r hand its successor, SVGT1.2, is still in flux, and while<br />
it will feature AV interfaces, it will still miss dynamicity, compression, streaming and<br />
is significantly heavier than SVGT1.1. Also, SVGT in general relies on a host <strong>of</strong><br />
o<strong>the</strong>r standards such as DOM, SMIL, ECMA-Script, XHTML and CSS, MIME<br />
multip<strong>art</strong>… and to manage such a pile <strong>of</strong> standards is a true challenge in terms <strong>of</strong><br />
interoperability.<br />
Why SAF?<br />
The decision to create yet ano<strong>the</strong>r standard for distribution <strong>of</strong> mobile content was<br />
taken after implementing and trying interactive services on small devices, based on<br />
RTP/RTSP or MP4/3GP download (progressive or not) on TCP/HTTP. In most<br />
cases, <strong>the</strong> need for a simpler, lighter solution was obvious. In order to package<br />
efficiently and download progressively or stream a scene with a few media, RTP is<br />
overkill, and MP4/3GP is not well suited to <strong>the</strong> job: MP4/3GP format is a file format,<br />
and it can only be used for progressive download by using special cases (moov<br />
atom in front <strong>of</strong> <strong>the</strong> file, media interleaved in time order). In addition, MP4/3GP has<br />
a host <strong>of</strong> features that burden a mobile implementation for no reason. In order to<br />
reduce <strong>the</strong> design time <strong>of</strong> SAF and get almost an immediate validation, SAF was<br />
designed around a simple configuration <strong>of</strong> a proven technology: <strong>the</strong> MPEG-4<br />
Systems Sync Layer. This enables as a bonus <strong>the</strong> availability <strong>of</strong> an RTP payload<br />
format for SAF for free with RFC3640.<br />
So as a summary, SAF has <strong>the</strong> minimal/optimal set <strong>of</strong> features for <strong>the</strong> job, and can<br />
be mapped easily to o<strong>the</strong>r transport mechanisms (RTP, MP4/3GP, MPEG-2 TS…).<br />
Requirements <strong>of</strong> LASeR<br />
The requirements which structure <strong>the</strong> design <strong>of</strong> LASeR are:<br />
1 Support efficient and compact representation <strong>of</strong> scene data supporting at least<br />
<strong>the</strong> subset <strong>of</strong> SVG T 1.1 object set functionality. (Today LASeR is aligned as<br />
much as possible with SVGT1.2).
2 Allow an easy conversion from o<strong>the</strong>r graphics formats (e.g. BIFS, SMIL/SVG,<br />
PDF, Flash, …).<br />
3 Provide efficient coding, to be suitable for <strong>the</strong> mobile environment.<br />
4 Allow separate streams for 2D and 3D content.<br />
5 Allow <strong>the</strong> representation <strong>of</strong> scalable scenes.<br />
6 Allow <strong>the</strong> representation <strong>of</strong> adaptable scenes, for use within <strong>the</strong> MPEG-21 DIA<br />
framework.<br />
7 Be extensible in an efficient manner.<br />
8 Allow small pr<strong>of</strong>iles definition.<br />
9 Allow <strong>the</strong> representation <strong>of</strong> error-resilient scenes.<br />
10 Allow encoding modes easily reconfigurable and signaled in band.<br />
11 Provide an optimal balance between compression efficiency and complexity<br />
and memory footprint <strong>of</strong> decoder and compositor code.<br />
12 Allow integer-only implementation <strong>of</strong> decoding and rendering.<br />
13 Allow to save/restore several scene <strong>state</strong>s. The saving and restoring shall be<br />
triggerable ei<strong>the</strong>r by <strong>the</strong> server or by <strong>the</strong> user.<br />
14 Allow low-complexity pr<strong>of</strong>iles implementable on Java MIDP platform.<br />
15 Allow <strong>the</strong> representation <strong>of</strong> differential scenes, i.e. scenes meant to build on top<br />
<strong>of</strong> ano<strong>the</strong>r scene.<br />
16 Allow interaction through available input devices, such as mobile keyboard or<br />
pen, and support <strong>the</strong> input <strong>of</strong> strings.<br />
17 Allow safe implementation <strong>of</strong> scene decoder.<br />
In addition, it is deemed crucial that LASeR is designed in such a way that<br />
implementations can:<br />
• Be as small as possible.<br />
• Be as fast as possible.<br />
• Require as small as possible runtime memory.<br />
• Be implementable at least p<strong>art</strong>ially in hardware.<br />
Requirements for Simple Aggregation Format (SAF)<br />
The requirements which structure <strong>the</strong> design <strong>of</strong> SAF are:<br />
1 Provide a simple aggregation mechanism for Access Units for various media in<br />
aggregated packets (Video, Audio, Graphics, Images, Text/Font…).<br />
2 Allow a synchronized presentation <strong>of</strong> <strong>the</strong> various media elements in a packet or<br />
a sequence <strong>of</strong> such aggregated packets.<br />
3 Be as bit efficient as possible.<br />
4 Be byte aligned.<br />
5 Be easily transported on popular interactive transport protocol (e.g. HTTP).<br />
6 Be easily mapped on popular streaming protocol (e.g. MPEG-4 RTP payload<br />
format RFC 3640).<br />
7 Be extensible in an efficient manner.<br />
8 Allow <strong>the</strong> management <strong>of</strong> pre-loaded objects that enables <strong>the</strong> server to<br />
anticipate <strong>the</strong> downloading <strong>of</strong> <strong>the</strong> corresponding objects to improve user<br />
experience.
What is LASeR?<br />
LASeR is:<br />
• A SVGT scene tree, with an SVG rendering model.<br />
• An updating protocol, allowing actions on <strong>the</strong> scene tree such as inserting an<br />
object, deleting an object, replacing an object or changing a property: this is <strong>the</strong><br />
key to <strong>the</strong> design <strong>of</strong> dynamic services and a fluid user experience. This<br />
updating protocol can also be seen as a kind <strong>of</strong> micro-scripting language.<br />
• OpenType text and fonts, including downloadable/streamable fonts.<br />
• A binary encoding which, coupled with <strong>the</strong> updating protocol, allows <strong>the</strong><br />
incremental loading/streaming <strong>of</strong> scenes, with excellent bandwidth usage.<br />
• Few LASeR extensions to improve <strong>the</strong> support <strong>of</strong> input devices, or <strong>the</strong> flexibility<br />
<strong>of</strong> event processing without a full scripting language, or simple axis-aligned<br />
rectangular clipping.<br />
Because <strong>of</strong> <strong>the</strong> above, LASeR may also have:<br />
• A micro-DOM or JSR226 interface, since <strong>the</strong> scene tree is almost purely SVG,<br />
thus allowing <strong>the</strong> design <strong>of</strong> complete applications on top <strong>of</strong> <strong>the</strong> LASeR engine.<br />
• The micro-DOM interface also makes it possible to use ECMA-Script with<br />
LASeR scenes.<br />
• Because <strong>of</strong> <strong>the</strong> updating protocol which is similar to that <strong>of</strong> Flash, it is easy to<br />
convert Flash content to LASeR.<br />
What is SAF?<br />
SAF is:<br />
• A fixed configuration <strong>of</strong> <strong>the</strong> MPEG-4 Systems Sync Layer, providing an easy yet<br />
powerful way <strong>of</strong> packaging elementary streams.<br />
• A simplified stream description mechanism.<br />
• A simple multiplex for several media, fonts and scene streams.<br />
SAF streams may be:<br />
• Packaged in RTP/RSTP using <strong>the</strong> payload format defined in RFC3640.<br />
• Packaged in MP4/3GP files using a mapping defined with SAF.<br />
• Packaged in MPEG-2 Transport Stream using <strong>the</strong> SL mapping defined in<br />
ISO/IEC. 14496-8.<br />
Although it seems that this format has a patent fee, we think this was not a problem<br />
since it also seems to be our best solution. Actually we are waiting for <strong>the</strong> release<br />
<strong>of</strong> <strong>the</strong> final reference s<strong>of</strong>tware to check its viability and stability.
3.3 Additional formats for most power devices (future)<br />
3.3.1 VC1 [21]<br />
O<strong>the</strong>r aspects to introduce, in near future, are best resolutions to cover high definition<br />
contents without resource penalties. For all this, we find that video codec SMPTE<br />
421M “VC-1 Compressed Video Bitstream Format and Decoding Process“ (known as<br />
VC-1) is a great choice to cover also <strong>the</strong>se big resolutions.<br />
VC-1 minimizes <strong>the</strong> complexity <strong>of</strong> decoding high definition (HD) content through<br />
improved intermediate stage processing and more robust transforms. As a result, VC-<br />
1 decodes HD video twice as fast as H.264, while <strong>of</strong>fering two to three times better<br />
compression than MPEG-2.<br />
Since VC-1 is optimized for decoding performance, it ensures a superior playback<br />
experience across <strong>the</strong> widest possible array <strong>of</strong> systems regardless <strong>of</strong> bit rate or<br />
resolution. These systems range from <strong>the</strong> PC (where VC-1 playback at 1080p is<br />
possible), to set-top-boxes, gaming systems, and even wireless handsets.<br />
VC-1 <strong>of</strong>fers superior quality across a wide variety <strong>of</strong> content types and bit rates, which<br />
has been well documented by independent sources:<br />
• DV Magazine found VC-1 to be superior to both MPEG-2 and MPEG-4.<br />
• TANDBERG Television found VC-1 produces significantly better quality than MPEG-2<br />
and comparable quality to H.264. These results were presented at <strong>the</strong> 2003<br />
International Broadcasting Convention (IBC).<br />
• C'T Magazine, Germany's premier audio-video magazine, compared various codec<br />
standards—including VC-1, H.264, and MPEG-4—and selected VC-1 as producing<br />
<strong>the</strong> best subjective and objective quality for HD video.<br />
• The European Broadcasting Union (EBU) found VC-1 had <strong>the</strong> most consistent quality<br />
in tests that compared VC-1, RealMedia V9, <strong>the</strong> Envivio MPEG-4 encoder, and <strong>the</strong><br />
Apple MPEG-4 encoder.<br />
3.3.2 Device-oriented screens<br />
Analysing hundreds <strong>of</strong> devices by major manufacturers (Nokia, Sony-Ericsson, Motorola,<br />
Fujitsu-BenQ-Siemens, Samsung, Alcatel, Phillips, Acer, HP, Blackberry, Qtek (HTC),<br />
Palm…) we find that square pixel is <strong>the</strong> most used proportion to represent pixel<br />
information onto <strong>the</strong>ir screens (typical based upon TFT). Because <strong>the</strong>re are many<br />
sources (primary documentary and films) recorded into panoramic formats, we think that<br />
it is important to accommodate all <strong>the</strong>se into <strong>the</strong>ir original aspect format and reduce size<br />
<strong>of</strong> bitstream and <strong>the</strong>ir complexity.<br />
We think that is important to create automatically and simultaneously all formats into a<br />
dedicated servers to provide <strong>the</strong> same information, in real time. With all this, we can<br />
make a standard to transmit all information to all devices independently <strong>of</strong> <strong>the</strong>ir power,<br />
and control <strong>the</strong> total server power to create each channel.
And eventually, we find that <strong>the</strong>se next resolutions are desirable to take advantage <strong>of</strong> our<br />
analysed device physical screens:<br />
Aspect Lowest Low Medium High Highest Ultra HD 1* HD 2*<br />
M M M+P P+C P+C C+T C+T C+T<br />
4:3 128x96 176x132 240x180 320x240 480x360 640x480 --- ---<br />
16:9 128x72 176x99 240x135 320x180 480x270 640x360 1280x720 1920x1080<br />
*: Optionally for future.<br />
M: Mobiles & Sm<strong>art</strong>phones.<br />
P: PDA’s.<br />
C: Computers.<br />
T: TV & Advanced SetTopBoxes.<br />
For all this we consider we need a source minimum resolution <strong>of</strong> 640x480 for an aspect<br />
ratio <strong>of</strong> 4:3 and six simultaneous compressed (or live) streams to accommodate all<br />
possible devices; and 640x360 for an aspect ratio <strong>of</strong> 16:9 and six streams with actual<br />
requirements, or 1920x1080 to cover <strong>the</strong> maximum future HD and need eight<br />
simultaneous streams. Because <strong>of</strong> server resources consumption, we prefer to limit<br />
computer final resolutions and oversample source video image <strong>of</strong> destination computer to<br />
a possible big screen resolution.<br />
3.4 Analysis <strong>of</strong> <strong>state</strong>-<strong>of</strong>-<strong>the</strong> <strong>art</strong> image compression algorithms for<br />
medical applications<br />
3.4.1 Still image compression such as JPEG, JPEG-LS and JPEG-2000<br />
Following results were obtained using optimized s<strong>of</strong>tware for JPEG-2K, JPEG-LS and<br />
lossless JPEG compression running on a Pentium IV 3 GHz. A set <strong>of</strong> grayscale<br />
medical images was compressed and <strong>the</strong> images were <strong>of</strong> size SXGA (1280x1024<br />
pixels).<br />
Performance measurements lossless mode<br />
CODEC Throughput<br />
Mbit/s<br />
Throughput<br />
Fps for<br />
SXGA<br />
Processing<br />
time / frame<br />
Average<br />
CR (1:x)<br />
Coded<br />
Stream BW<br />
Mbit/s<br />
JPEG2000 22 0.7 1420 ms 3.4 6.5<br />
JPEG-LS 62 2.0 500 ms 2.9 21.5<br />
Lossless<br />
JPEG<br />
230 7.3 137 ms 1.7 135.3
Performance measurements lossy mode<br />
CODEC Throughput<br />
Mbit/s<br />
Throughput<br />
Fps for SXGA<br />
Processing<br />
time /<br />
frame<br />
Average<br />
CR (1:x)<br />
JPEG2000 @<br />
10:1<br />
20 0.6 1667 ms 10 2<br />
JPEG2000 @<br />
20:1<br />
22 0.7 1429 ms 20 1.1<br />
JPEG @ 10:1 650 20.7 48 ms 10 65<br />
JPEG @ 20:1 800 25.4 39 ms 20 40<br />
Discussion<br />
Coded<br />
Stream<br />
BW<br />
Mbit/s<br />
There are two reasons why existing <strong>state</strong>-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> still image compression<br />
algorithms are not suitable for our (realtime) application. First <strong>of</strong> all: <strong>the</strong> throughput<br />
(framerate) <strong>of</strong> <strong>the</strong>se compression algorithms is too low. JPEG2000 for instance only<br />
achieves an average <strong>of</strong> 0.7 frames per second, which is not acceptable. JPEG-LS<br />
and lossless JPEG perform better but still 7 frames per second is too low to allow<br />
fluent interaction between user and application and to show medical video<br />
sequences. The second reason is that <strong>the</strong> compression ratios <strong>of</strong> <strong>the</strong>se algorithms are<br />
still too low. Even <strong>the</strong> best algorithm (JPEG2000) only achieves an average<br />
compression ratio <strong>of</strong> 3.4 on medical images. Current wireless networks (802.11g)<br />
have a <strong>the</strong>oretical bandwidth <strong>of</strong> 54 Mbit per second but <strong>the</strong> actual throughput is more<br />
around 20 Mbit per second. If we want to transmit medical color images <strong>of</strong> size<br />
1600x1200 (which is ra<strong>the</strong>r low for medical imaging) <strong>the</strong>n <strong>the</strong> size per image is<br />
1600x1200x3= 5.760.000 bytes or 46.080.000 bits since <strong>the</strong>re are three color planes.<br />
This means that a compression ratio <strong>of</strong> 3.4 would only allow to send 46.080.000 bits/<br />
20 Mbit= 2.2 images per second over <strong>the</strong> wireless network. Again, this is too low to<br />
display medical video data and to allow fluent interaction between <strong>the</strong> user and <strong>the</strong><br />
medical s<strong>of</strong>tware application.<br />
Above discussion was for lossless compression. If we would switch to lossy<br />
compression <strong>the</strong>n <strong>the</strong> problem <strong>of</strong> low compression ratio is solved. However, at <strong>the</strong><br />
same time uncontrolled <strong>art</strong>ifacts and distortions are introduced in <strong>the</strong> medical images<br />
due to <strong>the</strong> lossy nature <strong>of</strong> <strong>the</strong> compression algorithms. This is absolutely<br />
unacceptable: up to today <strong>the</strong> general opinion in <strong>the</strong> medical imaging community is<br />
that it is not a priori allowed to apply lossy compression on medical images that will<br />
be used for diagnosis. Lossy compression in medical imaging is only allowed to<br />
reduce size <strong>of</strong> archived images or if one can prove that <strong>the</strong> lossy nature cannot<br />
influence <strong>the</strong> clinical image quality (which no one has been able to prove up to now).<br />
Also even with lossy compression, <strong>the</strong>re is still <strong>the</strong> problem <strong>of</strong> limited throughput<br />
(framerate) <strong>of</strong> <strong>the</strong> existing compression algorithms.
Note that <strong>the</strong> performance results presented above are in line with results <strong>of</strong> o<strong>the</strong>r<br />
people and results available on <strong>the</strong> Internet (such as for <strong>the</strong> highly optimized ‘Kakadu’<br />
implementation <strong>of</strong> JPEG2000).<br />
3.4.2 Intra-frame image compression such as MJPEG-2000<br />
Motion JPEG2000 uses only key frame (intra-frame) compression, allowing each<br />
frame to be independently accessed. Advantage <strong>of</strong> applying frame-by-frame<br />
compression is that computationally expensive motion estimation is avoided.<br />
Disadvantage is that <strong>the</strong> compression ration <strong>of</strong> algorithms using only intra-frame<br />
compression will be significantly lower than inter-frame based algorithms. One could<br />
see intra-frame image algorithms as an extension <strong>of</strong> still image compression<br />
algorithms.<br />
The same drawbacks exist for this type <strong>of</strong> algorithms: <strong>the</strong> compression ratio is still<br />
insufficient when working in lossless mode and <strong>the</strong> latency is too low to really support<br />
medical video sequences (at typical medical image resolutions).<br />
3.4.3 Inter-frame image compression such as MPEG-4 AVC<br />
Intra-frame image compression provides very high compression ratio especially when<br />
used in lossy mode (where <strong>the</strong>y are designed for). Both <strong>the</strong> closed-loop or open-loop<br />
video codec architectures require complex hierarchical block-based motion models in<br />
order to efficiently reduce <strong>the</strong> uncertainty about <strong>the</strong> true motion and to improve <strong>the</strong><br />
compression efficiency. Employing complex motion models however, reduces <strong>the</strong><br />
chances <strong>of</strong> attaining real-time video encoding. Additionally, opting for a classical<br />
video codec brings a delay that is as high as high as N/2 +1 frames, where N is<br />
typically 16 or 32 for a frame-rate <strong>of</strong> 30 or 60 frames-per-second, respectively. This<br />
means that for a system running at 30 frames per second, <strong>the</strong> introduced delay due to<br />
<strong>the</strong> compression would be 17 frames or 567 milliseconds. It is obvious interaction<br />
between <strong>the</strong> user and <strong>the</strong> s<strong>of</strong>tware application generating <strong>the</strong> image data is<br />
completely impossible with a delay <strong>of</strong> 0.5 seconds. For example: such a delay would<br />
mean that <strong>the</strong> display system will respond to any action <strong>of</strong> <strong>the</strong> user (such as clicking a<br />
button, rotating a medical image, performing window level, moving a window, …) with<br />
a delay <strong>of</strong> more than half a second. For <strong>of</strong>f-line analysis <strong>of</strong> medical images (MRI, CT,<br />
etc) this is not a problem. For vision aided surgery this is indeed is a problem.
4 User Interface Adaptation<br />
4.1 Introduction<br />
User interface adaptation is an issue as old as <strong>the</strong> history <strong>of</strong> <strong>the</strong> existing devices<br />
and ways <strong>of</strong> interaction.<br />
During <strong>the</strong> last years more and more entertainment or pr<strong>of</strong>essional services and<br />
applications that can be used (interfaced) and accessed through different devices<br />
have been developed.<br />
Before analyzing <strong>the</strong> <strong>state</strong> <strong>of</strong> <strong>the</strong> <strong>art</strong> <strong>of</strong> this type <strong>of</strong> applications, some criteria<br />
should be established in order to narrow <strong>the</strong> scope <strong>of</strong> <strong>the</strong> analysis. In order to do<br />
that, <strong>the</strong> following classification, based on <strong>the</strong> way this adaptation is done, is<br />
proposed:<br />
a) Customized adaptation: This category compiles <strong>the</strong> user interfaces adapted<br />
manually. The main advantage <strong>of</strong> those applications is that <strong>the</strong> adaptation is<br />
perfectly suited for <strong>the</strong> final needs <strong>of</strong> <strong>the</strong> device. Each interface is redefined in<br />
a manual or semi-automatic way in order to have <strong>the</strong> perfect appearance that<br />
it should have. The cost, <strong>the</strong> lack <strong>of</strong> use <strong>of</strong> standards and <strong>the</strong> impossibility to<br />
launch automatic processes are <strong>the</strong> main disadvantages.<br />
b) Adaptation based on standard adaptation solutions or on generic<br />
standard tools: This kind <strong>of</strong> adaptation is base on standard or semi-standard<br />
tools that allow <strong>the</strong> automatic adaptation <strong>of</strong> <strong>the</strong> interfaces. This adaptation<br />
process can be fully standardized or can be based on generic standard<br />
transformation tools (i.e. XSLT) but both cases have a common feature: <strong>the</strong><br />
adaptation is based on solutions that facilitate <strong>the</strong> interoperability and<br />
serialization, although sometimes <strong>the</strong> price is <strong>the</strong> loss <strong>of</strong> granularity in <strong>the</strong><br />
adaptation process.<br />
4.2 MPEG-4 Advanced Content visualization technologies<br />
4.2.1 S<strong>of</strong>tware BIFS reproducers<br />
According to <strong>the</strong> Market, <strong>the</strong>re are available some developments that support MPEG-<br />
4 System (BIFS), such as universities that actuate as research institutions or<br />
companies in Research and Development aspects and services commercialization.<br />
The standard specification MPEG-4 BIFS guarantee <strong>the</strong> interoperability. In this way,<br />
a MP4 content generated with any tool that follow <strong>the</strong> standard, will be able to be<br />
reproduced in any device compatible with BIFS.
However, <strong>the</strong> most <strong>of</strong> <strong>the</strong> reproducers don’t implement <strong>the</strong> 100% BIFS nodes, and<br />
that implies that <strong>the</strong> interoperability is not completely achieved.<br />
4.2.2 GPAC: Osmo4<br />
Osmo4 is a p<strong>art</strong> <strong>of</strong> GPAC (Project on Advanced Content) framework developed by<br />
<strong>the</strong> National Superior Telecommunications School <strong>of</strong> France. GPAC allows generate<br />
2D and 3D advanced contents using <strong>the</strong> MP4Box tool and reproduce <strong>the</strong>m using<br />
Osmo4.<br />
GPAC is distributed under license LGPL (lesser General Public License).<br />
Characteristics:<br />
• It supports several multimedia formats, since simple contents (avi, mov, mpg) until<br />
2D/3D advanced contents.<br />
• It supports local files reproduction, unload http and reproduction and rtp/rstp<br />
streaming on UDP (unicast or multicast) or TCP.<br />
• Video and audio presentation based on open source plugins. It’s available a decoder<br />
development kit (DDK) to connect <strong>the</strong> player with <strong>the</strong> necessary codec.<br />
• Reproduction control: play, pause and advance.<br />
• Graphic characteristics: antialiasing, zoom, rendering area size update, complete<br />
screen.<br />
Osmo4 allows:<br />
• Animated c<strong>art</strong>oon reproduction (unloaded or by means <strong>of</strong> streaming).<br />
• Graphic, text, video/audio interactive and synchronized mixing.<br />
• MPEG-7 and MPEG-21 p<strong>art</strong>ial support: meta-data, encrypted, watermarking, DRM.<br />
4.2.3 IBM: M4Play<br />
IBM has developed a MPEG-4 toolkit. It consists on a classes Java and APIs set that<br />
allow to generate MPEG-4 advanced contents and reproduce <strong>the</strong>m. The toolkit is<br />
distributed under commercial license.<br />
M4Play player is a p<strong>art</strong> <strong>of</strong> <strong>the</strong> toolkit and its characteristics are as follow:<br />
Characteristics:<br />
• Based on java: multiplatform.<br />
• Two versions:<br />
o Independent application.<br />
o Adaptable Applet for html page.<br />
• It supports streaming on rtp/rtsp and local files reproduction.<br />
• Can reproduce:<br />
o MP4 according to ISMA specifications.<br />
o MP4 including MPEG-4 systems.<br />
o AVI : MPEG-4simple pr<strong>of</strong>ile video (.cmp, .m4v, .263).<br />
o AAC: Low-Complexity Pr<strong>of</strong>ile audio (.aac, .adif, .adts).<br />
o MP3: MPEG-1 and third audio level MPEG. (.mp3).
4.2.4 Envivio TV<br />
Envivio has developed and commercialize a MPEG-4 reproducer for set-top-boxes,<br />
PCs, and PDA’s.<br />
Characteristics:<br />
• It’s installed as:<br />
o Independent player.<br />
o Plugin for known players (QuickTime v.4.1.2 or later, RealNetworks v.7.0 or<br />
later, and Windows Media placer v6.4 or later).<br />
• Portable code C/C++ for set top boxes and mobile telephones.<br />
• According to 2D BIFS specification.<br />
• Local or with streaming MP4 files reproduction.<br />
• Protocols: RTP, RTCP, or RTSP on UDP or meanwhile http tunnels, unicast and<br />
multicast.<br />
The independent reproducer version can be integrated or ported to any device<br />
including set-top-boxes, PC, PDA and video game.<br />
Envivio has been certified by RealNetworks and is a p<strong>art</strong> <strong>of</strong> <strong>the</strong> automatic update<br />
program such as <strong>the</strong> MPEG-4 plugin for RealNetworks v8.0 reproducer and later.<br />
4.2.5 Bitmanagement: BS Contact MPEG-4<br />
Bitmanagement has developed a MPEG-4 player with 2d and 3D support. The<br />
implementation covers more than 80% <strong>of</strong> <strong>the</strong> MPEG-4 nodes. This reproducer is<br />
being used in several European projects <strong>of</strong> Telefonica I+D. The MPEG consortium<br />
has solicited to use <strong>the</strong> bitmanagement key s<strong>of</strong>tware technology as a reference<br />
implementation for <strong>the</strong> standard.<br />
The predecessor <strong>of</strong> this reproducer is <strong>the</strong> 3D blaxxun contact motor, that was <strong>the</strong><br />
fires VRML visor that introduced DirectX 7 acceleration hardware support and<br />
incorporated some 3D advanced characteristics (p<strong>art</strong>icles systems, multi-texture,<br />
nurbs, animation, etc.) and interactivity.<br />
The Bitmanagement player incorporate some characteristics as 2D/3D streaming,<br />
animations streaming, compressed scenarios and standardized interfaces for Digital<br />
Rights management and encryption.<br />
SoNG (portals <strong>of</strong> Next Generation) was a European Commission project <strong>of</strong> Telefonica<br />
I+D that used <strong>the</strong> Bitmanagement developed reproducer and was <strong>the</strong> first MPEG-4<br />
reproducer prototype with 2D/3D. Actually, Bitmanagement commercialize this<br />
reproducer.<br />
Characteristics:<br />
• It’s installed as:<br />
o Active X plugin for Micros<strong>of</strong>t Internet Explorer<br />
o Netscape plugin for Netscape 4.x<br />
o Control activeX embedded in any language that support COM (Visual C++,<br />
Visual Basic)<br />
o Control activeX embedded in Java2 way JNI
Bitmanagement assure that this reproducer has been probed with GPAC(ENST) and<br />
IBM generated content.<br />
4.2.6 Octaga Pr<strong>of</strong>essional<br />
Octaga commercialize a 3D MPEG-4 advanced content reproducer: Octaga<br />
Pr<strong>of</strong>essional.<br />
Characteristics:<br />
• Can reproduce MP4 files generated <strong>the</strong> GPAC creation tools<br />
• It’s installed as:<br />
o Independent application<br />
o Plug in that can be inserted into a html page for Internet Explorer, Firefox and<br />
Opera browsers.<br />
4.2.7 Digimax: MAXPEG Player<br />
Digimax commercialize a 2D/3D reproducer compatible with MEPG-4 (BIFS).<br />
Characteristics:<br />
• Portable: C++ code portable to different platforms (STB and mobile)<br />
• Can reproduce MP4 files generated by an own tool: MAXPEG Author.<br />
4.2.8 COSMOS<br />
COSMOS (COllaborative System based on MPEG-4 Objects and Streams) is a<br />
framework for developing applications in collaborative environments (CVE-<br />
Colaborative Virtual Environment).<br />
Completely developed in java, it allows keeping a 3D virtual environment where<br />
exchanging 3D objects and manipulate <strong>the</strong>m in real time.<br />
Allow to send by means <strong>of</strong> broadcast/multicast a change on a BIFS node to <strong>the</strong> whole<br />
interested p<strong>art</strong>icipants, updating in this way all <strong>the</strong> involved scenarios.<br />
4.3 UI adaptation based on XML<br />
4.3.1 UI adaptation based on XML transformation<br />
Most <strong>of</strong> <strong>the</strong> approaches for UI adaptation are based on XML [22] and its<br />
transformation technologies. In [23] and [24] <strong>the</strong>re are two very interesting tutorials<br />
concerning <strong>the</strong>se techniques.<br />
Most <strong>of</strong> <strong>the</strong> applications take into account <strong>the</strong> following assumption: considering <strong>the</strong><br />
user interface as a tree, this tree can be transformed (adapted) into a different tree by<br />
recombining <strong>the</strong> set <strong>of</strong> leaves it is composed <strong>of</strong>.<br />
In <strong>the</strong> following images can be seen how <strong>the</strong> authors <strong>of</strong> <strong>the</strong> work [25] present a<br />
possible architecture to carry out this type <strong>of</strong> user interface adaptation.
In <strong>the</strong> paper <strong>the</strong> reader can also get information about an authoring tool to develop<br />
such transformations.<br />
Architecture and tool components <strong>of</strong> <strong>the</strong> system described in [24].<br />
Ano<strong>the</strong>r example <strong>of</strong> this approach is <strong>the</strong> AUIT [26] methodology, which basing on<br />
XML transformations proposes a four layer architecture to adapt <strong>the</strong> user interface to<br />
different devices. This methodology has been improved during <strong>the</strong> last 4 years and<br />
<strong>the</strong>re are several implementations based on it.
AUIT architecture.<br />
In <strong>the</strong>re are several works done basing on XML transformations in <strong>the</strong> framework <strong>of</strong><br />
<strong>the</strong> SEESCOA (S<strong>of</strong>tware Engineering for Embedded Systems using a Component-<br />
Oriented Approach) initiative [27].<br />
4.3.2 Adaptation via XML publishing servers<br />
Based on similar technologies <strong>the</strong>re are widely used frameworks that provide<br />
mechanisms to implement user interface adaptations for those applications which<br />
access is based on IP technology.<br />
These frameworks act as Web servers which are able to handle different types <strong>of</strong><br />
devices (represented by different types or versions <strong>of</strong> web clients [28]) and implement<br />
a different behavior for each or <strong>the</strong>m. According to this, a web-site or a pizza-ordering<br />
service can be accessed, browsed and visualized in a very different way (in a TV,<br />
PDA, mobile, …).<br />
One <strong>of</strong> <strong>the</strong>se frameworks which use is quite extended and that has survived and been<br />
successfully improved during <strong>the</strong> last decade is Cocoon. This framework, basing on<br />
XML transformation technologies, systematizes <strong>the</strong> adaptation process in a very<br />
significant way.<br />
In [29] <strong>the</strong>re is an example <strong>of</strong> <strong>the</strong> use <strong>of</strong> one <strong>of</strong> those frameworks in order to do <strong>the</strong>se<br />
techniques.<br />
In <strong>the</strong>re is ano<strong>the</strong>r application based on Cocoon: PALIO (Personalized Access to<br />
Local Information and services for tourists) service framework. The PALIO framework<br />
is being used in <strong>the</strong> development <strong>of</strong> location-aware information systems for tourists,<br />
and is capable <strong>of</strong> delivering fully adaptive information to a wide range <strong>of</strong> devices,<br />
including mobile ones.
PALIO example.<br />
Sitemesh [30] is a device web-page layout and decoration Java framework that allows<br />
<strong>the</strong> device-oriented user interface adaptation basing on XML transformation. It does<br />
not act as a XML publishing engine but is integrated in <strong>the</strong> web server.<br />
4.3.3 Adaptation based on <strong>the</strong> definition & identification <strong>of</strong> <strong>the</strong> device<br />
4.3.3.1 Composite Capabilities / Preference Pr<strong>of</strong>iles<br />
Composite Capabilities/Preference Pr<strong>of</strong>iles (CC/PP) [31] recommendation <strong>of</strong> W3C,<br />
which using <strong>the</strong> Semantic Web oriented language RDF [32] was able to define <strong>the</strong><br />
pr<strong>of</strong>iles and capabilities <strong>of</strong> <strong>the</strong> device in order to carry out <strong>the</strong> appropriate<br />
adaptation. This Working group is closed and its work has been transferred to <strong>the</strong><br />
“Device Independent” group [33].<br />
One <strong>of</strong> <strong>the</strong> recent results <strong>of</strong> this group is <strong>the</strong> definition <strong>of</strong> <strong>the</strong> specification <strong>of</strong><br />
“Delivery Context: Interfaces (DCI) Accessing Static and Dynamic Properties [34]”.<br />
This document defines platform and language neutral interfaces that provide Web<br />
applications access to a hierarchy <strong>of</strong> dynamic properties representing device<br />
capabilities, configurations, user preferences and environmental conditions.
User Interface adaptation: concepts involved according DCI group [33].<br />
There is a well documented implementation <strong>of</strong> Sun <strong>of</strong> <strong>the</strong> CC/PP specification. This<br />
implementation describes how to process CC/PP in Java (JSR-000188, [35]).<br />
4.3.3.2 UAPROF (OMA)<br />
One <strong>of</strong> <strong>the</strong> outputs <strong>of</strong> <strong>the</strong> CC/PP had a direct impact in <strong>the</strong> active forum Open<br />
Mobile Alliance (OMA) [36] [37]. The result is <strong>the</strong> UAPr<strong>of</strong>, which is a concrete<br />
implementation <strong>of</strong> CC/PP developed by <strong>the</strong>. The UAPr<strong>of</strong> is a framework for<br />
describing and transporting information about <strong>the</strong> capabilities <strong>of</strong> a device. This<br />
information may include hardware characteristics (e.g. screen size, type <strong>of</strong><br />
keyboard, etc.) and s<strong>of</strong>tware characteristics (e.g. browser manufacturer, markup<br />
languages supported, etc.) The final purpose is that <strong>the</strong> origin servers, gateways<br />
and proxies use this information to customize content for <strong>the</strong> user. The current<br />
version <strong>of</strong> this specification is UAPr<strong>of</strong> 2.0.<br />
One <strong>of</strong> <strong>the</strong> applications that employ this technology can be found in [38].
Architecture defined for <strong>the</strong> Web UI adaptation in [37<br />
4.3.3.3 Device <strong>Description</strong> Repository<br />
The Device <strong>Description</strong> Repository is a concept proposed by <strong>the</strong> World Wide Web<br />
Consortium (W3C) Device <strong>Description</strong> Working Group (DDWG). The proposed<br />
repository would contain information about Web-enabled devices (p<strong>art</strong>icularly<br />
mobile devices) so that content could be adapted to suit. Information would include<br />
<strong>the</strong> screen dimensions, input mechanisms, supported colors, known limitations,<br />
special capabilities etc.<br />
The idea <strong>of</strong> implementing a Device <strong>Description</strong> Repository has been recently<br />
discussed at an international workshop held by <strong>the</strong> DDWG in Madrid, Spain in July,<br />
2006. Thus, using such approach in Cantata to include mobile devices in <strong>the</strong><br />
demonstrators could be interesting.<br />
4.3.4 XML based UI adaptation<br />
A s<strong>of</strong>tware application is known as being device independent when its functions are<br />
universal on different types <strong>of</strong> device. This generally means that it is written in a<br />
meta-language that can be read on any platform.<br />
XML (eXtensible Markup Language) seems to be a good approach to create deviceoriented<br />
interface. Indeed XML is a platform-neutral language that organizes and<br />
exchanges complex information. It is lightweight, easy and increasingly available in<br />
applications nowadays. In addition, XML provides a facility to define tags and <strong>the</strong><br />
structural relationships between <strong>the</strong>m. It is very powerful and useful language for<br />
creating a uniform information format for complex multimedia content and documents.
XML also supports XSL style sheets and allow creating customized presentation for<br />
different devices and users.<br />
XML-based user interface description seems to become a lot more visible such as<br />
Extensible User Interface Language (XUL) or TERESA XML. These approaches<br />
propose specific characteristics and different functionalities.<br />
The approaches that appear in this section have a common feature: <strong>the</strong> adaptation is<br />
achieved due to <strong>the</strong> definition <strong>of</strong> <strong>the</strong> interface without including <strong>the</strong> final presentation.<br />
Thus, <strong>the</strong>se approaches force <strong>the</strong> final device to be compliant with <strong>the</strong>m or to develop<br />
a renderer for each one <strong>of</strong> <strong>the</strong> devices.<br />
4.3.4.1 UIML User Interface Meta Language<br />
4.3.4.2 AUIML<br />
UIML [39] is an XML based (markup) language to define interfaces. UIML allows <strong>the</strong><br />
definition <strong>of</strong> interfaces by concatenating definitions <strong>of</strong> <strong>the</strong> different elements that<br />
compose that interface.<br />
There are renderers for different technologies and platforms (J2EE, QT, HTML,C++,<br />
VoiceXML) that transform <strong>the</strong> UIML expressed interface into <strong>the</strong> appropriate output.<br />
AUIML is similar to UIML but more abstract. AUIML does not include UI<br />
appearance features to be 100% platform and implementation technology<br />
independent. According to <strong>the</strong> IBM definition (which provides a toolkit) “AUIML<br />
captures relative positioning information <strong>of</strong> user interface components and<br />
delegates <strong>the</strong>ir display to a platform-specific renderer. Depending on <strong>the</strong> platform or<br />
device being used, <strong>the</strong> renderer decides <strong>the</strong> best way to present <strong>the</strong> user interface<br />
to <strong>the</strong> user and receive user input.”<br />
4.3.4.3 XIML (eXtensible Interface Markup Language)<br />
This initiative [40] has a similar philosophy to <strong>the</strong> previous ones, but it seems to be<br />
no very active.
Wea<strong>the</strong>r forecast application using XMIL.<br />
4.3.4.4 XUL<br />
The Extensive User Interface Language (XUL) is a Mozilla’s XML-based language<br />
for describing window layout. XUL provides a separation among <strong>the</strong> client<br />
application definition and programmatic logic and its graphical presentation and<br />
language-specific text labels.<br />
An User Interface (UI) can be described as a set <strong>of</strong> structured interface elements<br />
(such as windows, menubar, button …) along with a predefined set <strong>of</strong> properties.<br />
XUL has its focus on window-based graphical user interfaces so it might be not<br />
applicable to interfaces <strong>of</strong> small mobile devices for example.<br />
4.3.4.5 TERESA XML<br />
Teresa is a project, supported by <strong>the</strong> European project Cameleon IST, from <strong>the</strong> HCI<br />
Group <strong>of</strong> ISTI-C.N.R with <strong>the</strong> aim to design and develop a concrete user interface<br />
adapted to specific platform [41]. The Teresa XML language is composed <strong>of</strong> two<br />
p<strong>art</strong>s: a XML-description <strong>of</strong> <strong>the</strong> CTT (ConcurTaskTree [42]) notation and a language<br />
for describing user interfaces.<br />
This XML-based language describes <strong>the</strong> organization <strong>of</strong> <strong>the</strong> Abstract Interaction<br />
Objects (AIO) that composing <strong>the</strong> interface. The user interface dialog is also<br />
described with this language.<br />
A User Interface (UI) is a structured set <strong>of</strong> one or more presentation element(s).<br />
Each presentation element is characterized by a structure, which describes <strong>the</strong><br />
static organization <strong>of</strong> <strong>the</strong> UI and <strong>the</strong> relationships among <strong>the</strong> various presentation<br />
elements.
4.3.4.6 USIXML<br />
The Teresa XML is used in <strong>the</strong> TERESA tool that supports <strong>the</strong> generation <strong>of</strong> tasks<br />
models, abstracts UIs, and running UIs.<br />
UsiXML (which stands for USer Interface eXtensible Markup Language) is a XMLcompliant<br />
markup language that allow <strong>the</strong> description <strong>of</strong> <strong>the</strong> User Interface (UI) for<br />
multiple contexts <strong>of</strong> use, such as Character User Interfaces (CUIs), Graphical User<br />
Interfaces (GUIs), Auditory User Interfaces (AUI), and Multimodal User Interfaces<br />
(MUI).<br />
UsiXML consists <strong>of</strong> a User Interface <strong>Description</strong> Language (UIDL) that is a<br />
declarative language capturing <strong>the</strong> essence <strong>of</strong> what a UI is or should be,<br />
independently <strong>of</strong> physical characteristics.<br />
UsiXML supports device independence: a UI can be described in a way that<br />
remains independence from <strong>the</strong> interactions devices, such as e.g. mouse, screen,<br />
keyboard, voice recognition system. If needed, a reference to a p<strong>art</strong>icular device<br />
can be added to <strong>the</strong> description.<br />
(Information taken from www.usixml.org)<br />
4.3.4.7 AAIML [43]<br />
The Alternate User Interface Access Standard (AAIML) is an initiative <strong>of</strong> <strong>the</strong> V2<br />
technical committee <strong>of</strong> <strong>the</strong> National Committee for Information technology<br />
Standards (NCITS).<br />
This standard aims to allow people with disabilities to remotely control a large set <strong>of</strong><br />
electronics devices (for example copy machines or elevators) from <strong>the</strong>ir personal<br />
device (such as personal mobile phone).<br />
An abstract user interface is transmitted by <strong>the</strong> targeted device to <strong>the</strong> user with<br />
p<strong>art</strong>icular input and output mechanisms that are appropriate for this user. The<br />
concept <strong>of</strong> “Universal Remote Control” (URC) is introduced. This XML-based<br />
language is used to convey an abstract user interface description from <strong>the</strong> target<br />
device to <strong>the</strong> URC. On <strong>the</strong> URC, this abstract description must be mapped to a<br />
concrete description available on <strong>the</strong> platform.
A Compaq iPAQ handheld computer (running Java/Swing on Linux) controlling a TV simulation on a PC<br />
via 802.11b wireless connection and Jini/Java technology<br />
4.3.4.8 XForms and RIML<br />
The W3C XForms specification is a technology intended as <strong>the</strong> next generation <strong>of</strong><br />
forms for <strong>the</strong> web. Although its focus is on ga<strong>the</strong>ring <strong>the</strong> input provided by <strong>the</strong> user,<br />
it provides some information display facilities. Despite its specialized scope,<br />
XForms provides many <strong>of</strong> features necessary for a more general abstract language.<br />
Indeed XForms separates three aspects <strong>of</strong> a form interface:<br />
• The data model used by <strong>the</strong> target.<br />
• The presentation <strong>of</strong> <strong>the</strong> data model to <strong>the</strong> user.<br />
• The processing model.<br />
In XForms, <strong>the</strong> data model can be used by specialized interfaces. In fact XForms<br />
allows that resources such a label can be substituted according to <strong>the</strong> delivery<br />
context.
Renderer Independent Markup Language (RIML) is based on emerging standards.<br />
The current draft <strong>of</strong> XHTML2.0 is used for content such as paragraphs, tables,<br />
images, hyperlinks, etc. For form-based interaction, XForms elements have been<br />
included<br />
RIML stresses <strong>the</strong> separation <strong>of</strong> content definition (i.e. what is to be presented)<br />
from <strong>the</strong> description <strong>of</strong> dynamic adaptations, which can be performed on <strong>the</strong><br />
content in order to match varying capabilities <strong>of</strong> devices.<br />
4.3.4.9 MPEG-21<br />
ISO/IEC is defining <strong>the</strong> MPEG-21 framework, which is intended to support<br />
transparent use <strong>of</strong> multimedia resources across a wide range <strong>of</strong> networks and<br />
devices.<br />
One aspect <strong>of</strong> <strong>the</strong> requirements for MPEG-21 is Digital Item Adaptation, which is<br />
based on a Usage Environment <strong>Description</strong>. It proposes <strong>the</strong> description <strong>of</strong><br />
capabilities for at least <strong>the</strong> terminal, network, delivery, user, and natural<br />
environment, and notes <strong>the</strong> desirability <strong>of</strong> remaining compatible with o<strong>the</strong>r<br />
recommendations such as CC/PP and UAPr<strong>of</strong> (see 4.2.3.1 and 4.2.3.2).<br />
(Information taken from www.w3.org)<br />
4.4 Device ontology<br />
In 2001, <strong>the</strong> initiative <strong>of</strong> <strong>the</strong> FIPA proposes a device ontology [51]. This ontology<br />
describes <strong>the</strong> s<strong>of</strong>tware and hardware properties as well as <strong>the</strong> services proposed by<br />
devices. Thanks to this ontology, device’s pr<strong>of</strong>iles can be built and used by agents.<br />
The knowledge <strong>of</strong> this ontology permits agents receiving <strong>the</strong> pr<strong>of</strong>ile <strong>of</strong> a specific device<br />
to know if <strong>the</strong> properties or services <strong>of</strong> <strong>the</strong> latter allow <strong>the</strong>m to achieve <strong>the</strong>ir objectives.<br />
The FIPA-device ontology could be used in a CC/PP pr<strong>of</strong>ile (see 4.2.3.1).<br />
For some examples see:<br />
http://www.fipa.org/specs/fipa00091/PC00091A.html#_Toc511707116<br />
4.5 Agent-base user interface adaptation<br />
MATE<br />
MATE is a prototype <strong>of</strong> Computer-Human Interface based on a society <strong>of</strong> reactive<br />
agents and on a language <strong>of</strong> spatial description <strong>of</strong> tasks. Implemented as a text editor,<br />
this tool aims at showing that a s<strong>of</strong>tware (<strong>of</strong> an <strong>of</strong>fice automation type) can be build<br />
using <strong>the</strong> advantages <strong>of</strong> <strong>the</strong> agent paradigm and <strong>the</strong> power <strong>of</strong> script languages in order<br />
to make this interface more personalizable, more extendable and more intuitive for<br />
non-expert users [52].
5 State-<strong>of</strong>-<strong>the</strong>-<strong>art</strong> system architecture<br />
5.1 DLNA<br />
Digital.Living.Network.Alliance.(DLNA) is a cross-industry organization <strong>of</strong> leading<br />
consumer electronics, computing industry and mobile device companies share a vision<br />
<strong>of</strong> a wired and wireless network <strong>of</strong> interoperable consumer electronics (CE), personal<br />
computers (PC) and mobile devices in <strong>the</strong> home and on <strong>the</strong> road, enabling a seamless<br />
environment for sharing and growing new digital media and content services.<br />
DLNA is focused on delivering interoperability guidelines based on open industry<br />
standards to complete <strong>the</strong> cross-industry digital convergence. DLNA has published a<br />
common set <strong>of</strong> industry design guidelines that allow manufacturers to p<strong>art</strong>icipate in a<br />
growing marketplace <strong>of</strong> networked devices, leading to more innovation, simplicity and<br />
value for consumers. The DLNA Networked Device Interoperability Guidelines are use<br />
case driven and specify <strong>the</strong> interoperable building blocks that are available to build<br />
platforms and s<strong>of</strong>tware infrastructure.<br />
The DLNA Networked Device Interoperability Guidelines refer to standards from<br />
established, open industry standards organizations and provide CE, PC and mobile<br />
device manufacturers with <strong>the</strong> information needed to build compelling, interoperable<br />
digital.home platforms, devices and applications.
This Figure shows <strong>the</strong> technology ingredients covered by <strong>the</strong> DLNA Networked Device<br />
Interoperability Guidelines.<br />
The digital home consists <strong>of</strong> a network <strong>of</strong> CE, PC and mobile devices that cooperate<br />
transparently, delivering simple, seamless interoperability that enhances and enriches<br />
user experiences. This is <strong>the</strong> communications and control backbone for <strong>the</strong> home<br />
network and is based on IP networking UPnP and Internet Engineering Task Force<br />
technologies.<br />
Information taken from<br />
http://www.dlna.org/en/industry/pressroom/DLNA_white_paper.pdf<br />
5.2 mTag<br />
There are several new approaches on market focusing on a sm<strong>art</strong> tags which enables<br />
not only new way to point and select desired source <strong>of</strong> information but also initiate data<br />
access and direct desired content to terminal initiated by end user.<br />
An example <strong>of</strong> this kind <strong>of</strong> new approach is mTag architecture. With focus on sm<strong>art</strong><br />
environment and capabilities to <strong>of</strong>fer User Interface with a distributed event driven<br />
architecture for discovering location specific mobile web services mTag shows<br />
architecture where service discovery is initiated by touching a fixed RFID reader with a<br />
mobile passive RFID tag attached e.g. to a phone, which results in information <strong>of</strong><br />
available services being pushed to user’s preferred device [mTag].
As <strong>state</strong>d by mTag project: “The principal advantage <strong>of</strong> <strong>the</strong> proposed architecture is<br />
that it can be realized with today’s <strong>of</strong>f-<strong>the</strong>-shelf commercial products. We presented a<br />
proposal for an Internet based deployment and two case studies, where prototype<br />
implementations were empirically evaluated in <strong>the</strong> true environment <strong>of</strong> use. The case<br />
studies showed that <strong>the</strong> service was found as an easy way to access location based<br />
mobile web services.
Users were satisfied with <strong>the</strong> possibility to fully control <strong>the</strong> information pushed to <strong>the</strong>ir<br />
devices, in comparison to <strong>the</strong> automatic location based information delivery <strong>of</strong> <strong>the</strong><br />
comparative Bluetooth based service in <strong>the</strong> second case study.” [mTag]<br />
[mTag]: Korhonen J, Ojala T, Klemola M & Väänänen P (2006)<br />
mTag – Architecture for discovering location specific mobile web services using<br />
RFID and its evaluation with two case studies. Proc. International Conference on<br />
Internet and Web Applications and Services, Guadeloupe.<br />
5.3 Content retrieval and device management<br />
The delivery system will be managed like any o<strong>the</strong>r network system, but <strong>the</strong> devices<br />
present special challenges. The crucial insight is that content delivery is first and<br />
foremost a data management problem at multiple levels. Content delivery systems<br />
must be built around a set <strong>of</strong> database-related requirements: queryable metadata,<br />
secure and transactional distribution <strong>of</strong> data between databases, and <strong>the</strong> unbreakable<br />
linkage between content and its meta data. Additionally, <strong>the</strong> distributed system must be<br />
able to keep its application and configuration data under control to ensure proper<br />
functionality <strong>of</strong> <strong>the</strong> system and autonomic behavior from end user and device point <strong>of</strong><br />
view without much need for user intervention.<br />
This chapter describes a simple content distribution technique that enables a user to<br />
easily select content from <strong>the</strong> vast libraries that are available, download it, view it, and<br />
be charged for it.
The architecture for <strong>the</strong> presented system is based on a communicating network <strong>of</strong><br />
database servers that manage all <strong>the</strong> data <strong>of</strong> <strong>the</strong> system. The next figure illustrates <strong>the</strong><br />
different components at a conceptual level.<br />
The system has following components.<br />
• The conceptual centerpiece <strong>of</strong> <strong>the</strong> system is occupied by <strong>the</strong> Rendering Devices<br />
which accept different types <strong>of</strong> content from multiple media sources.<br />
• The Content Libraries contain digital content and <strong>the</strong> associated metadata.<br />
• The Preference Server contains user-specific data related to content and usage <strong>of</strong><br />
<strong>the</strong> system. Identity, au<strong>the</strong>ntication, and saved queries are stored in <strong>the</strong> preference<br />
server.<br />
• The Ontology Server maintains common ontology data that is shareable across <strong>the</strong><br />
o<strong>the</strong>r components <strong>of</strong> <strong>the</strong> system. This data makes <strong>the</strong> content machine searchable.<br />
• The Configuration Management Server manages <strong>the</strong> configuration <strong>of</strong> <strong>the</strong> system<br />
and its devices.<br />
The user’s network terminal device (typically a PC or a set top box) interacts with all<br />
<strong>the</strong> above host components using data synchronization across a protocol like http. It<br />
can download new components to upgrade itself. It can download results sets for<br />
fur<strong>the</strong>r local analysis. And <strong>of</strong> course, it can download content. It can also use <strong>the</strong><br />
system to back up preferences, configurations, user data, and media that no longer fits<br />
on <strong>the</strong> device.
At <strong>the</strong> core <strong>of</strong> <strong>the</strong> presented approach is <strong>the</strong> Solid BoostEngine, a small-footprint<br />
relational database manager that provides all <strong>the</strong> typical functionality <strong>of</strong> a modern data<br />
manager, including <strong>the</strong> SQL language for defining schemas and queries, transactions,<br />
multi-user capabilities, support for programmability (procedures, triggers, events) and<br />
automatic data recovery. Applications and devices communicate with <strong>the</strong> data manager<br />
using standard ODBC (Open Database Connectivity) and JDBC (Java Database<br />
Connectivity) application programming interfaces (APIs).<br />
New advanced databases <strong>of</strong>fers new ways to manage required content and critical<br />
information based on applications and user interface requirements. Solid BoostEngine<br />
has two separate storage methods: one for typical alphanumeric data, and a second<br />
mechanism optimized for <strong>the</strong> storage and retrieval <strong>of</strong> Binary Large Objects (BLOBs). In<br />
Solid, digital content can be handled within <strong>the</strong> database as efficiently as if <strong>the</strong> data<br />
were to be stored in operating system files. This provides relational database<br />
functionality for media content, a solution with many benefits:<br />
• The same API is used for accessing and distributing both alphanumeric and content<br />
data, which simplifies application design.<br />
• Access to content and metadata can be combined in <strong>the</strong> same query, ensuring that<br />
property rights data always accompanies content data.<br />
• All data can be treated transactionally, meaning that changes to content and<br />
changes to meta-data can be tightly linked.<br />
• The DBMS protects all data in <strong>the</strong> system with a unified access control mechanism.<br />
The data distribution component <strong>of</strong> <strong>the</strong> Solid Platform is <strong>the</strong> Solid Sm<strong>art</strong>Flow Option.<br />
It links toge<strong>the</strong>r a set <strong>of</strong> loosely coupled, cooperative databases that share data with<br />
one ano<strong>the</strong>r under strict integrity and security rules. Key aspects <strong>of</strong> <strong>the</strong> architecture<br />
include <strong>the</strong> following:<br />
• A hierarchical relationship <strong>of</strong> master and replica databases.<br />
• A publish/subscribe mechanism for distributing data from a master database to one<br />
or more replica databases.<br />
• A transaction propagation mechanism for forwarding local changes from a replica<br />
database to its master.<br />
• Transactional and recoverable message queuing for data transfer between<br />
databases.<br />
The content delivery network will be a very large system with numerous different<br />
components under <strong>the</strong> control <strong>of</strong> a variety <strong>of</strong> entities. Such a system must be designed<br />
for manageability from <strong>the</strong> ground up. Recent developments in Autonomic Computing<br />
show promise in this area. Autonomic systems are self-configuring, self -healing, self -<br />
optimizing and self –protecting so that <strong>the</strong>y effectively take care <strong>of</strong> <strong>the</strong>mselves without<br />
much need for user intervention. The delivery system will be managed like any o<strong>the</strong>r<br />
network system, but <strong>the</strong> devices present special challenges.
Device management includes at least <strong>the</strong> following tasks:<br />
• Managing user identification and au<strong>the</strong>ntication.<br />
• Automatically installing and upgrading s<strong>of</strong>tware on local devices.<br />
• Maintaining valid s<strong>of</strong>tware configurations without requiring user interaction.<br />
• Backing up and/or deleting unused s<strong>of</strong>tware and content from devices<br />
• Transferring user preferences from one device to ano<strong>the</strong>r.<br />
The configuration manager holds data relating to system configuration. This includes<br />
applications that may be needed by terminals and rendering devices. The configuration<br />
management data can be divided into following components:<br />
• Version “header information”.<br />
• Application binaries (Java classes and resources) <strong>of</strong> <strong>the</strong> new version.<br />
• SQL Scripts needed to create or upgrade <strong>the</strong> database schemas.<br />
• State information about each <strong>of</strong> <strong>the</strong> managed nodes.<br />
• Log information for troubleshooting purposes.<br />
All system configuration management operations are performed by preparing <strong>the</strong><br />
required configuration as a publication in <strong>the</strong> master and <strong>the</strong>n distributing it to <strong>the</strong><br />
managed terminals and rendering devices through data synchronization. After<br />
refreshing <strong>the</strong> local copy <strong>of</strong> <strong>the</strong> management data, <strong>the</strong> managed device may run some<br />
installation procedures (e.g. execute schema upgrade SQL scripts in <strong>the</strong> target<br />
database) to complete <strong>the</strong> task.<br />
Centralizing configuration data in this way solves <strong>the</strong> important problem <strong>of</strong> knowing <strong>the</strong><br />
<strong>state</strong> <strong>of</strong> any managed node at any point in time. The configuration manager can alter<br />
that <strong>state</strong> into a new consistent <strong>state</strong> by asking <strong>the</strong> device to subscribe to a new<br />
publication or refresh an old publication.<br />
The rendering device: In order to provide <strong>the</strong> media service to <strong>the</strong> end user, <strong>the</strong><br />
rendering device acquires applications and content data from <strong>the</strong> four components<br />
mentioned above. Within <strong>the</strong> database <strong>of</strong> this device, data may be organized as shown<br />
in Figure 7.
The diagram shows that <strong>the</strong> rendering device operates on data that it obtains from a<br />
number <strong>of</strong> sources. The data has been organized into logical databases, each <strong>of</strong> which<br />
may be synchronized with <strong>the</strong> source (master database) <strong>of</strong> <strong>the</strong> data. Much <strong>of</strong> this data<br />
is downloaded or pushed to <strong>the</strong> device as needed.<br />
The sequence <strong>of</strong> steps needed to query video content from a content library and deliver<br />
it to a rendering device has been described in outline earlier in this document in <strong>the</strong><br />
section on System Functionality. Figure 8 below shows how <strong>the</strong> various information<br />
resources contribute to resolving a user’s query.
Queries can take advantage <strong>of</strong> any or all <strong>of</strong> <strong>the</strong> metadata associated with <strong>the</strong> media in order<br />
to focus down on desired content. Figure 8 shows <strong>the</strong> use <strong>of</strong> two types <strong>of</strong> metadata:<br />
enumerated and free text. The user queries against both <strong>of</strong> <strong>the</strong>m.<br />
Users may retain <strong>the</strong>ir queries for reuse. In our use case, Amy wants to find recent video<br />
news clips that she has not seen yet about her favorite rock band’s world tour. She may<br />
wish to re-execute this query every few days to find recent news. Each query is made up <strong>of</strong><br />
a single row in <strong>the</strong> CONTENT_QUERY table which is linked to one or more rows in <strong>the</strong><br />
enumerated and free text tables, each <strong>of</strong> which represents a condition that must be met with<br />
regard to this content.<br />
The matchmaking procedure finds clips where <strong>the</strong> metadata and query items match, and it<br />
and produces rows in a QUERY_MATCH table. This table has a separate entry for each<br />
piece <strong>of</strong> content whose met data matches <strong>the</strong> query criteria. In this example <strong>the</strong> criteria will<br />
be: Amy’s favorite band, news clips, not yet seen. In <strong>the</strong> real world, <strong>the</strong> query may interact<br />
with Amy’s preferences about which news sources she prefers and how much she is willing<br />
to pay for this kind <strong>of</strong> content.<br />
The packaging procedure goes through <strong>the</strong> QUERY_MATCH table and creates rows in <strong>the</strong><br />
SEGMENT_ASSIGNMENT table <strong>of</strong> all content that matches <strong>the</strong> query and that has not yet<br />
been assigned to <strong>the</strong> rendering device. This step protects Amy from inadvertently<br />
downloading <strong>the</strong> same content twice. Amy will interact with this list, ei<strong>the</strong>r directly or through<br />
matching to her preferences, to determine what she will actually download. Rows in this<br />
table will be used to parameterize Amy’s content publication so that it defines <strong>the</strong> content <strong>of</strong><br />
current interest to her.
WP3.1 <strong>Deliverable</strong><br />
Cantata<br />
(ITEA 05010)<br />
Version 0.14<br />
Page 60 <strong>of</strong> 63<br />
At this point, <strong>the</strong> rendering device is able to obtain content by forwarding a refresh request<br />
to <strong>the</strong> content library, asking it to refresh <strong>the</strong> data <strong>of</strong> <strong>the</strong> CONTENT_OF_REPLICA (replica<br />
ID) publication. It is here that <strong>the</strong> content assigned to a replica can be downloaded to <strong>the</strong><br />
device or terminal.<br />
Because <strong>of</strong> <strong>the</strong> vast quantity <strong>of</strong> digital content, providing users with an easy way to locate<br />
content <strong>of</strong> interest to <strong>the</strong>m is key to <strong>the</strong> usability <strong>of</strong> <strong>the</strong> system. Technically, this comes down to<br />
giving users an intuitive way to create queries against content meta-data stores. It must be easy<br />
for both <strong>the</strong> naïve and <strong>the</strong> skilled user to define a query over a range <strong>of</strong> media servers. Queries<br />
must provide powerful and flexible search functions, including ways to select by <strong>the</strong> content <strong>of</strong><br />
<strong>the</strong> media. Searches must be efficient, i.e. fast to execute. User <strong>of</strong> <strong>the</strong> system must be able to<br />
retain queries for re-execution against new media or o<strong>the</strong>r media servers.
6 References<br />
WP3.1 <strong>Deliverable</strong><br />
Cantata<br />
(ITEA 05010)<br />
Version 0.14<br />
Page 61 <strong>of</strong> 63<br />
[1] M. Boliek, C. Christopoulos, and E. Majani, "JPEG2000 P<strong>art</strong> I Final Draft International<br />
Standard," ISO/IEC JTC1/SC29/WG1, Report September 25, 2000 2000.<br />
[2] J. Editors, "JPEG-2000 image coding system - P<strong>art</strong> 11: Wireless JPEG-2000 -<br />
Committee Draft," ISO/IEC/SC29/WG1 (JPEG), CD, 2005.<br />
[3] H. M. Radha, M. v. d. Schaar, and Y. Chen, "The MPEG-4 Fine-grained Scalable Video<br />
Coding for Multimedia Streaming over IP," IEEE Transactions on Multimedia, vol. 3, pp.<br />
53-68, 2001.<br />
[4] W. Li, "Streaming Video Pr<strong>of</strong>ile in MPEG-4," IEEE Transactions on Circuits and Systems<br />
for Video Technology, vol. 11, pp. 301-317, 2001.<br />
[5] C. Brislawn and P. Schelkens, "JPEG 2000 P<strong>art</strong> 12: Extensions for Three-Dimensional<br />
and Floating Point Data Scope and Requirements document, draft version 1," ISO/IEC<br />
JTC1/SC29/WG1, Sydney, Australia, Report WG1N2378, November 12-16, 2001 2001.<br />
[6] ISO/IEC, "JPEG 2000 image coding system – P<strong>art</strong> 11: Wireless JPEG 2000," ISO/IEC<br />
JTC1/SC29/WG11, N3386, 2004.<br />
[7] S.-J. Choi and J. W. Woods, "Motion-compensated 3-D subband coding <strong>of</strong> video," IEEE<br />
Transactions on Image Processing, vol. 8, pp. 155-167, 1999.<br />
[8] S. Han and B. Girod, "SNR Scalable Coding with Leaky Prediction," ITU-T Q.6/SG16,<br />
VCEG-N53 2001.<br />
[9] H. C. Huang, C.-N. Wang, and T. Chiang, "A Robust Fine Granularity Scalability Using<br />
Trellis Based Predictive Leak," IEEE Transactions on Circuits and Systems for Video<br />
Technology, vol. 12, pp. 372-385, 2002.<br />
[10] F. Wu, S. Li, and Y.-Q. Zhang, "A Framework for Efficient Progressive Fine Granularity<br />
Scalable Video Coding," IEEE Transactions on Circuits and Systems for Video<br />
Technology, vol. 11, pp. 332-344, 2001.<br />
[11] Y. He, R. Yan, F. Wu, and S. Li, "H.26L-based fine granularity scalable video coding,"<br />
ISO/IEC JTC1/SC29/WG1, M7788, December 2001 2001.<br />
[12] F. Wu, S. Li, R. Yan, X. Sun, and Y.-Q. Zhang, "Efficient and Universal Scalable Video<br />
Coding," presented at IEEE International Conference on Image Processing (ICIP),<br />
Rochester, NY, USA, 2002.<br />
[13] J.-R. Ohm, "Three-dimensional subband coding with motion compensation," IEEE<br />
Transactions on Image Processing, vol. 3, pp. 559-571, 1994.<br />
[14] B. Pesquet-Popescu and V. Bottreau, "Three Dimensional Lifting Schemes for Motion<br />
Compensated Video Compression," presented at IEEE International Conference on<br />
Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, USA, 2001.<br />
[15] A. Secker and D. Taubman, "Motion-Compensated Highly Scalable Video Compression<br />
using Adaptive 3D Wavelet Transform Based on Lifting," presented at IEEE International<br />
Conference on Image Processing (ICIP), Thessaloniki, Greece, 2001.<br />
[16] A. Secker and D. Taubman, "Lifting-Based Invertible Motion Adaptive Transform<br />
(LIMAT) Framework for Highly Scalable Video Compression," IEEE Transactions Image<br />
Processing, vol. 12, pp. 1530-1542, 2003.<br />
[17] P. Chen and J. W. Woods, "Bidirectional MC-EZBC with Lifting Implementation," IEEE<br />
Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 1183-1194,<br />
2004.
WP3.1 <strong>Deliverable</strong><br />
Cantata<br />
(ITEA 05010)<br />
Version 0.14<br />
Page 62 <strong>of</strong> 63<br />
[18] J. W. Woods and J.-R. Ohm, "Special issue on subband/wavelet interframe video<br />
coding," Signal Processing: Image Communication, vol. 19, 2004.<br />
[19] D. S. Turaga, M. v. d. Schaar, Y. Andreopoulos, A. Munteanu, and P. Schelkens,<br />
"Unconstrained Motion Compensated Temporal Filtering (UMCTF) for Efficient and<br />
Flexible Interframe Wavelet Video Coding," Signal Processing: Image Communication,<br />
to appear.<br />
[20] T. Wiegand and G. Sullivan, "Draft ITU-T Recommendation and Final Draft International<br />
Standard <strong>of</strong> Joint Video Specification," ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6<br />
2003.<br />
[21] SMPTE 421M: VC-1 Compressed Video Bitstream Format and Decoding Process.<br />
http://www.micros<strong>of</strong>t.com/windows/windowsmedia/forpros/events/NAB2005/VC-1.aspx<br />
[22] http://www.w3.org/XML/<br />
[23] Transformation with XSL.<br />
http://www.adobe.com/designcenter/indesign/<strong>art</strong>icles/indcs2at_xsl/indcs2at_xsl.pdf<br />
[24] XML Transformation Flow Processing.<br />
http://www.mulberrytech.com/Extreme/Proceedings/typesetpdf/2001/Euzenat01/EML2001Euzenat01.pdf<br />
[25] Grundy, J. and Yang, B. 2003. An environment for developing adaptive, multi-device<br />
user interfaces. In Proceedings <strong>of</strong> <strong>the</strong> Fourth Australasian User interface Conference on<br />
User interfaces 2003 - Volume 18 (Adelaide, Australia). R. Biddle and B. Thomas, Eds.<br />
ACM International Conference Proceeding Series, vol. 36. Australian Computer Society,<br />
Darlinghurst, Australia, 47-56.<br />
[26] Grundy, J. and Zou, W. AUIT: Adaptable User Interface Technology, with Extended Java<br />
Server Pages, in: Seffah, A. and Javahery, H. (eds.) Multiple User Interfaces:<br />
Crossplatform applications and context-aware interfaces, pages 149-167, Wiley, 2004.<br />
[27] SEESCOA http://www.cs.kuleuven.ac.be/cwis/research/distrinet/projects/SEESCOA/<br />
[28] Complete list <strong>of</strong> web-browsers (including mobile browsers or micro-browsers)<br />
http://en.wikipedia.org/wiki/List_<strong>of</strong>_web_browsers.<br />
[29] TWEEP – Design and implementation <strong>of</strong> a multilingual Web server with adapted<br />
interfaces to PC and Television.<br />
http://www.vicomtech.es/ingles/html/proyectos/index_proyecto46.html<br />
[30] Sitemesh: web-page layout and decoration framework.<br />
http://today.java.net/pub/a/today/2004/03/11/sitemesh.html<br />
[31] CC/PP Information Page http://www.w3.org/Mobile/CCPP/<br />
[32] RDF (Resource <strong>Description</strong> Framework) http://www.w3.org/RDF/<br />
[33] Device Independency <strong>of</strong> W3C http://www.w3.org/2001/di/<br />
[34] Delivery Context: http://www.w3.org/TR/2005/WD-DPF-20051111/<br />
[35] JSR 188 http://jcp.org/aboutJava/communityprocess/final/jsr188/index.html<br />
[36] http://www.openmobilealliance.org/<br />
[37] White Paper on UAPr<strong>of</strong> Best Practices Guide.<br />
http://www.openmobilealliance.org/docs/OMA-WP-UAPr<strong>of</strong>_Best_Practices_Guide-<br />
20060718-A.pdf<br />
[38] Example <strong>of</strong> Web UI adaptation.<br />
http://users.tkk.fi/~majakobs/<strong>the</strong>sis/WebUIAdaptation.pdf<br />
[39] UIML http://www.uiml.org/<br />
[40] XIML eXtensible Interface Markup Language.
WP3.1 <strong>Deliverable</strong><br />
Cantata<br />
(ITEA 05010)<br />
Version 0.14<br />
Page 63 <strong>of</strong> 63<br />
[41] Paternò. F and Santoro. C One model, many interfaces. In Ch Kolski and J.<br />
Vanderdonckt (Eds), editors, Proceedings <strong>of</strong> <strong>the</strong> 4 th International Conference on<br />
Computer-Aided Design <strong>of</strong> User Interfaces CADUI’2002 (Valenciennes, 15-17 May<br />
2002), pages 143-154, Dordrecht, 2002. Kluwer Academics Publishers.<br />
[42] Paternò F., Mancini C., Meniconi S. ConcurTaskTrees: A Diagrammatic Notation for<br />
Specifying Task Models.<br />
[43] Zimmermann, G., Vanderheiden, G., Gilman, A. “Prototype Implementations for a<br />
Universal Remote Console Specification,” in CHI'2002. 2002. Minneapolis, MN: pp. 510-<br />
511.<br />
[44] LASeR and SAF.<br />
http://www.mpeg-laser.org<br />
[45] ISO/IEC 14496-1: Systems.<br />
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=38559<br />
[46] ISO/IEC 14496-2: Visual.<br />
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=39259<br />
[47] ISO/IEC 14496-3: Audio.<br />
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=42739<br />
[48] ISO/IEC 14496-10: Advanced Video Coding<br />
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=43058<br />
[49] ISO/IEC 14496-20: Lightweight Application Scene Representation (LASeR) and Simple<br />
Aggregation Format (SAF).<br />
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=41650<br />
[50] ISO/IEC 14496-11: Scene description and application engine.<br />
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=38560<br />
[51] http://www.fipa.org/specs/fipa00091/PC00091A.html<br />
[52] Siléo C. Hutzler G. MATE: un éditeur de texte basé sur une société d’agents réactifs,<br />
RSTI/hors série. JFSMA 2003.<br />
[53] http://www.arcs<strong>of</strong>t.com/products/videoimpression/ : Video Impression<br />
[54] http://www.arcs<strong>of</strong>t.com/products/mobiledevicesolution/photo.asp : PhotoBase Deluxe<br />
[55] http://www.iris.tv/indexFlash.htm: IRIS<br />
[56] http://www.3rdisecure.tv/domestic_products.asp : 3rdi<br />
[57] http://www.dlink.com/products/?pid=500&sec=0 : DLink DCS-2120 Wireless Internet<br />
Camera with 3G Mobile Video Support<br />
[58] http://www.neiongfx.com/neion-video-surveillance-mobile.html:<br />
[59] http://visiowave.com/ : Nioo Visio<br />
[60] http://www.3rdeye.ro/index.php?mod=aplic: 3rdeye<br />
[61] http://www.dlna.org