Information Retrieval Techniques for non-textual media - Prof. A ...
Information Retrieval Techniques for non-textual media - Prof. A ...
Information Retrieval Techniques for non-textual media - Prof. A ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
INFORMATION RETRIEVAL<br />
TECHNIQUES FOR<br />
NON-TEXTUAL MEDIA<br />
A.Bala subramanian
World of <strong>In<strong>for</strong>mation</strong> explosion<br />
• <strong>In<strong>for</strong>mation</strong> produced globally are<br />
enormous<br />
• Print, film, magnetic, and optical<br />
storage <strong>media</strong> produced about 5<br />
exabytes (10^18 bytes) of new<br />
in<strong>for</strong>mation in 2002 itself (U of Calif-<br />
B).
Media Storage<br />
• Ninety-two percent of the new in<strong>for</strong>mation was<br />
stored on magnetic <strong>media</strong>, mostly in hard disks.<br />
• If digitized, the nineteen million books and other<br />
print collections in the Library of Congress would<br />
contain about ten terabytes of in<strong>for</strong>mation;<br />
• Hard disks store most new in<strong>for</strong>mation.<br />
• Film represents 7% of the total, paper 0.01%,and<br />
optical <strong>media</strong> 0.002%.
United States alone<br />
• The produces about 40% of the world's new<br />
stored in<strong>for</strong>mation,<br />
• including 33% of the world's new printed<br />
in<strong>for</strong>mation,<br />
• 30% of the world's new film titles,<br />
• 40% of the world's in<strong>for</strong>mation stored on<br />
optical <strong>media</strong>, and<br />
• about 50% of the in<strong>for</strong>mation stored on<br />
magnetic <strong>media</strong>.
Population Reference Bureau<br />
• the world population is 6.3 billion,<br />
• thus almost 800 MB of recorded in<strong>for</strong>mation<br />
is produced per person each year.<br />
• It would take about 30 feet of books to store<br />
the equivalent of 800 MB of in<strong>for</strong>mation on<br />
paper.
<strong>In<strong>for</strong>mation</strong> Flows<br />
• <strong>In<strong>for</strong>mation</strong> flows through electronic<br />
channels -- telephone, radio, TV, and the<br />
Internet – contained almost 18 exabytes<br />
• three and a half times more than is recorded<br />
in storage <strong>media</strong>.<br />
• Ninety eight percent of this total is the<br />
in<strong>for</strong>mation sent and received in telephone<br />
calls - including both voice and data on both<br />
fixed lines and wireless.
The WWW<br />
• contains about 170 terabytes of in<strong>for</strong>mation<br />
on its surface;<br />
• Instant messaging generates five billion<br />
messages a day (750GB), or 274 Terabytes a<br />
year.<br />
• Email generates about 400,000 terabytes of<br />
new in<strong>for</strong>mation each year worldwide.
P2P file exchange<br />
• P2P file exchange on the Internet is growing<br />
rapidly.<br />
• Seven percent of users provide files <strong>for</strong><br />
sharing, while 93% of P2P users only<br />
download files.<br />
• The largest files exchanged are video files<br />
larger than 100 MB, but the most frequently<br />
exchanged files contain music (MP3 files).
Web content<br />
• The <strong>media</strong>n size of HTM/HTML pages was 8 KB, but the<br />
mean was 605 KB.<br />
• About 23% included images and<br />
• 4% contained movies or animations, and about 20%<br />
contained Javascript applications.<br />
• There are about 2.9 million active weblogs ('blogs'),<br />
containing about 81 GB of in<strong>for</strong>mation.<br />
• About 62 billion emails are sent daily, on the Internet<br />
and elsewhere<br />
• The average email is about 59 kilobytes in size, thus<br />
the annual flow of emails worldwide is 667,585<br />
terabytes.
Peer to Peer (P2P) File Sharing<br />
• A significant new source of storing, creating<br />
and exchanging <strong>media</strong> and data on the<br />
Internet is through P2P file sharing networks.<br />
• One of the most popular of these applications,<br />
has recently reached over 230 million<br />
downloads worldwide, with an average of 2<br />
million more per week (source:<br />
Download.com)
AVI & mp3 files<br />
• In looking at file sizes, users frequently exchange<br />
files larger than 100 MB<br />
• The largest file types are .AVI Files.<br />
• The most common files shared by P2P users are MP3<br />
files, music files encoded using MP3 technology.<br />
• Images (jpg, bmp) are also popular but take up much<br />
less space.<br />
• Sixty percent of the files on users' hard disks were<br />
MP3 files, taking up about 30% of the space.
Use of <strong>non</strong>-<strong>textual</strong> digital<br />
documents<br />
• The use of <strong>non</strong>-<strong>textual</strong> digital documents is<br />
increasing.<br />
• A multi<strong>media</strong> document may be seen as a<br />
complex in<strong>for</strong>mation object with<br />
components of different kinds such as text,<br />
images, video, animation and audio under<br />
static displays or streaming applications.
Growth of multi<strong>media</strong> content<br />
• The exponential growth of multi<strong>media</strong> content,<br />
in on-line databases, broadcast and streaming of<br />
<strong>media</strong> over the internet, demands effective<br />
access and retrieval techniques of both <strong>textual</strong><br />
and <strong>non</strong>-<strong>textual</strong> resources.<br />
• There is a need to have more sophisticated<br />
technology <strong>for</strong> modeling of multi<strong>media</strong> data<br />
under networked environment.<br />
• More user oriented methods <strong>for</strong> indexing,<br />
compressing and warehousing of multi<strong>media</strong><br />
in<strong>for</strong>mation have been identified and used.
INFORMATION STORAGE AND<br />
RETRIEVAL<br />
• <strong>In<strong>for</strong>mation</strong> storage begins with the<br />
collection of data and represent them<br />
digitally.<br />
• It may be a text or other <strong>media</strong> data.<br />
• In order to be able to access and process the<br />
data in ways that are applicable to the <strong>media</strong><br />
type, it is necessary to define structures and<br />
data types <strong>for</strong> these binary strings.
Databases<br />
• Collection of all organised data <strong>for</strong>ms a<br />
database.<br />
• Databases are stored digitally in the <strong>for</strong>m of<br />
files with appropriate file extensions.<br />
• The file extensions denote the type of digital<br />
data contained in them.<br />
• File type and file extensions are mostly used<br />
by the user to fetch the links during a search.
Search/<strong>Retrieval</strong>/data model<br />
• Underlying data model: hierarchic, network,<br />
relational, object-oriented,<br />
• Primary data-type: tabular, text, image, map/spatial,<br />
audio, video,<br />
• Content: document, record, multi<strong>media</strong>,<br />
bibliographic, statistical, geographic,<br />
• Primary DB usage: administrative, library, geographic,<br />
museum,<br />
• Architecture: centralized, distributed, homogeneous,<br />
heterogeneous, or<br />
• Access type: real-time, interactive Web.
Metadata<br />
• The word Metadata is used to define 'data about<br />
data' or 'in<strong>for</strong>mation about in<strong>for</strong>mation'.<br />
• In the other words, metadata is data that<br />
describe in<strong>for</strong>mation resources.<br />
• Metadata captures the wide range of intrinsic or<br />
extrinsic in<strong>for</strong>mation about a variety of objects.<br />
• These intrinsic or extrinsic characteristics and<br />
features are described in the individually<br />
structured data elements that facilitate object<br />
use, identification and discovery.
Metadata<br />
• is used to speed up and enrich searching <strong>for</strong><br />
resources.<br />
• search queries using metadata can save users from<br />
per<strong>for</strong>ming more complex filter operations manually.<br />
• Metadata can be divided into 3 distinct categories as<br />
• descriptive,<br />
• administrative and<br />
• structural.<br />
• Search engines are used to search the desired<br />
in<strong>for</strong>mation from these databases using search key<br />
words.
Types of Search Queries<br />
• Most of the searches fall into the following 3<br />
categories:<br />
• <strong>In<strong>for</strong>mation</strong>al - seeking static in<strong>for</strong>mation<br />
about a topic<br />
• Transactional - shopping at, downloading<br />
from, or otherwise interacting with the result<br />
• Navigational - send me to a specific URL.
Notable works<br />
• Multi<strong>media</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong> Systems<br />
• Applications of video-content analysis and<br />
retrieval<br />
• Intelligent Multi<strong>media</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong><br />
• Content-based multi<strong>media</strong> in<strong>for</strong>mation<br />
retrieval:<br />
• Content based image retrieval were some of<br />
the works done on these areas.
THE MULTIMEDIA ANALYSIS AND<br />
RETRIEVAL SYSTEM<br />
• The Multi<strong>media</strong> Analysis and <strong>Retrieval</strong> System (MARS)<br />
was developed by Sharad Mehrotra of the<br />
Department of <strong>In<strong>for</strong>mation</strong> and Computer Science,<br />
University of Cali<strong>for</strong>nia.<br />
• The goals of the MARS project are to design and<br />
develop an integrated multi<strong>media</strong> in<strong>for</strong>mation<br />
retrieval and database management infrastructure,<br />
entitled Multi<strong>media</strong> Analysis and <strong>Retrieval</strong> System<br />
(MARS), that supports multi<strong>media</strong> in<strong>for</strong>mation as firstclass<br />
objects suited <strong>for</strong> storage and retrieval based on<br />
their content.
Multi<strong>media</strong> Analysis and <strong>Retrieval</strong><br />
System<br />
• four sub-areas<br />
• Multi<strong>media</strong> Content Representation: extraction of multi<strong>media</strong> content<br />
and content-based representation of multi<strong>media</strong> objects in databases.<br />
• Multi<strong>media</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong>: content-based multi<strong>media</strong> retrieval<br />
techniques including multi<strong>media</strong> retrieval models and interactive query<br />
refinement techniques.<br />
• Multi<strong>media</strong> Feature Indexing: that overcomes the high-dimensionality<br />
and <strong>non</strong>-Euclidean nature of feature data to efficiently support retrieval<br />
based on feature similarity.<br />
• Multi<strong>media</strong> Database Management: techniques to effectively and<br />
efficiently incorporate content-based retrieval of multi<strong>media</strong><br />
in<strong>for</strong>mation into structured database processing.
<strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong> Systems (IRS)<br />
• Text-based Applications<br />
• Text-based applications were among the first<br />
computer applications, initiated by the users to<br />
analyze large quantities of text documents.<br />
• Other early applications included establishment<br />
of in<strong>for</strong>mation retrieval systems (IRS) to support<br />
management, analysis and retrieval of<br />
in<strong>for</strong>mation from digitized medical journals and<br />
legal documents, public access catalogs, on-line<br />
catalogs (OPACs), and digital libraries, insurance<br />
documents, and digitized literature.
Early text-based applications<br />
• include,<br />
• 1) literature analysis<br />
• 2) language translation, and<br />
• 3) compilation of programs.<br />
• As access to network communications has gradually<br />
increased and text-based applications have moved to<br />
the public domain, with the most familiar and<br />
commonly used services being e-mail, Web sites with<br />
tagged and hyper-linked documents. From a data<br />
management (storage and retrieval) point of view,<br />
text-data is unstructured
Ways of locating text documents<br />
• There are 2 ways of locating text documents as<br />
• 1) Directly, using a set of query terms to be<br />
matched to the terms used in the body of<br />
documents to determine relevancy to the query or<br />
• 2) Indirectly, using the metadata describing the<br />
document, as done in most cataloging systems.<br />
• In principle, there is no limit to the number of<br />
search terms that can be included in a query.
Architecture of a Search Engine<br />
• A search engine contains all data<br />
managers, parsers, spiders, refresh<br />
managers and Graphic applications.<br />
• These are the channels of<br />
communication <strong>for</strong> in<strong>for</strong>mation<br />
retrieval.
MIMD architectures<br />
• Document /data /term partitioning-logical,<br />
physical, inverted indexing, etc<br />
• Models- Probabilistic, Bayesian networks,<br />
ranking, Fuzzy logic, Semantic indexing,<br />
extended Boolean , vector model<br />
• Machine Learning-<strong>In<strong>for</strong>mation</strong> extraction,<br />
sliding windows and boundary dtection,<br />
supervised learning, Hidden Markov Models,
Situation Today<br />
• Managing IRS is an expensive affair, we need to go <strong>for</strong><br />
alternative architectures and algorithms<br />
• Parallel and Distributed Inf. <strong>Retrieval</strong><br />
• Parallel Computing is the simultaneous application of<br />
multiple processors to solve a single problem<br />
• Flynn’s Taxonomy:<br />
Flynn’s Taxonomy:<br />
• SISD- Single instruction, single data<br />
• SIMD- Single instruction , multiple data<br />
• MISD- multiple instruction, single data<br />
• MIMD-multiple instruction, multiple data
Multi<strong>media</strong> Mining Hierarchy<br />
• Multi<strong>media</strong> data – image, video, audio,<br />
animation<br />
• Data segmentation- object-oriented<br />
representation, feature extraction, additional<br />
in<strong>for</strong>mation<br />
• Pattern extraction- case(event) definition,<br />
knowledge representation, in<strong>for</strong>mation<br />
modeling<br />
• MMDB
<strong>Retrieval</strong> of Images<br />
• Images were treated as document-<br />
like –objects, considering the set of<br />
image –specific elements.<br />
• Images, movies, speeches and music<br />
were characterized as document<br />
like objects(the size, codec,<br />
presenter, other options, security)
Image-based Applications<br />
• due to computer capacity limitations, development of<br />
image-based computer applications followed the path of<br />
text-based applications.<br />
• Perhaps the first image-based computer applications were<br />
developed <strong>for</strong> weather <strong>for</strong>ecasting, though other early<br />
applications included<br />
• Map generation and manipulation,<br />
• Visual analysis of experiment data - in chemistry, physics<br />
and statistics,<br />
• Visualization of mathematical functions, and<br />
• CAD/CAM (computer aided design and manufacture) <strong>for</strong><br />
ship building, quickly followed by architecture,<br />
landscaping, interior decorating and fashion designing.
Raster and vector images<br />
• CAD/CAM or geographic maps, business<br />
graphics which are generated from database<br />
values and interactive applications which<br />
might have different content <strong>for</strong> each user<br />
are considered to be no-document –like<br />
objects.<br />
• These sources do not contain images but<br />
they generate images.
Images<br />
• Images offer a number of technological and<br />
descriptive challenges peculiar to themselves.<br />
• Textual materials can be indexed and classified<br />
even through automated methods.<br />
• But the encoding schemes are critical <strong>for</strong> using<br />
images.<br />
• Rendering images require web graphic display<br />
facilities with wide differences in display<br />
properties.
<strong>In<strong>for</strong>mation</strong> necessary <strong>for</strong><br />
rendering<br />
• are Type ( Bit – mapped , vector , video ),<br />
• Format ( TIFF, GIF, JFIF, PICT, PCD , Photoshop,<br />
CGM,TGA ),<br />
• Compression schemes & ratios ( JPEG ,LZW,<br />
QuickTime … ),<br />
• Dimensions, Dynamic range and<br />
• Color look up tables and related matrices<br />
(CYMK, RGB ….).
Characteristics of original image<br />
capture<br />
• The (scanning) including<br />
• light source (full spectrum or infrared),<br />
• resolution,<br />
• dynamic range,<br />
• type of scanners,<br />
• date of scan,<br />
• journal/ audit trails and<br />
• digital signatures <strong>for</strong> authentication.
Image retrieval elements<br />
• All these are the critical elements of<br />
metadata <strong>for</strong> a particular image or<br />
collection.<br />
• Several element names were used<br />
to make the image description less<br />
text -centric.
The elements used <strong>for</strong> image<br />
representation are:<br />
• Subject (keywords, controlled vocabulary<br />
terms & <strong>for</strong>mal classification designators),<br />
• Description ( descriptive prose or content<br />
description) and<br />
• Rights management<br />
Rights management field ( Null , No<br />
restrictions on Reuse, URL or other pointers)
Image <strong>Retrieval</strong><br />
• Popular knowledge claims that an image is worth<br />
1000 words. Un<strong>for</strong>tunately, these 1000 words may<br />
differ from one individual to another depending on<br />
their perspective and/or knowledge of the image<br />
context.<br />
• The problem is fundamentally one of communication<br />
between an in<strong>for</strong>mation/image seeker/user and the<br />
image retrieval system.<br />
• Since the user may have differing needs and<br />
knowledge about the image collection, an image<br />
retrieval system must support various <strong>for</strong>ms <strong>for</strong><br />
query <strong>for</strong>mulation.
Image retrieval queries<br />
• can be classified as:<br />
• Attribute-based queries, which use context and/<br />
structural metadata values to retrieve images.<br />
• Textual queries, which use a term-based<br />
specification of the desired images that can be<br />
matched to <strong>textual</strong> image descriptors.<br />
• Visual queries, which give visual characteristics<br />
(color, texture) or an image example, that can be<br />
compared to visual descriptors.
Query types utilize different<br />
image descriptors<br />
image descriptors<br />
• Image descriptors can be classified into:<br />
• 1) Metadata descriptors, those that describe the image, as<br />
recommended in the numerous metadata standards.<br />
• These metadata can again be classified as:<br />
• 1) Attribute-based context and structural metadata, such<br />
as creator, dates, genre, (source) image type, size, file<br />
name, ..., or<br />
• 2) Text-based semantic metadata, such as a) title/caption,<br />
subject/keyword lists, free-text descriptions and/or the<br />
text surrounding embedded images, <strong>for</strong> example as used in<br />
a html document.<br />
• b) Visual descriptors that can be extracted from the image<br />
during the storage process by an image retrieval system.
Text-based image retrieval :<br />
• Often, the image requester is able to give a<br />
verbal description/specification of the<br />
content of the required images.<br />
• These text-based queries can be <strong>for</strong>mulated<br />
as free-text or a term list that can be<br />
compared to such text descriptors as<br />
description, subjects, title and/or the text<br />
surrounding an embedded image, , using the<br />
text retrieval techniques.
Query by Image <strong>Retrieval</strong><br />
(Content Based IR)<br />
• the term 'image content' refers to<br />
the implementation content of the<br />
image, i.e. to its pixel content.<br />
image, i.e. to its pixel content.<br />
• The objective of CBIR research and<br />
development activities is to develop<br />
automated routines that can analyze a digital<br />
image or image stream and identify the<br />
objects, actions and events that it portrays.
• Currently, CBIR routines are relatively adept at<br />
identifying color and texture distributions, as<br />
well as primitive shapes.<br />
• This in<strong>for</strong>mation is the basis <strong>for</strong> specification<br />
of one or more image signatures that can act<br />
as a surrogate <strong>for</strong> the image and that can be<br />
indexed to provide rapid access to elements in<br />
the image collection.
Using content descriptors:<br />
• color and texture - An image is basically a<br />
long string of pixels <strong>for</strong> which each pixel is<br />
identified by its place in the image matrix, its<br />
color and its intensity.<br />
• An analysis of the pixel set can give<br />
in<strong>for</strong>mation about the distribution of<br />
dominant colors, the image texture, and the<br />
shapes <strong>for</strong>med by marked change in<br />
neighbouring colors.
Basic color profile<br />
• Though there are many techniques used <strong>for</strong> color<br />
and texture extraction, they are variations<br />
and/or refinements of a basic color profile<br />
• An image query is given as an example or seed<br />
image or sketch <strong>for</strong> which a feature vector is<br />
calculated in the same way as that used <strong>for</strong> the<br />
DB image set.<br />
• The query signature is then compared to the DB<br />
image signatures, using a distance measure.<br />
•
Image analysis<br />
• Identifying shapes - image objects:<br />
• Shape identification and recognition is the most<br />
difficult challenge to image analysis since it relies on:<br />
• Isolation of the different objects/shapes within the<br />
image, which may not necessarily be whole or<br />
standardized, but are likely 'hidden' in the perspectives<br />
of the image and then<br />
• Normalizing the object's size and rotation<br />
• Identification and possible connection of object parts.<br />
• Semantic identification of the image components/<br />
objects.
Automatic object recognition<br />
• To date, automatic object recognition has only<br />
been accomplished <strong>for</strong> well-defined domains<br />
where objects of interest are well known and<br />
well defined within the image, such as:<br />
• Police images of faces and fingerprints,<br />
• Medical images from domain x-rays, MRI and<br />
scans, and<br />
• Industrial surveillance of building structures,<br />
such as bridges, tunnels or pipelines.
Visual query requires a query<br />
language<br />
• An object-based visual query requires a query<br />
language that can accept a visual<br />
object/image as an example.<br />
• A shape thesaurus can be developed <strong>for</strong><br />
image collections from specific domains that<br />
can be used in a way similar to that done <strong>for</strong><br />
the terms in text document collections.
Automatic Linguistic Indexing of<br />
Pictures<br />
• Categorized images are used to train a dictionary of hundreds<br />
of statistical models each representing a concept.<br />
• Images of any given concept are regarded as instances of a<br />
stochastic process that characterizes the concept.<br />
• To measure the extent of association between an image and<br />
the <strong>textual</strong> description of a concept, the likelihood of the<br />
occurrence of the image based on the characterizing<br />
stochastic process is computed.<br />
• A high likelihood indicates a strong association.<br />
• Two-dimensional multi-resolution hidden Markov models<br />
(2D MHMMs).
Video <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong> (VIR)<br />
• Recognition technologies<br />
– Image<br />
– Voice<br />
– Text transcripts<br />
• Document retrieval technologies<br />
– Topic segmentation<br />
– Topic matching<br />
– Text summarization<br />
• Presentation Technologies<br />
– Combine Recognition and retrieval<br />
technologies<br />
• Result is an integrated application
Streamed <strong>media</strong> applications:<br />
Source data + playback software<br />
• Streaming <strong>media</strong> is multi<strong>media</strong> that is constantly<br />
received by, and normally presented to, an end-user<br />
while it is being delivered by a streaming provider<br />
(the term "presented" is used <strong>for</strong> denoting audio or<br />
video playback).<br />
• The name refers to the delivery method of the<br />
medium rather than to the medium itself.<br />
• Media delivery systems are either<br />
• inherently streaming (e.g. radio, television) or<br />
• inherently <strong>non</strong>-streaming (e.g. books, video cassettes,<br />
audio CDs).
Streamed <strong>media</strong>/dynamic <strong>media</strong><br />
• is composed of a series of <strong>media</strong> objects that<br />
have a time relationship that is necessary <strong>for</strong><br />
proper communication.<br />
• Audio, film (a series of images) and video (a<br />
combination of audio and film) data belong to<br />
this class.
Streamed-data applications<br />
include<br />
• Speech analysis, Speech command of automated<br />
artifacts: robots, equipment <strong>for</strong> the handicapped,<br />
answering/ordering systems,<br />
• Music archives, search and presentation,<br />
Surveillance and film analysis,<br />
• Animation <strong>for</strong> stories, games, simulations,<br />
• Film archives, search and presentation and Video on<br />
Demand.<br />
• Webcasting, podcasting and on-line radio facilities<br />
are coming under this in<strong>for</strong>mation retrieval.
Streamed data representation<br />
• as image data, have some metadata attributes:<br />
• Context descriptors: artist name(s), publisher,<br />
publication date, ...<br />
• Content descriptors: title, genre, keywords,<br />
description and <strong>for</strong> music: mood, musical theme,<br />
melody, tempo, rhythm and instrumentation,<br />
and<br />
• Structural description, such as presentation<br />
length, storage size, storage/presentation<br />
<strong>for</strong>mat.
Search requests <strong>for</strong> streamed data<br />
• from a <strong>media</strong> collection can be expressed in 3<br />
ways as<br />
• An exact-match query giving data values to be<br />
compared to the regular metadata<br />
attributes(Retrieve all films where the<br />
artist_name='xxx ccccc ' )<br />
• A text-based query using the content descriptive<br />
metadata( Retrieve Hindi devotional songs) and<br />
• An example-based query using a Video or audio<br />
clip or a Whistled tune such as offered at<br />
Musipedia.
Stand-alone Internet radio devices<br />
• are offering listeners a "no-computer" option <strong>for</strong><br />
listening to audio streams.<br />
• In general, multi<strong>media</strong> content is large, so <strong>media</strong><br />
storage and transmission costs are still significant; to<br />
offset this somewhat, <strong>media</strong> is generally compressed<br />
<strong>for</strong> both storage and streaming.<br />
• A <strong>media</strong> stream can be on demand or live.<br />
• On demand streams are stored on a server <strong>for</strong> a long<br />
period of time, and are available to be transmitted at a<br />
user's request.<br />
• Live streams are also available at any time , as in a<br />
video stream of a live sporting event or temple pooja.
Web Server vs. Streaming Server:<br />
CEC Model + ISRO Edusat Model<br />
• There are two major methods of delivering<br />
streaming audio and video content over the<br />
Web.<br />
• The first method uses a standard Web server<br />
(E-content+ LOR) to deliver the audio and<br />
video data to a <strong>media</strong> player. The second<br />
method uses a separate streaming <strong>media</strong><br />
server (Edusat) specialized to the audio/video<br />
streaming task.
Programmes are deployable<br />
through a campus LAN
Watch Live<br />
programmes
Streaming with a Web Server<br />
• Deploying streaming <strong>media</strong> content with the Web<br />
server approach is actually only a small<br />
evolutionary step away from the download-andplay<br />
model.<br />
• Uncompressed audio and video is first compressed<br />
into a single "<strong>media</strong> file" <strong>for</strong> delivery over a specific<br />
network bandwidth such as a 28.8 kilobits per<br />
second (Kbps) modem.
• This <strong>media</strong> file is then placed on a standard Web<br />
server.<br />
• Next, a Web page containing the <strong>media</strong> file's URL is<br />
created and placed on the same Web server.<br />
• This Web page, when activated, launches the clientside<br />
player and downloads the <strong>media</strong> file.<br />
• So far, the actions are identical to those in the<br />
download-and-play case.<br />
• The difference lies in how the client functions.
Streaming with a Streaming Media<br />
Server<br />
• In the streaming <strong>media</strong> server approach, the<br />
initial steps are similar to the Web server<br />
approach, except that the compressed <strong>media</strong><br />
file is produced and copied to a specialized<br />
streaming <strong>media</strong> server (such as Microsoft<br />
Windows Media Services) instead of a Web<br />
server.<br />
• Then a Web page with a reference to the <strong>media</strong><br />
file is placed on a Web server.<br />
• Windows Media Services and the Web server<br />
may run on the same computer.
Data Delivery<br />
• The rest of the streaming <strong>media</strong> server delivery<br />
process differs significantly from the Web server<br />
approach.<br />
• In contrast to the passive burst methodology<br />
employed in Web server streaming, the data is<br />
actively and intelligently sent to the client, meaning<br />
that it delivers the content at the exact data rate<br />
associated with the compressed audio and video<br />
streams.<br />
• The server and the client stay in close touch during<br />
the delivery process, and the streaming <strong>media</strong> server<br />
can respond to any feedback from the client.
Streaming bandwidth and storage<br />
• Streaming <strong>media</strong> storage size (in the<br />
common file system measurements<br />
megabytes, gigabytes, terabytes, and so on)<br />
is calculated from streaming bandwidth and<br />
length of the <strong>media</strong> with the following<br />
<strong>for</strong>mula (<strong>for</strong> a single user and file):<br />
• storage size (in megabytes) = length (in<br />
seconds) * bit rate (in kbit/s) / (8 * 1024)
• For example,<br />
• One hour of video encoded at 300 kbit/s will be:<br />
(3,600 s * 300,000 bit/s) / (8*1024*1024) give<br />
around 128 MB of storage.<br />
• If the file is stored on a server <strong>for</strong> on-demand<br />
streaming and this stream is viewed by 1,000<br />
people at the same time using a Unicast protocol,<br />
you would need: 300 kbit/s * 1,000 = 300,000 kbit/s<br />
= 300 Mbit/s of bandwidth.<br />
• This is equivalent to around 125 GiB per hour.
• Of course, using a Multicast protocol the<br />
server sends out only a single stream that is<br />
common to all users.<br />
• Hence, such a stream would only use 300<br />
kbit/s of serving bandwidth.
Protocol issues<br />
• five major types of protocols used in multi<strong>media</strong><br />
usage in internet.<br />
• Datagram protocols, such as the User Datagram<br />
Protocol (UDP), send the <strong>media</strong> stream as a<br />
series of small packets. ( Edusat streaming<br />
through VLC)<br />
• This is simple and efficient<br />
• It is up to the receiving application to detect loss<br />
or corruption and recover data using error<br />
correction techniques. If data is lost, the stream<br />
may suffer a dropout.
• The Real-time Streaming Protocol (RTSP),<br />
Real-time Transport Protocol (RTP) and the<br />
Real-time Transport Control Protocol (RTCP)-<br />
all were specifically designed to stream<br />
<strong>media</strong> over networks.<br />
• The latter two are built on top of UDP.<br />
• Reliable protocols, such as the Transmission<br />
Control Protocol (TCP), guarantee correct<br />
delivery of each bit in the <strong>media</strong> stream.
Unicast protocols<br />
• send a separate copy of the <strong>media</strong> stream from the<br />
server to each recipient.<br />
• Multicasting broadcasts the same copy of the<br />
multi<strong>media</strong> over the entire network to all clients.<br />
• Multicast protocols were developed to reduce the<br />
data replication (and consequent server/network<br />
loads) that occur when many recipients receive<br />
Unicast content streams independently.<br />
• These protocols send a single stream from the<br />
source to a group of recipients.
Continuous streaming of radio or<br />
television<br />
• Continuous streaming of radio or television<br />
material usually precludes the recipient's<br />
ability to control playback.<br />
• IP Multicast provides a means to send a single<br />
<strong>media</strong> stream to a group of recipients on a<br />
computer network.<br />
• Peer-to-peer (P2P) protocols arrange <strong>for</strong> prerecorded<br />
streams to be sent between<br />
computers.
Benefits of protocols<br />
• UDP – provides the most efficient network<br />
throughput and can have a very positive impact<br />
on the user (player) experience.<br />
• The only downside to UDP is that many network<br />
administrators close their firewalls to UDP traffic,<br />
limiting the potential audience of UDP-based<br />
streams<br />
• TCP –provides an adequate, though not<br />
necessarily efficient, protocol <strong>for</strong> delivering<br />
streaming <strong>media</strong> content from a server to a<br />
client.
VLC – <strong>for</strong> open source streaming<br />
• Direct draw<br />
• Directshow<br />
• Four protocols<br />
• Multiple <strong>for</strong>mats, devices, methods<br />
• Single stream-output- suitable <strong>for</strong><br />
webcasting<br />
• Videoconferencing, live stream recording
Vlc <strong>media</strong> player options
Vlc<br />
• The network on which you setup the VideoLAN<br />
solution can be as small as one ethernet 10/100Mb<br />
switch or hub, and as big as the whole Internet! The<br />
bandwidth needed is:<br />
• 0.5 to 4 Mbit/s <strong>for</strong> an MPEG-4 stream,<br />
• 3 to 4 Mbit/s <strong>for</strong> an MPEG-2 stream read from a<br />
satellite card, a digital terrestial television card or<br />
an MPEG-2 encoding card,<br />
• 6 to 9 Mbit/s <strong>for</strong> a DVD.
Inputs<br />
VLC features page<br />
UDP Unicast /<br />
Multicast<br />
Yes Yes Yes unicast only Yes Yes<br />
Output<br />
RTP Unicast /<br />
Multicast<br />
Yes Yes Yes unicast only Yes Yes<br />
File Yes Yes Yes Yes Yes Yes<br />
HTTP Yes Yes Yes Yes Yes Yes<br />
MMSH Yes Yes Yes Yes Yes Yes<br />
Transcoding Yes Yes Yes Yes Yes Yes<br />
Misc<br />
Interfaces and<br />
more<br />
Send DVD<br />
subtitles<br />
Send SAP<br />
announces<br />
Partial Partial Partial Partial Partial No<br />
Yes Yes Yes Untested Yes Untested<br />
See the VLC features page
The VideoLAN streaming solution<br />
includes:<br />
• VLC <strong>media</strong> player (initially VideoLAN Client), which can be<br />
used as a server to stream MPEG-1, MPEG-2 and MPEG-4 /<br />
DivX files, DVDs and live videos on the network in unicast or<br />
multicast; or used as a client to receive, decode and display<br />
MPEG streams under multiple operating systems,<br />
• VLS (VideoLAN Server), which can stream MPEG-1, MPEG-2<br />
and MPEG-4 files, DVDs, digital satellite channels, digital<br />
terrestial television channels and live videos on the network<br />
in unicast or multicast. Most of the VLS functionality can<br />
now be found in the much better VLC program. Usage of VLC<br />
instead of VLS is advised.
Web 2.0 Ogg bitstream <strong>for</strong>mat<br />
• Free, open std container <strong>for</strong>mat<br />
• Designed to provide efficient streaming and<br />
manipulation of high quality digital<br />
multi<strong>media</strong><br />
• It can multiplex a no of separate independent<br />
free and open source codec <strong>for</strong> audio, video,<br />
text, etc.<br />
• Theora, vorbis,speex, flac, oggPCM,
Streaming Media Services<br />
• Numerous subscription based streaming<br />
<strong>media</strong> services began to appear on the web.<br />
• These services provided a low cost, low<br />
barrier to entry model that allowed small<br />
businesses and internet marketers to place<br />
streaming audio and streaming video on their<br />
websites.<br />
• On more addition was podcasting.
Internet television:<br />
(ERNET Model)<br />
• allows viewers to choose the show they want to<br />
watch from a library of shows.<br />
• The primary models <strong>for</strong> Internet television are<br />
streaming Internet TV or selectable video on an<br />
Internet location, typically a website.<br />
• The video can also be broadcast with a peer-to-<br />
peer network(P2PTV),which doesn't rely on<br />
single website's streaming.<br />
• It differs from IPTV in that IPTV offerings are<br />
typically offered on discrete service provider<br />
networks, requiring a special IPTV set-top-box.
• Internet TV is a quick-to-market and relatively<br />
low investment service.<br />
• Internet TV rides on existing infrastructure<br />
including broadband, ADSL, Wi-Fi, cable and<br />
satellite which makes it a valuable tool <strong>for</strong> a<br />
wide variety of service providers and content<br />
owners looking <strong>for</strong> new revenue streams.<br />
• CEC-ERNET collaboration model shows-<br />
Gyan-Vyas on internet.<br />
• ( Shirdi Sai Baba Sansthan and other religious<br />
websites)
P2PTV<br />
• The term P2PTV refers to peer-to-peer (P2P)<br />
software applications designed to redistribute<br />
video streams in real time on a P2P network;<br />
• the distributed video streams are typically TV<br />
channels from all over the world but may also<br />
come from other sources.<br />
• The draw to these applications is significant<br />
because they have the potential to make any<br />
TV channel globally available.
Peercast<br />
• A Peercast can be used to multicast streaming<br />
audio (MP3, WMA) and/or video (Nullsoft<br />
Streaming Video, or WMV), or any other<br />
stream of data, over the internet.<br />
• Peercast uses a distributed bandwidth<br />
technique to lighten the load of the<br />
broadcaster's upstream bandwidth where<br />
each listener/viewer will relay the stream they<br />
download to one or more additional listeners
• The benefits of using PeerCast is that it allows<br />
any multicasters, particularly small or<br />
independent ones, to distribute their streams<br />
without need <strong>for</strong> much bandwidth, saving<br />
them costs.<br />
• It also allows, theoretically, an infinite number<br />
of listeners as long as there are enough relays
Unreal Media Server<br />
• It is a proprietary streaming server <strong>for</strong> Windows<br />
plat<strong>for</strong>ms, designed to provide multi<strong>media</strong><br />
delivery over LAN and Internet.<br />
• It supports <strong>media</strong> files and live <strong>media</strong> streams.<br />
• Supported file <strong>for</strong>mats: AVI (DivX, XVid, VP6, any<br />
other codecs), MPEG-1/2/4, WMV/WMA, MP3,<br />
ASF, QuickTime.<br />
• Playlist functionality allows automatically playing<br />
all the files of server's virtual folder in a loop<br />
mode.
• Supported live <strong>media</strong> sources include: digital cameras,<br />
microphones, TV-tuners, analog video sources connected to<br />
video card or to Video capture card that supports<br />
DirectShow interface.<br />
• Live audio/video is encoded with WMA/MP3/GSM 6.10 -<br />
WMV/MPEG4 codecs in real time.<br />
• Hardware encoder appliances are supported <strong>for</strong> streaming<br />
hardware compressed content without software<br />
transcoding.<br />
• TCP, HTTP and MMS unicast, RTP multicast transport<br />
protocols are supported.<br />
• For HTTP delivery IIS web server is required on the server<br />
computer.
Unreal Media Server includes 3 major<br />
components<br />
• Streaming video recording : Streaming video recorders can<br />
record not only video streams but also many audio streams so sometimes they are<br />
also called streaming <strong>media</strong> recorders. There are different approaches that are<br />
used by the software to make the recording, depending on which stage of the<br />
process one taps into. In order, they are:<br />
• URL Snooping: Simplest is if the stream is served by simply requesting<br />
it, just as web pages are, as in an HTTP GET request: this will directly copy the<br />
encoded, streamed file. In this case, one simply needs to determine the URL, and<br />
then download that, either by pasting it into one's web browser (location box or<br />
"Open location..."), or via a specialized download manager.<br />
• Encoded capture: Some streaming is not via a simple request to an<br />
URL – in this case, to capture the stream requires some understanding and<br />
implementation of the particular streaming protocol (the encoded <strong>media</strong> stream is<br />
encapsulated within a network stream), either: passively / offline :- capturing the<br />
actual traffic and extracting it (via deep packet capture, using a packet sniffer), or<br />
actively / online :-implementing the streaming protocol / program enough to<br />
request the encoded data.
Decoded capture<br />
• An approach used to record the decoded in<strong>for</strong>mation<br />
at the end level- from the video and sound card of the<br />
computer.<br />
• This is essentially capturing what you are watching or<br />
listening to directly from the screen, and could be<br />
likened to recording off the air with a microphone.<br />
• This solution makes it possible to record anything that<br />
you are able to view or listen to, regardless of original<br />
<strong>for</strong>mat or protection, though it suffers from a loss in<br />
quality (digital generation loss) due to re-encoding.
On-going Research<br />
• Multi-modal in<strong>for</strong>mation retrieval from Broadcast Video<br />
using OCR and Speech recognition.<br />
• multi-modal in<strong>for</strong>mation retrieval from broadcast video<br />
where text can be read on the screen through OCR and<br />
speech recognition can be per<strong>for</strong>med on the audio track.<br />
• OCR and speech recognition are compared on the 2001 TREC<br />
Video <strong>Retrieval</strong> evaluation corpus.<br />
• Results show that OCR is more important than speech<br />
recognition <strong>for</strong> video retrieval.<br />
• OCR retrieval can further improve through dictionary-based<br />
post-processing.
• International Institute of <strong>In<strong>for</strong>mation</strong><br />
Technology, Hyderabad, India have worked on<br />
“A System <strong>for</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong><br />
Applications on Broadcast News Videos.<br />
• This system is specifically designed <strong>for</strong> the<br />
in<strong>for</strong>mation retrieval applications on broadcast<br />
news videos.<br />
• The system is directly useful to an end user <strong>for</strong><br />
easy access to the news stories of interest.<br />
• It also act as a plat<strong>for</strong>m <strong>for</strong> convenient<br />
deployment and experimentation of various<br />
video analysis and indexing techniques on real<br />
data, and on a large scale.
AUDIO INFORMATION RETRIEVAL<br />
• The problem of audio in<strong>for</strong>mation retrieval is<br />
familiar to anyone who has returned from<br />
vacation to an answering machine which may<br />
contain full of messages.<br />
• recent advances in automatic speech<br />
recognition, word spotting, speaker and music<br />
identification, and audio similarity have been<br />
explained.
Music <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong><br />
• to determine the similarity between two given<br />
melodies;<br />
• there are several melodic similarity measures<br />
• Audio Segment <strong>Retrieval</strong> Using a Synthesized<br />
HMM was attempted<br />
• The approach allows a user to query audio data<br />
of any length by one or more example audio<br />
segments and find similar segments.<br />
• Audio identification using sinusoidal modeling<br />
and application to jingle detection
Signal + Context = Better<br />
Classification<br />
• Typical signal-based approaches to extract<br />
musical descriptions from audio only have limited<br />
precision.<br />
• architecture to train a large set of binary<br />
classifiers simultaneously, <strong>for</strong> many different<br />
musical metadata (genre, instrument, mood,<br />
etc.), in such a way that correlation between<br />
metadata is used to rein<strong>for</strong>ce each individual<br />
classifier.<br />
• The system is iterative
Content-based music retrieval<br />
using query integration<br />
• For users whose preferences range in songs with<br />
a wide variety of features.<br />
• MIR method dynamically generates an optimal<br />
set of query vectors from the sample set of songs<br />
submitted by the user to express their<br />
preferences, based on the similarity of the songs<br />
in the sample set.<br />
• Experiments conducted on a music collection<br />
with subjective user ratings were very effective<br />
to improve the accuracy of content based MIR.
Issues to be considered: Flash
Flash video + text + command<br />
buttons
Total e-content in a package<br />
<strong>for</strong>mat
E-content deployment<br />
• Full content coverage of subjects, deployment<br />
architecture, servers (at least 5, database,<br />
webservice, message, backup, intrusion<br />
detection, <strong>media</strong> server), tools and<br />
customization at three levels( content,<br />
localhost , webserver level)<br />
• Distributed network architecture(with<br />
EMMRcs as Hubs)- replication of database<br />
servers
Other issues<br />
• Gateway<br />
• Bandwidth<br />
• Load-balancing<br />
• End user facilitation<br />
• Application environment<br />
• Integrated system
LOR: Front page- Blocked content
Allowing the blocked <strong>media</strong>
Radio Broadcast<br />
TV Broadcast<br />
Videoconference<br />
Night Time<br />
Loading<br />
Asymmetric<br />
Internet<br />
Benefits<br />
On-line<br />
Education<br />
Telephone<br />
Voice Chat<br />
on Internet<br />
WebCAM<br />
Talkback<br />
Internet
Technological Convergence<br />
• Computer professionals<br />
• <strong>In<strong>for</strong>mation</strong> scientists<br />
• Media <strong>Prof</strong>essionals<br />
• User limitations<br />
• Deliverer’s preferences<br />
• Operational issues at local and global levels<br />
• Co-ordinated approach
Thank you