12.01.2015 Views

Information Retrieval Techniques for non-textual media - Prof. A ...

Information Retrieval Techniques for non-textual media - Prof. A ...

Information Retrieval Techniques for non-textual media - Prof. A ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INFORMATION RETRIEVAL<br />

TECHNIQUES FOR<br />

NON-TEXTUAL MEDIA<br />

A.Bala subramanian


World of <strong>In<strong>for</strong>mation</strong> explosion<br />

• <strong>In<strong>for</strong>mation</strong> produced globally are<br />

enormous<br />

• Print, film, magnetic, and optical<br />

storage <strong>media</strong> produced about 5<br />

exabytes (10^18 bytes) of new<br />

in<strong>for</strong>mation in 2002 itself (U of Calif-<br />

B).


Media Storage<br />

• Ninety-two percent of the new in<strong>for</strong>mation was<br />

stored on magnetic <strong>media</strong>, mostly in hard disks.<br />

• If digitized, the nineteen million books and other<br />

print collections in the Library of Congress would<br />

contain about ten terabytes of in<strong>for</strong>mation;<br />

• Hard disks store most new in<strong>for</strong>mation.<br />

• Film represents 7% of the total, paper 0.01%,and<br />

optical <strong>media</strong> 0.002%.


United States alone<br />

• The produces about 40% of the world's new<br />

stored in<strong>for</strong>mation,<br />

• including 33% of the world's new printed<br />

in<strong>for</strong>mation,<br />

• 30% of the world's new film titles,<br />

• 40% of the world's in<strong>for</strong>mation stored on<br />

optical <strong>media</strong>, and<br />

• about 50% of the in<strong>for</strong>mation stored on<br />

magnetic <strong>media</strong>.


Population Reference Bureau<br />

• the world population is 6.3 billion,<br />

• thus almost 800 MB of recorded in<strong>for</strong>mation<br />

is produced per person each year.<br />

• It would take about 30 feet of books to store<br />

the equivalent of 800 MB of in<strong>for</strong>mation on<br />

paper.


<strong>In<strong>for</strong>mation</strong> Flows<br />

• <strong>In<strong>for</strong>mation</strong> flows through electronic<br />

channels -- telephone, radio, TV, and the<br />

Internet – contained almost 18 exabytes<br />

• three and a half times more than is recorded<br />

in storage <strong>media</strong>.<br />

• Ninety eight percent of this total is the<br />

in<strong>for</strong>mation sent and received in telephone<br />

calls - including both voice and data on both<br />

fixed lines and wireless.


The WWW<br />

• contains about 170 terabytes of in<strong>for</strong>mation<br />

on its surface;<br />

• Instant messaging generates five billion<br />

messages a day (750GB), or 274 Terabytes a<br />

year.<br />

• Email generates about 400,000 terabytes of<br />

new in<strong>for</strong>mation each year worldwide.


P2P file exchange<br />

• P2P file exchange on the Internet is growing<br />

rapidly.<br />

• Seven percent of users provide files <strong>for</strong><br />

sharing, while 93% of P2P users only<br />

download files.<br />

• The largest files exchanged are video files<br />

larger than 100 MB, but the most frequently<br />

exchanged files contain music (MP3 files).


Web content<br />

• The <strong>media</strong>n size of HTM/HTML pages was 8 KB, but the<br />

mean was 605 KB.<br />

• About 23% included images and<br />

• 4% contained movies or animations, and about 20%<br />

contained Javascript applications.<br />

• There are about 2.9 million active weblogs ('blogs'),<br />

containing about 81 GB of in<strong>for</strong>mation.<br />

• About 62 billion emails are sent daily, on the Internet<br />

and elsewhere<br />

• The average email is about 59 kilobytes in size, thus<br />

the annual flow of emails worldwide is 667,585<br />

terabytes.


Peer to Peer (P2P) File Sharing<br />

• A significant new source of storing, creating<br />

and exchanging <strong>media</strong> and data on the<br />

Internet is through P2P file sharing networks.<br />

• One of the most popular of these applications,<br />

has recently reached over 230 million<br />

downloads worldwide, with an average of 2<br />

million more per week (source:<br />

Download.com)


AVI & mp3 files<br />

• In looking at file sizes, users frequently exchange<br />

files larger than 100 MB<br />

• The largest file types are .AVI Files.<br />

• The most common files shared by P2P users are MP3<br />

files, music files encoded using MP3 technology.<br />

• Images (jpg, bmp) are also popular but take up much<br />

less space.<br />

• Sixty percent of the files on users' hard disks were<br />

MP3 files, taking up about 30% of the space.


Use of <strong>non</strong>-<strong>textual</strong> digital<br />

documents<br />

• The use of <strong>non</strong>-<strong>textual</strong> digital documents is<br />

increasing.<br />

• A multi<strong>media</strong> document may be seen as a<br />

complex in<strong>for</strong>mation object with<br />

components of different kinds such as text,<br />

images, video, animation and audio under<br />

static displays or streaming applications.


Growth of multi<strong>media</strong> content<br />

• The exponential growth of multi<strong>media</strong> content,<br />

in on-line databases, broadcast and streaming of<br />

<strong>media</strong> over the internet, demands effective<br />

access and retrieval techniques of both <strong>textual</strong><br />

and <strong>non</strong>-<strong>textual</strong> resources.<br />

• There is a need to have more sophisticated<br />

technology <strong>for</strong> modeling of multi<strong>media</strong> data<br />

under networked environment.<br />

• More user oriented methods <strong>for</strong> indexing,<br />

compressing and warehousing of multi<strong>media</strong><br />

in<strong>for</strong>mation have been identified and used.


INFORMATION STORAGE AND<br />

RETRIEVAL<br />

• <strong>In<strong>for</strong>mation</strong> storage begins with the<br />

collection of data and represent them<br />

digitally.<br />

• It may be a text or other <strong>media</strong> data.<br />

• In order to be able to access and process the<br />

data in ways that are applicable to the <strong>media</strong><br />

type, it is necessary to define structures and<br />

data types <strong>for</strong> these binary strings.


Databases<br />

• Collection of all organised data <strong>for</strong>ms a<br />

database.<br />

• Databases are stored digitally in the <strong>for</strong>m of<br />

files with appropriate file extensions.<br />

• The file extensions denote the type of digital<br />

data contained in them.<br />

• File type and file extensions are mostly used<br />

by the user to fetch the links during a search.


Search/<strong>Retrieval</strong>/data model<br />

• Underlying data model: hierarchic, network,<br />

relational, object-oriented,<br />

• Primary data-type: tabular, text, image, map/spatial,<br />

audio, video,<br />

• Content: document, record, multi<strong>media</strong>,<br />

bibliographic, statistical, geographic,<br />

• Primary DB usage: administrative, library, geographic,<br />

museum,<br />

• Architecture: centralized, distributed, homogeneous,<br />

heterogeneous, or<br />

• Access type: real-time, interactive Web.


Metadata<br />

• The word Metadata is used to define 'data about<br />

data' or 'in<strong>for</strong>mation about in<strong>for</strong>mation'.<br />

• In the other words, metadata is data that<br />

describe in<strong>for</strong>mation resources.<br />

• Metadata captures the wide range of intrinsic or<br />

extrinsic in<strong>for</strong>mation about a variety of objects.<br />

• These intrinsic or extrinsic characteristics and<br />

features are described in the individually<br />

structured data elements that facilitate object<br />

use, identification and discovery.


Metadata<br />

• is used to speed up and enrich searching <strong>for</strong><br />

resources.<br />

• search queries using metadata can save users from<br />

per<strong>for</strong>ming more complex filter operations manually.<br />

• Metadata can be divided into 3 distinct categories as<br />

• descriptive,<br />

• administrative and<br />

• structural.<br />

• Search engines are used to search the desired<br />

in<strong>for</strong>mation from these databases using search key<br />

words.


Types of Search Queries<br />

• Most of the searches fall into the following 3<br />

categories:<br />

• <strong>In<strong>for</strong>mation</strong>al - seeking static in<strong>for</strong>mation<br />

about a topic<br />

• Transactional - shopping at, downloading<br />

from, or otherwise interacting with the result<br />

• Navigational - send me to a specific URL.


Notable works<br />

• Multi<strong>media</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong> Systems<br />

• Applications of video-content analysis and<br />

retrieval<br />

• Intelligent Multi<strong>media</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong><br />

• Content-based multi<strong>media</strong> in<strong>for</strong>mation<br />

retrieval:<br />

• Content based image retrieval were some of<br />

the works done on these areas.


THE MULTIMEDIA ANALYSIS AND<br />

RETRIEVAL SYSTEM<br />

• The Multi<strong>media</strong> Analysis and <strong>Retrieval</strong> System (MARS)<br />

was developed by Sharad Mehrotra of the<br />

Department of <strong>In<strong>for</strong>mation</strong> and Computer Science,<br />

University of Cali<strong>for</strong>nia.<br />

• The goals of the MARS project are to design and<br />

develop an integrated multi<strong>media</strong> in<strong>for</strong>mation<br />

retrieval and database management infrastructure,<br />

entitled Multi<strong>media</strong> Analysis and <strong>Retrieval</strong> System<br />

(MARS), that supports multi<strong>media</strong> in<strong>for</strong>mation as firstclass<br />

objects suited <strong>for</strong> storage and retrieval based on<br />

their content.


Multi<strong>media</strong> Analysis and <strong>Retrieval</strong><br />

System<br />

• four sub-areas<br />

• Multi<strong>media</strong> Content Representation: extraction of multi<strong>media</strong> content<br />

and content-based representation of multi<strong>media</strong> objects in databases.<br />

• Multi<strong>media</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong>: content-based multi<strong>media</strong> retrieval<br />

techniques including multi<strong>media</strong> retrieval models and interactive query<br />

refinement techniques.<br />

• Multi<strong>media</strong> Feature Indexing: that overcomes the high-dimensionality<br />

and <strong>non</strong>-Euclidean nature of feature data to efficiently support retrieval<br />

based on feature similarity.<br />

• Multi<strong>media</strong> Database Management: techniques to effectively and<br />

efficiently incorporate content-based retrieval of multi<strong>media</strong><br />

in<strong>for</strong>mation into structured database processing.


<strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong> Systems (IRS)<br />

• Text-based Applications<br />

• Text-based applications were among the first<br />

computer applications, initiated by the users to<br />

analyze large quantities of text documents.<br />

• Other early applications included establishment<br />

of in<strong>for</strong>mation retrieval systems (IRS) to support<br />

management, analysis and retrieval of<br />

in<strong>for</strong>mation from digitized medical journals and<br />

legal documents, public access catalogs, on-line<br />

catalogs (OPACs), and digital libraries, insurance<br />

documents, and digitized literature.


Early text-based applications<br />

• include,<br />

• 1) literature analysis<br />

• 2) language translation, and<br />

• 3) compilation of programs.<br />

• As access to network communications has gradually<br />

increased and text-based applications have moved to<br />

the public domain, with the most familiar and<br />

commonly used services being e-mail, Web sites with<br />

tagged and hyper-linked documents. From a data<br />

management (storage and retrieval) point of view,<br />

text-data is unstructured


Ways of locating text documents<br />

• There are 2 ways of locating text documents as<br />

• 1) Directly, using a set of query terms to be<br />

matched to the terms used in the body of<br />

documents to determine relevancy to the query or<br />

• 2) Indirectly, using the metadata describing the<br />

document, as done in most cataloging systems.<br />

• In principle, there is no limit to the number of<br />

search terms that can be included in a query.


Architecture of a Search Engine<br />

• A search engine contains all data<br />

managers, parsers, spiders, refresh<br />

managers and Graphic applications.<br />

• These are the channels of<br />

communication <strong>for</strong> in<strong>for</strong>mation<br />

retrieval.


MIMD architectures<br />

• Document /data /term partitioning-logical,<br />

physical, inverted indexing, etc<br />

• Models- Probabilistic, Bayesian networks,<br />

ranking, Fuzzy logic, Semantic indexing,<br />

extended Boolean , vector model<br />

• Machine Learning-<strong>In<strong>for</strong>mation</strong> extraction,<br />

sliding windows and boundary dtection,<br />

supervised learning, Hidden Markov Models,


Situation Today<br />

• Managing IRS is an expensive affair, we need to go <strong>for</strong><br />

alternative architectures and algorithms<br />

• Parallel and Distributed Inf. <strong>Retrieval</strong><br />

• Parallel Computing is the simultaneous application of<br />

multiple processors to solve a single problem<br />

• Flynn’s Taxonomy:<br />

Flynn’s Taxonomy:<br />

• SISD- Single instruction, single data<br />

• SIMD- Single instruction , multiple data<br />

• MISD- multiple instruction, single data<br />

• MIMD-multiple instruction, multiple data


Multi<strong>media</strong> Mining Hierarchy<br />

• Multi<strong>media</strong> data – image, video, audio,<br />

animation<br />

• Data segmentation- object-oriented<br />

representation, feature extraction, additional<br />

in<strong>for</strong>mation<br />

• Pattern extraction- case(event) definition,<br />

knowledge representation, in<strong>for</strong>mation<br />

modeling<br />

• MMDB


<strong>Retrieval</strong> of Images<br />

• Images were treated as document-<br />

like –objects, considering the set of<br />

image –specific elements.<br />

• Images, movies, speeches and music<br />

were characterized as document<br />

like objects(the size, codec,<br />

presenter, other options, security)


Image-based Applications<br />

• due to computer capacity limitations, development of<br />

image-based computer applications followed the path of<br />

text-based applications.<br />

• Perhaps the first image-based computer applications were<br />

developed <strong>for</strong> weather <strong>for</strong>ecasting, though other early<br />

applications included<br />

• Map generation and manipulation,<br />

• Visual analysis of experiment data - in chemistry, physics<br />

and statistics,<br />

• Visualization of mathematical functions, and<br />

• CAD/CAM (computer aided design and manufacture) <strong>for</strong><br />

ship building, quickly followed by architecture,<br />

landscaping, interior decorating and fashion designing.


Raster and vector images<br />

• CAD/CAM or geographic maps, business<br />

graphics which are generated from database<br />

values and interactive applications which<br />

might have different content <strong>for</strong> each user<br />

are considered to be no-document –like<br />

objects.<br />

• These sources do not contain images but<br />

they generate images.


Images<br />

• Images offer a number of technological and<br />

descriptive challenges peculiar to themselves.<br />

• Textual materials can be indexed and classified<br />

even through automated methods.<br />

• But the encoding schemes are critical <strong>for</strong> using<br />

images.<br />

• Rendering images require web graphic display<br />

facilities with wide differences in display<br />

properties.


<strong>In<strong>for</strong>mation</strong> necessary <strong>for</strong><br />

rendering<br />

• are Type ( Bit – mapped , vector , video ),<br />

• Format ( TIFF, GIF, JFIF, PICT, PCD , Photoshop,<br />

CGM,TGA ),<br />

• Compression schemes & ratios ( JPEG ,LZW,<br />

QuickTime … ),<br />

• Dimensions, Dynamic range and<br />

• Color look up tables and related matrices<br />

(CYMK, RGB ….).


Characteristics of original image<br />

capture<br />

• The (scanning) including<br />

• light source (full spectrum or infrared),<br />

• resolution,<br />

• dynamic range,<br />

• type of scanners,<br />

• date of scan,<br />

• journal/ audit trails and<br />

• digital signatures <strong>for</strong> authentication.


Image retrieval elements<br />

• All these are the critical elements of<br />

metadata <strong>for</strong> a particular image or<br />

collection.<br />

• Several element names were used<br />

to make the image description less<br />

text -centric.


The elements used <strong>for</strong> image<br />

representation are:<br />

• Subject (keywords, controlled vocabulary<br />

terms & <strong>for</strong>mal classification designators),<br />

• Description ( descriptive prose or content<br />

description) and<br />

• Rights management<br />

Rights management field ( Null , No<br />

restrictions on Reuse, URL or other pointers)


Image <strong>Retrieval</strong><br />

• Popular knowledge claims that an image is worth<br />

1000 words. Un<strong>for</strong>tunately, these 1000 words may<br />

differ from one individual to another depending on<br />

their perspective and/or knowledge of the image<br />

context.<br />

• The problem is fundamentally one of communication<br />

between an in<strong>for</strong>mation/image seeker/user and the<br />

image retrieval system.<br />

• Since the user may have differing needs and<br />

knowledge about the image collection, an image<br />

retrieval system must support various <strong>for</strong>ms <strong>for</strong><br />

query <strong>for</strong>mulation.


Image retrieval queries<br />

• can be classified as:<br />

• Attribute-based queries, which use context and/<br />

structural metadata values to retrieve images.<br />

• Textual queries, which use a term-based<br />

specification of the desired images that can be<br />

matched to <strong>textual</strong> image descriptors.<br />

• Visual queries, which give visual characteristics<br />

(color, texture) or an image example, that can be<br />

compared to visual descriptors.


Query types utilize different<br />

image descriptors<br />

image descriptors<br />

• Image descriptors can be classified into:<br />

• 1) Metadata descriptors, those that describe the image, as<br />

recommended in the numerous metadata standards.<br />

• These metadata can again be classified as:<br />

• 1) Attribute-based context and structural metadata, such<br />

as creator, dates, genre, (source) image type, size, file<br />

name, ..., or<br />

• 2) Text-based semantic metadata, such as a) title/caption,<br />

subject/keyword lists, free-text descriptions and/or the<br />

text surrounding embedded images, <strong>for</strong> example as used in<br />

a html document.<br />

• b) Visual descriptors that can be extracted from the image<br />

during the storage process by an image retrieval system.


Text-based image retrieval :<br />

• Often, the image requester is able to give a<br />

verbal description/specification of the<br />

content of the required images.<br />

• These text-based queries can be <strong>for</strong>mulated<br />

as free-text or a term list that can be<br />

compared to such text descriptors as<br />

description, subjects, title and/or the text<br />

surrounding an embedded image, , using the<br />

text retrieval techniques.


Query by Image <strong>Retrieval</strong><br />

(Content Based IR)<br />

• the term 'image content' refers to<br />

the implementation content of the<br />

image, i.e. to its pixel content.<br />

image, i.e. to its pixel content.<br />

• The objective of CBIR research and<br />

development activities is to develop<br />

automated routines that can analyze a digital<br />

image or image stream and identify the<br />

objects, actions and events that it portrays.


• Currently, CBIR routines are relatively adept at<br />

identifying color and texture distributions, as<br />

well as primitive shapes.<br />

• This in<strong>for</strong>mation is the basis <strong>for</strong> specification<br />

of one or more image signatures that can act<br />

as a surrogate <strong>for</strong> the image and that can be<br />

indexed to provide rapid access to elements in<br />

the image collection.


Using content descriptors:<br />

• color and texture - An image is basically a<br />

long string of pixels <strong>for</strong> which each pixel is<br />

identified by its place in the image matrix, its<br />

color and its intensity.<br />

• An analysis of the pixel set can give<br />

in<strong>for</strong>mation about the distribution of<br />

dominant colors, the image texture, and the<br />

shapes <strong>for</strong>med by marked change in<br />

neighbouring colors.


Basic color profile<br />

• Though there are many techniques used <strong>for</strong> color<br />

and texture extraction, they are variations<br />

and/or refinements of a basic color profile<br />

• An image query is given as an example or seed<br />

image or sketch <strong>for</strong> which a feature vector is<br />

calculated in the same way as that used <strong>for</strong> the<br />

DB image set.<br />

• The query signature is then compared to the DB<br />

image signatures, using a distance measure.<br />


Image analysis<br />

• Identifying shapes - image objects:<br />

• Shape identification and recognition is the most<br />

difficult challenge to image analysis since it relies on:<br />

• Isolation of the different objects/shapes within the<br />

image, which may not necessarily be whole or<br />

standardized, but are likely 'hidden' in the perspectives<br />

of the image and then<br />

• Normalizing the object's size and rotation<br />

• Identification and possible connection of object parts.<br />

• Semantic identification of the image components/<br />

objects.


Automatic object recognition<br />

• To date, automatic object recognition has only<br />

been accomplished <strong>for</strong> well-defined domains<br />

where objects of interest are well known and<br />

well defined within the image, such as:<br />

• Police images of faces and fingerprints,<br />

• Medical images from domain x-rays, MRI and<br />

scans, and<br />

• Industrial surveillance of building structures,<br />

such as bridges, tunnels or pipelines.


Visual query requires a query<br />

language<br />

• An object-based visual query requires a query<br />

language that can accept a visual<br />

object/image as an example.<br />

• A shape thesaurus can be developed <strong>for</strong><br />

image collections from specific domains that<br />

can be used in a way similar to that done <strong>for</strong><br />

the terms in text document collections.


Automatic Linguistic Indexing of<br />

Pictures<br />

• Categorized images are used to train a dictionary of hundreds<br />

of statistical models each representing a concept.<br />

• Images of any given concept are regarded as instances of a<br />

stochastic process that characterizes the concept.<br />

• To measure the extent of association between an image and<br />

the <strong>textual</strong> description of a concept, the likelihood of the<br />

occurrence of the image based on the characterizing<br />

stochastic process is computed.<br />

• A high likelihood indicates a strong association.<br />

• Two-dimensional multi-resolution hidden Markov models<br />

(2D MHMMs).


Video <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong> (VIR)<br />

• Recognition technologies<br />

– Image<br />

– Voice<br />

– Text transcripts<br />

• Document retrieval technologies<br />

– Topic segmentation<br />

– Topic matching<br />

– Text summarization<br />

• Presentation Technologies<br />

– Combine Recognition and retrieval<br />

technologies<br />

• Result is an integrated application


Streamed <strong>media</strong> applications:<br />

Source data + playback software<br />

• Streaming <strong>media</strong> is multi<strong>media</strong> that is constantly<br />

received by, and normally presented to, an end-user<br />

while it is being delivered by a streaming provider<br />

(the term "presented" is used <strong>for</strong> denoting audio or<br />

video playback).<br />

• The name refers to the delivery method of the<br />

medium rather than to the medium itself.<br />

• Media delivery systems are either<br />

• inherently streaming (e.g. radio, television) or<br />

• inherently <strong>non</strong>-streaming (e.g. books, video cassettes,<br />

audio CDs).


Streamed <strong>media</strong>/dynamic <strong>media</strong><br />

• is composed of a series of <strong>media</strong> objects that<br />

have a time relationship that is necessary <strong>for</strong><br />

proper communication.<br />

• Audio, film (a series of images) and video (a<br />

combination of audio and film) data belong to<br />

this class.


Streamed-data applications<br />

include<br />

• Speech analysis, Speech command of automated<br />

artifacts: robots, equipment <strong>for</strong> the handicapped,<br />

answering/ordering systems,<br />

• Music archives, search and presentation,<br />

Surveillance and film analysis,<br />

• Animation <strong>for</strong> stories, games, simulations,<br />

• Film archives, search and presentation and Video on<br />

Demand.<br />

• Webcasting, podcasting and on-line radio facilities<br />

are coming under this in<strong>for</strong>mation retrieval.


Streamed data representation<br />

• as image data, have some metadata attributes:<br />

• Context descriptors: artist name(s), publisher,<br />

publication date, ...<br />

• Content descriptors: title, genre, keywords,<br />

description and <strong>for</strong> music: mood, musical theme,<br />

melody, tempo, rhythm and instrumentation,<br />

and<br />

• Structural description, such as presentation<br />

length, storage size, storage/presentation<br />

<strong>for</strong>mat.


Search requests <strong>for</strong> streamed data<br />

• from a <strong>media</strong> collection can be expressed in 3<br />

ways as<br />

• An exact-match query giving data values to be<br />

compared to the regular metadata<br />

attributes(Retrieve all films where the<br />

artist_name='xxx ccccc ' )<br />

• A text-based query using the content descriptive<br />

metadata( Retrieve Hindi devotional songs) and<br />

• An example-based query using a Video or audio<br />

clip or a Whistled tune such as offered at<br />

Musipedia.


Stand-alone Internet radio devices<br />

• are offering listeners a "no-computer" option <strong>for</strong><br />

listening to audio streams.<br />

• In general, multi<strong>media</strong> content is large, so <strong>media</strong><br />

storage and transmission costs are still significant; to<br />

offset this somewhat, <strong>media</strong> is generally compressed<br />

<strong>for</strong> both storage and streaming.<br />

• A <strong>media</strong> stream can be on demand or live.<br />

• On demand streams are stored on a server <strong>for</strong> a long<br />

period of time, and are available to be transmitted at a<br />

user's request.<br />

• Live streams are also available at any time , as in a<br />

video stream of a live sporting event or temple pooja.


Web Server vs. Streaming Server:<br />

CEC Model + ISRO Edusat Model<br />

• There are two major methods of delivering<br />

streaming audio and video content over the<br />

Web.<br />

• The first method uses a standard Web server<br />

(E-content+ LOR) to deliver the audio and<br />

video data to a <strong>media</strong> player. The second<br />

method uses a separate streaming <strong>media</strong><br />

server (Edusat) specialized to the audio/video<br />

streaming task.


Programmes are deployable<br />

through a campus LAN


Watch Live<br />

programmes


Streaming with a Web Server<br />

• Deploying streaming <strong>media</strong> content with the Web<br />

server approach is actually only a small<br />

evolutionary step away from the download-andplay<br />

model.<br />

• Uncompressed audio and video is first compressed<br />

into a single "<strong>media</strong> file" <strong>for</strong> delivery over a specific<br />

network bandwidth such as a 28.8 kilobits per<br />

second (Kbps) modem.


• This <strong>media</strong> file is then placed on a standard Web<br />

server.<br />

• Next, a Web page containing the <strong>media</strong> file's URL is<br />

created and placed on the same Web server.<br />

• This Web page, when activated, launches the clientside<br />

player and downloads the <strong>media</strong> file.<br />

• So far, the actions are identical to those in the<br />

download-and-play case.<br />

• The difference lies in how the client functions.


Streaming with a Streaming Media<br />

Server<br />

• In the streaming <strong>media</strong> server approach, the<br />

initial steps are similar to the Web server<br />

approach, except that the compressed <strong>media</strong><br />

file is produced and copied to a specialized<br />

streaming <strong>media</strong> server (such as Microsoft<br />

Windows Media Services) instead of a Web<br />

server.<br />

• Then a Web page with a reference to the <strong>media</strong><br />

file is placed on a Web server.<br />

• Windows Media Services and the Web server<br />

may run on the same computer.


Data Delivery<br />

• The rest of the streaming <strong>media</strong> server delivery<br />

process differs significantly from the Web server<br />

approach.<br />

• In contrast to the passive burst methodology<br />

employed in Web server streaming, the data is<br />

actively and intelligently sent to the client, meaning<br />

that it delivers the content at the exact data rate<br />

associated with the compressed audio and video<br />

streams.<br />

• The server and the client stay in close touch during<br />

the delivery process, and the streaming <strong>media</strong> server<br />

can respond to any feedback from the client.


Streaming bandwidth and storage<br />

• Streaming <strong>media</strong> storage size (in the<br />

common file system measurements<br />

megabytes, gigabytes, terabytes, and so on)<br />

is calculated from streaming bandwidth and<br />

length of the <strong>media</strong> with the following<br />

<strong>for</strong>mula (<strong>for</strong> a single user and file):<br />

• storage size (in megabytes) = length (in<br />

seconds) * bit rate (in kbit/s) / (8 * 1024)


• For example,<br />

• One hour of video encoded at 300 kbit/s will be:<br />

(3,600 s * 300,000 bit/s) / (8*1024*1024) give<br />

around 128 MB of storage.<br />

• If the file is stored on a server <strong>for</strong> on-demand<br />

streaming and this stream is viewed by 1,000<br />

people at the same time using a Unicast protocol,<br />

you would need: 300 kbit/s * 1,000 = 300,000 kbit/s<br />

= 300 Mbit/s of bandwidth.<br />

• This is equivalent to around 125 GiB per hour.


• Of course, using a Multicast protocol the<br />

server sends out only a single stream that is<br />

common to all users.<br />

• Hence, such a stream would only use 300<br />

kbit/s of serving bandwidth.


Protocol issues<br />

• five major types of protocols used in multi<strong>media</strong><br />

usage in internet.<br />

• Datagram protocols, such as the User Datagram<br />

Protocol (UDP), send the <strong>media</strong> stream as a<br />

series of small packets. ( Edusat streaming<br />

through VLC)<br />

• This is simple and efficient<br />

• It is up to the receiving application to detect loss<br />

or corruption and recover data using error<br />

correction techniques. If data is lost, the stream<br />

may suffer a dropout.


• The Real-time Streaming Protocol (RTSP),<br />

Real-time Transport Protocol (RTP) and the<br />

Real-time Transport Control Protocol (RTCP)-<br />

all were specifically designed to stream<br />

<strong>media</strong> over networks.<br />

• The latter two are built on top of UDP.<br />

• Reliable protocols, such as the Transmission<br />

Control Protocol (TCP), guarantee correct<br />

delivery of each bit in the <strong>media</strong> stream.


Unicast protocols<br />

• send a separate copy of the <strong>media</strong> stream from the<br />

server to each recipient.<br />

• Multicasting broadcasts the same copy of the<br />

multi<strong>media</strong> over the entire network to all clients.<br />

• Multicast protocols were developed to reduce the<br />

data replication (and consequent server/network<br />

loads) that occur when many recipients receive<br />

Unicast content streams independently.<br />

• These protocols send a single stream from the<br />

source to a group of recipients.


Continuous streaming of radio or<br />

television<br />

• Continuous streaming of radio or television<br />

material usually precludes the recipient's<br />

ability to control playback.<br />

• IP Multicast provides a means to send a single<br />

<strong>media</strong> stream to a group of recipients on a<br />

computer network.<br />

• Peer-to-peer (P2P) protocols arrange <strong>for</strong> prerecorded<br />

streams to be sent between<br />

computers.


Benefits of protocols<br />

• UDP – provides the most efficient network<br />

throughput and can have a very positive impact<br />

on the user (player) experience.<br />

• The only downside to UDP is that many network<br />

administrators close their firewalls to UDP traffic,<br />

limiting the potential audience of UDP-based<br />

streams<br />

• TCP –provides an adequate, though not<br />

necessarily efficient, protocol <strong>for</strong> delivering<br />

streaming <strong>media</strong> content from a server to a<br />

client.


VLC – <strong>for</strong> open source streaming<br />

• Direct draw<br />

• Directshow<br />

• Four protocols<br />

• Multiple <strong>for</strong>mats, devices, methods<br />

• Single stream-output- suitable <strong>for</strong><br />

webcasting<br />

• Videoconferencing, live stream recording


Vlc <strong>media</strong> player options


Vlc<br />

• The network on which you setup the VideoLAN<br />

solution can be as small as one ethernet 10/100Mb<br />

switch or hub, and as big as the whole Internet! The<br />

bandwidth needed is:<br />

• 0.5 to 4 Mbit/s <strong>for</strong> an MPEG-4 stream,<br />

• 3 to 4 Mbit/s <strong>for</strong> an MPEG-2 stream read from a<br />

satellite card, a digital terrestial television card or<br />

an MPEG-2 encoding card,<br />

• 6 to 9 Mbit/s <strong>for</strong> a DVD.


Inputs<br />

VLC features page<br />

UDP Unicast /<br />

Multicast<br />

Yes Yes Yes unicast only Yes Yes<br />

Output<br />

RTP Unicast /<br />

Multicast<br />

Yes Yes Yes unicast only Yes Yes<br />

File Yes Yes Yes Yes Yes Yes<br />

HTTP Yes Yes Yes Yes Yes Yes<br />

MMSH Yes Yes Yes Yes Yes Yes<br />

Transcoding Yes Yes Yes Yes Yes Yes<br />

Misc<br />

Interfaces and<br />

more<br />

Send DVD<br />

subtitles<br />

Send SAP<br />

announces<br />

Partial Partial Partial Partial Partial No<br />

Yes Yes Yes Untested Yes Untested<br />

See the VLC features page


The VideoLAN streaming solution<br />

includes:<br />

• VLC <strong>media</strong> player (initially VideoLAN Client), which can be<br />

used as a server to stream MPEG-1, MPEG-2 and MPEG-4 /<br />

DivX files, DVDs and live videos on the network in unicast or<br />

multicast; or used as a client to receive, decode and display<br />

MPEG streams under multiple operating systems,<br />

• VLS (VideoLAN Server), which can stream MPEG-1, MPEG-2<br />

and MPEG-4 files, DVDs, digital satellite channels, digital<br />

terrestial television channels and live videos on the network<br />

in unicast or multicast. Most of the VLS functionality can<br />

now be found in the much better VLC program. Usage of VLC<br />

instead of VLS is advised.


Web 2.0 Ogg bitstream <strong>for</strong>mat<br />

• Free, open std container <strong>for</strong>mat<br />

• Designed to provide efficient streaming and<br />

manipulation of high quality digital<br />

multi<strong>media</strong><br />

• It can multiplex a no of separate independent<br />

free and open source codec <strong>for</strong> audio, video,<br />

text, etc.<br />

• Theora, vorbis,speex, flac, oggPCM,


Streaming Media Services<br />

• Numerous subscription based streaming<br />

<strong>media</strong> services began to appear on the web.<br />

• These services provided a low cost, low<br />

barrier to entry model that allowed small<br />

businesses and internet marketers to place<br />

streaming audio and streaming video on their<br />

websites.<br />

• On more addition was podcasting.


Internet television:<br />

(ERNET Model)<br />

• allows viewers to choose the show they want to<br />

watch from a library of shows.<br />

• The primary models <strong>for</strong> Internet television are<br />

streaming Internet TV or selectable video on an<br />

Internet location, typically a website.<br />

• The video can also be broadcast with a peer-to-<br />

peer network(P2PTV),which doesn't rely on<br />

single website's streaming.<br />

• It differs from IPTV in that IPTV offerings are<br />

typically offered on discrete service provider<br />

networks, requiring a special IPTV set-top-box.


• Internet TV is a quick-to-market and relatively<br />

low investment service.<br />

• Internet TV rides on existing infrastructure<br />

including broadband, ADSL, Wi-Fi, cable and<br />

satellite which makes it a valuable tool <strong>for</strong> a<br />

wide variety of service providers and content<br />

owners looking <strong>for</strong> new revenue streams.<br />

• CEC-ERNET collaboration model shows-<br />

Gyan-Vyas on internet.<br />

• ( Shirdi Sai Baba Sansthan and other religious<br />

websites)


P2PTV<br />

• The term P2PTV refers to peer-to-peer (P2P)<br />

software applications designed to redistribute<br />

video streams in real time on a P2P network;<br />

• the distributed video streams are typically TV<br />

channels from all over the world but may also<br />

come from other sources.<br />

• The draw to these applications is significant<br />

because they have the potential to make any<br />

TV channel globally available.


Peercast<br />

• A Peercast can be used to multicast streaming<br />

audio (MP3, WMA) and/or video (Nullsoft<br />

Streaming Video, or WMV), or any other<br />

stream of data, over the internet.<br />

• Peercast uses a distributed bandwidth<br />

technique to lighten the load of the<br />

broadcaster's upstream bandwidth where<br />

each listener/viewer will relay the stream they<br />

download to one or more additional listeners


• The benefits of using PeerCast is that it allows<br />

any multicasters, particularly small or<br />

independent ones, to distribute their streams<br />

without need <strong>for</strong> much bandwidth, saving<br />

them costs.<br />

• It also allows, theoretically, an infinite number<br />

of listeners as long as there are enough relays


Unreal Media Server<br />

• It is a proprietary streaming server <strong>for</strong> Windows<br />

plat<strong>for</strong>ms, designed to provide multi<strong>media</strong><br />

delivery over LAN and Internet.<br />

• It supports <strong>media</strong> files and live <strong>media</strong> streams.<br />

• Supported file <strong>for</strong>mats: AVI (DivX, XVid, VP6, any<br />

other codecs), MPEG-1/2/4, WMV/WMA, MP3,<br />

ASF, QuickTime.<br />

• Playlist functionality allows automatically playing<br />

all the files of server's virtual folder in a loop<br />

mode.


• Supported live <strong>media</strong> sources include: digital cameras,<br />

microphones, TV-tuners, analog video sources connected to<br />

video card or to Video capture card that supports<br />

DirectShow interface.<br />

• Live audio/video is encoded with WMA/MP3/GSM 6.10 -<br />

WMV/MPEG4 codecs in real time.<br />

• Hardware encoder appliances are supported <strong>for</strong> streaming<br />

hardware compressed content without software<br />

transcoding.<br />

• TCP, HTTP and MMS unicast, RTP multicast transport<br />

protocols are supported.<br />

• For HTTP delivery IIS web server is required on the server<br />

computer.


Unreal Media Server includes 3 major<br />

components<br />

• Streaming video recording : Streaming video recorders can<br />

record not only video streams but also many audio streams so sometimes they are<br />

also called streaming <strong>media</strong> recorders. There are different approaches that are<br />

used by the software to make the recording, depending on which stage of the<br />

process one taps into. In order, they are:<br />

• URL Snooping: Simplest is if the stream is served by simply requesting<br />

it, just as web pages are, as in an HTTP GET request: this will directly copy the<br />

encoded, streamed file. In this case, one simply needs to determine the URL, and<br />

then download that, either by pasting it into one's web browser (location box or<br />

"Open location..."), or via a specialized download manager.<br />

• Encoded capture: Some streaming is not via a simple request to an<br />

URL – in this case, to capture the stream requires some understanding and<br />

implementation of the particular streaming protocol (the encoded <strong>media</strong> stream is<br />

encapsulated within a network stream), either: passively / offline :- capturing the<br />

actual traffic and extracting it (via deep packet capture, using a packet sniffer), or<br />

actively / online :-implementing the streaming protocol / program enough to<br />

request the encoded data.


Decoded capture<br />

• An approach used to record the decoded in<strong>for</strong>mation<br />

at the end level- from the video and sound card of the<br />

computer.<br />

• This is essentially capturing what you are watching or<br />

listening to directly from the screen, and could be<br />

likened to recording off the air with a microphone.<br />

• This solution makes it possible to record anything that<br />

you are able to view or listen to, regardless of original<br />

<strong>for</strong>mat or protection, though it suffers from a loss in<br />

quality (digital generation loss) due to re-encoding.


On-going Research<br />

• Multi-modal in<strong>for</strong>mation retrieval from Broadcast Video<br />

using OCR and Speech recognition.<br />

• multi-modal in<strong>for</strong>mation retrieval from broadcast video<br />

where text can be read on the screen through OCR and<br />

speech recognition can be per<strong>for</strong>med on the audio track.<br />

• OCR and speech recognition are compared on the 2001 TREC<br />

Video <strong>Retrieval</strong> evaluation corpus.<br />

• Results show that OCR is more important than speech<br />

recognition <strong>for</strong> video retrieval.<br />

• OCR retrieval can further improve through dictionary-based<br />

post-processing.


• International Institute of <strong>In<strong>for</strong>mation</strong><br />

Technology, Hyderabad, India have worked on<br />

“A System <strong>for</strong> <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong><br />

Applications on Broadcast News Videos.<br />

• This system is specifically designed <strong>for</strong> the<br />

in<strong>for</strong>mation retrieval applications on broadcast<br />

news videos.<br />

• The system is directly useful to an end user <strong>for</strong><br />

easy access to the news stories of interest.<br />

• It also act as a plat<strong>for</strong>m <strong>for</strong> convenient<br />

deployment and experimentation of various<br />

video analysis and indexing techniques on real<br />

data, and on a large scale.


AUDIO INFORMATION RETRIEVAL<br />

• The problem of audio in<strong>for</strong>mation retrieval is<br />

familiar to anyone who has returned from<br />

vacation to an answering machine which may<br />

contain full of messages.<br />

• recent advances in automatic speech<br />

recognition, word spotting, speaker and music<br />

identification, and audio similarity have been<br />

explained.


Music <strong>In<strong>for</strong>mation</strong> <strong>Retrieval</strong><br />

• to determine the similarity between two given<br />

melodies;<br />

• there are several melodic similarity measures<br />

• Audio Segment <strong>Retrieval</strong> Using a Synthesized<br />

HMM was attempted<br />

• The approach allows a user to query audio data<br />

of any length by one or more example audio<br />

segments and find similar segments.<br />

• Audio identification using sinusoidal modeling<br />

and application to jingle detection


Signal + Context = Better<br />

Classification<br />

• Typical signal-based approaches to extract<br />

musical descriptions from audio only have limited<br />

precision.<br />

• architecture to train a large set of binary<br />

classifiers simultaneously, <strong>for</strong> many different<br />

musical metadata (genre, instrument, mood,<br />

etc.), in such a way that correlation between<br />

metadata is used to rein<strong>for</strong>ce each individual<br />

classifier.<br />

• The system is iterative


Content-based music retrieval<br />

using query integration<br />

• For users whose preferences range in songs with<br />

a wide variety of features.<br />

• MIR method dynamically generates an optimal<br />

set of query vectors from the sample set of songs<br />

submitted by the user to express their<br />

preferences, based on the similarity of the songs<br />

in the sample set.<br />

• Experiments conducted on a music collection<br />

with subjective user ratings were very effective<br />

to improve the accuracy of content based MIR.


Issues to be considered: Flash


Flash video + text + command<br />

buttons


Total e-content in a package<br />

<strong>for</strong>mat


E-content deployment<br />

• Full content coverage of subjects, deployment<br />

architecture, servers (at least 5, database,<br />

webservice, message, backup, intrusion<br />

detection, <strong>media</strong> server), tools and<br />

customization at three levels( content,<br />

localhost , webserver level)<br />

• Distributed network architecture(with<br />

EMMRcs as Hubs)- replication of database<br />

servers


Other issues<br />

• Gateway<br />

• Bandwidth<br />

• Load-balancing<br />

• End user facilitation<br />

• Application environment<br />

• Integrated system


LOR: Front page- Blocked content


Allowing the blocked <strong>media</strong>


Radio Broadcast<br />

TV Broadcast<br />

Videoconference<br />

Night Time<br />

Loading<br />

Asymmetric<br />

Internet<br />

Benefits<br />

On-line<br />

Education<br />

Telephone<br />

Voice Chat<br />

on Internet<br />

WebCAM<br />

Talkback<br />

Internet


Technological Convergence<br />

• Computer professionals<br />

• <strong>In<strong>for</strong>mation</strong> scientists<br />

• Media <strong>Prof</strong>essionals<br />

• User limitations<br />

• Deliverer’s preferences<br />

• Operational issues at local and global levels<br />

• Co-ordinated approach


Thank you

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!