objects

iti.gr

objects

Multimedia content

indexing and retrieval

Yiannis Kompatsiaris

ikom@iti.gr

http://egnatia.ee.auth.gr/~ikom

Informatics and Telematics Institute

Centre for Research and Technology Hellas

GREECE

Director Prof. Michael Strintzis


Overview

• Introduction

• Multimedia content indexing and retrieval

• Ontologies - context

• Still image segmentation

• Video segmentation

• Relevance feedback

• Indexing

• Content-based search engines

• Conclusions

Informatics and Telematics Institute

Centre for Research and Technology Hellas

2


Introduction

• 1-2 exabytes (millions of terabytes)

of new information produced

world-wide annually

• 80 billion of digital images are

captured each year

• Over 1 billion images related to

commercial transactions are

available through the Internet

• This number is estimated to

increase by ten times in the next

two years.

• 4 000 new films are produced each

year

• 300 000 world-wide available films

• 33 000 television stations and 43

000 radio stations

• 100 billions of hours of audiovisual

content

Web- Mobile

Personal Content

Sport - News

Informatics and Telematics Institute

Centre for Research and Technology Hellas

Movies

3


Introduction

“The value of information depends on

how easily can be found, retrieved,

accessed, filtered or managed”

• Multimedia (content-based) indexing and

retrieval is needed that understands the

semantics of the content and associate it

with the actual meaning of the user’s query

Informatics and Telematics Institute

Centre for Research and Technology Hellas

4


Applications

• Content production and consumption

• Organize and share personal content

• (Multimedia) Semantic web

(multimedia search engines,

directories, e-commerce)

• Multimedia documents

• Medicine

• Content filtering

• Face detection

• Transcoding

• Augmented reality

• The only “content-based” customer of

FAST Norway is dealing with

“inappropriate” content

Informatics and Telematics Institute

Centre for Research and Technology Hellas

5


Approaches

• Manual text & caption

based annotation

• + Straightforward

• + High-level

• + Efficient during content

creation

• Most commonly used

• - Time consuming

• - Operator-application

dependent

• - captions must exist

Informatics and Telematics Institute

Centre for Research and Technology Hellas

6


Approaches

• Low-level features

(color, texture,

shape, edges,

motion, etc)

• + automatic

• + computation

• Suitable for many

applications

• - low-level

• - irrelevant results

• - “visual” input is

needed

• representation

• features

• color, texture space

• invariance

• compactness

• indexing (MPEG-7)

• database

• matching – distance

• global – local features

(segmentation)

Informatics and Telematics Institute

Centre for Research and Technology Hellas

7


Approaches

• Semantic annotation

of content

• + High-level

• + Allows natural queries

• A-priori knowledge is

usually needed

• - Domain specific

• -Computation

• - (semi) automatic

“I want video clips of the

Greek football national

team containing goals”

Informatics and Telematics Institute

Centre for Research and Technology Hellas

8


Semantic annotation

• LL features are analyzed to recognize

objects and events

• Specific domains (e.g. sports, news)

• A multimodal approach is usually followed

(e.g. audio-assisted video analysis: goal

detection)

• Object/Events/Relationships models or

knowledge about models are needed a-

priori

• Techniques for formally representation of

knowledge → knowledge base

• Matching techniques

Informatics and Telematics Institute

Centre for Research and Technology Hellas

9


Semantic annotation

High

knowledge-based

event modeling

Level

of

abstraction

Iconic based Grouping

and Browsing

Motion –

Trajectory analysis

Low

Video Parsing

and Segmentation

Sp./Temp. Object

Detection

Coarse Granularity of data processing

Fine

Informatics and Telematics Institute

Centre for Research and Technology Hellas

10


Semantic annotation example

Original image

Image analysis

(edges) to locate

faces

Constrained graph

from NLP

Annotated image

PICTION

System

Informatics and Telematics Institute

Centre for Research and Technology Hellas

11


Ontologies - Context

• Ontologies primarily used for text applications

• Knowledge structure for specific domains (use of context to define the

domain)

• Knowledge-assisted analysis

• Enrich ontologies with multimedia features

Sports

Football Basketball …

Context

analysis

Player

Colors (LL features)

Content

analysis


Player A

Player B

Ball

Field


object

recognition

Ball

Informatics and Telematics Institute

Centre for Research and Technology Hellas

12


Semantic annotation

Text

User

High-level features (keywords)

Statistical

analysis

Ontologies

Intermediate-level features

Relevance

Feedback

Content

Analysis

Algorithms

Low-level features

MPEG-7

Informatics and Telematics Institute

Centre for Research and Technology Hellas

13


Semantic annotation

high-level descriptors

(keywords)

object ontology

intermediate-level

descriptors and

descriptor values

{black,

white,

red, &

green,

yellow,

blue,

brown,

pink,

orange,

purple,

gray}

dominant color

{most

dominant,

second most

dominant,

third most

dominant}

shape

{slightly oblong,

moderately oblong,

very oblong}

{low,

medium,

high}

foreground motion

x direction x speed y direction y speed

{right-->left,

left-->right}

{low-->high,

high-->low}

foreground/background

{low,

medium,

high}

vertical axis

{high,

middle,

low}

background position

horizontal axis

{left,

middle,

right}

low-level MPEG-7

descriptors

dominant color set in

Luv color space,

DCOL={(L i ,u i , v i , p i)}

eccentricity

value of

contour-based

shape descriptor

motion trajectory MT I ={(x i ,y i ,t i )}

using "Integrated" coordinates

motion trajectory MT L ={(x i ,y i ,t i )}

using "Local" coordinates

Informatics and Telematics Institute

Centre for Research and Technology Hellas

14


Still image segmentation

• Object-based approach for content

indexing and retrieval

• Descriptors are estimated for each

object

• Unsupervised segmentation

• An internal representation is available

• The user can search for a specific

object contained in images

Informatics and Telematics Institute

Centre for Research and Technology Hellas

15


Still image segmentation

• Stage 1. Extraction of the colour and texture

feature vectors corresponding to each pixel

• Stage 2. Estimation of the initial number of

regions and their spatial, intensity and

texture centers

• Stage 3. Conditional filtering using a moving

average filter

• Stage 4. Final classification of the pixels,

using the KMCC algorithm

Informatics and Telematics Institute

Centre for Research and Technology Hellas

16


Still image segmentation

Normalization

s

s M

D( p,

sk

) = J(

p)

− J ( sk

) + λ 1

T(

p)

− T ( sk

) + λ2

p − S(

sk

M

k

)

Conditional filtering of:

I

s

[ ( ) ( ) ( )]

s

s

s

I s I s I s

( sk

) =

L k α k b k

Texture: Discrete

Wavelet Frames

- spatial distance

-region size

-average size

Informatics and Telematics Institute

Centre for Research and Technology Hellas

17


Experimental Results

• 1: Original

image

• 2: Blobworld

• 3: Simple KM

• 4: KMCC with

texture

Informatics and Telematics Institute

Centre for Research and Technology Hellas

18


Large-Format Image

Segmentation Framework

• partial pixel

reclassification

using a Bayes

classifier, to

improve the

perceptual

quality

R

f

f

R

Layer 2: Image used for

initial clustering.

( M • N )

Layer 1: Reduced image

on which the segmentation

algorithm is applied.

( M' • N' )

Layer 0: Original Image

( y max •x max )

Informatics and Telematics Institute

Centre for Research and Technology Hellas

19


Experimental Results

• 1: Original

image:

730x490

• 2: Direct

application

(2494.28’’)

• 3: Reduced

image

(18.92’’)

• 4: Proposed

framework

(47.55’’)

1 2

3 4

Informatics and Telematics Institute

Centre for Research and Technology Hellas

20


Experimental Results

• 1: Original

image:

730x490

• 2: Direct

application

(2494.28’’)

• 3: Reduced

image

(18.92’’)

• 4: Proposed

framework

(47.55’’)

1 2

3 4

Informatics and Telematics Institute

Centre for Research and Technology Hellas

21


Image sequence

segmentation

• Real-time compressed-domain segmentation

• Moving object segmentation and tracking

• motion vectors from P frames, average motion for

I frames

• Background segmentation

• DC coefficients of the macroblocks, corresponding

to the Y, Cb and Cr

• Features estimated for each spatiotemporal

object

• Pixel-domain boundary refinement

Informatics and Telematics Institute

Centre for Research and Technology Hellas

22


Image sequence

segmentation

compressed-domain

sequence

shot

detection

Moving object segmentation and tracking

R t-i F , i=T,...,1

R t-1

TR

information

extraction

video shot

iterative

rejection

R t

IR

macroblocklevel

tracking

R t

TR

object

formation

object

ontology

feature

mapping

feature

extraction

R t

F

R t

O

background

segmentation

intermediate-level

descriptors

MPEG-7 low-level

descriptors

final segmentation

mask

R t

F

Informatics and Telematics Institute

Centre for Research and Technology Hellas

23


Moving object

segmentation and tracking

Iterative macroblock

rejection, to detect

macroblocks possibly

belonging to foreground

objects

Macroblock-level

tracking; temporal

consistency of the

output of iterative

rejection

Clustering of foreground

macroblocks to connected

regions and assignment to

foreground spatiotemporal

objects

Informatics and Telematics Institute

Centre for Research and Technology Hellas

24


Background segmentation

• Number of background objects is

determined using the maximin

algorithm and DC coefficients

• In I-frames, macroblock clustering

using K-Means algorithm and DC

coefficients

• In P-frames, tracking of background

objects using macroblock motion

vectors

Informatics and Telematics Institute

Centre for Research and Technology Hellas

25


Pixel-domain boundary refinement

• Creation of pixel-accuracy segmentation

masks using a Bayes classifier

• Full decompression of the frame is needed

Informatics and Telematics Institute

Centre for Research and Technology Hellas

26


Experimental Results

Informatics and Telematics Institute

Centre for Research and Technology Hellas

27


Experimental Results

Informatics and Telematics Institute

Centre for Research and Technology Hellas

28


Experimental Results

Informatics and Telematics Institute

Centre for Research and Technology Hellas

29


Experimental Results

Informatics and Telematics Institute

Centre for Research and Technology Hellas

30


Indexing Information Extraction

• MPEG-7 descriptors:

• Motion Activity

• Dominant Color

• GoF/GoP Color

• Contour Shape

• Motion Trajectory using “Local”

Coordinates

• Motion Trajectory using “Integrated”

Coordinates

Informatics and Telematics Institute

Centre for Research and Technology Hellas

31


MPEG-7 XM

• A description

database is

built from a

media

database

• MPEG-7

descriptors

extracted for

each

spatiotemporal

object

Non normative

Informatics and Telematics Institute

Centre for Research and Technology Hellas

32


MPEG-7 XM

• Compute

distances

between

descriptions

• indexing

and

retrieval

• transcoding

Non normative

Informatics and Telematics Institute

Centre for Research and Technology Hellas

33


Video Indexing Scheme

video

segmentation and

feature extraction

keywords

representing

semantic objects

video

storage

Qualitative

object

description

low-level to

intermediate-level

descriptor mapping

object

database

object

ontology

keyword

database

Qualitative

keyword

description

system

supervisor/user

Informatics and Telematics Institute

Centre for Research and Technology Hellas

34


Video Retrieval Process

keyword intermediate-level

descriptor values, if not in database

query

keyword

database

intermediate-level

descriptor values &

spatiotemporal

relationships

object

database

initial query output

(visual presentation)

video

storage

low-level

descriptor

values

support vector

machines

final query output

user

feedback

Informatics and Telematics Institute

Centre for Research and Technology Hellas

35


Relevance feedback

USER

Selected

Images

Result

Support

Vector

Machines

Informatics and Telematics Institute

Centre for Research and Technology Hellas

LL

Features

36


Object-based Video Retrieval

using Ontologies

• User can define

high-level

concepts using

the object

ontology

• Results can be

improved with

relevance

feedback

• Suitable for

knowledge

sharing

Automatic segmentation

of the example image

The user can

select a region

Informatics and Telematics Institute

Centre for Research and Technology Hellas

37


A World Wide Web Region-

Based Search Engine

World Wide Web

User

Crawler - Spider

PHP

Data Base

JDBC

Java Data Base

Connection

Server

Segmentation

and Indexing

Informatics and Telematics Institute

Centre for Research and Technology Hellas

38


A World Wide Web Region-

Based Search Engine

• Region-based image

search

• Each image is

segmented into

regions using a novel

extended K-Means

algorithm: the K-

Means-withconnectivityconstraint

• A new color distance

is defined using the

L*a*b* color space

• Characteristic

features of each

region are estimated

using color, texture

and shape

information

Automatic segmentation

of the example image

The user can

select a region

http://uranus.ee.auth.gr/Istorama

Informatics and Telematics Institute

Centre for Research and Technology Hellas

39


Experimental Results

The user can

select a

category to

browse

Informatics and Telematics Institute

Centre for Research and Technology Hellas

40


Experimental Results

The user can

select an

image to start

a query

Informatics and Telematics Institute

Centre for Research and Technology Hellas

41


Segmented

representation

The user must

select a region

The user can

adjust the

weights of

each feature

Informatics and Telematics Institute

Centre for Research and Technology Hellas

42


Informatics and Telematics Institute

Centre for Research and Technology Hellas

43


Informatics and Telematics Institute

Centre for Research and Technology Hellas

44


Informatics and Telematics Institute

Centre for Research and Technology Hellas

45


SCHEMA NoE Aim

• Bring together a critical mass of industrial partners,

end users, universities and research centres

• Improve the systematic exchange of information by

the forging of links between the partners

• Create a synergy and a multiplier-effect enriching the

capabilities of the group of partners by bringing

together complementary skills and know-how

• Continuous expansion of the network

• Develop common European R&D agendas

• Strengthen the European expertise and research

excellence

• Disseminate the results worldwide

Informatics and Telematics Institute

Centre for Research and Technology Hellas

46


SCHEMA Research Topics

• Content-based multimedia analysis

• Access to the information using query

structures that come naturally to human

beings

• Semantic web technologies

• Copyright issues of multimedia

• New methods for multimedia access and

delivery

• MPEG-7 and MPEG-21 standards

• User interfaces and human factors

Informatics and Telematics Institute

Centre for Research and Technology Hellas

47


SCHEMA Reference System

Design

• Design of a general

architecture for contentbased

analysis,

representation, content

protection (watermarking),

indexing and retrieval

systems

• Module-based, distributed

and expandable architecture

• Definition of interfaces

between different modules

• Each partner will be able to

use its own module

Informatics and Telematics Institute

Centre for Research and Technology Hellas

48


SCHEMA Reference System

The user can

select from four

different algorithms

Informatics and Telematics Institute

Centre for Research and Technology Hellas

49


More Information

Planned Activities

• "Special session on Content-Based Semantic Scene

Analysis”, Third International Workshop on Content-

Based Multimedia Indexing September 22 - 24, 2003,

IRISA, Rennes, France (CBMI 2003).

• IEEE International Conference on Image Processing

2003, September 14-17, 2003, Barcelona, Spain,

Exhibitors booth.

• SCHEMA web page: http://www.schema-ist.org/

• Become an Affiliated Member:

• http://www.schema-ist.org/SCHEMA/project/become_a_member.html

Informatics and Telematics Institute

Centre for Research and Technology Hellas

50


Conclusions – Open issues

• Image Understanding: Automatic mapping of LL to HL

• Use of context to restrict the domain

• Audio scene analysis

• Fusion of multimodalities

• Active learning

• Unsupervised search for “interesting” AV patterns

(“define” the keywords)

• Huge Image/Video Databases

• User in the loop (e.g. relevance feedback)

• Performance evaluation (reference data sets)

• Killer applications (medicine, surveillance, personal

content, ?)

Informatics and Telematics Institute

Centre for Research and Technology Hellas

51

More magazines by this user
Similar magazines