23.01.2014 Views

READIT-2007 - Indira Gandhi Centre for Atomic Research

READIT-2007 - Indira Gandhi Centre for Atomic Research

READIT-2007 - Indira Gandhi Centre for Atomic Research

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7. A private or public network.<br />

2. DIGITIZATION AND DATA MINING<br />

Digitization refers to the conversion of an item – be it printed text, manuscript, image<br />

or sound, film and video recording – from one <strong>for</strong>mat (usually print or analogue) into digital.<br />

The process basically involves taking a physical object and essentially making an ‘electronic<br />

photograph’ of it. An image of the physical object is captured- using a scanner or digital<br />

camera – and converted to digital <strong>for</strong>mat that can be stored electronically and accessed via a<br />

computer 2 .<br />

It is noted that the data and in<strong>for</strong>mation available in different <strong>for</strong>mats. These <strong>for</strong>mats<br />

include Text, Images, Video, Audio, Picture, Maps, etc. It is noted that in case of text<br />

in<strong>for</strong>mation, there is needed to scan the printed text through scanners and provide different<br />

links to access it. But in case of multimedia <strong>for</strong>mats like images, Audio, Picture, Maps,<br />

Video etc, the conversion and systematic presentation is not easy. Further, there is needed to<br />

make automatic search <strong>for</strong> easy accessibility. The easy search, effective and systematic<br />

presentation of the data is essential in case of multimedia in<strong>for</strong>mation. For this purpose, there<br />

is need to adopt data mining techniques in the library. Data mining techniques are basically<br />

from logic, Multimedia and Artificial Intelligence techniques.<br />

Data mining is the automatic extraction of patterns of in<strong>for</strong>mation from historical<br />

data, enabling companies to focus on the next important aspects of their business—telling<br />

them what they did not know and had not even thought of asking 3 . Data mining is that it “is<br />

the process of automating in<strong>for</strong>mation discovery” 4 , which improves decision making and<br />

gives a company advantages on the market. Another definition is that is “is the exploration<br />

and analysis, by automatic or semiautomatic means, of large quantities of data in order to<br />

discover meaningful patterns and rules: 5 Data mining is an applied discipline, which grew<br />

our of the statistical pattern recognition, machine learning, and artificial intelligence and<br />

coupled with business decision making to optimize and enhance it. Initially, data mining<br />

techniques have been applied to structured data from databases.<br />

Recently two branches of data mining, text data mining and Web data mining, have<br />

emerged 6&7 . They have their own research agenda, communities of researchers, and<br />

supporting companies that develop technologies and tools. Un<strong>for</strong>tunately, today multimedia<br />

data mining is in beginning stage and still there is need <strong>for</strong> developments to make effective<br />

presentation of multimedia in<strong>for</strong>mation.<br />

There are four types of multimedia data: audio data, which includes sound , speech,<br />

and music; image data (black-and-white and colour images); video data, which include timealigned<br />

sequences of images; and electronic or digital, which is sequences of time aligned 2D<br />

or 3D coordinates of a stylus, a light per, data glove sensors, or a similar device. All this data<br />

is generated by specific kind of sensors.<br />

The concept of mining in multimedia is also referred to as automatic annotation or<br />

annotation mining. There appears to be three main pattern discovery approaches that have<br />

been used <strong>for</strong> automatic annotation in multimedia data mining. These approaches primarily<br />

differ in terms of how external knowledge is provided to mine concepts. The first approach<br />

includes assigning key words or classifying the data. The second approach <strong>for</strong> automatic<br />

annotation is through clustering and here multimedia documents are clustered first and then<br />

the resulting clusters are assigned keywords by annotator. The third approach does not rely<br />

on manual annotator and it tries to mine concepts by knowing the contextual in<strong>for</strong>mation.<br />

55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!