READIT-2007 - Indira Gandhi Centre for Atomic Research
READIT-2007 - Indira Gandhi Centre for Atomic Research
READIT-2007 - Indira Gandhi Centre for Atomic Research
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
7. A private or public network.<br />
2. DIGITIZATION AND DATA MINING<br />
Digitization refers to the conversion of an item – be it printed text, manuscript, image<br />
or sound, film and video recording – from one <strong>for</strong>mat (usually print or analogue) into digital.<br />
The process basically involves taking a physical object and essentially making an ‘electronic<br />
photograph’ of it. An image of the physical object is captured- using a scanner or digital<br />
camera – and converted to digital <strong>for</strong>mat that can be stored electronically and accessed via a<br />
computer 2 .<br />
It is noted that the data and in<strong>for</strong>mation available in different <strong>for</strong>mats. These <strong>for</strong>mats<br />
include Text, Images, Video, Audio, Picture, Maps, etc. It is noted that in case of text<br />
in<strong>for</strong>mation, there is needed to scan the printed text through scanners and provide different<br />
links to access it. But in case of multimedia <strong>for</strong>mats like images, Audio, Picture, Maps,<br />
Video etc, the conversion and systematic presentation is not easy. Further, there is needed to<br />
make automatic search <strong>for</strong> easy accessibility. The easy search, effective and systematic<br />
presentation of the data is essential in case of multimedia in<strong>for</strong>mation. For this purpose, there<br />
is need to adopt data mining techniques in the library. Data mining techniques are basically<br />
from logic, Multimedia and Artificial Intelligence techniques.<br />
Data mining is the automatic extraction of patterns of in<strong>for</strong>mation from historical<br />
data, enabling companies to focus on the next important aspects of their business—telling<br />
them what they did not know and had not even thought of asking 3 . Data mining is that it “is<br />
the process of automating in<strong>for</strong>mation discovery” 4 , which improves decision making and<br />
gives a company advantages on the market. Another definition is that is “is the exploration<br />
and analysis, by automatic or semiautomatic means, of large quantities of data in order to<br />
discover meaningful patterns and rules: 5 Data mining is an applied discipline, which grew<br />
our of the statistical pattern recognition, machine learning, and artificial intelligence and<br />
coupled with business decision making to optimize and enhance it. Initially, data mining<br />
techniques have been applied to structured data from databases.<br />
Recently two branches of data mining, text data mining and Web data mining, have<br />
emerged 6&7 . They have their own research agenda, communities of researchers, and<br />
supporting companies that develop technologies and tools. Un<strong>for</strong>tunately, today multimedia<br />
data mining is in beginning stage and still there is need <strong>for</strong> developments to make effective<br />
presentation of multimedia in<strong>for</strong>mation.<br />
There are four types of multimedia data: audio data, which includes sound , speech,<br />
and music; image data (black-and-white and colour images); video data, which include timealigned<br />
sequences of images; and electronic or digital, which is sequences of time aligned 2D<br />
or 3D coordinates of a stylus, a light per, data glove sensors, or a similar device. All this data<br />
is generated by specific kind of sensors.<br />
The concept of mining in multimedia is also referred to as automatic annotation or<br />
annotation mining. There appears to be three main pattern discovery approaches that have<br />
been used <strong>for</strong> automatic annotation in multimedia data mining. These approaches primarily<br />
differ in terms of how external knowledge is provided to mine concepts. The first approach<br />
includes assigning key words or classifying the data. The second approach <strong>for</strong> automatic<br />
annotation is through clustering and here multimedia documents are clustered first and then<br />
the resulting clusters are assigned keywords by annotator. The third approach does not rely<br />
on manual annotator and it tries to mine concepts by knowing the contextual in<strong>for</strong>mation.<br />
55