29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 1. Framework of the thesis<br />

Figure 1.1: Evolution of the total number of websites across all Internet<br />

domains, from November 1995 to February 2009 (extracted from<br />

http://news.netcraft.com/archives/2009/02/index.html).<br />

Resorting to the WWW example again, we have all witnessed, through the last decade, how<br />

web pages have evolved from static plain text to dynamic multimedia contents. That is,<br />

the information available on the Web is, to a large extent, no longer restricted to a single<br />

modality (e.g. news in text format). Rather the contrary, data is increasingly becoming<br />

multimodal, i.e. a combination of several modalities (e.g. text news accompanied with<br />

photos, graphics, audio or video).<br />

This shift towards data multimodality can be regarded as a change of paradigm which<br />

is also found in many other domains (Klosgen, Zytkow, and Zyt, 2002). For instance,<br />

meteorological information often combines satellite and radar imagery with meteorological<br />

data in numerical form (e.g. temperature, humidity, wind speed, rainfall, etc.). In medical<br />

contexts, repositories often contain data obtained from several diagnostic tests (e.g.<br />

blood analysis, radiography, electrocardiophy, electroencephalography, functional magnetic<br />

resonance) whose results are represented under distinct modalities (nominal and numerical<br />

data, images, etc.).<br />

To sum up, despite providing enormous quantities of information on a silver plate, the<br />

development and expansion of the ICT pose a serious challenge to human analytic and<br />

understanding capabilities, not only by the large volumes of data available, but also by its<br />

growing complexity. Therefore, it seems logical to highlight the importance of developing<br />

automatic tools that allow knowledge extraction from large multimodal data repositories,<br />

regardless of their domain (Witten and Frank, 2005). The techniques supporting these tools<br />

belong to the fields of knowledge discovery and data mining (Klosgen, Zytkow, and Zyt,<br />

2002), which constitute, in a broad sense, the frame of reference of this thesis.<br />

When it comes to extracting knowledge from a given data collection, one of the primary<br />

tasks one thinks of is organization: clearly, arranging the contents of a data repository<br />

according to some meaningful structure helps to gain some perspective on it –in fact, orga-<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!