13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

IADIS International Conference <strong>WWW</strong>/<strong>Internet</strong> 2010ORDINARY WEB PAGES AS A SOURCE FOR METADATAACQUISITION FOR OPEN CORPUS USER MODELINGMichal Barla and Mária BielikováInstitute of Informatics and <strong>Software</strong> Engineering, Faculty of Informaticsand Information Technologies, Slovak University of TechnologyIlkovičova 3, 842 16 Bratislava, SlovakiaABSTRACTPersonalization and adaptivity of the Web as we know of today is often “closed” within a particular web-based system.As a result there are only a few “personalized islands” within the whole Web. Spreading the personalization to the wholeWeb either via an enhanced proxy server or using an agent residing on a client-side brings a challenge how to determinemetadata within an open corpus Web <strong>do</strong>main, which would allow for an efficient creation of overlayed user model. Inthis paper we present our approach to metadata acquisition for open corpus user modeling applicable on the “wild” Web,where we decided to take into account metadata in the form of keywords representing the visited web pages. We presentthe user modeling process (which is thus keyword-based) built on the top of an enhanced proxy server, capable ofpersonalizing user browsing sessions via pluggable modules. The paper focuses on comparison of algorithms and thirdpartyservices which allow for extraction of required keywords from ordinary web pages, which is a crucial step of ouruser modeling approach.KEYWORDSKeyword extraction, user modeling, proxy.1. INTRODUCTIONThe field of adaptive web-based systems is nowadays well established and pretty mature(Brusilovsky, et al., 2007). However, most of the proposed approaches are concentrating on personalizationof one particular system, either by incorporating user modeling and adaptive features into a web applicationor by creating an adaptive layer on the top of an existing application. As a result, there are only a few“personalized islands” within the whole Web, where majority of content and information services areprovided using a failing “one-size-fits-all” paradigm.The reason why the majority of adaptive approaches is built on the top of a closed corpus <strong>do</strong>main is thatevery adaptive system must track user's attitudes (such as knowledge or interest) towards <strong>do</strong>main elementsoften realized in the form of an overlayed user model. Closed corpus <strong>do</strong>main can provide a detailed, oftenmanually prepared and non-changing conceptualization, which is easily used for user modeling purposes. Inthe case of an open corpus or vast and dynamic <strong>do</strong>main, we cannot track user's relations to all <strong>do</strong>cuments orother pieces of information which exist within the <strong>do</strong>main. The solution is to incorporate and use externalmodels that exist beyond the hyperspace of interlinked <strong>do</strong>cuments and provide a meaningful abstraction ofthe <strong>do</strong>main (Brusilovsky & Henze, 2007), i.e., to provide a metadata model and a mapping between <strong>do</strong>mainitems and this metadata model, which serves also as a bottom layer for an overlayed user model. An exampleof such an approach, which maps user's interests to a set of metadata, can be seen in (Barla, et al., 2009).When considering the whole Web as our <strong>do</strong>main of interest, it seems that we can leverage the SemanticWeb as both a source of metadata and mapping inherently present within it. However, when consideringfindings from (Sabou, et al., 2008; Fernandez, et al., 2008) that• existing semantic systems are restricted to a limited set of <strong>do</strong>mains – they use a set of a prioridefined ontologies covering one specific <strong>do</strong>main without proper linking to other ontologies,• the overall Semantic Web <strong>do</strong>es not adequately cover specific terminology and227

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!