13.07.2015 Views

MEANING: a Roadmap to Knowledge Technologies ... - CiteSeerX

MEANING: a Roadmap to Knowledge Technologies ... - CiteSeerX

MEANING: a Roadmap to Knowledge Technologies ... - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

present in the web as the others. Usingthis approach it is possible <strong>to</strong> balance bothgaps.Although the technology <strong>to</strong> providecompatibility across wordnets exits(Daudé et al, 1999, 2000, 2001), newresearch is needed for porting anduploading the various types of knowledgeacross languages, and new ways <strong>to</strong> test thevalidity of the ported knowledge in thetarget languages.3. The <strong>MEANING</strong> <strong>Roadmap</strong>The improvements mentioned above havebeen explored separately with relativesuccess. In fact, no research group inisolation has tried <strong>to</strong> combine all thisaforementioned fac<strong>to</strong>rs. We designed the<strong>MEANING</strong> project 3 convinced that only acombination of all relevant knowledgeand resources will be able <strong>to</strong> producesignificant advances in this crucialresearch area.<strong>MEANING</strong> will treat the web as a (huge)corpus <strong>to</strong> learn information from, sinceeven the largest conventional corporaavailable (e.g. the Reuters corpus, theBritish National Corpus) are not largeenough <strong>to</strong> be able <strong>to</strong> acquire reliableinformation in sufficient detail aboutlanguage behaviour. Moreover, mostlanguages do not have large or diverseenough corpora available.<strong>MEANING</strong> proposes an innovativebootstrapping process <strong>to</strong> deal with theinter-dependency between WSD andknowledge acquisition:populate it with domain labels, <strong>to</strong>induce au<strong>to</strong>matically trainingexamples) with ML techniques thatcombine very large amounts of labeledand unlabeled data. When ready, usealso the knowledge acquired in 2.2. Use the obtained accurate WSD data inconjunction with shallow parsingtechniques and domain tagging <strong>to</strong>extract new linguistic knowledge <strong>to</strong>incorporate in<strong>to</strong> EuroWordNet.This method will be able <strong>to</strong> break thisinterdependency in a series of cycles thanks<strong>to</strong> the fact that the WSD system will bebased on all domain information,sophisticated linguistic knowledge, largenumbers of au<strong>to</strong>matically tagged examplesfrom the web, and a combination ofannotated and unannotated data. The firstWSD system will have weaker linguisticknowledge, but the sole combination of therest of the fac<strong>to</strong>rs will produce significantperformance gains. Besides, some of therequired linguistic knowledge can beacquired from unnanotated data, and cantherefore be acquired without using anyWSD system. Once acceptable WSD isavailable, the acquired knowledge will beof a higher quality, and will allow for betterWSD performance.Multilingualism will be also helpful for<strong>MEANING</strong>. The idiosyncratic way themeaning is realised in a particular languagewill be captured and ported <strong>to</strong> the rest oflanguages involved in the project 4 usingEuroWordNet as a Multilingual CentralReposi<strong>to</strong>ry in three consecutive phases (seefigure 1).1. Train accurate WSD systems andapply them <strong>to</strong> very large corpora bycoupling knowledge-based techniqueson the existing EuroWordNet (e.g. <strong>to</strong>3Started in March 2002, <strong>MEANING</strong> IST-2001-34460 "Developing Multilingual Web-scaleLanguage <strong>Technologies</strong>" is a three years researchproject funded by the EC.4 <strong>MEANING</strong> will work with three major Europeanlanguages (English, Spanish and Italian) and twominority languages (Catalan and Basque).


(QJOLVK:HE&RUSXV:6':6',WDOLDQ:HE&RUSXV$&4(QJOLVK(:183/2$'83/2$',WDOLDQ(:1$&4325732570XOWLOLQJXDO&HQWUDO5HSRVLWRU\32573257$&46SDQLVK(:183/2$'83/2$'%DVTXH(:1$&46SDQLVK:HE&RUSXV&DWDODQ(:1:6'%DVTXH:HE&RUSXV&DWDODQ:HE&RUSXV:6'4 ConclusionsWhere the acquisition of knowledge fromlarge-scale document collections will beone of the major challenge for the nextgeneration of text processing applications,<strong>MEANING</strong> emphasises multilingualcontent-based access <strong>to</strong> web content.Moreover, it can provide a keys<strong>to</strong>neenabling technologies for the semanticweb. In particular, the MultilingualCentral Reposi<strong>to</strong>ry produced by<strong>MEANING</strong> is going <strong>to</strong> constitute thenatural knowledge resource for a numberof semantic processes that need largeamounts of linguistic data <strong>to</strong> be effective<strong>to</strong>ols (e.g. web on<strong>to</strong>logies). NLP <strong>to</strong>ols andsoftware of the next generation willbenefit from the <strong>MEANING</strong> outcomes.Current web access applications are basedon words; <strong>MEANING</strong> will open the wayfor access <strong>to</strong> the multilingual web basedon concepts, providing applications withcapabilities that significantly exceed thosecurrently available. <strong>MEANING</strong> willFigure 1: <strong>MEANING</strong> data flow.facilitate development of concept-basedopen domain Internet applications (such asQuestion/Answering, Cross LingualInformation Retrieval, Summarisation, TextCategorisation, Event Tracking,Information Extraction, MachineTranslation, etc.). Furthermore, <strong>MEANING</strong>will supply a common conceptual structure<strong>to</strong> Internet documents, thus facilitatingknowledge management of web content.This common conceptual structure is adecisive enabling technology for allowingthe semantic web.ReferencesAgirre E. and Martínez D. Exploring au<strong>to</strong>maticword sense disambiguation with decision listsand the Web. Proceedings of the Workshop“Semantic Annotation And IntelligentAnnotation” organized by COLING 2000.Luxembourg. 2000.Agirre E. and Martinez D. Learning class-<strong>to</strong>-classselectional preferences. Proceedings of theWorkshop "Computational Natural LanguageLearning" (CoNLL-2001). In conjunction withACL'2001/EACL'2001. Toulouse. 2001.Agirre E., Ansa O., Martínez D. and Hovy E.Enriching WordNet concepts with <strong>to</strong>pic


signatures. Proceedings of the NAACLworkshop on WordNet and Other lexicalResources: Applications, Extensions andCus<strong>to</strong>mizations. Pittsburg. 2001.Agirre E. and Martinez D. Integrating selectionalpreferences in WordNet. Proceedings of thefirst International WordNet Conference.Mysore, India, 2002.Blum A. and Mitchel T. Combining labelled andunlabeled data with co-training. InProceedings of the 11 th Annual Conference onComputational Learning Theory. 1998.Carroll, J. and McCarthy, D. Word sensedisambiguation using au<strong>to</strong>matically acquiredverbal preferences. Computers and theHumanities. Senseval Special Issue, Vol. 34,No 1-2. 2000.Daudé J., Padró L. and Rigau G., MappingMultilingual Hierarchies using RelaxationLabelling, Joint SIGDAT Conference onEmpirical Methods in Natural LanguageProcessing and Very Large Corpora(EMNLP/VLC'99). Maryland, 1999.Daudé J., Padró L. and Rigau G., MappingWordNets Using Structural Information , 38thAnual Meeting of the ACL. Hong Kong, 2000.Daudé J., Padró L. and Rigau G., A CompleteWN1.5 <strong>to</strong> WN1.6 Mapping, Proceedings ofNAACL Workshop "WordNet and OtherLexical Resources: Applications, Extensionsand Cus<strong>to</strong>mizations". Pittsburg, PA, 2001.Escudero G., Màrquez L. and Rigau G., BoostingApplied <strong>to</strong> Word Sense Disambiguation.Proceedings of the 11th European Conferenceon Machine Learning. Barcelona. 2000.Escudero G., Màrquez L. and Rigau G., NaiveBayes and Exemplar-Based approaches <strong>to</strong>Word Sense Disambiguation Revisited.Proceedings of the 14th European Conferenceon Artificial Intelligence, Berlin. 2000.Escudero G., Màrquez L. and Rigau G., AComparison between Supervised LearningAlgorithms for Word Sense Disambiguation.Proceedings of Fourth Computational NaturalLanguage Learning Workshop. Lisbon. 2000.Escudero G., Màrquez L. and Rigau G., AnEmpirical Study of the Domain Dependence ofSupervised Word Sense DisambiguationSystems. Proceedings of Joint SIGDATConference on Empirical Methods in NaturalLanguage Processing and Very Large Corpora.Hong Kong. 2000.Escudero G., Màrquez L. and Rigau G., UsingLazyBoosting for Word Sense Disambiguation.Proceedings of 2 nd International Workshop“Evaluating Word Sense DisambiguationSystems”, SENSEVAL-2. Toulouse. 2001.Fellbaum C. edi<strong>to</strong>r. WordNet An ElectronicLexical Database. The MIT Press. 1998.Ide, N. and Vèronis, J. Introduction <strong>to</strong> the specialissue on word sense disambiguation: The state ofthe art. Computational Linguistics, 24 (1), 1998.Korhonen A., Gorrell, G. and McCarthy D.Statistical Filtering and SubcategorizationFrame Acquisition. In Proceedings of the JointSIGDAT Conference on Empirical Methods inNatural Language Processing and Very LargeCorpora. Hong Kong. 2000.Leacock, C. Chodorow, M. and Miller, G.A. UsingCorpus Statistics and WordNet Relations forSense Identication, Computational Linguistics,24(1), 1998.Magnini B. and Cavaglià G., Integrating subjectfield codes in<strong>to</strong> WordNet. In Proceedings of the2 nd International Conference on LanguageResources and Evaluation, Athens. 2000.Martínez D. and Agirre E. One Sense perCollocation and Genre/Topic Variations.Proceedings of the Joint SIGDAT Conference onEmpirical Methods in Natural LanguageProcessing and Very Large Corpora. Hong Kong,2000.McCarthy, D. and Korhonen, A. Detecting verbalparticipation in diathesis alternations.Proceedings of the 17th International Conferenceon Computational Linguistics and 36th AnnualMeeting of the Association for ComputationalLinguistics COLING-ACL'98. Montreal. 1998.McCarthy D., Lexical Acquisition at the Syntax-Semantics Interface: Diathesis Aternations,Subcategorization Frames and SelectionalPreferences. Ph.D. thesis, University of Sussex.2001.McCarthy, D., Carroll J. and J. Preiss. J.Disambiguating noun and verb senses usingau<strong>to</strong>matically acquired selectional preferences.Proceedings of the SENSEVAL-2 Workshop atACL/EACL'01, Toulouse. 2001.Mihalcea R. and Moldovan D. An au<strong>to</strong>matic methodfor generating sense tagged corpora. InProceedings of American Association forArtificial Intelligence. 1999.Miller G. Five papers on WordNet, Special Issue ofInternational Journal of Lexicogrphy 3(4). 1990.Ng. H. T. Getting Serious about Word SenseDisambiguation. In Proceedings of Workshop“Tagging Text with Lexical Semantics: Why,what and how?”, Washing<strong>to</strong>n, 1997.Vossen P. EuroWordNet: A Multilingual Databasewith Lexical Semantic Networks, KluwerAcademic Publishers, Dordrecht. 1998.Yarowsky D., Unsupervised word sensedisambiguation rivaling supervised methods. InProceedings of the 33 rd Annual Meeting of theAssociation for Computational Linguistics. 1995.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!