Web検索を用いた関連のある概念間の関係抽出手法 - 情報処理学会 ...

DEIM Forum 2010 C1-2Web † †† †† † †† 565-0871 1-5†† 113-8656 7-3-1E-mail: †{shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, ††nakayama@cks.u-tokyo.ac.jp,†††eiji.aramaki@gmail.com Web Wikipedia is-a a-part-of Wikipedia Web Wikipedia Web WikipediaAbstractA Relation Extraction Method between Related Conceptsusing Web SearchMasumi SHIRAKAWA † , Kotaro NAKAYAMA †† , Eiji ARAMAKI †† ,Takahiro HARA † , and Shojiro NISHIO †† Department of Multimedia Engineering, Graduate School of Information Science and Technology,Osaka University1-5 Yamadaoka, Suita, Osaka 565-0871, Japan†† The Center for Knowledge Structuring, The University of Tokyo7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, JapanE-mail: †{shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, ††nakayama@cks.u-tokyo.ac.jp,†††eiji.aramaki@gmail.comConstruction of a huge scale ontology covering many named entities, domain specific terms and relationsamong these concepts is one of the essential technologies in the next generation Web based on semantics. In thiswork, we aim at automated ontology construction with a wide coverage of concepts and these relations by combininginformation on the Web with Wikipedia. In this paper, we propose a relation extraction method which gets pairsof co-related concepts from an association thesaurus extracted from Wikipedia and extracts relations by using Websearch.Key wordsrelation extraction, natural language processing (NLP), thesaurus, Wikipedia— 1 —

1. Web [1] [2]OpenCyc 1 WordNet [3]EDR 2 is-a a-part-of Web Wikipedia Wikipedia Wikipedia Web • • Wikipedia [4], [5] Web Web Web 2. Wikipedia Web Wikipedia Wiki Web Web 1http://www.opencyc.org/2http://www2.nict.go.jp/r/r312/EDR/J_index.html 1 Fig. 1 Ontology reconstruction from Case Frames 300 60 2009 12 Wikipedia URL [6] Wikipedia DBpedia [7] YAGO [8], [9]DBpedia Wikipedia RDF (Resource DescriptionFramework) YAGO WordNet Wikipedia WordNet [10] [11] Wikipedia is-a SVM (Support Vector Machine) Wikipedia Wikipedia Web is-a a-part-of [12], [13] 1 — 2 —

1 R Table 1A list of link texts of “Softbank” and reliability R 2 Wikipedia Fig. 2 An example of Wikipedia Thesaurus [14] Web [15]Web 3. Web 3. 1 Web Wikipedia [4], [5] Wikipedia Wikipedia 2 iPhoneiPod touch 0.016iPhoneBlackBerry 0.0053Wikipedia Web Wikipedia 371 1 17 0.479SoftBank 4 0.234 4 0.234 3 0.186SOFTBANK 3 0.186Softbank 3 0.186 2 0.117 2 0.117 2 0.117 1 0 1 0 1 03. 2 Web Web Web Web 3. 2. 1 Wikipedia [16]Wikipedia Wikipedia 1 1 c L c l i ∈ L c M c (l i ) l i R c (l i )— 3 —

3 Web Fig. 3 An example of query generation with case particlesFig. 4 4An example of relation extraction by analyzing responsesR c (l i ) =ln(M c(l i))ln(max(M c(l k )) , l k ∈ L c (1) R c (l i ) l i c 1 M c (l i ) 1 l i Wikipedia URL R c3. 2. 3 I R c 3. 2. 2 Web Web [14] Web Web AND Web Web Web 3 Web AND N Web H H 3. 2. 3 I 3. 2. 3 N Web3. 2. 1 SoftBank 1 CaboCha[17] MeCab [18] v v c 1 c 2 l i l j p 1 p 2 F (p 1 , p 2 , v)F (p 1 , p 2 , v) = R c 1(l i ) + R c2 (l j )2 F l i l j 1 c 1 c 2 1 0.5 I(p 1 , p 2 , v) ( )H(p1 , p 2 )I(p 1, p 2, v) = max, 1N(2)∑· F (p 1, p 2, v) (3)H(p 1, p 2) p 1p 2 N Web (3) Web I 4. Web — 4 —

4. 1 Wikipedia 306 20 Web 5 Web 50 250 5 (H)(A) (B) Web [14] 1 (CFR) 2 Mean Reciprocal Rank (MRR) 0 3 Discounted Cumulative Gain (DCG)Jarvelin [19]⎧⎨ g(1) if i = 1dcg(i) =⎩ dcg(i − 1) +g(i) otherwiselog c (i)⎧⎪⎨ hg(i) = a⎪⎩bif r(i) ∈ Hif r(i) ∈ Aif r(i) ∈ Br(i) i g(i) r(i)dcg(i) i [14], [20] (h, a, b) = (3, 2, 0)c = 2 0 CFR MRR DCG 5 3 4. 2 (4)(5) 2 3 4 2 Table 2 Evaluation (all concept pairs) CFR 0.141 0.258CFR 0.0817 0.160MRR 0.145 0.272MRR 0.0866 0.181DCG5 0.547 1.27DCG3 0.519 1.14 3 Table 3 Evaluation (concept pairs relation extracted between) 49 101 CFR 0.878 0.782CFR 0.510 0.485MRR 0.908 0.825MRR 0.541 0.549DCG5 3.42 3.86DCG3 3.24 3.44 4 Table 4 Analysis data in the experiment Web 72473 37019 1650 11197 140 953 0.0848 0.0851 49 101 93 263 0.304 0.859 1.90 2.60CFR 306 2 1 2 2 MRR CFR 2 CFR MRR MRR — 5 —

Table 5 5An example of extracted relationsTable 6 6An example of extracted wrong relations A B A B Berryz A B NEWS A B ZARD A B A B A B A B A B A B A B A B A B A B A B A B SMAP A B 0 1.90 2.60 DCG DCG 1 5 3 DCG Web Web 50 Web Web 4. 3 5 5 A B MBS A B A B F A B A B A B A B A B A B A B A B A B 2010 1 SMAP 2010 1 NEWS 2010 1 6 MBS MBS MBSMBS MBSMBS MBS 306 205 is-a a-part-of— 6 —

is-a a-part-of 8 !5. Wikipedia Web Wikipedia Wikipedia [4], [5]Web 2 C(20500093) B(21300032) CORE [1] “” vol.20no.6pp.707–714Nov. 2005.[2] “” vol.19no.2pp.172–186Mar. 2004.[3] G.A. Miller, “WordNet: A Lexical Database for English,”Communications of the ACM (CACM), vol.38, no.11,pp.39–41, Nov. 1995.[4] “Wikipedia ” vol.47no.10pp.2917–2928Oct. 2006.[5] K. Nakayama, T. Hara, and S. Nishio, “Wikipedia Miningfor An Association Web Thesaurus Construction,” Proceedingsof International Conference on Web Information SystemsEngineering (WISE), pp.322–334, Dec. 2007.[6] “: Wikipedia” vol.22no.5pp.693–701Sept. 2007.[7] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak,and Z.G. Ives, “DBpedia: A Nucleus for a Web of OpenData,” Proceedings of International Semantic Web Conference,Asian Semantic Web Conference (ISWC/ASWC),pp.722–735, Nov. 2007.[8] F.M. Suchanek, G. Kasneci, and G. Weikum, “YAGO: ACore of Semantic Knowledge,” Proceedings of InternationalConference on World Wide Web (WWW), pp.697–706, May2007.[9] F.M. Suchanek, G. Kasneci, and G. Weikum, “YAGO: ALarge Ontology from Wikipedia and WordNet,” Journal ofWeb Semantics, vol.6, no.3, pp.203–217, Sept. 2008.[10] “Wikipedia ” 14 pp.769–772Mar. 2008.[11] “ Wikipedia ” 18 Web SIG-SWO-A801-06July 2008.[12] “” vol.12no.2pp.109–131Mar. 2005.[13] D. Kawahara, and S. Kurohashi, “Case Frame Compilationfrom the Web using High-Performance Computing,” Proceedingsof International Conference on Language Resourcesand Evaluation (LREC), May 2006.[14] “” Dvol.J91-Dno.3pp.619–627Mar. 2008.[15] “Web ” :vol.47no.SIG19(TOD32)pp.98–112Dec. 2006.[16] K. Nakayama, T. Hara, and S. Nishio, “A Thesaurus ConstructionMethod from Large Scale Web Dictionaries,” Proceedingsof IEEE International Conference on Advanced InformationNetworking and Applications (AINA), pp.932–939, May 2007.[17] T. Kudo, and Y. Matsumoto, “Fast Methods for Kernel-Based Text Analysis,” Proceedings of Meeting on Associationfor Computational Linguistics (ACL), pp.24–31, July2003.[18] T. Kudo, K. Yamamoto, and Y. Matsumoto, “ApplyingConditional Random Fields to Japanese MorphologicalAnalysis,” Proceedings of Conference on Empirical Methodsin Natural Language Processing (EMNLP), pp.230–237,July 2004.[19] K. Järvelin, and J. Kekäläinen, “IR Evaluation Methodsfor Retrieving Highly Relevant Documents,” Proceedings ofInternational ACM SIGIR Conference on Research and Developmentin Information Retrieval (SIGIR), pp.41–48, July2000.[20] K. Eguchi, “Overview of the Topical Classification Task atNTCIR-4 WEB,” Working Notes of the 4th NTCIR Meeting,Supplement, vol.1, no.48–55, June 2004.— 7 —

Web検索を用いた関連のある概念間の関係抽出手法 - 情報処理学会 ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?