複数語句から構成されるコンテキストを考慮した連想関係の抽出 ...

DEIM Forum 2011 F3-1 † †† † †† 565-0871 1-5†† 113-8656 7-3-1E-mail: †{shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, ††nakayama@cks.u-tokyo.ac.jpWikipedia Wikipedia , , Extraction of Association Relations from Contexts Consisting ofMulti-wordsMasumi SHIRAKAWA † , Kotaro NAKAYAMA †† , Takahiro HARA † , and Shojiro NISHIO †† Department of Multimedia Engineering, Graduate School of Information Science and Technology,Osaka University1-5 Yamadaoka, Suita, Osaka 565-0871, Japan†† The Center for Knowledge Structuring, The University of Tokyo7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, JapanE-mail: †{shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, ††nakayama@cks.u-tokyo.ac.jp1. Web 1 Web 610 2007 8 comScore 1 qSerach 2.0Web Web Web [1] [2] [3]Web [4] [5], [6] Wikipedia 2 [7], [8]Wikipedia Wikipedia [3]1http://www.comscore.com/2http://dev.sigwp.org/WikipediaThesaurusV3/

Wikipedia Wikipedia (Wikipedia Sets) (1) (2) (3) Wikipedia2. [5], [6] [5], [6], [9], [10] Web [11] [5], [6] [10] [9] Web Web Chen [11] Web Wikipedia [7], [8](semantic relatedness) (semantic similarity) [12]Google Sets 3 Bayesian Sets [13]SEAL [14] 4 [6] [6] Preferred Infrastructure reflexa 5 ESA [15] Wikipedia Wikipedia 6 TermCloud [16] Web Web 3. 3. 1 Wikipedia Wikipedia Wikipedia [7], [8] WikipediaWiki Web Web 350 70 2011 1 WikipediaURL [17] Wikipedia Wikipedia Wikipedia Web 2011 1 Wikipedia 3 8000 1 [5] Web [11] [7]Wikipedia 1 WikipediaWikipedia 3http://labs.google.com/sets4http://www.boowa.com/5http://labs.preferred.jp/reflexa/6

1Table 1Input words and their contexts for evaluation - - - - - - - - - UNIX - - iPodiPhoneMac OS XGoogle Sets reflexa Google Sets reflexa ESA [15] Wikipedia Wikipedia 3 2 1 10 1 5 1 10 30 [19] 0 4 5 5 5 nDCG p p > 1nDCG p [20]nDCG p = DCG pIDCG p(2)DCG p = R 1 +IDCG p = 4 +p∑i=2p∑i=2R ilog 2 i4log 2 iIDCG p p (5 4) DCG p p = 10, 20, 30 (3)(4) nDCG p 5. 2 nDCG 2 Wikipedia Sets With BSWithout BS (p) 0 nDCG p 2 Wikipedia Sets reflexa reflexa Wikipedia SEAL Google Sets 2 p = 30

2 (nDCG p )Table 2 Evaluation results (nDCG p )TOP 10 (p = 10)Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14] Google Sets - 0.860 0.795 0.745 0.574 0.769 - 0.580 0.651 0.607 (0.543) 0.529 - 0.638 0.453 0.322 0.574 0.262 - 0.713 0.701 0.547 0.254 0.611 - - 0.284 0.292 0.170 0.262 (0.197) - 0.654 0.666 0.574 0.455 0.590 - 0.456 0.500 0.477 0.626 0.656 - UNIX 0.605 0.625 0.516 - 0.604 - 0.748 0.766 0.581 0.783 0.725 - 0.646 0.599 0.617 0.410 0.679 0.618 0.605 0.516 0.492 0.603TOP 20 (p = 20)Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14] Google Sets - 0.776 0.756 0.648 0.593 0.666 - 0.586 0.617 0.590 - 0.494 - 0.587 0.431 0.317 0.514 0.254 - 0.652 0.629 0.461 0.221 (0.485) - - 0.249 0.243 0.152 0.293 - - 0.641 0.651 0.547 0.410 0.508 - 0.421 0.452 0.457 0.535 0.573 - UNIX 0.587 0.585 0.476 - 0.560 - 0.728 0.742 0.576 0.686 0.647 - 0.621 0.561 0.571 0.434 0.678 0.585 0.567 0.479 0.461 0.548TOP 30 (p = 30)Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14] Google Sets - 0.730 0.656 (0.532) 0.487 0.657 - 0.571 0.576 0.537 - 0.482 - 0.529 0.380 0.273 0.471 0.225 - 0.626 0.604 0.429 0.208 - - - 0.232 0.237 0.125 0.296 - - 0.598 0.574 0.501 0.388 0.493 - 0.416 0.410 0.442 0.463 0.558 - UNIX 0.550 0.557 0.475 - 0.507 - 0.692 0.703 0.561 0.635 0.608 - 0.569 0.547 0.569 0.415 0.689 0.551 0.524 0.435 0.420 0.527 [21]5. 3 15 3 3 WikipediaSets 1

3 Table 3 Comparison of outputs: - Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14]Google Sets : - Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14]Google Sets navi - - - - - lead - SEAL Google Setsreflexa 6. SEAL Google Sets

Wikipedia Wikipedia 160 Wikipedia Web C(20500093) B(21300032)[1] P.A. Chirita, C.S. Firan, and W. Nejdl, “Personalized QueryExpansion for the Web,” Proceedings of International ACMSIGIR Conference on Research and Development in InformationRetrieval (SIGIR), pp.7–14, July 2007.[2] R. Mandala, T. Tokunaga, and H. Tanaka, “Query Expansionusing Heterogeneous Thesauri,” International Journalof Information Processing and Management, vol.36, no.3,pp.361–378, May 2000.[3] R. Kraft, F. Maghoul, and C.C. Chang, “Y!Q: ContextualSearch at the Point of Inspiration,” Proceedings of InternationalConference on Information and Knowledge Management(CIKM), pp.816–823, Oct./Nov. 2005.[4] D. Shen, Z. Chen, Q. Yang, H.J. Zeng, B. Zhang, Y. Lu,and W.Y. Ma, “Web-page Classification through Summarization,”Proceedings of International ACM SIGIR Conferenceon Research and Development in Information Retrieval(SIGIR), pp.242–249, July 2004.[5] H. Schütze, and J.O. Pedersen, “A Cooccurrence-basedThesaurus and Two Applications to Information Retrieval,”International Journal of Information Processing and Management,vol.33, no.3, pp.307–318, May 1997.[6] Y. Jing, and W.B. Croft, “An Association Thesaurusfor Information Retrieval,” Proceedings of Recherched’Information Assistée par Ordinateur Conference (RIAO),pp.146–160, Oct. 1994.[7] “Wikipedia ” vol.47no.10pp.2917–2928Oct. 2006.[8] K. Nakayama, T. Hara, and S. Nishio, “Wikipedia Miningfor An Association Web Thesaurus Construction,” Proceedingsof International Conference on Web Information SystemsEngineering (WISE), pp.322–334, Dec. 2007.[9] H. Chen, T. Yim, D. Fye, and B. Schatz, “Automatic ThesaurusGeneration for an Electronic Community System,”Journal of the American Society for Information Science,vol.46, no.3, pp.175–193, Apr. 1995.[10] C.J. Crouch, “A Cluster-based Approach to Thesaurus Construction,”Proceedings of International ACM SIGIR Conferenceon Research and Development in Information Retrieval(SIGIR), pp.309–320, June 1988.[11] Z. Chen, S. Liu, L. Wenyin, G. Pu, and W.Y. Ma, “Buildinga Web Thesaurus from Web Link Structure,” Proceedingsof International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR), pp.48–55,July/Aug. 2003.[12] A. Budanitsky, and G. Hirst, “Semantic Distance in Word-Net: An Experimental, Application-oriented Evaluationof Five Measures,” Proceedings of Workshop on WordNetand Other Lexical Resources, Second meeting of the NorthAmerican Chapter of the Association for ComputationalLinguistics (NAACL), June 2001.[13] Z. Ghahramani, and K.A. Heller, “Bayesian Sets,” Proceedingsof Advances in Neural Information Processing Systems(NIPS), Dec. 2005.[14] R.C. Wang, and W.W. Cohen, “Language-Independent SetExpansion of Named Entities using the Web,” Proceedingsof International Conference on Data Mining (ICDM),pp.342–350, Oct. 2007.[15] E. Gabrilovich, and S. Markovitch, “Computing SemanticRelatedness using Wikipedia-based Explicit Semantic Analysis,”Proceedings of International Joint Conference on ArtificialIntelligence (IJCAI), pp.1606–1611, Jan. 2007.[16] T. Yamamoto, S. Nakamura, and K. Tanaka, “Term-Cloud for Enhancing Web Search,” Proceedings of InternationalConference on Web Information Systems Engineering(WISE), pp.159–166, Oct. 2009.[17] “: Wikipedia” vol.22no.5pp.693–701Sept. 2007.[18] K. Nakayama, T. Hara, and S. Nishio, “A Thesaurus ConstructionMethod from Large Scale Web Dictionaries,” Proceedingsof IEEE International Conference on Advanced InformationNetworking and Applications (AINA), pp.932–939, May 2007.[19] J. Gracia, and E. Mena, “Web-based Measure of SemanticRelatedness,” Proceedings of International Conferenceon Web Information Systems Engineering (WISE), pp.136–150, Sept. 2008.[20] K. Järvelin, and J. Kekäläinen, “Cumulated Gain-basedEvaluation of IR Techniques,” ACM Transactions on InformationSystems, vol.20, no.4, pp.422–446, Oct. 2002.[21] “Espresso :” vol.25no.2pp.233–242Jan. 2010.

複数語句から構成されるコンテキストを考慮した連想関係の抽出 ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?