12.07.2015 Views

複数語句から構成されるコンテキストを考慮した連想関係の抽出 ...

複数語句から構成されるコンテキストを考慮した連想関係の抽出 ...

複数語句から構成されるコンテキストを考慮した連想関係の抽出 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DEIM Forum 2011 F3-1 † †† † †† 565-0871 1-5†† 113-8656 7-3-1E-mail: †{shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, ††nakayama@cks.u-tokyo.ac.jpWikipedia Wikipedia , , Extraction of Association Relations from Contexts Consisting ofMulti-wordsMasumi SHIRAKAWA † , Kotaro NAKAYAMA †† , Takahiro HARA † , and Shojiro NISHIO †† Department of Multimedia Engineering, Graduate School of Information Science and Technology,Osaka University1-5 Yamadaoka, Suita, Osaka 565-0871, Japan†† The Center for Knowledge Structuring, The University of Tokyo7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, JapanE-mail: †{shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp, ††nakayama@cks.u-tokyo.ac.jp1. Web 1 Web 610 2007 8 comScore 1 qSerach 2.0Web Web Web [1] [2] [3]Web [4] [5], [6] Wikipedia 2 [7], [8]Wikipedia Wikipedia [3]1http://www.comscore.com/2http://dev.sigwp.org/WikipediaThesaurusV3/


Wikipedia Wikipedia (Wikipedia Sets) (1) (2) (3) Wikipedia2. [5], [6] [5], [6], [9], [10] Web [11] [5], [6] [10] [9] Web Web Chen [11] Web Wikipedia [7], [8](semantic relatedness) (semantic similarity) [12]Google Sets 3 Bayesian Sets [13]SEAL [14] 4 [6] [6] Preferred Infrastructure reflexa 5 ESA [15] Wikipedia Wikipedia 6 TermCloud [16] Web Web 3. 3. 1 Wikipedia Wikipedia Wikipedia [7], [8] WikipediaWiki Web Web 350 70 2011 1 WikipediaURL [17] Wikipedia Wikipedia Wikipedia Web 2011 1 Wikipedia 3 8000 1 [5] Web [11] [7]Wikipedia 1 WikipediaWikipedia 3http://labs.google.com/sets4http://www.boowa.com/5http://labs.preferred.jp/reflexa/6


1Table 1Input words and their contexts for evaluation - - - - - - - - - UNIX - - iPodiPhoneMac OS XGoogle Sets reflexa Google Sets reflexa ESA [15] Wikipedia Wikipedia 3 2 1 10 1 5 1 10 30 [19] 0 4 5 5 5 nDCG p p > 1nDCG p [20]nDCG p = DCG pIDCG p(2)DCG p = R 1 +IDCG p = 4 +p∑i=2p∑i=2R ilog 2 i4log 2 iIDCG p p (5 4) DCG p p = 10, 20, 30 (3)(4) nDCG p 5. 2 nDCG 2 Wikipedia Sets With BSWithout BS (p) 0 nDCG p 2 Wikipedia Sets reflexa reflexa Wikipedia SEAL Google Sets 2 p = 30


2 (nDCG p )Table 2 Evaluation results (nDCG p )TOP 10 (p = 10)Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14] Google Sets - 0.860 0.795 0.745 0.574 0.769 - 0.580 0.651 0.607 (0.543) 0.529 - 0.638 0.453 0.322 0.574 0.262 - 0.713 0.701 0.547 0.254 0.611 - - 0.284 0.292 0.170 0.262 (0.197) - 0.654 0.666 0.574 0.455 0.590 - 0.456 0.500 0.477 0.626 0.656 - UNIX 0.605 0.625 0.516 - 0.604 - 0.748 0.766 0.581 0.783 0.725 - 0.646 0.599 0.617 0.410 0.679 0.618 0.605 0.516 0.492 0.603TOP 20 (p = 20)Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14] Google Sets - 0.776 0.756 0.648 0.593 0.666 - 0.586 0.617 0.590 - 0.494 - 0.587 0.431 0.317 0.514 0.254 - 0.652 0.629 0.461 0.221 (0.485) - - 0.249 0.243 0.152 0.293 - - 0.641 0.651 0.547 0.410 0.508 - 0.421 0.452 0.457 0.535 0.573 - UNIX 0.587 0.585 0.476 - 0.560 - 0.728 0.742 0.576 0.686 0.647 - 0.621 0.561 0.571 0.434 0.678 0.585 0.567 0.479 0.461 0.548TOP 30 (p = 30)Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14] Google Sets - 0.730 0.656 (0.532) 0.487 0.657 - 0.571 0.576 0.537 - 0.482 - 0.529 0.380 0.273 0.471 0.225 - 0.626 0.604 0.429 0.208 - - - 0.232 0.237 0.125 0.296 - - 0.598 0.574 0.501 0.388 0.493 - 0.416 0.410 0.442 0.463 0.558 - UNIX 0.550 0.557 0.475 - 0.507 - 0.692 0.703 0.561 0.635 0.608 - 0.569 0.547 0.569 0.415 0.689 0.551 0.524 0.435 0.420 0.527 [21]5. 3 15 3 3 WikipediaSets 1


3 Table 3 Comparison of outputs: - Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14]Google Sets : - Wikipedia SetsreflexaWith BS Without BS (ESA [15] based)SEAL [14]Google Sets navi - - - - - lead - SEAL Google Setsreflexa 6. SEAL Google Sets


Wikipedia Wikipedia 160 Wikipedia Web C(20500093) B(21300032)[1] P.A. Chirita, C.S. Firan, and W. Nejdl, “Personalized QueryExpansion for the Web,” Proceedings of International ACMSIGIR Conference on Research and Development in InformationRetrieval (SIGIR), pp.7–14, July 2007.[2] R. Mandala, T. Tokunaga, and H. Tanaka, “Query Expansionusing Heterogeneous Thesauri,” International Journalof Information Processing and Management, vol.36, no.3,pp.361–378, May 2000.[3] R. Kraft, F. Maghoul, and C.C. Chang, “Y!Q: ContextualSearch at the Point of Inspiration,” Proceedings of InternationalConference on Information and Knowledge Management(CIKM), pp.816–823, Oct./Nov. 2005.[4] D. Shen, Z. Chen, Q. Yang, H.J. Zeng, B. Zhang, Y. Lu,and W.Y. Ma, “Web-page Classification through Summarization,”Proceedings of International ACM SIGIR Conferenceon Research and Development in Information Retrieval(SIGIR), pp.242–249, July 2004.[5] H. Schütze, and J.O. Pedersen, “A Cooccurrence-basedThesaurus and Two Applications to Information Retrieval,”International Journal of Information Processing and Management,vol.33, no.3, pp.307–318, May 1997.[6] Y. Jing, and W.B. Croft, “An Association Thesaurusfor Information Retrieval,” Proceedings of Recherched’Information Assistée par Ordinateur Conference (RIAO),pp.146–160, Oct. 1994.[7] “Wikipedia ” vol.47no.10pp.2917–2928Oct. 2006.[8] K. Nakayama, T. Hara, and S. Nishio, “Wikipedia Miningfor An Association Web Thesaurus Construction,” Proceedingsof International Conference on Web Information SystemsEngineering (WISE), pp.322–334, Dec. 2007.[9] H. Chen, T. Yim, D. Fye, and B. Schatz, “Automatic ThesaurusGeneration for an Electronic Community System,”Journal of the American Society for Information Science,vol.46, no.3, pp.175–193, Apr. 1995.[10] C.J. Crouch, “A Cluster-based Approach to Thesaurus Construction,”Proceedings of International ACM SIGIR Conferenceon Research and Development in Information Retrieval(SIGIR), pp.309–320, June 1988.[11] Z. Chen, S. Liu, L. Wenyin, G. Pu, and W.Y. Ma, “Buildinga Web Thesaurus from Web Link Structure,” Proceedingsof International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR), pp.48–55,July/Aug. 2003.[12] A. Budanitsky, and G. Hirst, “Semantic Distance in Word-Net: An Experimental, Application-oriented Evaluationof Five Measures,” Proceedings of Workshop on WordNetand Other Lexical Resources, Second meeting of the NorthAmerican Chapter of the Association for ComputationalLinguistics (NAACL), June 2001.[13] Z. Ghahramani, and K.A. Heller, “Bayesian Sets,” Proceedingsof Advances in Neural Information Processing Systems(NIPS), Dec. 2005.[14] R.C. Wang, and W.W. Cohen, “Language-Independent SetExpansion of Named Entities using the Web,” Proceedingsof International Conference on Data Mining (ICDM),pp.342–350, Oct. 2007.[15] E. Gabrilovich, and S. Markovitch, “Computing SemanticRelatedness using Wikipedia-based Explicit Semantic Analysis,”Proceedings of International Joint Conference on ArtificialIntelligence (IJCAI), pp.1606–1611, Jan. 2007.[16] T. Yamamoto, S. Nakamura, and K. Tanaka, “Term-Cloud for Enhancing Web Search,” Proceedings of InternationalConference on Web Information Systems Engineering(WISE), pp.159–166, Oct. 2009.[17] “: Wikipedia” vol.22no.5pp.693–701Sept. 2007.[18] K. Nakayama, T. Hara, and S. Nishio, “A Thesaurus ConstructionMethod from Large Scale Web Dictionaries,” Proceedingsof IEEE International Conference on Advanced InformationNetworking and Applications (AINA), pp.932–939, May 2007.[19] J. Gracia, and E. Mena, “Web-based Measure of SemanticRelatedness,” Proceedings of International Conferenceon Web Information Systems Engineering (WISE), pp.136–150, Sept. 2008.[20] K. Järvelin, and J. Kekäläinen, “Cumulated Gain-basedEvaluation of IR Techniques,” ACM Transactions on InformationSystems, vol.20, no.4, pp.422–446, Oct. 2002.[21] “Espresso :” vol.25no.2pp.233–242Jan. 2010.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!