Vector Space Semantic Parsing: A Framework for Compositional ...

More documents

Recommendations

Info

ing LSA are depicted. Discussion As described above, we observed different values of correlations for different expression types. This motivates us to think about other classes of expressions different from types; Measures could be e.g. varyingly successful with regards to different occurrence frequency classes of expressions (Evert, 2005). However, with such small datasets, as shown e.g. by the fact that the majority of our results are statistically indistinguishable, we cannot carry out any deeper investigations. A large dataset would provide a more reliable comparison. Ideally, this would consist of all the candidate expressions occurring in some smaller corpus. Also, we would prefer the annotated dataset not to be biased towards non-compositional expressions and to be provided with an inner-annotator agreement (Pecina, 2008); which is unfortunately not the case of the DISCO dataset. 6 Conclusion Our study suggests that different WSMs combined with different Measures perform reasonably well in the task of determining the semantic compositionality of word expressions of different types. Especially, LSA and COALS perform well in our experiments since their results are better than those of their basic variants (VSM and HAL, respectively) and, although not statistically significantly, they outperform the best results of the previously proposed approaches (Table 4). Importantly, our results demonstrate (Section 5) that the datasets used for the experiments are small for: first, a statistical learning of optimal parameters of both WSM algorithms and Measures; second, a thorough (different types) and reliable (statistically significant) comparison of our and the previously proposed approaches. Therefore, we plan to build a larger manuallyannotated dataset. Finally, we plan to extract a list of semantically non-compositional expressions from a given corpus and experiment with using it in NLP applications. Acknowledgments We thank to Vít Suchomel for providing the ukWaC corpus and the anonymous reviewers for their helpful comments and suggestions. This work was supported by the European Regional Development Fund (ERDF), project NTIS – New Technologies for the Information Society, European Centre of Excellence, CZ.1.05/1.1.00/02.0090; by Advanced Computing and Information Systems (grant no. SGS- 2013-029); and by the Czech Science Foundation (grant no. P103/12/G084). Also, the access to the CERIT-SC computing facilities provided under the programme Center CERIT Scientific Cloud, part of the Operational Program Research and Development for Innovations, reg. no. CZ.1.05/3.2.00/08.0144 is highly appreciated. References Otavio Costa Acosta, Aline Villavicencio, and Viviane P. Moreira. 2011. Identification and treatment of multiword expressions applied to information retrieval. In Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, MWE ’11, pages 101–109, Stroudsburg, PA, USA. Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows. 2003. An empirical model of multiword expression decomposability. Proceedings of the ACL 2003 workshop on Multiword expressions analysis acquisition and treatment, pages 89–96. Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Journal of Language Resources And Evaluation, 43(3):209–226. Chris Biemann and Eugenie Giesbrecht. 2011. Distributional semantics and compositionality 2011: shared task description and results. In Proceedings of the Workshop on Distributional Semantics and Compositionality, DiSCo ’11, pages 21–28. Marine Carpuat and Mona Diab. 2010. Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pages 242– 245, Stroudsburg, PA, USA. Tanmoy Chakraborty, Santanu Pal, Tapabrata Mondal, Tanik Saikh, and Sivaju Bandyopadhyay. 2011. Shared task system description: Measuring the compositionality of bigrams using statistical methodologies. In Proceedings of the Workshop on Distributional Semantics and Compositionality, pages 38– 42, Portland, Oregon, USA. Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 70
WSM Measure wAvg(of ρ) ρAN-VO-SV ρAN ρVO ρSV ρNN VSM 1 SU 1 0.31 0.11 -0.03 0.36 0.31 0.75 VSM 2 EN 1 0.36 0.32 0.41 0.30 0.10 0.61 VSM 3 CO 1 0.40 0.34 0.40 0.26 0.39 0.64 VSM 1 NE 1 0.34 0.26 0.20 0.48 0.07 0.60 LSA 1 SU 2 0.34 0.19 -0.05 0.46 0.42 0.71 LSA 2 EN 2 0.56 0.53 0.54 0.51 0.59 0.65 LSA 3 CO 1 0.55 0.53 0.49 0.56 0.63 0.58 LSA 2 NE 2 0.50 0.45 0.46 0.37 0.64 0.62 HAL 1 SU 3 0.45 0.36 0.28 0.50 0.40 0.67 HAL 2 EN 3 0.36 0.35 0.47 0.28 0.27 0.38 HAL 3 CO 1 0.23 0.15 0.28 0.12 -0.01 0.54 HAL 4 NE 3 0.27 0.25 0.31 0.21 0.17 0.39 COALS 1 SU 4 0.48 0.41 0.28 0.56 0.49 0.68 COALS 2 EN 2 0.58 0.54 0.6 0.63 0.37 0.68 COALS 2 CO 1 0.59 0.54 0.6 0.64 0.37 0.70 COALS 2 NE 4 0.58 0.56 0.61 0.58 0.46 0.67 RI 1 SU 5 0.52 0.44 0.45 0.51 0.52 0.68 RI 2 EN 3 0.45 0.44 0.41 0.57 0.33 0.45 RI 3 CO 1 0.21 0.13 0.13 0.16 0.11 0.54 RI 2 NE 5 0.43 0.43 0.43 0.53 0.21 0.49 Table 3: The Spearman correlations ρ of the best performing (wAvg) combinations of particular WSMs and Measures from all the tested Setups applied to TrValD. The highest correlation values in the particular columns and the correlation values which are not statistically different from them (p < 0.05) are in bold (yet we do not know how to calculate the stat. significance for the wAvg(of ρ) column). The parameters of WSMs and Measures corresponding to the indexes are depicted in Tables 5 and 6, respectively. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407. Stefan Evert. 2005. The statistics of word cooccurrences : word pairs and collocations. Ph.D. thesis, Universität Stuttgart, Holzgartenstr. 16, 70174 Stuttgart. Zellig Harris. 1954. Distributional structure. Word, 10(23):146–162. Anders Johannsen, Hector Martinez Alonso, Christian Rishøj, and Anders Søgaard. 2011. Shared task system description: frustratingly hard compositionality prediction. In Proceedings of the Workshop on Distributional Semantics and Compositionality, DiSCo ’11, pages 29–32, Stroudsburg, PA, USA. David Jurgens and Keith Stevens. 2010. The s- space package: an open source package for word space models. In Proceedings of the ACL 2010 System Demonstrations, ACLDemos ’10, pages 30–35, Stroudsburg, PA, USA. Lubomír Krčmář, Karel Ježek, and Massimo Poesio. 2012. Detection of semantic compositionality using semantic spaces. Lecture Notes in Computer Science, 7499 LNAI:353–361. Lubomír Krčmář, Karel Ježek, and Pavel Pecina. 2013. Determining compositionality of word expressions using word space models. In Proceedings of the 9th Workshop on Multiword Expressions, pages 42–50, Atlanta, Georgia, USA. Thomas K. Landauer and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240. Dekang Lin. 1999. Automatic identification of non-compositional phrases. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, ACL ’99, pages 317–324, Stroudsburg, PA, USA. Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, 28(2):203–208. Diana McCarthy, Bill Keller, and John Carroll. 2003. Detecting a continuum of compositionality in phrasal verbs. In Proceedings of the ACL 2003 workshop on Multiword expressions analysis acquisition and treatment, volume 18 of MWE ’03, pages 73–80. 71
Page 1 and 2:
ACL 2013 51st Annual Meeting of the
Page 3:
Introduction In recent years, there
Page 7:
Table of Contents Vector Space Sema
Page 10 and 11:
Poster session (continued) 12:30 Lu
Page 12 and 13:
Input: Log. Form: “red ball”
Page 14 and 15:
the NP/N λx.x red N/N λx.A red x
Page 16 and 17:
dicted and target distributions, an
Page 18 and 19:
typed objects lie in the same vecto
Page 20 and 21:
Peter D. Turney. 2006. Similarity o
Page 22 and 23:
(1999), who suggested that the real
Page 24 and 25:
mation. In addition, using a new se
Page 26 and 27:
Figure 6: Positive rate, negative r
Page 28 and 29:
References Rens Bod, Remko Scha, an
Page 30 and 31: A Structured Distributional Semanti
Page 32 and 33: Figure 1: Sample sentences & triple
Page 34 and 35: el as its three axes. We explore th
Page 36 and 37: as well as our framing of word comp
Page 38 and 39: References Marco Baroni and Roberto
Page 40 and 41: Letter N-Gram-based Input Encoding
Page 42 and 43: Hidden my house my house Visibl
Page 44 and 45: … m my y h se us Hidden Visible m
Page 46 and 47: line of the systems presented in Ta
Page 48 and 49: References Yoshua Bengio, Réjean D
Page 50 and 51: Transducing Sentences to Syntactic
Page 52 and 53: t s “We booked the flight” →
Page 54 and 55: D 3(s) = ❀ PRP + ❀ V + DT ❀ +
Page 56 and 57: Output Model λ = 0 λ = 0.2 λ = 0
Page 58 and 59: References James A. Anderson. 1973.
Page 60 and 61: General estimation and evaluation o
Page 62 and 63: Guevara (2010) and Zanzotto et al.
Page 64 and 65: ⃗p = A u ⃗v where A u is a matr
Page 66 and 67: stituents. For this reason, we must
Page 68 and 69: References Hervé Abdi and Lynne Wi
Page 70 and 71: GOOD VERTICAL safe out raise level
Page 72 and 73: HLBL HLBL SENNA 4lang original scal
Page 74 and 75: Determining Compositionality of Wor
Page 76 and 77: ability of a word in the investigat
Page 78 and 79: terminer) wheel” and “reinvent
Page 82 and 83: WSM Measure wAvg(of ρ) ρAN-VO-SV
Page 84 and 85: “Not not bad” is not “bad”:
Page 86 and 87: activation functions in recursive n
Page 88 and 89: logic experiment in Socher et al. (
Page 90 and 91: tinction without the need to move t
Page 92 and 93: T. K. Landauer and S. T. Dumais. 19
Page 94 and 95: This differs from the classical Ran
Page 96 and 97: Hungarian Algorithm First cosine si
Page 98 and 99: Features: MSRpar MSRvid SMTeuroparl
Page 100 and 101: Joseph Reisinger and Raymond J. Moo
Page 102 and 103: phrase meaning in vector space. •
Page 104 and 105: By Bayes’ rule, p(x|a, n; Θ) ∝
Page 106 and 107: 4.2 Learning from distributional re
Page 108 and 109: W = (w 1 , w 2 , · · · , w n ) a
Page 110 and 111: Aggregating Continuous Word Embeddi
Page 112 and 113: trieval, Sivic and Zisserman propos
Page 114 and 115: Another advantage is that the propo
Page 116 and 117: e 50 100 200 300 400 500 CLEF 4.0 6
Page 118 and 119: Y. Bengio, J. Louradour, R. Collobe
Page 120 and 121: Answer Extraction by Recursive Pars
Page 122 and 123: Algorithm 1: Auto-encoders co-train
Page 124 and 125: NP NP ) , ADJP , NP ( NP - NP Engli
Page 126 and 127: System Short Answer Short Answer Sh
Page 128 and 129: References Y. Bengio and R. Ducharm
Page 130 and 131:
problem of paragraphs, of tence, mo
Page 132 and 133:
m1 k 4 only on the length of s and
Page 134 and 135:
Dialogue Act Label Example Train (%
Page 136 and 137:
[Calhoun et al.2010] Sasha Calhoun,
show all

Vector Space Semantic Parsing: A Framework for Compositional ...

Create successful ePaper yourself

Delete template?

Save as template?