Full proceedings volume - Australasian Language Technology ...

More documents

Recommendations

Info

Correlation coefficient with human assessors 0.0 0.2 0.4 0.6 0.8 TE PPMI DocCos RI500 CER PARA RI300 RI150 LM Lsa50 Lsa150 Lsa300 Lsa500 RI50 LDA HAL Figure 2: Comparison of all corpus-based measures (from Koopman et al., 2012), including TE and PARA; correlations averaged across datasets and corpora. Pair # Concept 1 Doc. Freq. Concept 2 Doc. Freq. 11 Arrhythmia 2,298 Cardiomyopathy, Alcoholic 13 16 Angina Pectoris 1,725 Cardiomyopathy, Alcoholic 13 21 Abdominal pain 690 Respiratory System Abnormalities 1 34 Cardiomyopathy, Alcoholic 13 Respiratory System Abnormalities 1 36 Heart Diseases 1,872 Respiratory System Abnormalities 1 37 Heart Failure, Congestive 1,192 Respiratory System Abnormalities 1 38 Heartburn 104 Respiratory System Abnormalities 1 Table 3: Example concept pairs for which the PARA measure diverges from the human judgements on the OHSUMED corpus. Document frequencies showing the prevalence of the concepts in the corpus are reported. We conclude that the PARA measure is unable to estimate accurate semantic similarity when insufficient occurrence statistics are available for either concept. human judges, then this indicates a strong correlation. To better understand why the paradigmatic based measure differs from human assessors in Figure 4, the document frequencies of concept pairs 11, 16, 21, 34, 36, 37 and 38 from the Cav data set are reported in Table 3. This table shows that for these concept pairs at least one concept occurs in a very small number of documents. This provides little evidence for the accurate estimation of paradigmatic associations between the concept pairs. We therefore conclude that the PARA measure requires that concepts occur in a sufficient number of documents for an effective semantic similarity estimation. Similar observations are valid across datasets and corpora. For example, consider the correlation of PARA with human judgements for the Pedersen et al. (Ped) data set and the MedTrack corpus, as shown in Figure 5. The document frequencies for a number of concept pairs that show divergence from the Ped data set are shown in Table 4. For these concept pairs where PARA is inconsistent with human judges, the PPMI measure effectively estimates semantic similarity. Thus the TE model, which mixes the two form of associations, is still effective even when the PARA measure is unreliable. This further supports the inclusion of both paradigmatic and syntagmatic associations for assessing semantic similarity between medical concepts. Figure 4 also illustrates a large number of discontinuities in the PPMI graph. A discontinuity, i.e. the absence of the data-point within the plot, is due to a PPMI score of zero for the concept pair 2 . In practice, these discontinuities represent instances where the concept pair never co-occurs within any document. The same situation applies across other datasets and corpora, for example the 2 As the graph is in log scale, log(0) = −∞ cannot be plotted. 20
Pair # Concept 1 Doc. Freq. Concept 2 Doc. Freq. 9 Diarrhea 6,184 Stomach cramps 14 23 Rectal polyp 26 Aorta 3,555 Table 4: Example concept pairs for which the PARA measure diverges from the human judgements on the MedTrack corpus. Document frequencies showing the prevalence of the concepts in the corpus are reported. We conclude that the PARA measure is unable to estimate accurate semantic similarity when insufficient occurrence statistics are available for either concept. 0.1000 1.0000 0.0500 0.1000 Similarity Score 0.0100 0.0050 0.0010 Similarity Score 0.0100 0.0005 0.0001 human assessed PPMI PARA TE 0.0010 0.0001 human assessed PPMI PARA TE 0 10 20 30 40 concept pair 0 5 10 15 20 25 concept pair Figure 4: Normalised similarity scores (on log scale) of human assessors, PPMI, PARA and TE on Caviedes and Cimino dataset (Cav) when using the OHSUMED corpus for priming. Ped data set on MedTrack corpus shown in Figure 5. While PPMI discontinuities for concept pairs judged as unrelated by human assessors are correct estimates (as PPMI= 0 implies unrelatedness), discontinuities for concept pairs judged similar (e.g. pairs 11, 16, etc. in Figure 4) indicate a failure of the PPMI measure. These situations may provide the reason why the performance of PPMI, and indeed of many existing corpus-based approaches (Koopman et al., 2012), are sensitive to the choice of priming corpus. This may indicate that the ability to successfully model syntagmatic associations is sensitive to the corpus used for priming. However, because of the results obtained by the TE model we can conclude that appropriately mixing both syntagmatic and paradigmatic associations overcomes corpus sensitivity issues. In summary, the performance (both in terms of robustness and effectiveness) of the TE model is achieved by including both syntagmatic and Figure 5: Normalised similarity scores (on log scale) of human assessors, PPMI, PARA and TE on Pedersen et al dataset (Ped) when using the Medtrack corpus for priming. paradigmatic associations between medical concepts; this is due to the diversification of the type of information used to underpin the semantic similarity estimation process. 6 Conclusion This paper has presented a novel variant of a robust and effective corpus-based approach that estimates similarity between medical concepts that strongly correlates with human judges. This approach is based on the tensor encoding (TE) model. By explicitly modelling syntagmatic and paradigmatic associations the TE model is able to outperform state of the art corpus-based approaches. Furthermore, the TE model is robust across corpora and datasets, in particular overcoming corpus sensitivity issues experienced by previous approaches. A significant contribution of this paper is to highlight the important role of paradigmatic associations. Our results suggest that paradigmatic 21
Page 1 and 2: Australasian Language Technology As
Page 3 and 4: ALTA 2012 Workshop Committees Works
Page 5 and 6: ALTA 2012 Programme The proceedings
Page 7 and 8: Contents Invited talks 1 Using a la
Page 9 and 10: Invited talks 1
Page 11 and 12: Diverse Words, Shared Meanings: Sta
Page 13 and 14: A Citation Centric Annotation Schem
Page 15 and 16: The structure of this paper is as f
Page 17 and 18: understanding of sentences surround
Page 19 and 20: henceforth referred to as Annotator
Page 21 and 22: References Angrosh, M A, Cranefield
Page 23 and 24: Semantic Judgement of Medical Conce
Page 25 and 26: 2012). The TE model provides a form
Page 27: Correlation coefficient with human
Page 31 and 32: Active Learning and the Irish Treeb
Page 33 and 34: 2.2 Sources of annotator disagreeme
Page 35 and 36: complement (csubj) 2 . See Figure 4
Page 37 and 38: top Y trees from this ordered set a
Page 39 and 40: References Ron Artstein and Massimo
Page 41 and 42: Unsupervised Estimation of Word Usa
Page 43 and 44: not lend itself to determining unsu
Page 45 and 46: T 0: think, want, thing, look, tell
Page 47 and 48: correlation is much stronger than t
Page 49 and 50: References Eneko Agirre and Philip
Page 51 and 52: eaders with sentences with a negati
Page 53 and 54: Question Which word is closest in m
Page 55 and 56: Acceptability and negativity: conce
Page 57 and 58: MORE NEGATIVE / LESS NEGATIVE MR Δ
Page 59 and 60: Julie Elizabeth Weeds. 2003. Measur
Page 61 and 62: cation produced by a supervised mac
Page 63 and 64: understood to carry a lower weight
Page 65 and 66: Figure 1: Average classification ac
Page 67 and 68: 0.8 0.75 0.7 AuthorA4 AuthorB4 Auth
Page 69 and 70: Segmentation and Translation of Jap
Page 71 and 72: tomatosōsu “tomato sauce”, rev
Page 73 and 74: “markup” and “Mach”, so the
Page 75 and 76: MWE Segmentation Possible Translati
Page 77 and 78: English Term Pairs from Search Engi
Page 79 and 80:
techniques used are, of necessity,
Page 81 and 82:
ation metrics robust, if they are v
Page 83 and 84:
Figure 2: Methodologies of human as
Page 85 and 86:
an optimal method? Machine translat
Page 87 and 88:
Towards Two-step Multi-document Sum
Page 89 and 90:
archical and non-hierarchical relat
Page 91 and 92:
We use the MetaMap 7 tool to automa
Page 93 and 94:
T T & C CC z -1.5 -1.27 -1.33 p-val
Page 95 and 96:
References Alan R. Aronson. 2001. E
Page 97 and 98:
techniques and results are shown. F
Page 99 and 100:
Rock Hip-Hop Electronic Punk Metal
Page 101 and 102:
Rank Frequency Frequency (filtered)
Page 103 and 104:
Ex. 1 Ex. 2 Ex. 3 Score Song Title
Page 105 and 106:
Free-text input vs menu selection:
Page 107 and 108:
consistent with the much earlier ed
Page 109 and 110:
3.2 Dialogue system architecture Th
Page 111 and 112:
Control Free-text Menu-based n=119
Page 113 and 114:
tions in analyzing algebra example
Page 115 and 116:
langid.py for better language model
Page 117 and 118:
documents of MIME type text/html wi
Page 119 and 120:
Acknowledgments NICTA is funded by
Page 121 and 122:
LaBB-CAT: an Annotation Store Rober
Page 123 and 124:
elationship between the coincidence
Page 125 and 126:
Communication 33 (special issue on
Page 127 and 128:
matically extracting location infor
Page 129 and 130:
Classifier DBp:1R Geo:1R D+G:1R DBp
Page 131 and 132:
ALTA Shared Task papers 123
Page 133 and 134:
All Struct. Unstruct. Total - Abstr
Page 135 and 136:
System Population Intervention Back
Page 137 and 138:
tence positions (absolute and relat
Page 139 and 140:
sitional attributes, and sequential
Page 141 and 142:
ordering labels, All includes BOW,
Page 143 and 144:
3 Software Used All experimentation
Page 145 and 146:
Combination Output Public Private C
Page 147 and 148:
Experiments with Clustering-based F
Page 149 and 150:
3 Results For the initial experimen
show all

Full proceedings volume - Australasian Language Technology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?