Full proceedings volume - Australasian Language Technology ...

More documents

Recommendations

Info

O. Bojar, M. Ercegovčevic, M. Popel, and O. Zaidan. 2011. A grain of salt for the WMT manual evaluation. In Proc. 6th Wkshp. Statistical Machine Translation, pages 1–11, Edinburgh, Scotland. Assoc. Computational Linguistics. F. Bond, K. Ogura, and S. Ikehara. 1995. Possessive pronouns as determiners in Japanese-to-English machine translation. In Proc. 2nd Conf. Pacific Assoc. Computational Linguistics, pages 32–38, Brisbane, Australia. C. Callison-Burch, M. Osborne, and P. Koehn. 2006. Reevaluating the role of BLEU in machine translation research. In Proc. 11th Conf. European Chapter of the Assoc. Computational Linguistics, pages 249–256, Trento, Italy, April. Assoc. for Computational Linguistics. C. Callison-Burch, C. Fordyce, P. Koehn, C. Monz, and J. Schroeder. 2007. (Meta-) evaluation of machine translation. In Proc. 2nd Wkshp. Statistical Machine Translation, pages 136–158, Prague, Czech Republic. Assoc. Computational Linguistics. C. Callison-Burch, C. Fordyce, P. Koehn, C. Monz, and J. Schroeder. 2008. Further meta-evaluation of machine translation. In Proc. 3rd Wkshp. Statistical Machine Translation, pages 70–106, Columbus, Ohio. Assoc. Computational Linguistics. C. Callison-Burch, P. Koehn, C. Monz, and J. Schroeder. 2009. Findings of the 2009 Workshop on Statistical Machine Translation. In Proc. 4th Wkshp. Statistical Machine Translation, pages 1–28, Athens, Greece. Assoc. Computational Linguistics. C. Callison-Burch, P. Koehn, C. Monz, K. Peterson, M. Przybocki, and O. Zaidan. 2010. Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation. In Proc. 5th Wkshp. Statistical Machine Translation, Uppsala, Sweden. Assoc. Computational Linguistics. C. Callison-Burch, P. Koehn, C. Monz, and O. Zaidan. 2011. Findings of the 2011 Workshop on Statistical Machine Translation. In Proc. 6th Wkshp. Statistical Machine Translation, pages 22–64, Edinburgh, Scotland. Assoc. Computational Linguistics. C. Callison-Burch, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2012. Findings of the 2012 Workshop on Statistical Machine Translation. In Proc. 7th Wkshp. Statistical Machine Translation, pages 10–51, Montreal, Canada, June. Assoc. Computational Linguistics. D. T. Campbell and D. W. Fiske. 1959. Convergent and discriminant validation by the muultitrait-multimethod matrix. 56:81–105. J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46. M. Denkowski and A. Lavie. 2010. Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgement tasks. In Proc. 9th Conf. Assoc. Machine Translation in the Americas (AMTA). S. Ikehara, S. Shirai, and K. Ogura. 1994. Criteria for evaluating the linguistic quality of Japanese to English machine translations. J. Japanese Soc. Artificial Intelligence, 9. (in Japanese). P. Koehn and C. Monz. 2006. Manual and automatic evaluation of machine translation between european languages. In Proc. Wkshp. Statistical Machine Translation, pages 102–121. LDC. 2005. Linguistic data annotation specification: Assessment of fluency and adequacy in translations. Technical report, Linguistic Data Consortium. Revision 1.5. A. Lopez. 2012. Putting human machine translation systems in order. In Proc. 7th Wkshp. Statistical Machine Translation, pages 1–9, Montreal, Canada, June. Assoc. Computational Linguistics. K. MacCorquodale and P.E. Meehl. 1948. On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55(2):307–321. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proc. 40th Ann. Meet. Assoc. Computational Linguistics, pages 311–318, Philadelphia, PA, July. Assoc. Computational Linguistics. M. Przybocki, K. Peterson, S. Bronsart, and G. Sanders. 2009. The NIST 2008 metrics for machine translation challenge – overview, methodology, metrics and results. Machine Translation, 23(2-3):71–103. L. Schwartz, T. Aikawa, and C. Quirk. 2003. Disambiguation of English PP attachment using multilingual aligned data. In Proc. 9th Machine Translation Summit (MT Summit IX), New Orleans, LA. William M.K. Trochim. 1999. The Research Methods Knowledge Base. Cornell University Custom Publishing, Ithaca, New York. E. M. Voorhees and D. K. Harman. 2005. TREC: Experiment and evaluation in information retrieval. MIT Press, Cambridge, MA. J. S. White, T. O’Connell, and F. O’Mara. 1994. The ARPA MT evaluation methodologies: Evolution, lessons, and future approaches. In Proc. 1st Conf. Assoc. for Machine Translation in the Americas (AMTA’94), pages 193–205. 78
Towards Two-step Multi-document Summarisation for Evidence Based Medicine: A Quantitative Analysis Abeed Sarker , Diego Mollá-Aliod Cécile Paris Centre for Language Technology CSIRO – ICT Centre Macquarie University Sydney, NSW 2122 Sydney, NSW 2109 cecile.paris@csiro.au {abeed.sarker, diego.molla-aliod}@mq.edu.au Abstract We perform a quantitative analysis of data in a corpus that specialises on summarisation for Evidence Based Medicine (EBM). The intent of the analysis is to discover possible directions for performing automatic evidence-based summarisation. Our analysis attempts to ascertain the extent to which good, evidence-based, multidocument summaries can be obtained from individual single-document summaries of the source texts. We define a set of scores, which we call coverage scores, to estimate the degree of information overlap between the multi-document summaries and source texts of various granularities. Based on our analysis, using several variants of the coverage scores, and the results of a simple task oriented evaluation, we argue that approaches for the automatic generation of evidence-based, bottom-line, multi-document summaries may benefit by utilising a two-step approach: in the first step, content-rich, singledocument, query-focused summaries are generated; followed by a step to synthesise the information from the individual summaries. 1 Introduction Automatic summarisation is the process of presenting the important information contained in a source text in a compressed format. Such approaches have important applications in domains where lexical resources are abundant and users face the problem of information overload. One such domain is the medical domain, with the largest online database (PubMed 1 ) containing over 21 million published medical articles. Thus, a standard clinical query on this database returns numerous results, which are extremely time-consuming to read and analyse manually. This is a major obstacle to the practice of Evidence Based Medicine (EBM), which requires practitioners to refer to relevant published medical research when attempting to answer clinical 1 http://www.ncbi.nlm.nih.gov/pubmed queries. Research has shown that practitioners require botom-line evidence-based answers at point of care, but often fail to obtain them because of time constraints (Ely et al., 1999). 1.1 Motivation Due to the problems associated with the practise of EBM, there is a strong motivation for automatic summarisation/question-answering (QA) systems that can aid practitioners. While automatic text summarisation research in other domains (e.g., news) has made significant advances, research in the medical domain is still at an early stage. This can be attributed to various factors: (i) the process of answer generation for EBM requires practitioners to combine their own expertise with medical evidence, and automatic systems are only capable of summarising content present in the source texts; (ii) the medical domain is very complex with a large number of domain specific terminologies and relationships between the terms that systems need to take into account when performing summarisation; and (iii) while there is an abundance of medical documents available electronically, specialised corpora for performing summarisation research in this domain are scarce. 1.2 Contribution We study a corpus that specialises on the task of summarisation for EBM and quantitatively analyse the contents of human generated evidence-based summaries. We compare bottom-line evidence-based summaries to source texts and human-generated, query-focused, single-document summaries of the source texts. This enables us to determine if good singledocument summaries contain sufficient content, from source texts, to be used for the generation of multi-document, bottom-line summaries. We also study single-document extractive summaries Abeed Sarker, Diego Mollá-Aliod and Cecile Paris. 2012. Towards Two-step Multi-document Summarisation for Evidence Based Medicine: A Quantitative Analysis. In Proceedings of Australasian Language Technology Association Workshop, pages 79−87.
Page 1 and 2:
Australasian Language Technology As
Page 3 and 4:
ALTA 2012 Workshop Committees Works
Page 5 and 6:
ALTA 2012 Programme The proceedings
Page 7 and 8:
Contents Invited talks 1 Using a la
Page 9 and 10:
Invited talks 1
Page 11 and 12:
Diverse Words, Shared Meanings: Sta
Page 13 and 14:
A Citation Centric Annotation Schem
Page 15 and 16:
The structure of this paper is as f
Page 17 and 18:
understanding of sentences surround
Page 19 and 20:
henceforth referred to as Annotator
Page 21 and 22:
References Angrosh, M A, Cranefield
Page 23 and 24:
Semantic Judgement of Medical Conce
Page 25 and 26:
2012). The TE model provides a form
Page 27 and 28:
Correlation coefficient with human
Page 29 and 30:
Pair # Concept 1 Doc. Freq. Concept
Page 31 and 32:
Active Learning and the Irish Treeb
Page 33 and 34:
2.2 Sources of annotator disagreeme
Page 35 and 36: complement (csubj) 2 . See Figure 4
Page 37 and 38: top Y trees from this ordered set a
Page 39 and 40: References Ron Artstein and Massimo
Page 41 and 42: Unsupervised Estimation of Word Usa
Page 43 and 44: not lend itself to determining unsu
Page 45 and 46: T 0: think, want, thing, look, tell
Page 47 and 48: correlation is much stronger than t
Page 49 and 50: References Eneko Agirre and Philip
Page 51 and 52: eaders with sentences with a negati
Page 53 and 54: Question Which word is closest in m
Page 55 and 56: Acceptability and negativity: conce
Page 57 and 58: MORE NEGATIVE / LESS NEGATIVE MR Δ
Page 59 and 60: Julie Elizabeth Weeds. 2003. Measur
Page 61 and 62: cation produced by a supervised mac
Page 63 and 64: understood to carry a lower weight
Page 65 and 66: Figure 1: Average classification ac
Page 67 and 68: 0.8 0.75 0.7 AuthorA4 AuthorB4 Auth
Page 69 and 70: Segmentation and Translation of Jap
Page 71 and 72: tomatosōsu “tomato sauce”, rev
Page 73 and 74: “markup” and “Mach”, so the
Page 75 and 76: MWE Segmentation Possible Translati
Page 77 and 78: English Term Pairs from Search Engi
Page 79 and 80: techniques used are, of necessity,
Page 81 and 82: ation metrics robust, if they are v
Page 83 and 84: Figure 2: Methodologies of human as
Page 85: an optimal method? Machine translat
Page 89 and 90: archical and non-hierarchical relat
Page 91 and 92: We use the MetaMap 7 tool to automa
Page 93 and 94: T T & C CC z -1.5 -1.27 -1.33 p-val
Page 95 and 96: References Alan R. Aronson. 2001. E
Page 97 and 98: techniques and results are shown. F
Page 99 and 100: Rock Hip-Hop Electronic Punk Metal
Page 101 and 102: Rank Frequency Frequency (filtered)
Page 103 and 104: Ex. 1 Ex. 2 Ex. 3 Score Song Title
Page 105 and 106: Free-text input vs menu selection:
Page 107 and 108: consistent with the much earlier ed
Page 109 and 110: 3.2 Dialogue system architecture Th
Page 111 and 112: Control Free-text Menu-based n=119
Page 113 and 114: tions in analyzing algebra example
Page 115 and 116: langid.py for better language model
Page 117 and 118: documents of MIME type text/html wi
Page 119 and 120: Acknowledgments NICTA is funded by
Page 121 and 122: LaBB-CAT: an Annotation Store Rober
Page 123 and 124: elationship between the coincidence
Page 125 and 126: Communication 33 (special issue on
Page 127 and 128: matically extracting location infor
Page 129 and 130: Classifier DBp:1R Geo:1R D+G:1R DBp
Page 131 and 132: ALTA Shared Task papers 123
Page 133 and 134: All Struct. Unstruct. Total - Abstr
Page 135 and 136: System Population Intervention Back
Page 137 and 138:
tence positions (absolute and relat
Page 139 and 140:
sitional attributes, and sequential
Page 141 and 142:
ordering labels, All includes BOW,
Page 143 and 144:
3 Software Used All experimentation
Page 145 and 146:
Combination Output Public Private C
Page 147 and 148:
Experiments with Clustering-based F
Page 149 and 150:
3 Results For the initial experimen
show all

Full proceedings volume - Australasian Language Technology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?