Views
3 years ago

Corpus Linguistics - IU Computational Linguistics Program - Indiana ...

Corpus Linguistics - IU Computational Linguistics Program - Indiana ...

AdvantagesResponses to

AdvantagesResponses to ChomskyCorpus LinguisticsWhy CorpusLinguistics?AdvantagesCorpus-based & Intuition-based approachesCorpus LinguisticsWhy CorpusLinguistics?There are good points about the limitations of corpus-basedresearch, but corpora should not be dismissed1. Existence in corpus grammatical.◮ Response: Intuition is necessary, but existence incorpora can point out new assumptions & reduce somebiases (see next slide)2. Finite corpus cannot capture all possible sentences.◮ Response: A corpus can supplement the sentencesyour brain can generate (& show appropriate context).3. Grammaticality is not statistical.◮ Response: This point is arguable (see later slide), andgrammaticality is not everything (cf. language use)4. Corpora are observational, not experimental◮ Response: Both are worth investigating: controlledstudies and real-world use.CorporaCorpus linguisticsApplicationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objectionsBeing empirical (i.e., using corpora [& experiments]) hasadvantages over intuition on its own:◮ Intuition can be influenced by ideolect or dialect◮ corpus-based approach is free of overt judgments◮ Intuition is based on a conscious monitoring of one’sproduction◮ generated setences may not be typical language use◮ Intuition-based examples are difficult to verifyAdditionally, corpus-based approaches can show differencesthat intuition cannot provideNB: Not every research question needs corpus data, so usejudgmentCorporaCorpus linguisticsApplicationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objections19 / 2720 / 27Is language probabilistic?Degrees of grammaticalityCorpus LinguisticsWhy CorpusLinguistics?Is language probabilistic?Language Acquisition, Change, and VariationCorpus LinguisticsWhy CorpusLinguistics?CorporaCorporaAre sentences completely grammatical or completelyungrammatical?(1) a. John I believe Sally said Bill believed Sue saw.b. What did Sally whisper that she had secretly read?c. Who did Jo think said John saw him?d. The boys read Mary’s stories about each other.e. I considered John as a good candidate.⇒ Probabilistic modeling gives a degree of grammaticalityCorpus linguisticsApplicationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objections◮ Language Acquisition: child uses grammaticalconstructions with varying frequency◮ Rule possibilities are tried with different frequencies◮ Language Change & Variation: gradual changes◮◮Some proportion of the population uses newconstuctionsDialect continua can make judgments difficultCorpus linguisticsApplicationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objectionsSee Manning and Schütze (1999), Foundations of Statistical NaturalLanguage Processing and Abney (1996), Statistical Methods andLinguistics21 / 2722 / 27More on objections to corporaCorpus LinguisticsWhy CorpusLinguistics?Widdowson 2000Corpus LinguisticsWhy CorpusLinguistics?CorporaCorporaCorpus linguisticsCorpus linguisticsSpectrum of viewpoints on the usefulness of corpora◮ Strong view: “without a corpus (or corpora) there is nomeaningful work to be done” (Murison-Bowie 1996)◮ Weak view: corpora provide viewpoints previouslyunavailable for language study & applicationsIt is rare for people to think that corpora are completelyunusableApplicationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objectionsBenefits of corpora:◮ Quantitative analysis from corpora is not accessible byintuition◮ regular patterns of collocational co-occurrence◮ 3rd person observed data different from 1st personintuition or 2nd person elicited dataLimitations of corpora:◮ “it cannot represent the reality of first personawareness”◮ Corpora do not reveal what people know, nor what theythink they know, about languageApplicationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objections23 / 2724 / 27

Other points from WiddowsonCorpus LinguisticsWhy CorpusLinguistics?Stubbs 2001Corpus LinguisticsWhy CorpusLinguistics?CorporaCorporaCorpus linguisticsCorpus linguisticsApplicationsApplications◮ Corpora are textual products which do not reveal theprocess underlying it◮ Decontextualized language which has to bere-contextualized for use in, e.g., language teaching◮ Corpora are texts, not discourse◮ The “real” nature of corpus data may not fit the purposeat hand◮ It must be justified that, e.g., language learners shouldbe exposed to real, native languageComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objectionsData and methods of corpus linguistics◮ Corpora reveal what frequently and typically occurs◮ This is only a small proportion of what is possible◮ Corpora do not capture all possibilities: that’s whybigger corpora are always needed◮ Corpora reveal how divergent intuition & usage can be◮ Corpora can reveal both syntagmatic (co-occurrence)and paradigmatic (recurrence) relationsComputational linguisticsApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objections25 / 2726 / 27Process and productCorpus LinguisticsWhy CorpusLinguistics?CorporaCorpus linguisticsApplicationsComputational linguisticsStubbs agrees that we can only view the product oflanguage in a corpus◮ But this is generally true in empirical disciplines (e.g.,geology)◮ Corpora can still provide a particular level of objectivitypreviously unavailableApproachesObjectionsHistoryAdvantagesProbabilistic languageOther objections27 / 27

Corpus Linguistics - IU Computational Linguistics Program - Indiana ...
Corpus Linguistics - IU Computational Linguistics Program - Indiana ...
Dialogue Systems - IU Computational Linguistics Program - Indiana ...
Dialogue Systems - IU Computational Linguistics Program - Indiana ...
Searching - IU Computational Linguistics Program
Dialogue Systems - IU Computational Linguistics Program
Corpus Linguistics - Indiana University
Corpus Linguistics (L615) - Corpus Building - Indiana University
Corpus Linguistics (L615) R for corpus ... - Indiana University
Corpus Linguistics (L615) Language learning ... - Indiana University
Corpus Linguistics (L615) - Multidimensional ... - Indiana University
Corpus Linguistics (L615) Multidimensional ... - Indiana University
Corpus Linguistics (L615) Motivating regular ... - Indiana University
Corpus Linguistics (L615) Translation ... - Indiana University
IU VOLUNTARY BENEFITS PROGRAM - Indiana University
IU VOLUNTARY BENEFITS PROGRAM - Indiana University
Corpus Linguistics (L615) Big picture Annotating ... - Indiana University
Computational Linguistics and Mayan Languages - Indiana University
master of science in accounting program - AIM @ IU Home - Indiana ...
ALUMNI · MAGAZINE - AIM @ IU Home - Indiana University
IUS Strategic Plan - Indiana University Southeast
ALUMNI • MAGAZINE - AIM @ IU Home - Indiana University
ALUMNI • MAGAZINE - AIM @ IU Home - Indiana University
ALUMNI • MAGAZINE - AIM @ IU Home - Indiana University
E-Commerce @ Indiana University - Protect IU
VIVO@IU and the Data Culture of Indiana University
A private, nonprofit organization, Indiana University ... - IU Health