11.07.2015 Views

Université de Montréal - Thèse sous forme numérique

Université de Montréal - Thèse sous forme numérique

Université de Montréal - Thèse sous forme numérique

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

157period in which the texts were written, <strong>de</strong>gree of technicality are similar. Maia (2003: 27)mentions similarity in relation to form, content, structure, function, register, tenor, field,mo<strong>de</strong> and dialect of texts. For McEnery and Xiao a comparable corpus should contain ―thesame proportions of the same texts of the same genres in the same domains in a range ofdifferent languages in the same sampling period‖ (2007: 3 authors' italics).While the authors seem to agree that aspects such as text genres and dates of textsshould be taken into account when it comes to <strong>de</strong>signing a comparable corpus, the criterionof choosing texts based on their content or topic is important for Bowker and Pearson(2002) and for Maia (2003), but it is less important for McEnery and Xiao (2007). Forreasons that will be mentioned throughout this subchapter, we will come round to the pointof view of McEnery and Xiao (2007).4.1.1. Corpus featuresThe comparable corpus is <strong>forme</strong>d of two subcorpora 13 : a European Portuguese subcorpus ofjudgments and a Canadian English subcorpus of judgments. The comparable corpus totalsapproximately 5,000,000 words 14 . Table 11 provi<strong>de</strong>s further information on the corpus,such as the number of words and the number of texts per subcorpus, the average of wordsper text in each corpus, and the dates of the texts.The Portuguese and English subcorpora have one element in common but threeother are different. The total amount of words per subcorpus is similar, i.e. approximately2,500,000 words. Subcorpora differ in the number of texts, in the average number of wordsper text and in the dates of the texts. The Portuguese subcorpus contains approximately 400texts while the Canadian subcorpus contains approximately 200 texts, i.e. the Portuguesesubcorpus is composed of twice as texts as the Canadian subcorpus. In average, the13 The term subcorpora is used here in the sense of ―a subset of a corpus, either a static component of acomplex corpus or a dynamic selection from a corpus during online analysis‖ (Atkins et al. 1992: 1).14 Words correspond to the forms i<strong>de</strong>ntified by the word counting function of MSWord.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!