25.12.2013 Views

Evaluative Meanings and Disciplinary Values - eTheses Repository ...

Evaluative Meanings and Disciplinary Values - eTheses Repository ...

Evaluative Meanings and Disciplinary Values - eTheses Repository ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

As Figure 5.1 illustrates, regardless of whether a sentence finishes or not, line feed codes are<br />

automatically inserted at the end of each line. Since data in PDF files are inherently designed<br />

in this format, a corpus concordancer may compute each line feed code as each end of<br />

sentence (viz. a corpus concordancer regards each line feed code as each period of the<br />

sentence, indicating significant, written, about <strong>and</strong> broad are interpreted as the last words in<br />

each sentence in Figure 5.1 above, even though they are clearly in the middle of each<br />

sentence). 24 This may be a trivial issue of data preparation if the aim of the study is to extract<br />

<strong>and</strong> count each individual word in order to, for example, create a (key)word list. However,<br />

this may well cause problems of a more serious nature with some software including the one<br />

used in the current study, when the goal is to extract two or more word units, such as<br />

collocations, phrases <strong>and</strong> patterns. For example, it is impossible to extract word units,<br />

significant parts, written academic prose, about one million, <strong>and</strong> broad categories from the<br />

context above, largely because the computer interprets the line feeds in each of the phrases as<br />

unseen periods. (N.B. In particular, this is a problem with the software used in the current<br />

study which cannot cope with the data as it is). Therefore, after extracting data from academic<br />

journals, the data were adjusted by erasing each line feed <strong>and</strong> changing it into a single space,<br />

thereby allowing valid <strong>and</strong> meaningful collocates, phrases, <strong>and</strong> patterns to be extracted.<br />

Having described the procedures <strong>and</strong> guidelines that informed the corpus building<br />

process, I now turn to the final composition of the two corpora themselves. My corpus of<br />

journal articles in applied linguistics (hereafter, ALC) comprises 289 published papers from 8<br />

leading journals in each sub-discipline. Each sub-disciplinary sub-corpus includes<br />

approximately 330,000 words tokens, giving a total number of running words for the whole<br />

corpus of approximately 2,667,000 words. Meanwhile, my business studies corpus (hereafter,<br />

BC) comprises 436 published papers from 8 leading journals in each sub-discipline. Each of<br />

these sub-corpora also contains approximately 330,000 words each, yielding a total of<br />

2,668,679 running words for the corpus as a whole. In short, the two corpora are almost equal<br />

in size, in word token terms, thereby enabling a direct comparison using only raw frequency<br />

figures. (N.B. The two corpora are not equal in terms of the number of texts that they contain,<br />

since the token size of each article is different in the two disciplines.) Table 5.1 below<br />

summarizes the basic figures for ALC <strong>and</strong> BC. The table provides: 1) the journal name; 2)<br />

24 I confirmed this phenomenon in the use of the corpus concordancer used in the current study.<br />

61

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!