11.12.2012 Views

(Person) Percentage - Sabanci University Research Database

(Person) Percentage - Sabanci University Research Database

(Person) Percentage - Sabanci University Research Database

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Asian Media & Mass Communication Conference 2010 Osaka, Japan<br />

files were converted and hyperlinks to footnotes were deleted. Also removed were the<br />

footnotes themselves and the 1891 introduction to the edition. As WordSmith ignores<br />

anything in square brackets when compiling wordlists and concordances, I decided to<br />

bracket the annotations containing number, date and author at the top of each essay, to<br />

bracket each initial Latin or Greek motto, and to do the same with the initials (‘L.’, ‘T.’<br />

etc.) that appeared at the bottom of each number of the periodical.<br />

More drastically I also made the decision to remove all numbers of the periodical that<br />

contain ‘letters to the editor’. It is not always clear whether these are genuinely from<br />

readers of the periodical or fabricated by the contributors themselves. As the focus of<br />

this research was to examine stylistically the contributions to The Spectator of Addison<br />

and Steele, and as retaining the letters may have skewed the data, all numbers that<br />

incorporated correspondence, whether by Addison or Steele, were removed. Since<br />

Steele in particular favored the incorporation of letters in his essays, this decision led to<br />

a considerable reduction in the amount of data to be examined. Of the 555 editions of<br />

The Spectator that appeared in 1711-12, only 296, or 53.3% of the total, were analyzed<br />

in this study. The composition of the three Spectator subcorpora, Addison.txt, Steele.txt<br />

and Others.txt, can be seen in Table 1:<br />

Corpus Content Size<br />

Addison.txt 192 contributions About 268,000 tokens<br />

Steele.txt 81 contributions About 107,000 tokens<br />

Others.txt 23 contributions from<br />

others (Budgell 15, Hughes<br />

6, Pope 1, Parnell 1)<br />

Table 1: Composition of the three Spectator subcorpora<br />

3. Keywords<br />

About 32,000 tokens<br />

The notion of keyword is now a familiar one to most researchers in corpus linguistics. A<br />

keyword is a word that appears in a particular corpus a statistically significant number<br />

of times more often than in another (usually larger) ‘reference corpus’. Keywords,<br />

therefore, are lexical items that are prominent or foregrounded in Text A when<br />

contrasted with their use or non-use in Text B. The semantic content of keywords is<br />

seen as a good indicator of the foregrounded content of a text, in particular reflecting<br />

http://www.gutenberg.org/wiki/Main_Page.<br />

328

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!