03.09.2013 Views

Lexical variation in aggregate perspective

Lexical variation in aggregate perspective

Lexical variation in aggregate perspective

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

100 Tom Ruette, Dirk Speelman, and Dirk Geeraerts<br />

Tab. 1: Fictional absolute frequencies for the variants of two concepts <strong>in</strong> two language varieties<br />

Concept Variant Am. Eng. Br. Eng.<br />

SUBTERRANEAN PUBLIC TRANSPORT<br />

SMALL INSTRUMENT PLAYED WITH A BOW<br />

subway 70 20<br />

underground 10 50<br />

viol<strong>in</strong> 50 30<br />

fiddle 40 35<br />

Subsequently, we <strong>in</strong>troduce the relative frequency R :<br />

RVj ,L (xi ) =<br />

FVj ,L (xi )<br />

n<br />

k =1 FVj ,L (xk )<br />

The absolute frequencies from Table 1 now become the relative frequencies <strong>in</strong> Table 2<br />

by means of apply<strong>in</strong>g Equation 2.<br />

Tab. 2: Fictional relative frequencies for the variants of two concepts <strong>in</strong> two language varieties,<br />

based on Table 1<br />

Concept Variant Am. Eng. Br. Eng.<br />

SUBTERRANEAN PUBLIC TRANSPORT<br />

SMALL INSTRUMENT PLAYED WITH A BOW<br />

subway 0,875 0,286<br />

underground 0,125 0,714<br />

viol<strong>in</strong> 0,556 0,462<br />

fiddle 0,444 0,538<br />

Now we can def<strong>in</strong>e the (City-Block) distance DCB between V1 and V2 on the basis of the<br />

profile for L as follows (the division by two is for normalization, mapp<strong>in</strong>g the results<br />

to the <strong>in</strong>terval [0,1]):<br />

DCB ,L (V1, V2) = 1<br />

2<br />

n<br />

i =1<br />

(2)<br />

|RVj ,L (xi ) − RVj ,L (xi )| (3)<br />

The City-Block distance is a straightforward descriptive dissimilarity measure that assumes<br />

the absolute frequencies <strong>in</strong> the sample-based profile to be large enough for the<br />

relative frequencies to be good estimates for the relative frequencies <strong>in</strong> the underly<strong>in</strong>g<br />

population-based profiles. If however the samples are rather small, the relative frequencies<br />

become unreliable, and a supplementary control is needed. For this we use<br />

a measure that takes as its basis the confidence of there be<strong>in</strong>g an actual difference between<br />

two profiles: the Fisher Exact test. This time, unlike with DCB , we look at the<br />

absolute frequencies <strong>in</strong> the profiles we compare. When we compare a profile <strong>in</strong> one<br />

subcorpus to the profile for the same concept <strong>in</strong> a second subcorpus, we use a Fisher<br />

Exact test to check the hypothesis that both samples are drawn from the same pop-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!