18.07.2013 Views

Automatic detection of new domain-specific words, using document ...

Automatic detection of new domain-specific words, using document ...

Automatic detection of new domain-specific words, using document ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3. To compute a score for a certain <strong>domain</strong>. . .<br />

1 count text tokens that are a member <strong>of</strong> the <strong>domain</strong> t ∈ D ∩W<br />

2 weigh count by size <strong>of</strong> <strong>domain</strong>-<strong>specific</strong> vocabulary v = 1 √<br />

|D|<br />

3 weigh score by number <strong>of</strong> <strong>domain</strong>s the text token is<br />

a member <strong>of</strong><br />

w = 1 d<br />

where d = ∑i |t ∩ Di|<br />

4 consider number <strong>of</strong> ‘unknown’ text tokens u (same as n − k)<br />

5 consider number <strong>of</strong> ‘known’ text tokens k (same as n − u)<br />

6 consider text length (to make score relative) n (same as u + k)<br />

sD = 1 k<br />

· · v<br />

n u

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!