Automatic detection of new domain-specific words, using document ...
Automatic detection of new domain-specific words, using document ...
Automatic detection of new domain-specific words, using document ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.3. To compute a score for a certain <strong>domain</strong>. . .<br />
1 count text tokens that are a member <strong>of</strong> the <strong>domain</strong> t ∈ D ∩W<br />
2 weigh count by size <strong>of</strong> <strong>domain</strong>-<strong>specific</strong> vocabulary v = 1 √<br />
|D|<br />
3 weigh score by number <strong>of</strong> <strong>domain</strong>s the text token is<br />
a member <strong>of</strong><br />
w = 1 d<br />
where d = ∑i |t ∩ Di|<br />
4 consider number <strong>of</strong> ‘unknown’ text tokens u (same as n − k)<br />
5 consider number <strong>of</strong> ‘known’ text tokens k (same as n − u)<br />
6 consider text length (to make score relative) n (same as u + k)<br />
sD = 1 k<br />
·<br />
n u