Automatic detection of new domain-specific words, using document ...
Automatic detection of new domain-specific words, using document ...
Automatic detection of new domain-specific words, using document ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2. Data: the DDO Corpus<br />
The Corpus <strong>of</strong> the Danish Dictionary, DDOC:<br />
• 43000 text samples totalling 40 million <strong>words</strong><br />
• Compiled by DSL 1991-93<br />
• Broad coverage <strong>of</strong> Danish language from 1983-1992<br />
• 88.6% <strong>of</strong> the text samples are assigned to one <strong>of</strong> 66 <strong>domain</strong>s<br />
From this material, we will