Automatic detection of new domain-specific words, using document ...
Automatic detection of new domain-specific words, using document ...
Automatic detection of new domain-specific words, using document ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
• Classification procedure<br />
– Reflect properties <strong>of</strong> the material to be processed<br />
∗ Token overlap between a text and a <strong>domain</strong> vocabulary<br />
∗ Size <strong>of</strong> the <strong>domain</strong>-<strong>specific</strong> vocabulary<br />
∗ Uniqueness <strong>of</strong> a certain type for a particular <strong>domain</strong><br />
∗ Ratio between recognised and unrecognised tokens<br />
−→ Other properties, e.g. salience rank?<br />
−→ Consequences <strong>of</strong> the properties being based on intuition?<br />
−→ Is the quantification appropriate?<br />
Yes, it seems to yield acceptable results<br />
No, it doesn’t explain nor reflect the nature <strong>of</strong> language<br />
−→ More appropriate classification approaches?