20.08.2013 Views

Bernadette Ewen - units.sla.org - Special Libraries Association

Bernadette Ewen - units.sla.org - Special Libraries Association

Bernadette Ewen - units.sla.org - Special Libraries Association

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

In Figure 2 we can see the schematic diagram of a typical text mining system as was described above.<br />

Corporate<br />

Databases<br />

External Systems<br />

Integration<br />

File<br />

Systems<br />

Rich<br />

XML<br />

/API<br />

Semantic Tagging<br />

Statistical Tagging<br />

Structural Tagging<br />

WEB SITES/<br />

HTML<br />

Workflow<br />

Systems<br />

NEWS<br />

FEEDS<br />

CapLits 22<br />

Business Intelligence Suite<br />

Business Intelligence<br />

Suites<br />

ClearTags Intelligent Suite<br />

(Intelligent Tagging Auto-Tagging)<br />

INTERNAL<br />

DOCUMENTS<br />

Rich<br />

XML<br />

/API<br />

U n s t r u c t u r e d C o n t e n t<br />

Figure 3 - Architecture of Text Mining Systems<br />

OTHER<br />

“RAW” DATA<br />

A more detailed description of the intelligent tagging component is shown in Figure 3. Each of the taggers is using<br />

a separate training module that is based on annotated examples. A more detailed discussion of the training modules<br />

will be presented in the following sections. The training module for the structural tagging is producing document<br />

signatures that are then saved and mapped against new documents. The training module for the statistical tagging is<br />

producing classifiers for each of the categories and the training module for the semantic training is producing<br />

information extraction rules based on annotated documents.<br />

ClearTags IntelligentSuite<br />

(Intelligent TaggingAuto<br />

Auto-Tagging) Tagging)<br />

Tagging Controller<br />

Rich<br />

XML<br />

/API<br />

Semantic<br />

tagger<br />

Statistical<br />

tagger<br />

Structural<br />

tagger<br />

Fetcher<br />

Unstructured<br />

Content<br />

Rulebooks<br />

Classifiers<br />

Classifiers & Term<br />

Extraction<br />

Structural<br />

Templates<br />

Semantic Clear<br />

Trainer Lab<br />

Statistical<br />

Trainer<br />

Structural<br />

Trainer<br />

Figure 3 - Detailed Description of the Intelligent Tagging Component

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!