27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

IV.<br />

SCATTER<br />

In order to enhance patterns capitalization and reuse, a first<br />

step in this direction was to build up a unified<br />

conceptualization of process patterns via the proposed ontology<br />

MetaProPOS, as it is stated before. The next step is to<br />

demonstrate how it can be performed to fulfill our goal through<br />

the proposed approach SCATTER which, as the name implies,<br />

aims to help improve process patterns’ knowledge<br />

dissemination by means of a formal and semantic technique of<br />

patterns’ capitalization and reuse. As illustrated by Fig. 5,<br />

SCATTER comprises two major processes:<br />

A. Terminological, Semantic and Architectural Unification<br />

Process<br />

Given different collections of process patterns forming a<br />

patterns’ corpus, this main process consists of a tr iple<br />

unification effort.<br />

1) Terminological unification: It aims to map between<br />

terms used as labels for a given pattern and the corresponding<br />

meta labels i n the proposed meta model. This is ensured by<br />

means of MetaProPOS on the one hand, and a text mining tool<br />

on other. The purpose of this phase is to recogn ize the<br />

terminology employed by a given pattern through key terms<br />

extraction. The unification’s result is an annotated pattern<br />

whose format is XM L. In order to reach th is target, we<br />

adopted a text mining approach. Indeed, there are already tools<br />

that are well recognized for their mastery in Natural Language<br />

Processing (NLP) namely: Open NLP [25], UIMA<br />

(Unstructured Information Management Architecture [26] and<br />

GATE (General Architecture for Text Engineering) [27].<br />

Thus, we do not need to reinvent the wheel by rebuilding one<br />

from the scratch. This is why we choose to reuse, among these<br />

latter, GATE since it is open source and very well documented<br />

as well as used in research and industry. In this regard it<br />

should be n oted that the terminological unification phase is<br />

performed using the information extraction open source<br />

component of GATE which is ANNIE acronym for A Nearly-<br />

New Information Extraction system [28].<br />

We should notice that in addition to GATE configuration,<br />

we have made a considerable extension to ANNIE’s system by<br />

adding Gazetteer lists such as “problem.lst”, “context.lst”,<br />

“relationship.lst and so on in order to help ANNIE’s system<br />

recognize key terms and concepts used in patterns’<br />

descriptions. However, the use of these lists is necessary but<br />

insufficient to detect patterns segments, for this reason we<br />

added Jape rules such as “identification.jape” to capture the<br />

pattern’s identification facet in a pattern description,<br />

“guidance.jape” to identify the pattern’s guidelines, etc.<br />

As it is illustrated by Fig. 6, we used in a first step, the<br />

Sentence Splitter and the Tokeniser to perform a morphological<br />

analysis of unstructured patterns allowing the extraction of<br />

sentences and basic entities. Then in a next step, the POS<br />

Tagger is p erformed to associate grammatical category to<br />

tokens allowing thus recognition of various entities. The last<br />

step consists in patterns tagging by extracting pattern concepts.<br />

Ambler<br />

pattern<br />

PPDL<br />

pattern<br />

MetaProPOS<br />

Ontology<br />

WolF<br />

MetaProPOS<br />

Ontology<br />

Figure 5.<br />

P-Sigma<br />

pattern<br />

PROPEL<br />

pattern<br />

Converted patterns<br />

Format: .owl<br />

Terminological, Semantic and Architectural<br />

Unifications<br />

WordNet<br />

Similar<br />

Patterns<br />

Patterns’ corpus<br />

Störrle<br />

pattern<br />

UML-PP<br />

pattern<br />

Unified<br />

Form<br />

Patterns Warehouse<br />

(patterns + annotations)<br />

Patterns Mining<br />

Mega<br />

Patterns<br />

Gnatz<br />

pattern<br />

RHODES<br />

pattern<br />

Unified Architecture<br />

+<br />

Unified Terminology<br />

+<br />

Semantic Annotations<br />

Anti-<br />

Patterns<br />

Functional architecture of SCATTER.<br />

PROMENADE<br />

pattern<br />

PLMLx<br />

pattern<br />

Java code + JENA+<br />

ANNIE + JAPE Rules<br />

Java code + Mining tool<br />

This step has recourse to previously generated results and<br />

uses the NE Transducer to extract named pattern’s entities<br />

(problem, context, solution, relationship, etc.) through new<br />

JAPE rules and Gazetteer lists. The components implied to<br />

achieve these steps are described as follows:<br />

a) Tokeniser: This component identifies various symbols<br />

in text documents (punctuation, numbers, symbols and<br />

different types). It applies basic rules to input text to identify<br />

textual objects.<br />

508

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!