27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Step 1: To start the process, the user (requirements<br />

engineer, requirements analyst, project manager, developer,<br />

etc.) enters basic information about the new problem. Figure 1<br />

shows the UML class diagram of the representation of a<br />

problem.<br />

Step 2: In the “CBR Component” component, the data of<br />

the new problem are compared with the Case Base. The<br />

Attribute-Value pairs of the problem object are compared with<br />

the various cases existing in the Case Base, aimed at an initial<br />

selection of similar cases. The similarity between two cases c 1<br />

and c 2 is calculated by the formula below, based on<br />

Wangehheim [7, p.112], which defines the similarity by the<br />

weighted average of the similarity between the values of each<br />

index of c 1 and c 2 :<br />

Sim A (c 1 ,c 2 ) = n (Vs i x P i ) (1)<br />

P i<br />

Where,<br />

• n: number of discriminatory attributes of a case<br />

(indexes)<br />

• Vs i : similarity between the values of index i in c 1 and<br />

c 2 , which is attributed per parameter.<br />

• P i : weight of index i, attributed per parameter.<br />

For each index i, must be set a weight P i , which defines the<br />

importance of the index in the calculation of similarity between<br />

cases, and a value of similarity Vs i between the possible values<br />

of each index. In this work, in order to define the values of<br />

similarity between the possible values of each index, specialists<br />

in engineering requirements of a company from the federal<br />

government of Brazil analyzed the level of relationship or<br />

dependence between each value. For example, in Table 1 we<br />

present the similarity values (Vs) defined by specialists for<br />

index Artifact: the artifact “Use Case” is related – from the<br />

highest to lowest level –to the artifacts “Use Case” (100%),<br />

“Business Rule” (80%), “Functional Requirements” (70%),<br />

“Non-Functional Requirements” (30%), “Traceability Matrix”<br />

(30%) and “Technology” (0%). It is important to point out that<br />

these values were used in the evaluation of this proposed<br />

approach, but may be customized for each company or<br />

situation of use.<br />

TABLE 1. SIMILARITY BETWEEN THE POSSIBLE VALUES OF THE INDEX<br />

ARTIFACT.<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

CU = Case of Use | RN = Business Rule |<br />

RF = Functional Requirements|<br />

RNF = Non-Functional Requirements|<br />

DV = Vision Document | MR = Traceability Matrix<br />

In this work, the indexes are artifact, causer and cause.<br />

These were chosen because they are more discriminatory of a<br />

case. After all, a cutoff value must be applied to select similar<br />

cases. To illustrate the calculation of similarity, let us suppose<br />

the following values for the indexes of a new problem (case<br />

c 1 ): Causer = “Organization” (OR); Artifact = “Business Rule”<br />

(RN); Cause = “Wrong Business Rule Specified” (RNEE).<br />

Considering the following values for a case c 2 , existing in the<br />

Case Base: Causer = “Process” (PR); Artifact = “Traceability<br />

Matrix” (MT); Cause = “Wrong Traceability Matrix Specified”<br />

(MREE). Applying the values in formula (1), we have the<br />

following:<br />

Sim A(c 1,c 2) = (VsOR.PR x 1) + (VsRN.MR x 3) + (VsRNEE.MREE x 5)<br />

9<br />

Sim A(c 1,c 2) = (0.5 x 1) + (0.3 x 3) + (0 x 5) = 0,15<br />

9<br />

Step 3: If no similar case is returned in step 2, the input<br />

case is a new case and will be stored in the Case Base.<br />

Step 4: In this step, the Cases Initially Similar are refined<br />

through using NLP techniques. Actually, the “NLP<br />

Component” defines the cases for reuse based on the number of<br />

words in common between the textual description of the input<br />

problem and the description of the pre-selected cases in step 2.<br />

For this, the Vector Space Model technique and the<br />

morphosyntactic analysis are applied to the texts in question. In<br />

detail, the frequency of each word in the “Noun” and “Verb”<br />

word classes, contained in the texts is calculated and those<br />

cases with m (defined by parameter) words in common with the<br />

input case are defined for reuse. The word classes “Noun” and<br />

“Verb” were chosen because they express things and actions,<br />

respectively, and therefore are the words that matter most to the<br />

semantic value of the text [8]. For example, Figure 3 shows the<br />

morphosyntactic analysis [9] of the descriptive text of one of<br />

the cases from the Case Base (pre-selected in step 2).<br />

Considering the description of the input problem “Search<br />

functionality defined in wrong fields.” [Funcionalidade de<br />

pesquisa definida com campos errados], the words (nouns and<br />

verbs in Portuguese) of the input case are {funcionalidade,<br />

pesquisa, campo}, all with a frequency of 1. In the pre-selected<br />

case (Figure 3) the words are {campo, pesquisa}, all with a<br />

frequency of 1. Two words were identified as being in common<br />

between the cases.<br />

FIGURE 3. MORPHOSYNTACTIC ANALYSIS OF THE DESCRIPTION OF A PRE-<br />

SELECTED CASE.<br />

576

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!