27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

V. APPLICATION OF GENETIC ALGORITHM<br />

As previously mentioned, in practice the class diagram s in<br />

the repository would usually have different number of classes<br />

than the number of classes in the query class diagram. While<br />

discussing the Name Similarity and Structure Sim ilarity<br />

measures in the last sec tion, we have ass umed that the c lass<br />

diagrams have equal number of classes. In this section we<br />

explain how GA can be used to select an equal num ber of<br />

classes from both diagrams. Wang et al [14] used GA to match<br />

2 graphs having the sam e number of node s. In contrast, our<br />

method can match graphs having different number of nodes.<br />

A. Chromosome Design<br />

Let A and B be two class diagram s having na a nd nb<br />

classes respectively such that na nb. The task of choosing<br />

how all the na classes of A will be mapped to na classes in B is<br />

a combinatorial optimization problem. GA has been use d to<br />

solve combinatorial optimization problems such as timetabling,<br />

scheduling, Travelling Salesman Problem, Eight Queens Chess<br />

problem and s o on. Fig. 2 shows a suitable encoding of a<br />

chromosome to determ ine the mapping of all the na clas ses<br />

from A to na classes in B.<br />

Each gene in the chrom osome represents a class nu mber in<br />

B. For example, from Fig. 2 we observe that the 1 st class in A is<br />

mapped to the 5 th class in B, the 2 nd class in A is mapped to the<br />

nb th class in B, the 3 rd class in A is mapped to the na th class in B<br />

and so on. In t his way, we map the classes in A to a subset of<br />

the classes in B. The selected classes in B maintain their class<br />

relationships only if both cla sses involved in the relationship<br />

were chosen as part of the chromosome.<br />

B. Fitness Function<br />

We use the similarity measure S given in equation 3 as our<br />

fitness function. Since the value of S always ranges from 0 to 1<br />

(where 0 im plies the highest level of sim ilarity), successive<br />

generations of GA should produces less values of S compared<br />

to previous ge nerations. This results in GA selecting (near)<br />

optimal mappings from classes in the query class diagram to<br />

those in repository class diagrams.<br />

VI. EXPERIMENTS<br />

In this section, we present the results of experiments carried<br />

out using our proposed method. Our experiments focused<br />

mainly on Structure Sim ilarity (SS), henc e the weight in<br />

equation 3 was set to 1. Our objective was to deter mine the<br />

efficiency of the proposed method in retrieving matching class<br />

diagrams from the repository. Fig. 3 shows two class diagrams;<br />

a query class diagram Q and a class diagram R from the<br />

repository. As shown in the figure, Q is isomorphic to a s ubgraph<br />

of R.<br />

We used the proposed method to determine how m any<br />

times the classes in Q were correctly mapped to the classes in<br />

R. Maximum similarity is obtained when the value of the<br />

fitness function is zero as de scribed in Section V. Fig. 4 shows<br />

the mean and standard deviation of the fitness value over 500<br />

successive generations. The experiment was re peated 100<br />

times. After a few genera tions, the mean and standard<br />

deviations of the fitness function stabiliz es, indicating that<br />

there is no additional benefit in running GA further.<br />

TABLE II.<br />

DIFF MATRIX<br />

AS AG CO DE GE RE IR NO<br />

AS 0 0.11 0.11 0.45 0.45 0.66 0.77 1<br />

AG 0.11 0 0.11 0.45 0.45 0.66 0.77 1<br />

CO 0.11 0.11 0 0.45 0.45 0.66 0.77 1<br />

DE 0.49 0.49 0.49 0 0.28 0.21 0.32 1<br />

GE 0.49 0.49 0.49 0.28 0 0.49 0.6 1<br />

RE 0.83 0.83 0.83 0.34 0.62 0 0.11 1<br />

IR 1 1 1 0.51 0.79 0.17 0 1<br />

NO 1 1 1 1 1 1 1 0<br />

AS = ASSOCIATION, AG = AGGREGATION, CO = COMPOSITION, DE = DEPENDENCY, GE =<br />

GENERALIZATION, RE = REALIZATION, IR = INTERFACE REALIZATION, NO = NO<br />

RELATION<br />

In addition, the mean value of the fitness function is often<br />

sufficiently close to zero because the propos ed algorithm finds<br />

exact matches most of the time. However, the standard<br />

deviation of the fitness value is higher than the mean in most<br />

cases. This is because the algorithm usually finds optimal<br />

solutions, but obtains near optimal solutions at other times.<br />

In another set of experim ents, we studied the im pact of<br />

using GA in our algorithm. We replaced the GA component of<br />

our algorithm with a random matching of query class diagrams<br />

to repository class diagrams. The experiment was repeated<br />

1,000 times. For the GA-based experiments, there were 100<br />

individuals in each generation, and the m aximum number of<br />

generations was 50. Thus, the maximum number of iterations<br />

was 5,000 across all generations. In the case of the experiments<br />

based on random matching of classes, the matchings were<br />

based on randomly generated permutations. The random<br />

permutations were generated and tested up to 5,000 times. As<br />

in the case of GA, the search was abandoned as soon as an<br />

optimal matching was f ound. The res ults shown in Fi g. 5<br />

indicate that o ur GA-based algorithm usually finds optim al<br />

class matchings in less num ber of iterations com pared with<br />

random matching of classes. The figure shows that in 700 out<br />

of the 1000 cases, our GA-based algorithm finds optim al<br />

matchings after a few generations ( 1,500 iterations or<br />

alternatively 15 generations). However, 30% of the time, GA<br />

executes all the 5,000 iterations (50 generations ) and<br />

determines near-optimal class matchings.<br />

VII. CONCLUSION AND FUTURE WORK<br />

We have described a method for retrieving UML class<br />

diagrams from a software repository by using GA. Few<br />

experiments were carried out to measure the effectiveness of<br />

the proposed algorithm in d etecting structural sim ilarity. No<br />

experiment was carried out re garding name similarity. Thus, it<br />

is necessary to perform many more experiments to evaluate the<br />

performance of the proposed method in t erms of precision,<br />

recall, execution time and so on.<br />

We have considered only cl ass names and class topology.<br />

In the future, we hope to include class attr ibutes and operations<br />

in the similarity measure. In a ddition, the technique will be<br />

extended to deter mine similarity measures of other U ML<br />

diagrams such as sequence diagrams and state chart diagrams.<br />

The development of a tool to integrate our proposed technique<br />

739

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!