27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

three different learning techniques: K-Nearest Neighbor (K-<br />

NN); Naive Bayes; and Support Vector Machine (SVM). In our<br />

experiment, we adopt the RapidMiner (http://rapidi.com/content/view/181/190)<br />

tool to perform document<br />

classification.<br />

V. DATA SET<br />

Our data set consists of structured hierarchy ontologies of<br />

course syllabi of three universities; Cornell University,<br />

University of Michigan and University of Washington. In order<br />

to test our approach, we use three MASs as teacher agents:<br />

Ag C , Ag M and Ag W . Each MAS controls a repository that<br />

contains one of the three ontologies for the course syllabi of the<br />

three universities. We also set up a learner agent Ag L to learn<br />

some new concepts from the three teacher agents at the same<br />

time. We choose to learn the concept “computer science”<br />

because it has different representations among the three<br />

ontologies. The University of Michigan organizes “computer<br />

science” as an engineering discipline and as a joint program<br />

with Electrical Engineering. The University of Washington<br />

organizes “computer science” also as an engineering discipline<br />

but independent from Electrical Engineering and as a joint<br />

program with Computer Engineering. Cornell University<br />

considers “computer science” as an engineering discipline but<br />

independent from both Electrical and Computer Engineering.<br />

VI. EXPERIMENT<br />

In this experiment, we want to show how using social<br />

networks can improve the accuracy of the learning process. We<br />

follow the same strategy proposed in [17] in choosing positive<br />

and negative examples. This strategy depends on the value of<br />

sim(q spec , C best ), where, sim(q spec , C best ) is the ratio between the<br />

number of examples that meet the entered query and the total<br />

number of examples of chosen concepts C best . If sim(q spec , C best )<br />

is greater than a specific threshold value (we set the value of<br />

the threshold to 0.6), this concept completely represents the<br />

concept to be learned. In this case, we can use any of its<br />

examples as positive examples. The negative examples can be<br />

chosen from its si blings (external negative examples). In this<br />

case, the siblings’ examples are considered a good source of<br />

negative examples because the two concepts have the same<br />

parent which means they have some common features. At the<br />

same time, they are two different concepts, each with its own<br />

set of examples. So the examples of the siblings are considered<br />

discriminating examples. If sim(q spec , C best ) is smaller than the<br />

chosen threshold value (i.e. 0.6), that means C new overlaps with<br />

the concept C best , which means that some of the examples of the<br />

concept C best reflect the meaning of C goal but the others do not.<br />

In this case, the returned documents contain only positive<br />

examples of C new . The negative examples can be chosen from<br />

the rest of the examples (internal negative examples).<br />

The scenario of our experiment is as follows:<br />

1. No “computer science” concept is defined in the<br />

learner agent Ag L . The learner agent Ag L therefore uses<br />

only keywords to search for the best matching concept<br />

C best in the teacher agents’ ontologies. The keywords<br />

used are “(computer science) or (program language)”.<br />

At the beginning, the strengths of ties between Ag L and<br />

all teacher agents (Ag C , Ag M , Ag W ) are the same.<br />

2. Each teacher agent searches its ontology for the best<br />

matching concept C best that has the highest value of<br />

sim(q spec , C best ). Depending on this value, each teacher<br />

agent picks sets of positive and negative examples.<br />

3. Each teacher agent sends its own sets o f positive and<br />

negative examples to the learner agent Ag L .<br />

4. Ag L develops a new concept “computer science” in its<br />

ontology based on these examples using machine<br />

learning methods, and then calculates the accuracy of<br />

the learning process in each case.<br />

5. Calculate the feature vector of the newly learnt concept<br />

“computer science” in Ag L .<br />

6. Calculate the closeness between the feature vector<br />

created for the new learnt concept and feature vectors<br />

of each C best chosen by each teacher agent.<br />

7. Depending on the closeness values calculated in step 6,<br />

set the initial tie strengths between the learner agent<br />

Ag L and teacher agents Ag C , Ag M , Ag W .<br />

8. Repeat the learning process by using the same<br />

keywords used before as well as the feature vector<br />

created in step 5 based on the tie strengths calculated in<br />

7 (repeat steps 2 to 5).<br />

In order to measure the accuracy of the learned concept, we<br />

use the confusion matrix to measure the proportion of true<br />

results (i.e. true positive and true negative as opposed to false<br />

positive and false negative). The overall accuracy is calculated<br />

as follow:<br />

=<br />

<br />

<br />

VII. RESULTS<br />

The first step in our experiment in learning the new concept<br />

“computer science” is to find the best known concept C best in<br />

teacher agents’ ontologies.<br />

At the beginning, the learner agent Ag L does not have the<br />

concept “computer science” in its ontology. We use only<br />

keywords to learn this concept. Our keywords that describe the<br />

“computer science” concept are (“computer science” or<br />

“program language”). According to the ratio between the<br />

number of documents returned by the search and the total<br />

number of documents describing each concept, we can<br />

calculate sim(q spec , C best ) for all concepts in each teacher<br />

agent’s ontology. The concept with higher value of sim(q spec ,<br />

C best ) is chosen as the best matching concept C best for each<br />

teacher. Table I shows the value of sim(q spec , C best ) of the<br />

chosen concepts C best from each university.<br />

We choose the following concepts:<br />

“Computer Science E” from Cornell University<br />

“Electrical Engineering and Computer Science” from<br />

University of Michigan.<br />

“Computer Science and Engineering” from University<br />

of Washington.<br />

For all universities, sim(q spec , C best ) < 0.6 is chosen to be the<br />

threshold, so we select the negative examples internally from<br />

the documents of the chosen concepts. In this case, the strength<br />

of ties between Ag L and all teacher agents Ag C , Ag M and Ag W<br />

(1)<br />

264

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!