02.12.2012 Views

Web-based Learning Solutions for Communities of Practice

Web-based Learning Solutions for Communities of Practice

Web-based Learning Solutions for Communities of Practice

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Mining Unnoticed Knowledge in Collaboration Support Systems<br />

into a larger one, and those that split a larger<br />

cluster into smaller ones. As reported in (Ding &<br />

He, 2002), the MinMax linkage (Ding, He, Zha,<br />

Gu & Simon, 2001) is the best in agglomerative<br />

clustering and the average similarity is the best<br />

method in divisive clustering.<br />

1. In MinMax given n data objects the pairwise<br />

similarity matrix W=(w ij ) (where w ij is the<br />

similarity between i,j) we want to partition<br />

the data into two clusters C 1 ,C 2 using the<br />

min-max principle - minimize similarity<br />

between clusters and maximize similarity<br />

within a cluster. The similarity between C 1<br />

and C 2 is defined as s(C 1 ,C 2 ). Linkage l is<br />

the closeness or similarity measure between<br />

two clusters and helps in getting better results<br />

<strong>for</strong> clusters. It is defined as follows:<br />

l MinMax (C 1 ,C 2 ) = s(C 1 ,C 2 )/s(C 1 , C 1 )s(C 2 , C 2 )<br />

2. The average similarity is <strong>based</strong> on the minmax<br />

clustering principle. The purpose here<br />

is to choose the loosest cluster to split, or in<br />

other words the cluster with the smallest average<br />

similarity. The self-similarity <strong>of</strong> cluster<br />

C is s(C ,C )=s which has to be maximized<br />

k k k kk<br />

during clustering. The average self-similarity<br />

now is computed as skk = skk / nk<br />

2 where<br />

n = |C |. When a cluster has large average<br />

k k<br />

self-similarity then its objects are more<br />

homogeneous.<br />

Directed arrays<br />

If we consider that there is a difference between<br />

the meaning <strong>of</strong> Xij and Xji, then we understand<br />

that the Euclidean distance between users does<br />

not exist and we have to focus on a directed graph<br />

<strong>of</strong> users. More precisely, we need a clustering<br />

technique <strong>for</strong> directed graph in order to optimize<br />

the separation <strong>of</strong> users into clusters. An array X<br />

may correspond to a community <strong>of</strong> spaces or to a<br />

space. The commonly used approach is to obtain<br />

the symmetric array X <strong>of</strong> data and then apply<br />

clustering to X . In our case, we propose the<br />

framework <strong>of</strong> clustering by weighed cuts directly<br />

in X (Meila & Pentney, 2007). This framework<br />

unifies many different criteria used successfully<br />

on undirected graphs (like the normalization cut<br />

and the averaged cut) and directed graphs. It<br />

<strong>for</strong>mulates clustering as an optimization problem<br />

where the objective is to minimize the weighted<br />

cut in the directed graph which can be equivalent<br />

to the problem <strong>of</strong> a symmetric matrix.<br />

A different approach, that could probably<br />

lead to better (in a semantic sense) results would<br />

be to use the clustering algorithm presented in<br />

(Chakrabarti, Dom, Gibson, Kumar, Raghavan,<br />

Rajagopalan & Tomkins, 1998); to do that, we<br />

have to consider X as an adjacency matrix <strong>of</strong> an<br />

underlying graph, with nodes as the users. This<br />

algorithm inspired by the web search algorithm<br />

<strong>of</strong> (Kleinberg, 1999) can produce clusters, where<br />

in each cluster each user is characterized by two<br />

different scores. The first score (authority score)<br />

depicts the quality <strong>of</strong> the incoming links a user<br />

has, while the second score (hub score) depicts the<br />

quality <strong>of</strong> the outgoing links a user has. Hence,<br />

the first score can be used to depict how strong or<br />

weak author is a user and the second score can be<br />

used to depict how good or bad reviewer is this<br />

user. Users that have high authority scores are<br />

expected to have relevant ideas, whereas users<br />

with high hub scores are expected to have worked<br />

in spaces <strong>of</strong> relevant subject.<br />

Social Network Analysis<br />

The final scope <strong>of</strong> this analysis involves the mapping<br />

and measuring <strong>of</strong> the normally unnoticed<br />

relationships between users and, more specifically,<br />

between clusters <strong>of</strong> users. The presented metrics<br />

are largely regarded as a means <strong>of</strong> evaluating<br />

properties <strong>of</strong> particular interest. Even though it<br />

is not the goal <strong>of</strong> this paper to thoroughly study<br />

the different algorithmic and implementation<br />

89

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!