02.12.2012 Views

Web-based Learning Solutions for Communities of Practice

Web-based Learning Solutions for Communities of Practice

Web-based Learning Solutions for Communities of Practice

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Artificial Intelligence. More precisely, numerical<br />

techniques emphasize on the determination <strong>of</strong><br />

homogeneous clusters according to some similarity<br />

measures, but provide low-level descriptions<br />

<strong>of</strong> clusters (Anderberg, 1973). Recently, there<br />

are works on clustering that focus on numerical<br />

data whose inherent geometric properties can be<br />

exploited to naturally define distance functions<br />

between data points, such as DBSCAN (Ester,<br />

Kriegel, Sander & Xu, 1996), BIRTH (Zhang,<br />

Ramakrishnan & Livny, 1996), C2P (Nanopoulos,<br />

Theodoridis & Manolopoulos, 2001), CURE<br />

(Guha, Rastogi & Shim, 1998), CHAMELEON<br />

(Karypis, Han & Kumar, 1999), WaveCluster<br />

(Sheikholeslami, Chatterjee & Zhang, 1998).<br />

However, data mining applications frequently<br />

involve many datasets that also consist <strong>of</strong> categorical<br />

attributes on which distance functions are not<br />

naturally defined.<br />

Clustering<br />

In our case, we have numerical data that characterize<br />

users. In order to cluster them, we have to<br />

<strong>for</strong>m a clustering algorithm in these input data. As<br />

we have different users and different communities<br />

<strong>of</strong> users, it is desirable to find clusters <strong>of</strong> users by<br />

referring to a specific community each time, so<br />

as to result to some valuable conclusions. Referring<br />

to a specific community, we regard a cluster<br />

as the collection <strong>of</strong> users that have something in<br />

common by working on the same workspace and<br />

thus. For example, let SP1, SP2,..., SPk be the k<br />

spaces (from now on the term workspace and space<br />

should be treated as equal) used by a community<br />

A. We build an array X with size n x n (n is the<br />

number <strong>of</strong> users), where the cell Xij denotes the<br />

correlation between user i and user j.<br />

First <strong>of</strong> all, we have to consider both cases <strong>for</strong><br />

symmetric undirected and directed arrays <strong>of</strong> data<br />

depending on the analysis we want to make. In<br />

addition, we can use the affinity metric introduced<br />

in section 3.1, in order to build the array X.<br />

So we have an array <strong>for</strong> each space in the<br />

88<br />

Mining Unnoticed Knowledge in Collaboration Support Systems<br />

community A. After the construction <strong>of</strong> these arrays,<br />

we build a unified array <strong>for</strong> the whole community<br />

by using the arrays <strong>of</strong> each space. More<br />

precisely, we may use the average <strong>of</strong> each cell,<br />

which is the most common way. This final array<br />

will have the relationships between users <strong>of</strong> the<br />

same community. Having this array as an input<br />

and applying an appropriate clustering algorithm<br />

into them, we can find hidden relationships and<br />

correlations between users <strong>of</strong> a community by<br />

observing the resulted clusters <strong>of</strong> users.<br />

An interesting approach could be the use <strong>of</strong> all<br />

data arrays <strong>for</strong> all spaces and not the unified data<br />

array <strong>of</strong> a community. According to this idea, we<br />

can have more detailed views <strong>of</strong> users. This can<br />

be derived by applying a clustering procedure <strong>for</strong><br />

each certain space. By clustering user data <strong>for</strong> a<br />

specific space, we can provide micro-clusters <strong>of</strong><br />

users and give lower level clusters. After that,<br />

we can per<strong>for</strong>m a macro-clustering procedure.<br />

This procedure can exploit user properties from<br />

micro-clusters and find higher level clusters in a<br />

history or time horizon. Based on this method, we<br />

can have - at the same time - both detailed and<br />

general views <strong>of</strong> users.<br />

Depending on the kind <strong>of</strong> the analysis we want<br />

to make about the users <strong>of</strong> the system, we have to<br />

follow two different approaches about the nature <strong>of</strong><br />

the arrays that we use, one concerning symmetric<br />

undirected arrays, and one concerning directed<br />

arrays. Both <strong>of</strong> them are described below.<br />

Symmetric Undirected Arrays<br />

In this case, we can use any known hierarchical<br />

algorithm <strong>for</strong> clustering where we use relevant<br />

distances. In hierarchical clustering, there is a<br />

partitioning procedure <strong>of</strong> objects into optimally<br />

homogeneous groups. It is <strong>based</strong> on empirical<br />

measures <strong>of</strong> similarity among the objects that<br />

have received increasing attention in several<br />

different fields (Johnson, 1967). There are two<br />

different categories <strong>of</strong> hierarchical algorithms:<br />

those that repeatedly merge two smaller clusters

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!