Web-based Learning Solutions for Communities of Practice
Web-based Learning Solutions for Communities of Practice
Web-based Learning Solutions for Communities of Practice
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Artificial Intelligence. More precisely, numerical<br />
techniques emphasize on the determination <strong>of</strong><br />
homogeneous clusters according to some similarity<br />
measures, but provide low-level descriptions<br />
<strong>of</strong> clusters (Anderberg, 1973). Recently, there<br />
are works on clustering that focus on numerical<br />
data whose inherent geometric properties can be<br />
exploited to naturally define distance functions<br />
between data points, such as DBSCAN (Ester,<br />
Kriegel, Sander & Xu, 1996), BIRTH (Zhang,<br />
Ramakrishnan & Livny, 1996), C2P (Nanopoulos,<br />
Theodoridis & Manolopoulos, 2001), CURE<br />
(Guha, Rastogi & Shim, 1998), CHAMELEON<br />
(Karypis, Han & Kumar, 1999), WaveCluster<br />
(Sheikholeslami, Chatterjee & Zhang, 1998).<br />
However, data mining applications frequently<br />
involve many datasets that also consist <strong>of</strong> categorical<br />
attributes on which distance functions are not<br />
naturally defined.<br />
Clustering<br />
In our case, we have numerical data that characterize<br />
users. In order to cluster them, we have to<br />
<strong>for</strong>m a clustering algorithm in these input data. As<br />
we have different users and different communities<br />
<strong>of</strong> users, it is desirable to find clusters <strong>of</strong> users by<br />
referring to a specific community each time, so<br />
as to result to some valuable conclusions. Referring<br />
to a specific community, we regard a cluster<br />
as the collection <strong>of</strong> users that have something in<br />
common by working on the same workspace and<br />
thus. For example, let SP1, SP2,..., SPk be the k<br />
spaces (from now on the term workspace and space<br />
should be treated as equal) used by a community<br />
A. We build an array X with size n x n (n is the<br />
number <strong>of</strong> users), where the cell Xij denotes the<br />
correlation between user i and user j.<br />
First <strong>of</strong> all, we have to consider both cases <strong>for</strong><br />
symmetric undirected and directed arrays <strong>of</strong> data<br />
depending on the analysis we want to make. In<br />
addition, we can use the affinity metric introduced<br />
in section 3.1, in order to build the array X.<br />
So we have an array <strong>for</strong> each space in the<br />
88<br />
Mining Unnoticed Knowledge in Collaboration Support Systems<br />
community A. After the construction <strong>of</strong> these arrays,<br />
we build a unified array <strong>for</strong> the whole community<br />
by using the arrays <strong>of</strong> each space. More<br />
precisely, we may use the average <strong>of</strong> each cell,<br />
which is the most common way. This final array<br />
will have the relationships between users <strong>of</strong> the<br />
same community. Having this array as an input<br />
and applying an appropriate clustering algorithm<br />
into them, we can find hidden relationships and<br />
correlations between users <strong>of</strong> a community by<br />
observing the resulted clusters <strong>of</strong> users.<br />
An interesting approach could be the use <strong>of</strong> all<br />
data arrays <strong>for</strong> all spaces and not the unified data<br />
array <strong>of</strong> a community. According to this idea, we<br />
can have more detailed views <strong>of</strong> users. This can<br />
be derived by applying a clustering procedure <strong>for</strong><br />
each certain space. By clustering user data <strong>for</strong> a<br />
specific space, we can provide micro-clusters <strong>of</strong><br />
users and give lower level clusters. After that,<br />
we can per<strong>for</strong>m a macro-clustering procedure.<br />
This procedure can exploit user properties from<br />
micro-clusters and find higher level clusters in a<br />
history or time horizon. Based on this method, we<br />
can have - at the same time - both detailed and<br />
general views <strong>of</strong> users.<br />
Depending on the kind <strong>of</strong> the analysis we want<br />
to make about the users <strong>of</strong> the system, we have to<br />
follow two different approaches about the nature <strong>of</strong><br />
the arrays that we use, one concerning symmetric<br />
undirected arrays, and one concerning directed<br />
arrays. Both <strong>of</strong> them are described below.<br />
Symmetric Undirected Arrays<br />
In this case, we can use any known hierarchical<br />
algorithm <strong>for</strong> clustering where we use relevant<br />
distances. In hierarchical clustering, there is a<br />
partitioning procedure <strong>of</strong> objects into optimally<br />
homogeneous groups. It is <strong>based</strong> on empirical<br />
measures <strong>of</strong> similarity among the objects that<br />
have received increasing attention in several<br />
different fields (Johnson, 1967). There are two<br />
different categories <strong>of</strong> hierarchical algorithms:<br />
those that repeatedly merge two smaller clusters