05.01.2013 Views

April 2012 Volume 15 Number 2 - Educational Technology & Society

April 2012 Volume 15 Number 2 - Educational Technology & Society

April 2012 Volume 15 Number 2 - Educational Technology & Society

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

to discover clusters in a graph. It performs an unsupervised clustering (no need to define number of clusters) based<br />

on weighted graphs and it has previously been used successfully in biology (Enright, 2002) and text mining<br />

(Theodosiou et al., 2008).<br />

MCL is based on the idea that natural clusters in a graph have many edges between the members of each cluster and<br />

few across clusters. Once inside a cluster a hypothetical random walker will have little chance to get out of the<br />

cluster. MCL simulates random walks (flow) within the whole graph and strengthens flow where it is already strong<br />

and weakens it where it is weak. After many iterations of this process the underlying cluster structure of the graph<br />

gradually becomes visibly. Regions of the graph with high flows that describe clusters are separated by boundaries<br />

with no flow.<br />

The MCL simulates the random walks within a graph by two algebraic operations called expansion and inflation that<br />

are applied to a stochastic matrix. The matrix representing the graph is used as input, and expansion and inflation are<br />

applied for many rounds until little or no changes are made in the matrix. The final matrix then represents the<br />

clustering of the graph nodes. Expansion refers to the power of a stochastic matrix using the normal matrix product.<br />

Inflation is the entry-wise Hadamard-Schur product (Radhakrishna & Bhaskara, 1998) combined with diagonal<br />

scaling and is responsible for both the strengthening and the weakening of the flow. The value of inflation controls<br />

the granularity of the clusters. The MCL algorithm is considered to be very fast and scalable, since its worst case<br />

time complexity is O(N*L^2), where N is the number of documents and L is an MCL parameter usually between 500<br />

and 1000. The space complexity is O(N*L) (Dongen, 2000).<br />

An example is presented in Figure 1. The original dense graph (upper left corner) is transformed in each step of the<br />

algorithm by pruning weak connections and keeping strong ones. Finally the algorithm convergences to a state where<br />

no more connections are pruned and the remaining ones formulate each cluster (lower right corner of the Figure 1).<br />

Since MCL is applied to a graph we also draw a graph from our data. Each node of the graph represents a user. Two<br />

users are connected if they have both visited at least one common web page of the e-learning environment. The<br />

weight of the connection can be the number of common web pages.<br />

Visualization<br />

Figure 1. An illustrated example of the MCL algorithm<br />

The final step of the methodology involves the visualisation of the clustering. This is achieved with BioLayout<br />

Express 3D (Goldovsky et al., 2005) that also allows the user to interact with results, for instance by searching for<br />

keywords, highlighting relevant documents, analyzing graph connectivity, linking nodes with external databases, and<br />

so forth. BioLayout is also open source, free software implemented in Java, that can be easily run under different<br />

operating system and thus it is ideal for our methodology.<br />

<strong>15</strong>9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!