NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Generalized Blockmodeling<br />
Samantha Lam, Jeffrey Chan, Conor Hayes<br />
Digital Enterprise Research Institute, National University of Ireland, <strong>Galway</strong><br />
samantha.lam@deri.org, jkc.chan@deri.org, conor.hayes@deri.org<br />
1. Introduction<br />
As online social network data become increasingly<br />
available and popular, there is a ongoing need to<br />
analyze and model them in a scalable manner. To<br />
understand these large networks, it is useful to be able<br />
to reduce and summarize them in terms of their<br />
underlying structure. Popular approaches include<br />
community finding [1] and blockmodeling [2], both of<br />
which aim to group the strongly associated vertices<br />
together. Our research is focused on the latter approach.<br />
2. Generalized blockmodeling<br />
Generalized blockmodeling [3] decomposes a<br />
network into partitions and assigns a relation type to<br />
each pair of partition (called a block), which describes<br />
the relationship between the partitions. A major<br />
component to the generality of this method comes from<br />
the use of regular equivalence as a defining feature of<br />
the blocks. The previously well-studied structural<br />
equivalence proved to be somewhat too restrictive to<br />
describe real-world networks which led to the<br />
proposition of regular equivalence [3].<br />
Up to now, the generalized blockmodel analysis of<br />
social networks has not received much attention, partly<br />
due to the computational demands of the existing<br />
algorithms. Therefore, we have designed approaches<br />
based on genetic algorithms (GA) and simulated<br />
annealing (SA) to fit generalized blockmodels. We<br />
have found both approaches are at least two orders of<br />
magnitude faster than the existing method.<br />
3. Improving algorithms<br />
In [3], they proposed the greedy KL-based approach<br />
to fit blockmodels. This algorithm considers the<br />
solution neighborhood of each vertex, and then greedily<br />
makes a move that minimizes the objective cost. A<br />
neighborhood move as a vertex moves from one<br />
partition to another, and the swapping of two vertices in<br />
different partitions were also considered. However,<br />
there was no description as to how to optimize the<br />
blocks types themselves. Therefore, we also introduce<br />
an additional step, where the blocks types are optimized<br />
after the partitions are optimized.<br />
3.1. Results<br />
We evaluate the efficiency and optimization<br />
performance of three algorithms <strong>–</strong> the proposed KLbased,<br />
and two aforementioned ones, SA and GA. To<br />
measure the scalability and optimizing ability of the<br />
algorithms, we generated synthetic datasets using a<br />
community generating algorithm and he results indicate<br />
that for larger networks, if speed is important, than the<br />
81<br />
SA algorithm should be used, but if accuracy is more<br />
important, than the GA should be used.<br />
To demonstrate the importance of increasing the<br />
scalability of fitting generalized blockmodels, we fitted<br />
blockmodels to the Enron and flight route datasets that<br />
could be not be fitted before because of the limitations<br />
of the KL algorithm.<br />
We used the GA algorithm to explore the Enron<br />
dataset over three time periods - prior, during and after<br />
the crisis. As a guide to the communications between<br />
the employees we used results found by [4] to help us<br />
construct our blocks. The best-fitted blockmodels<br />
summarized the roles, the key relationships (block<br />
types) between the different roles.<br />
For the flight route dataset, we decided to use<br />
European airlines as it was shown that a hub-andspoke/hierarchy<br />
model existed . For this data, we found<br />
that an addition of a 'density' block to be a fruitful<br />
addition to the nine types specified by [3].<br />
4. Improving the objective function<br />
The current definition of the objective function<br />
proposed by [3] is somewhat naïve. It is essentially a<br />
simple count/percentage of the number of deviations of<br />
an element from its ideal block. We are currently<br />
investigating methods to improve upon this definition.<br />
5. Future work<br />
We aim to investigate additional ways to measure<br />
and rank discovered blockmodels, such as improving<br />
the objective function and defining other block types.<br />
6. References<br />
[1] A. Clauset, M.E.J. Newman, and C. Moore, “Finding<br />
community structure in very large networks”, Phys. Rev. E,<br />
American Physical Society, Vol. 7 Issue 6, 2004, 066111.<br />
[2] S. Wasserman, and K. Faust, “Social network analysis:<br />
Methods and applications”, Cambridge university press, 1994<br />
[3] P. Doreian, V. Batagelj, and A. Ferligoj, “Generalized<br />
Blockmodeling”, Cambridge university press, 2005<br />
[4] J. Diesner. and T.L. Frantz, and K.M. Carley,<br />
“Communication networks from the Enron email corpus “It's<br />
always about the people. Enron is no different”,<br />
Computational & Mathematical Organization Theory,<br />
Springer, Vol. 1 No. 3, 2005, pp. 201-228