29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Generalized Blockmodeling<br />

Samantha Lam, Jeffrey Chan, Conor Hayes<br />

Digital Enterprise Research Institute, National University of Ireland, <strong>Galway</strong><br />

samantha.lam@deri.org, jkc.chan@deri.org, conor.hayes@deri.org<br />

1. Introduction<br />

As online social network data become increasingly<br />

available and popular, there is a ongoing need to<br />

analyze and model them in a scalable manner. To<br />

understand these large networks, it is useful to be able<br />

to reduce and summarize them in terms of their<br />

underlying structure. Popular approaches include<br />

community finding [1] and blockmodeling [2], both of<br />

which aim to group the strongly associated vertices<br />

together. Our research is focused on the latter approach.<br />

2. Generalized blockmodeling<br />

Generalized blockmodeling [3] decomposes a<br />

network into partitions and assigns a relation type to<br />

each pair of partition (called a block), which describes<br />

the relationship between the partitions. A major<br />

component to the generality of this method comes from<br />

the use of regular equivalence as a defining feature of<br />

the blocks. The previously well-studied structural<br />

equivalence proved to be somewhat too restrictive to<br />

describe real-world networks which led to the<br />

proposition of regular equivalence [3].<br />

Up to now, the generalized blockmodel analysis of<br />

social networks has not received much attention, partly<br />

due to the computational demands of the existing<br />

algorithms. Therefore, we have designed approaches<br />

based on genetic algorithms (GA) and simulated<br />

annealing (SA) to fit generalized blockmodels. We<br />

have found both approaches are at least two orders of<br />

magnitude faster than the existing method.<br />

3. Improving algorithms<br />

In [3], they proposed the greedy KL-based approach<br />

to fit blockmodels. This algorithm considers the<br />

solution neighborhood of each vertex, and then greedily<br />

makes a move that minimizes the objective cost. A<br />

neighborhood move as a vertex moves from one<br />

partition to another, and the swapping of two vertices in<br />

different partitions were also considered. However,<br />

there was no description as to how to optimize the<br />

blocks types themselves. Therefore, we also introduce<br />

an additional step, where the blocks types are optimized<br />

after the partitions are optimized.<br />

3.1. Results<br />

We evaluate the efficiency and optimization<br />

performance of three algorithms <strong>–</strong> the proposed KLbased,<br />

and two aforementioned ones, SA and GA. To<br />

measure the scalability and optimizing ability of the<br />

algorithms, we generated synthetic datasets using a<br />

community generating algorithm and he results indicate<br />

that for larger networks, if speed is important, than the<br />

81<br />

SA algorithm should be used, but if accuracy is more<br />

important, than the GA should be used.<br />

To demonstrate the importance of increasing the<br />

scalability of fitting generalized blockmodels, we fitted<br />

blockmodels to the Enron and flight route datasets that<br />

could be not be fitted before because of the limitations<br />

of the KL algorithm.<br />

We used the GA algorithm to explore the Enron<br />

dataset over three time periods - prior, during and after<br />

the crisis. As a guide to the communications between<br />

the employees we used results found by [4] to help us<br />

construct our blocks. The best-fitted blockmodels<br />

summarized the roles, the key relationships (block<br />

types) between the different roles.<br />

For the flight route dataset, we decided to use<br />

European airlines as it was shown that a hub-andspoke/hierarchy<br />

model existed . For this data, we found<br />

that an addition of a 'density' block to be a fruitful<br />

addition to the nine types specified by [3].<br />

4. Improving the objective function<br />

The current definition of the objective function<br />

proposed by [3] is somewhat naïve. It is essentially a<br />

simple count/percentage of the number of deviations of<br />

an element from its ideal block. We are currently<br />

investigating methods to improve upon this definition.<br />

5. Future work<br />

We aim to investigate additional ways to measure<br />

and rank discovered blockmodels, such as improving<br />

the objective function and defining other block types.<br />

6. References<br />

[1] A. Clauset, M.E.J. Newman, and C. Moore, “Finding<br />

community structure in very large networks”, Phys. Rev. E,<br />

American Physical Society, Vol. 7 Issue 6, 2004, 066111.<br />

[2] S. Wasserman, and K. Faust, “Social network analysis:<br />

Methods and applications”, Cambridge university press, 1994<br />

[3] P. Doreian, V. Batagelj, and A. Ferligoj, “Generalized<br />

Blockmodeling”, Cambridge university press, 2005<br />

[4] J. Diesner. and T.L. Frantz, and K.M. Carley,<br />

“Communication networks from the Enron email corpus “It's<br />

always about the people. Enron is no different”,<br />

Computational & Mathematical Organization Theory,<br />

Springer, Vol. 1 No. 3, 2005, pp. 201-228

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!