10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.4 Socio-Sense: A System for Analyzing the Societal Behavior from <strong>Web</strong> Archive 1617.3.4 Community Discovery ExamplesIn [164], an interesting result given is based on a NEC blog dataset. The blog data was collectedby an NEC in-house crawler. The crawled blog dataset contains 148,681 entry-to-entrylinks betw<strong>ee</strong>n 407 blogs over 12 consecutive month (one month corresponds to one time stampin the networked dataset) betw<strong>ee</strong>n August 2005 <strong>and</strong> September 2006. To determine the numberof communities, Soft Modularity metric [191] is introduced to measure the goodness of thecommunities derived. The larger the soft modularity value the better result of community partitionis obtained. Thus first the blog dataset is transformed into a single networked graph <strong>and</strong>its soft modularity values Q s are calculated <strong>and</strong> plotted in Fig.7.13(a). As s<strong>ee</strong>n from the figure,a clear peak of Q s is located at the community number being 4. Then, based on this optimizedcommunity number, four aggregated community structures were extracted by using the graphpartition algorithm [164], which are shown in Fig.7.13. In addition to the aggregated commu-Fig. 7.13. (a) Soft modularity <strong>and</strong> (b) mutual information under different α for the NEC dataset[164]nity structures, in Fig.7.14 the top keywords measured by the tf/idf scores that contained infour corresponding communities are listed as well. Furthermore, by interpreting the top keywords(i.e. the representatives of communities), we can s<strong>ee</strong> the essential properties of variouscommunities. For example, C1 is about technology, C2 about politics, C3 about entertainment,<strong>and</strong> C4 about digital libraries [164]. With the derived community structures as ground truthof communities, the proposed algorithm is employed to determine the community membershipsby taking the evolutions of community into consideration. By comparing the differenceof the ground truth <strong>and</strong> dynamically extracted results, the mutual information betw<strong>ee</strong>n themis plotted in Fig.7.13(b) under different α values. The plots indicate that when α increases,emphasizing less on the temporal smoothness, the extracted community structures are muchdeviated from the ground truth at each time stamp <strong>and</strong> have high variation over time. As aresult, the concluded results reveal the dynamic evolutions of communities in-depth, whichagain justifies the strong capability of social network analysis in dynamic environments. Moreanalysis insights are s<strong>ee</strong>n in [164].7.4 Socio-Sense: A System for Analyzing the Societal Behaviorfrom <strong>Web</strong> ArchiveM. Kitsuregawa et al. [136] introduce a Socio-Sense <strong>Web</strong> analysis system. The system appliesstructural <strong>and</strong> temporal analysis methods to long term <strong>Web</strong> archive to obtain insight into the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!