10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

146 7 Extracting <strong>and</strong> Analyzing <strong>Web</strong> <strong>Social</strong> Networks<strong>Web</strong> community slightly differs from a community of people, for example, a <strong>Web</strong> communitymay include competing companies. Since a <strong>Web</strong> community represents a certain topic, we canunderst<strong>and</strong> when <strong>and</strong> how the topic emerged <strong>and</strong> evolved in the <strong>Web</strong>.As introduced in Section 5.4, there are several algorithms for finding <strong>Web</strong> communities.Here, the extraction of <strong>Web</strong> community utilizes <strong>Web</strong> community chart that is a graph of communities,in which related communities are connected by weighted edges. The main advantageof the <strong>Web</strong> community chart is existence of relevance betw<strong>ee</strong>n communities. We can navigatethrough related communities, <strong>and</strong> locate evolution around a particular community.M.Toyoda <strong>and</strong> M. Kitsuregawa explain how <strong>Web</strong> communities evolve, <strong>and</strong> what kindsof metrics can measure degr<strong>ee</strong> of the evolution, such as growth rate <strong>and</strong> novelty. They firstexplain the details of changes of <strong>Web</strong> communities, <strong>and</strong> then introduce evolution metrics thatcan be used for finding patterns of evolution. Here the notations used are summarized in thissection.t 1 , t 2 , ..., t n : Time when each archive crawled. Currently, a month is used as the unit time.W(t k ): The <strong>Web</strong> archive at time t k .C(t k ): The <strong>Web</strong> community chart at time t k .c(t k ), d(t k ), e(t k ), ...: Communities in C(t k ).7.1.1 Types of ChangesEmerge: A community c(t k ) emerges in C(t k ), when c(t k ) shares no URLs with any communityin C(t k−1 ). Note that not all URLs in c(t k ) newly appear in W(t k ). Some URLs in c(t k )may be included in W (t k−1 ), <strong>and</strong> do not have enough connectivity to form a community.Dissolve: A community c(t k−1 ) in C(t k1 ) has dissolved, when c(t k−1 ) shares no URLs withany community in C(t k ). Note that not all URLs in c(t k−1 ) disappeared from W (t k−1 ). SomeURLs in c(t k−1 ) may still be included in W (t k ) losing connectivity to any community.Grow <strong>and</strong> shrink: When c(t k−1 ) in C(t k−1 ) shares URLs with only c(t k ) in C(t k ), <strong>and</strong> viceversa, only two changes can occur to c(t k−1 ). The community grows when new URLs areappeared in c(t k ), <strong>and</strong> shrinks when URLs disappeared from c(t k−1 ). When the number ofappeared URLs is greater than the number of disappeared URLs, it grows. In the reverse case,it shrinks.Split: A community c(t k−1 ) may split into some smaller communities. In this case, c(t k−1 )shares URLs with multiple communities in C(t k ). Split is caused by disconnections of URLsin SDG. Split communities may grow <strong>and</strong> shrink. They may also merge (s<strong>ee</strong> the next item)with other communities.Merge: When multiple communities (c(t k−1 )), d(t k−1 ), ...) share URLs with a single communitye(t k ), these communities are merged into e(t k ) by connections of their URLs in SDG.Merged community may grow <strong>and</strong> shrink. They may also split before merging.7.1.2 Evolution MetricsEvolution metrics measure how a particular community c(t k ) has evolved. For example, wecan know how much c(t k ) has grown, <strong>and</strong> how many URLs newly appeared in c(t k ). Theproposed metrics can be used for finding various patterns of evolution described above. Tomeasure changes of c(t k ), the community is identified at time t k−1 corresponding to c(t k ).This corresponding community, c(t k−1 ), is defined as the community that shares the mostURLs with c(t k ). If there were multiple communities that share the same number of URLs, acommunity that has the largest number of URLs is selected.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!