27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The rest of this paper is orga nized as follows. We<br />

present the background of LIM model and sparse learning in<br />

Section II. In Section III, we introduce our SLIM model for<br />

hub nodes selection. An ef ficient and scalable optim ization<br />

technique is given in Section IV. I n the final Section V, we<br />

present the experimental results on real twitter data.<br />

II.<br />

BACKGROUND<br />

A. Linear Influence Model (LIM)<br />

We follow the notations in [7]. Assum e there are<br />

nodes (corresponds to users) and contagions diffused<br />

over the network (corresponds to topics). Assuming the<br />

entire time intervals are norm alized into units:<br />

. In the ori ginal paper of LIM, it uses an<br />

indicator function to present whether the node got<br />

infected by the contagion between the time interval<br />

and . Therefore, the value for will be either one or<br />

zero. In this paper, we relax this assum ption by defining<br />

to be the num ber of ti mes that the node got<br />

infected by contagion in . Furthermore, let<br />

denote the total volum e of contagion between and .<br />

Under the linear influence model:<br />

<br />

<br />

Where is the maximum lag-length and is the nonnegative<br />

influence factor of user at the time-lag and<br />

is the i.i.d. zero-mean Gaussian noise. To m odel the<br />

influence for each user, we need to obtain a ro bust estimator<br />

of . Following LIM, we could organize ,<br />

and in a m atrix form. In particular, we define the<br />

volume vector<br />

to be the concatenation of<br />

where each<br />

; the<br />

user's influence vector to be the concatenation of<br />

where each<br />

. The matrix<br />

, whose ele ments are , is or ganized<br />

so that (1) can be written in a matrix form<br />

<br />

<br />

Where is the vector of noise. For the details of<br />

constructing , please refer to [7].<br />

Based on (2), we can formulate the problem of predicting<br />

by a non-negative least square problem:<br />

B. Sparse Learning<br />

We present the necessary background for sparse learning,<br />

starting with the high- dimensional linear r egression model.<br />

Let denote the input data m atrix and<br />

denote the response vector. Under the linear regression<br />

model,<br />

Where is the regression coefficient to be e stimated<br />

and the noise is distributed as , To select the<br />

most predictive features, Lasso [10] provides a sparse<br />

estimate of by solving the followin g optimization<br />

problem:<br />

<br />

Where is the -norm of which<br />

encourages the solutions t o be sparse, and is the<br />

regularization parameter that controls the sparsity-level (the<br />

number of non-zero elements in ). For the sparsity-patter n<br />

of , we could obtain a set of important features which<br />

correspond to those non-zero elements in .<br />

When features have a natural group structure, we could<br />

use the group Lasso penalty [8] to shrink each group of<br />

features to zero all-together instead of each individual<br />

feature. In particular, let denote the set of groups a nd the<br />

corresponding group Lasso problem can be formulated as:<br />

Where is the sub vector of for features in gr oup ;<br />

is the vector -norm. This group<br />

Lasso penalty achieves the effect of jointly setting all of t he<br />

coefficients within each group to zero or nonzero values.<br />

III. SPARSE INFLUENCE LINEAR MODEL (SLIM)<br />

Utilizing the group Lass o penalty introduced in the<br />

previous section, we propose a new m odel, called SLIM<br />

(sparse linear influence m odel), which can autom atically<br />

select the hub nodes without the prior knowledge of the<br />

network structure. We extend the LIM m odel by introducing<br />

another group lasso penalty on the user influence vector. In<br />

particular, we solve the following optimization problem:<br />

<br />

<br />

<br />

<br />

<br />

Where is the vector -norm. From the estimated , we<br />

obtain the pattern of the influence for each user.<br />

<br />

<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!