considering autocorrelation in predictive models - Department of ...

More documents

Recommendations

Info

30 Definition of the Problem Figure 2.4: Spatio-temporal autocorrelation. An example of the effect of the spatio-temporal autocorrelation on an observed variable in a map. The different colors indicate different values of the observed variable. The different maps represent the observed variable in 1980 and 1984. searchers in these fields have proven that systems of different nature can be represented as networks (Newman and Watts, 2006). For instance, the Web can be considered as a network of web-pages, which may be connected with each other by edges representing various explicit relations, such as hyperlinks. Social networks can be seen as groups of members that can be connected by friendship relations or can follow other members because they are interested in similar topics of interests. Metabolic networks can provide insight about genes and their possible relations of co-regulation based on similarities in their expressions level. Finally, in epidemiology, networks can represent the spread of diseases and infections. Regardless of where we encounter them, networks consist of entities (nodes), which may be connected to each other by edges. The nodes in a networks are generally of the same type and the edges between nodes express various explicit relations. Information on the nodes is provided as a set of properties (attributes) whose values are associated to each node in the network. The edges reflect the relation or dependence between the properties of the nodes. Network (relational) autocorrelation is defined as the property that a value observed at a node depends on the values observed at neighboring nodes in the network (Doreian, 1990). It can be seen as a special case of spatial (multi-dimensional and multi-directional) autocorrelation, but with different measures of relationship applied. In social analysis, it is recognized by the homophily’s principle, that is the tendency of nodes with similar values to be linked to each other (McPherson et al, 2001). Figure 2.5 shows an example of a data network where different colors represent different node labels. We can observe that there is a higher tendency of nodes having the same color to be connected to each other, than the tendency of nodes having different colors to be connected to each other. Formally, a network is a set of entities connected by edges. Each entity is called a node of the network. A number (which is usually taken to be positive) called weight is associated with each edge. In a general formulation, a network can be represented as a (weighted and directed) graph, i.e., a set of nodes and a ternary relation which represents both the edges between nodes and the weights associated to those edges. The network is represented by an adjacency matrix W, whose entries are positive (wi j > 0) if there is an edge connecting i to j, and equal to 0 (wi j = 0), otherwise. In practice, when the original data come in the form of a network, the weight wi j represents the strength of the connection from one node to another. Generalizing the Moran’s definition of spatial autocorrelation (Moran, 1950) of a random variable xn to match n dimensions, we deal with a n-dimensional vector and the autocorrelation function is expressed as the covariance between the values xd = (x1,d,...,xn,d) and xd ′ = (x1,d ′,...,xn,d ′), where τ = d′ −d is the
Definition of the Problem 31 Figure 2.5: Network autocorrelation. Different nodes are given in different colors depending on the nodes’ labels. total shift between the two vectors xd and xd ′ along the network: Network_AC(τ) = E[(Xd − µX)(Xd ′ − µX)] E[(Xd − µX)] (2.23) In social networks, homophily is defined as the tendency of individuals to associate and bond with similar others (friendship). Actually, homophily shows that people’s social networks are homogeneous with regard to many sociodemographic, behavioral, and intra-personal characteristics. Homophily effects among friends have demonstrated their importance in marketing (Chuhay, 2010). Moreover, homophily is the hidden assumption of recommender systems, although it veers away from how people are socially connected to how they are measurably similar to each another. In the context of Twitter, as special type of a social network, homophily implies that a twitterer follows a friend because he/she is interested in some topics the friend is publishing and the friend follows back the twitterer if he/she shares similar topics of interest. This is due to the fact that “a contact between similar people occurs at a higher rate than among dissimilar people” (Weng et al, 2010). Recently, (Kwak et al, 2010) investigated homophily in two contexts: geographic location and popularity. They considered the time zone of a user as an approximate indicator for the location of the user and the number of followers as a measure for user’s popularity. Among reciprocated users they observed some level of homophily. Frequent causes of network (relational) autocorrelation (Gujarati, 2002) are misidentifications of the following types: • Omitted variables • Fitting an inappropriate functional form, which can cause rise of autocorrelation in the errors of the model • Cobweb phenomenon (it takes time to adjust to a change in policy) • Manipulation of data (creating subsets over time resulting in a systematic pattern) • Data transformations
Page 1 and 2: CONSIDERING AUTOCORRELATION IN PRED
Page 3: MEDNARODNA PODIPLOMSKA ŠOLA JOŽEF
Page 6 and 7: 3.3.2 Measures for Classification .
Page 9: To my family Na mojata familija
Page 12 and 13: than PCTs learned by completely dis
Page 14 and 15: Predlagamo tudi metodo podatkovnega
Page 17 and 18: 1 Introduction In this introductory
Page 19 and 20: Introduction 3 1.2 Motivation The a
Page 21 and 22: Introduction 5 and cluster homogene
Page 23 and 24: Introduction 7 the first one consid
Page 25 and 26: 2 Definition of the Problem The wor
Page 27 and 28: Definition of the Problem 11 Table
Page 29 and 30: Definition of the Problem 13 Table
Page 31 and 32: Definition of the Problem 15 In the
Page 33 and 34: Definition of the Problem 17 a user
Page 35 and 36: Definition of the Problem 19 farthe
Page 37 and 38: Definition of the Problem 21 ods as
Page 39 and 40: Definition of the Problem 23 and th
Page 41 and 42: Definition of the Problem 25 Figure
Page 43 and 44: Definition of the Problem 27 Figure
Page 45: Definition of the Problem 29 The si
Page 49 and 50: 3 Existing Autocorrelation Measures
Page 51 and 52: Existing Autocorrelation Measures 3
Page 69 and 70: 4 Predictive Modeling Methods that
Page 71 and 72: Predictive Modeling Methods that us
Page 83 and 84: 5 Learning Predictive Clustering Tr
Page 85 and 86: Learning Predictive Clustering Tree
Page 91 and 92: 6 Learning PCTs for Spatially Autoc
Page 93 and 94: Learning PCTs for Spatially Autocor
Page 95 and 96: Learning PCTs for Spatially Autocor
Page 97 and 98:
Learning PCTs for Spatially Autocor
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
7 Learning PCTs for Network Autocor
Page 119 and 120:
Learning PCTs for Network Autocorre
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
8 Learning PCTs for HMC from Networ
Page 137 and 138:
Learning PCTs for HMC from Network
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
9 Conclusions and Further Work In t
Page 155 and 156:
Conclusions and Further Work 139 -
Page 157 and 158:
10 Acknowledgments I would like to
Page 159 and 160:
11 References Aha, D.; Kibler, D. I
Page 161 and 162:
Chuhay, R. Marketing via friends: S
Page 163 and 164:
Hasan, M. A.; Chaoji, V.; Salem, S.
Page 165 and 166:
Michalski, R. S.; Stepp, R. Learnin
Page 167 and 168:
Stojanova, D.; Ceci, M.; Malerba, D
Page 169 and 170:
List of Figures 2.1 An example of d
Page 171 and 172:
List of Tables 2.1 An example of da
Page 173 and 174:
List of Algorithms 1 Top-down induc
Page 175 and 176:
Appendices
Page 177 and 178:
Appendix A: CLUS user manual The me
Page 179 and 180:
• : parameters for constructing t
Page 181 and 182:
- :uses Dyadicity and Heterophilici
Page 183 and 184:
Appendix B: Bibliography List of pu
Page 185 and 186:
Levanic, T.; Stojanova, D. Uporaba
Page 187:
Appendix C: Biography Daniela Stoja
show all

considering autocorrelation in predictive models - Department of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?