considering autocorrelation in predictive models - Department of ...

More documents

Recommendations

Info

4 Introduction analysis of such data needs to take this into account. Such work either removes the autocorrelation dependencies during pre-processing and then use traditional algorithms (e.g., (Hardisty and Klippel, 2010; Huang et al, 2004)) or modifies the classical machine learning, data mining and statistical methods in order to consider the autocorrelation (e.g., (Bel et al, 2009; Rinzivillo and Turini, 2004, 2007)). There are also approaches which use a relational setting (e.g., (Ceci and Appice, 2006; Malerba et al, 2005)), where the autocorrelation is usually incorporated through the data structure or defined implicitly through relationships among the data and other data properties. However, one limitation of most of the approaches that take autocorrelation into account is that they assume that autocorrelation dependencies are constant (i.e., do not change) throughout the space/network (Angin and Neville, 2008). This means that possible significant variability in autocorrelation dependencies in different points of the space/network cannot be represented and modeled. Such variability could result from a different underlying latent structure of the space/network that varies among its parts in terms of properties of nodes or associations between them. For example, different research communities may have different levels of cohesiveness and thus cite papers on other topics with varying degrees. As pointed out by Angin and Neville (2008), when autocorrelation varies significantly throughout a network, it may be more accurate to model the dependencies locally rather than globally. In the dissertation, we extend the predictive clustering framework in the context of PCTs that are able to deal with data (spatial and network) that do not follow the i.i.d. assumption. The distinctive characteristic of the proposed approach is that it explicitly considers the non-stationary (spatial and network) autocorrelation when building the predictive models. Such a method not only extends the applicability of the predictive clustering approach, but also exploits the autocorrelation phenomenon and uses it to make better predictions and better models. In traditional PCTs (Blockeel, 1998), the tree construction is performed by maximizing variance reduction. This heuristic guarantees, in principle, accurate models since it reduces the error on the training set. However, it neglects the possible presence of autocorrelation in the training data. To address this issue, we propose to simultaneously maximize autocorrelation for spatial/network domains. In this way, we exploit the spatial/network structure of the data in the PCT induction phase and obtain predictive models that naturally deal with the phenomenon of autocorrelation. The consideration of autocorrelation in clustering has already been investigated in the literature, both for spatial clustering (Glotsos et al, 2004) and network clustering (Jahani and Bagherpour, 2011). Motivated by the demonstrated benefits of considering autocorrelation, we exploit some characteristics of autocorrelated data to improve the quality of PCTs. The consideration of autocorrelation in clustering offers several advantages, since it allows us to: • determine the strength of the spatial/network arrangement on the variables in the model; • evaluate stationarity and heterogeneity of the autocorrelation phenomenon across space; • identify the possible role of the spatial/network arrangement/distance decay on the predictions associated with each of the nodes of the tree; • focus on the spatial/network “neighborhood” to better understand the effects that it can have on other neighborhoods and vice versa. These advantages of considering spatial autocorrelation in clustering, identified by (Arthur, 2008), fit well into the case of PCTs. Moreover, as recognized by (Griffith, 2003), autocorrelation implicitly defines a zoning of a (spatial) phenomenon: Taking this into account reduces the effect of autocorrelation on prediction errors. Therefore, we propose to perform clustering by maximizing both variance reduction
Introduction 5 and cluster homogeneity (in terms of autocorrelation) at the same time, during the phase of adding a new node to the predictive clustering tree. The network (spatial and relational) setting that we address in this work is based on the use of both the descriptive information (attributes) and the network structure during training, whereas we only use the descriptive information in the testing phase and disregard the network structure. More specifically, in the training phase, we assume that all examples are labeled and that the given network is complete. In the testing phase, all testing examples are unlabeled and the network is not given. A key property of our approach is that the existence of the network is not obligatory in the testing phase, where we only need the descriptive information. This can be very beneficial when predictions need to be made for those examples for which connections to others examples are not known or need to be confirmed. The more common setting where a network with some nodes labeled and some nodes unlabeled is given, can be easily mapped to our setting. We can use the nodes with labels and the projection of the network on these nodes for training and only the unlabeled nodes without network information in the testing phase. This network setting is very different from the existing approaches to network classification and regression where the descriptive information is typically in a tight connection to the network structure. The connections (edges in the network) between the data in the training/testing set are predefined for a particular instance and are used to generate the descriptive information associated to the nodes of the network (see, for example, (Steinhaeuser et al, 2011)). Therefore, in order to predict the value of the response variable(s), besides the descriptive information, one needs the connections (edges in the network) to related/similar entities. This is very different from what is typically done in network analysis as well. Indeed, the general focus there is on exploring the structure of a network by calculating its properties (e.g. the degrees of the nodes, the connectedness within the network, scalability, robustness, etc.). The network properties are then fitted into an already existing mathematical (theoretical) network (graph) model (Steinhaeuser et al, 2011). From the predictive perspective, according to the tests in the tree, it is possible to associate an observation (a test node of a network) to a cluster. The predictive model associated to the cluster can then be used to predict its response value (or response values, in the case of multi-target tasks). From the descriptive perspective, the tree models obtained by the proposed algorithm allow us to obtain a hierarchical view of the network, where clusters can be employed to design a federation of hierarchically arranged networks. A hierarchial view of the network can be useful, for instance, in wireless sensor networks, where a hierarchical structure is one of the possible ways to reduce the communication cost between the nodes (Li et al, 2007). Moreover, it is possible to browse the generated clusters at different levels of the hierarchy, where each cluster can naturally consider different effects of the autocorrelation phenomenon on different portions of the network: at higher levels of the tree, clusters will be able to consider autocorrelation phenomenons that are spread all over the network, while at lower levels of the tree, clusters will reasonably consider local effects of autocorrelation. This gives us a way to consider non-stationary autocorrelation. 1.3 Contributions The research presented in this dissertation extends the PCT framework towards learning from autocorrelated data. We address important aspects of the problem of learning predictive models in the case when the examples in the data are not i.i.d, such as the definition of autocorrelation measures for a variety of learning tasks that we consider, the definition of autocorrelation-based heuristics, the development of algorithms that use such heuristics for learning predictive models, as well as their experimental evaluation. In our broad overview, we consider four different types of autocorrelation: spatial, temporal, spatio-
Page 1 and 2: CONSIDERING AUTOCORRELATION IN PRED
Page 3: MEDNARODNA PODIPLOMSKA ŠOLA JOŽEF
Page 6 and 7: 3.3.2 Measures for Classification .
Page 9: To my family Na mojata familija
Page 12 and 13: than PCTs learned by completely dis
Page 14 and 15: Predlagamo tudi metodo podatkovnega
Page 17 and 18: 1 Introduction In this introductory
Page 19: Introduction 3 1.2 Motivation The a
Page 23 and 24: Introduction 7 the first one consid
Page 25 and 26: 2 Definition of the Problem The wor
Page 27 and 28: Definition of the Problem 11 Table
Page 29 and 30: Definition of the Problem 13 Table
Page 31 and 32: Definition of the Problem 15 In the
Page 33 and 34: Definition of the Problem 17 a user
Page 35 and 36: Definition of the Problem 19 farthe
Page 37 and 38: Definition of the Problem 21 ods as
Page 39 and 40: Definition of the Problem 23 and th
Page 41 and 42: Definition of the Problem 25 Figure
Page 45 and 46: Definition of the Problem 29 The si
Page 49 and 50: 3 Existing Autocorrelation Measures
Page 51 and 52: Existing Autocorrelation Measures 3
Page 69 and 70: 4 Predictive Modeling Methods that
Page 71 and 72:
Predictive Modeling Methods that us
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
5 Learning Predictive Clustering Tr
Page 85 and 86:
Learning Predictive Clustering Tree
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
6 Learning PCTs for Spatially Autoc
Page 93 and 94:
Learning PCTs for Spatially Autocor
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
7 Learning PCTs for Network Autocor
Page 119 and 120:
Learning PCTs for Network Autocorre
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
8 Learning PCTs for HMC from Networ
Page 137 and 138:
Learning PCTs for HMC from Network
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
9 Conclusions and Further Work In t
Page 155 and 156:
Conclusions and Further Work 139 -
Page 157 and 158:
10 Acknowledgments I would like to
Page 159 and 160:
11 References Aha, D.; Kibler, D. I
Page 161 and 162:
Chuhay, R. Marketing via friends: S
Page 163 and 164:
Hasan, M. A.; Chaoji, V.; Salem, S.
Page 165 and 166:
Michalski, R. S.; Stepp, R. Learnin
Page 167 and 168:
Stojanova, D.; Ceci, M.; Malerba, D
Page 169 and 170:
List of Figures 2.1 An example of d
Page 171 and 172:
List of Tables 2.1 An example of da
Page 173 and 174:
List of Algorithms 1 Top-down induc
Page 175 and 176:
Appendices
Page 177 and 178:
Appendix A: CLUS user manual The me
Page 179 and 180:
• : parameters for constructing t
Page 181 and 182:
- :uses Dyadicity and Heterophilici
Page 183 and 184:
Appendix B: Bibliography List of pu
Page 185 and 186:
Levanic, T.; Stojanova, D. Uporaba
Page 187:
Appendix C: Biography Daniela Stoja
show all

considering autocorrelation in predictive models - Department of ...

Create successful ePaper yourself

Delete template?

Save as template?