considering autocorrelation in predictive models - Department of ...

More documents

Recommendations

Info

Abstract Most machine learning, data mining and statistical methods rely on the assumption that the analyzed data are independent and identically distributed (i.i.d.). More specifically, the individual examples included in the training data are assumed to be drawn independently from each other from the same probability distribution. However, cases where this assumption is violated can be easily found: For example, species are distributed non-randomly across a wide range of spatial scales. The i.i.d. assumption is often violated because of the phenomenon of autocorrelation. The cross-correlation of an attribute with itself is typically referred to as autocorrelation: This is the most general definition found in the literature. Specifically, in statistics, temporal autocorrelation is defined as the cross-correlation between the attribute of a process at different points in time. In timeseries analysis, temporal autocorrelation is defined as the correlation among time-stamped values due to their relative proximity in time. In spatial analysis, spatial autocorrelation has been defined as the correlation among data values, which is strictly due to the relative location proximity of the objects that the data refer to. It is justified by Tobler’s first law of geography according to which “everything is related to everything else, but near things are more related than distant things”. In network studies, autocorrelation is defined by the homophily principle as the tendency of nodes with similar values to be linked with each other. In this dissertation, we first give a clear and general definition of the autocorrelation phenomenon, which includes spatial and network autocorrelation for continuous and discrete responses. We then present a broad overview of the existing autocorrelation measures for the different types of autocorrelation and data analysis methods that consider them. Focusing on spatial and network autocorrelation, we propose three algorithms that handle non-stationary autocorrelation within the framework of predictive clustering, which deals with the tasks of classification, regression and structured output prediction. These algorithms and their empirical evaluation are the major contributions of this thesis. We first propose a data mining method called SCLUS that explicitly considers spatial autocorrelation when learning predictive clustering models. The method is based on the concept of predictive clustering trees (PCTs), according to which hierarchies of clusters of similar data are identified and a predictive model is associated to each cluster. In particular, our approach is able to learn predictive models for both a continuous response (regression task) and a discrete response (classification task). It properly deals with autocorrelation in data and provides a multi-level insight into the spatial autocorrelation phenomenon. The predictive models adapt to the local properties of the data, providing at the same time spatially smoothed predictions. We evaluate our approach on several real world problems of spatial regression and spatial classification. The problem of “network inference” is known to be a challenging task. In this dissertation, we propose a data mining method called NCLUS that explicitly considers autocorrelation when building predictive models from network data. The algorithm is based on the concept of PCTs that can be used for clustering, prediction and multi-target prediction, including multi-target regression and multi-target classification. We evaluate our approach on several real world problems of network regression, coming from the areas of social and spatial networks. Empirical results show that our algorithm performs better
Page 1 and 2: CONSIDERING AUTOCORRELATION IN PRED
Page 3: MEDNARODNA PODIPLOMSKA ŠOLA JOŽEF
Page 6 and 7: 3.3.2 Measures for Classification .
Page 9: To my family Na mojata familija
Page 13 and 14: Povzetek Večina metod za podatkovn
Page 15: Abbreviations AUPRC = Area Under th
Page 18 and 19: 2 Introduction typical classificati
Page 20 and 21: 4 Introduction analysis of such dat
Page 22 and 23: 6 Introduction temporal and network
Page 24 and 25: 8 Introduction
Page 26 and 27: 10 Definition of the Problem formal
Page 28 and 29: 12 Definition of the Problem Table
Page 30 and 31: 14 Definition of the Problem Figure
Page 32 and 33: 16 Definition of the Problem of the
Page 34 and 35: 18 Definition of the Problem where
Page 36 and 37: 20 Definition of the Problem In con
Page 38 and 39: 22 Definition of the Problem relati
Page 40 and 41: 24 Definition of the Problem as the
Page 42 and 43: 26 Definition of the Problem betwee
Page 44 and 45: 28 Definition of the Problem • No
Page 46 and 47: 30 Definition of the Problem Figure
Page 48 and 49: 32 Definition of the Problem 2.4 Su
Page 50 and 51: 34 Existing Autocorrelation Measure
Page 60 and 61:
44 Existing Autocorrelation Measure
Page 62 and 63:
Page 64 and 65:
Page 66 and 67:
Page 68 and 69:
Page 70 and 71:
54 Predictive Modeling Methods that
Page 72 and 73:
Page 74 and 75:
Page 76 and 77:
Page 78 and 79:
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
68 Learning Predictive Clustering T
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
76 Learning PCTs for Spatially Auto
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
100 Learning PCTs for Spatially Aut
Page 118 and 119:
102 Learning PCTs for Network Autoc
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
120 Learning PCTs for HMC from Netw
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
138 Conclusions and Further Work -
Page 156 and 157:
140 Conclusions and Further Work po
Page 158 and 159:
142
Page 160 and 161:
144 Batagelj, V.; Mrvar, A. PAJEK -
Page 162 and 163:
146 Engle, R. F. Autoregressive con
Page 164 and 165:
148 LeSage, J.; Pace, K. Spatial de
Page 166 and 167:
150 Rice, J. A. Mathematical Statis
Page 168 and 169:
152 Zhu, J.; Zheng, Y.; Carroll, A.
Page 170 and 171:
154
Page 172 and 173:
156
Page 174 and 175:
158
Page 176 and 177:
160
Page 178 and 179:
162 The results of a CLUS run go
Page 180 and 181:
164 - : Euclidean weights. - : Gaus
Page 182 and 183:
166
Page 184 and 185:
168 B.2 Other publications B.2.1 Jo
Page 186 and 187:
170
show all

considering autocorrelation in predictive models - Department of ...

Create successful ePaper yourself

Delete template?

Save as template?