Semantic Interpretation of Digital Aerial Images Utilizing ...

More documents

Recommendations

Info

28 Chapter 3. From Appearance and 3D to Interpreted Image Pixels interpretation of images. On the one hand, the aerial images provide a nearly constant stuff object scale that is mainly defined by the GSD of the aerial project. The local interpretation can be thus performed for one scale which is mainly defined by the patch size. On the other hand, the huge variability of the mapped data and the undefined object’s shapes make a top-down recognition strategy intractable to solve. In addition the training sample generation does not need entirely annotated objects, but rather an efficient assigning of object classes by drawing strokes. We therefore concentrate on a local explanation of the images and introduce the exact object extraction in a later step (see Chapter 5). We extensively exploit statistical Sigma Points [Julier and Uhlmann, 1996] features, directly derived from the well-established covariance descriptors [Tuzel et al., 2006] in combination with RF classifiers [Breiman, 2001] to compactly describe and classify color, basic texture and elevation measurements within local image regions. The combination of the derived statistical features and RF classifiers provides several advantages for largescale computations in aerial imagery. Since the aerial imagery consists of multiple information sources, there is a need to reasonably combine these low-level information cues. We therefore apply a Sigma Points feature representation to compactly describe different channels considering a small local neighborhood. Compared to computed histograms over multi-spectral data, these descriptors are low-dimensional and enable a simple integration of appearance and height information, that is then represented on a Euclidean vector space. Moreover, they can be quickly computed for each pixel using integral image structures and also support parallel computation techniques. Randomized forests have proven to give robust and accurate results for challenging multiclass classification tasks [Lepetit and Fua, 2006, Shotton et al., 2008]. RFs are very efficient at runtime since the final decision is made on fast binary decisions between a small number of selected feature attributes. In addition, the classifier can be efficiently trained on a large amount of data and can handle some errors in the training data. In the following sections we first review related work in the context of semantic image interpretation and classification. Then, we outline the core parts of our semantic interpretation, consisting of a powerful feature representation (Section 3.4), the classifier (Section 3.5) and refinement steps to obtain an improved final class labeling (see Sections 3.6 and 3.7). 3.3 Related Work While recently proposed approaches aim at extracting coarse scene geometry directly from interpretation results [Hoiem et al., 2007,Saxena et al., 2008,Gould et al., 2009,Liu et al., 2010] or try to jointly estimate classification and dense reconstruction [Ladicky et al., 2010b], we rather focus on directly integrating available 3D data in our interpre-
3.3. Related Work 29 tation workflow to improve the task of semantic labeling. The tight integration of 3D data into image classification as additional information source is still a new and upcoming field of research. As shown in [Brostow et al., 2008, Sturgess et al., 2009, Xiao and Quan, 2009], a combination of color and coarse 3D information, obtained from SfM geometry, is essential for an accurate semantic interpretation of street-level images. Leibe et al. [Leibe et al., 2007] demonstrated that SfM improves the detection and tracking of moving objects. In this thesis we go one step further by utilizing dense 3D reconstruction. Several approaches in the field of photogrammetry, dealing with aerial imagery, focus on detecting single object classes, e.g., buildings by using only LiDAR data [Matei et al., 2008, Toshev et al., 2010] or height models [Lafarge et al., 2008], but also on exploiting appearance cues together with elevation measurements (resulting from a combination of a surface and a terrain model) [Rottensteiner et al., 2004, Zebedin et al., 2006]. While these approaches focus on binary classification tasks, the presented concept handles multiple classes and can be configured to specific objects. In contrast to [Zebedin et al., 2006], where appearance and geometry are treated separately, our approach tightly integrates dense matching results and low-level cues like color and derived edge responses within a compact yet local feature representation. In addition, we train specified object classes directly and do not introduce prior knowledge, like, e.g., that buildings and trees are elevated from ground, to derive the final classification. Local classification strategies, using supervision, aim to semantically describe every pixel by considering a small spatial neighborhood or entire image regions, provided by unsupervised segmentation, in the image space. In fact, Bag-of-Features (BoF) models have shown excellent performance in various recognition tasks [Winn et al., 2005,Nowak et al., 2006, Rabinovich et al., 2006, Marszalek and Schmid, 2007, Verbeek and Triggs, 2007, Bosch et al., 2007, Pantofaru et al., 2008, Fulkerson et al., 2009, Lazebnik and Raginsky, 2009]. The BoF concept is mainly based on collecting different types of feature vectors within given image regions. Collected features instances are then quantized into a specified number of words by using well-established clustering procedures. Any image region is then represented by an one-dimensional histogram of word occurrences. The resulting histogram representations get then trained and evaluated with arbitrary classifiers. Commonly, quantized feature instances are extensively composed of combinations of Texton filter bank responses [Winn et al., 2005], SIFTs [Lowe, 2004], Histogram of Oriented Gradients (HOG) [Dalal and Triggs, 2005] or spatial information. However a reliable combination of different feature requires a sophisticated normalization step. An integration of appearance and height information particularly induces problems since height is difficult to quantize, e.g., into one-dimensional histograms if the present value range not known in advance. Mainly inspired by popular approaches [Viola and Jones, 2004,Tuzel et al., 2007,Shot-
Page 1: PhD Thesis Semantic Interpretation
Page 5: Statutory Declaration I declare tha
Page 8 and 9: This thesis was created at the Inst
Page 11 and 12: Kurzfassung Eines der grundlegenden
Page 13 and 14: Contents 1 Introduction 1 1.1 Aeria
Page 15 and 16: CONTENTS iii 5.4.3 Prototype Refine
Page 17 and 18: Chapter 1 Introduction Three-dimens
Page 19 and 20: 1.2. Photo-realistic Modeling vs. V
Page 21 and 22: 1.3. Semantic Interpretation of Aer
Page 23 and 24: 1.4. Contributions 7 Figure 1.4: Ov
Page 25 and 26: 1.5. Outline 9 3. From Interpreted
Page 27 and 28: Chapter 2 Digital Aerial Imagery Th
Page 29 and 30: 2.2. Redundancy 13 Figure 2.1: Geom
Page 31 and 32: 2.3. Digital Surface Model 15 Figur
Page 33 and 34: 2.3. Digital Surface Model 17 Since
Page 35 and 36: 2.3. Digital Surface Model 19 Figur
Page 37 and 38: 2.5. Orthographic Image Representat
Page 39 and 40: 2.7. Summary 23 Dataset Nb of Image
Page 41 and 42: Chapter 3 From Appearance and 3D to
Page 43: 3.2. Overview 27 Figure 3.2: Corres
Page 47 and 48: 3.4. Feature Representation 31 a co
Page 49 and 50: 3.4. Feature Representation 33 cons
Page 51 and 52: 3.4. Feature Representation 35 d LE
Page 53 and 54: 3.4. Feature Representation 37 wher
Page 55 and 56: 3.4. Feature Representation 39 Figu
Page 57 and 58: 3.4. Feature Representation 41 Figu
Page 59 and 60: 3.4. Feature Representation 43 for
Page 61 and 62: 3.5. Randomized Forest Classifier 4
Page 63 and 64: 3.6. Introducing Segmentation for O
Page 65 and 66: 3.7. Refined Labeling 49 high compu
Page 67 and 68: 3.8. Experiments on Benchmark Datas
Page 79 and 80: 3.9. Experiments on Aerial Images 6
Page 89 and 90: 3.10. Discussion and Summary 73 Fig
Page 95 and 96:
Chapter 4 From 3D to the Fusion of
Page 97 and 98:
4.1. Introduction 81 Figure 4.1: Th
Page 99 and 100:
4.2. Introducing a Common View 83 i
Page 101 and 102:
4.3. Fusion of Redundant Intensity
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
4.4. Fusion of Redundant Classifica
Page 109 and 110:
4.4. Fusion of Redundant Classifica
Page 111 and 112:
4.5. Experiments 95 Figure 4.6: A c
Page 113 and 114:
4.5. Experiments 97 Figure 4.7: A s
Page 115 and 116:
4.5. Experiments 99 Experiments on
Page 117 and 118:
4.5. Experiments 101 Figure 4.10: A
Page 119 and 120:
4.5. Experiments 103 Figure 4.12: I
Page 121 and 122:
4.5. Experiments 105 As shown in Fi
Page 123 and 124:
4.5. Experiments 107 Figure 4.16: A
Page 125 and 126:
4.5. Experiments 109 Figure 4.18: S
Page 127 and 128:
4.5. Experiments 111 Figure 4.20: C
Page 129 and 130:
4.5. Experiments 113 Figure 4.22: O
Page 131 and 132:
4.5. Experiments 115 the CRFs with
Page 133 and 134:
4.5. Experiments 117 Figure 4.25: S
Page 135 and 136:
4.5. Experiments 119 Dallas buildin
Page 137 and 138:
4.6. Discussion and Summary 121 Fig
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Chapter 5 From Interpreted Regions
Page 145 and 146:
5.1. Introduction 129 Figure 5.1: T
Page 147 and 148:
5.3. Overview 131 cation techniques
Page 149 and 150:
5.4. Building Modeling based on Sup
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
5.5. Experiments 141 Dataset Image
Page 159 and 160:
5.5. Experiments 143 Figure 5.9: 3D
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Chapter 6 Conclusion This thesis ha
Page 167 and 168:
6.2. Outlook 151 can be successfull
Page 169 and 170:
Appendix A Publications My work at
Page 171 and 172:
155 (17) Leberl, F., Kluckner, S.,
Page 173 and 174:
Appendix B Acronyms CRF Conditional
Page 175 and 176:
List of Figures 1.1 A illustration
Page 177 and 178:
LIST OF FIGURES 161 4.15 Obtained f
Page 179 and 180:
List of Tables 2.1 The basic inform
Page 181 and 182:
Bibliography [Agarwal et al., 2009]
Page 183 and 184:
BIBLIOGRAPHY 167 [Champion and Bold
Page 185 and 186:
BIBLIOGRAPHY 169 [Forsyth et al., 1
Page 187 and 188:
BIBLIOGRAPHY 171 [Hoiem et al., 200
Page 189 and 190:
BIBLIOGRAPHY 173 [Leibe et al., 200
Page 191 and 192:
BIBLIOGRAPHY 175 [Ojala et al., 200
Page 193 and 194:
BIBLIOGRAPHY 177 [Santner et al., 2
Page 195 and 196:
BIBLIOGRAPHY 179 [Unnikrishnan et a
show all

Semantic Interpretation of Digital Aerial Images Utilizing ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?