29.06.2013 Views

View/Open - ARAN

View/Open - ARAN

View/Open - ARAN

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Automated Aerial Image Analysis using Ordnance<br />

Survey Vector Data<br />

Brian Sexton<br />

Master of Science<br />

NUI Galway<br />

Department of Information Technology<br />

August 2010<br />

Dr. James Duggan<br />

Dr. Sam Redfern


Certificate of Authorship<br />

i


Contents<br />

ii<br />

Page<br />

Certificate of Authorship .......................................................................................i<br />

Contents ..................................................................................................................ii<br />

List of Tables ........................................................................................................ iii<br />

List of Figures........................................................................................................iv<br />

Abstract..................................................................................................................vi<br />

1 Project Outline...................................................................................................1<br />

1.1 Project Overview .......................................................................................1<br />

1.2 General Introduction and Background.......................................................8<br />

2 Stepping through the Algorithm....................................................................18<br />

2.1 Initial Inputs.............................................................................................23<br />

2.2 Area Extraction........................................................................................27<br />

2.3 Spectral Value Comparison .....................................................................29<br />

2.4 Confirmation............................................................................................32<br />

3 Sampling for the Baseline Image Key ...........................................................34<br />

3.1 Roads .......................................................................................................35<br />

3.2 Water........................................................................................................42<br />

3.3 Marsh .......................................................................................................49<br />

3.4 Coniferous Forestry .................................................................................55<br />

3.5 Mixed Forestry.........................................................................................61<br />

3.6 Track ........................................................................................................66<br />

3.7 Shade........................................................................................................72<br />

3.8 Roof Areas ...............................................................................................78<br />

3.9 Pasture......................................................................................................86<br />

3.10 Rough Pasture..........................................................................................92<br />

4 Testing ..............................................................................................................98<br />

4.1 Pasture Test..............................................................................................99<br />

4.2 Rough Pasture Test ................................................................................109<br />

4.3 Marsh Test .............................................................................................119<br />

4.4 Bog Test.................................................................................................132<br />

4.5 Conclusion .............................................................................................146<br />

5 Literature Review..........................................................................................148<br />

5.1 Spectral and image considerations for the thesis...................................152<br />

5.2 Vector and polygon based studies of aerial photography......................159<br />

6 References ......................................................................................................167


List of Tables<br />

iii<br />

Page<br />

Table 1: Road sample values .................................................................................37<br />

Table 2: Road test sample value 1 .........................................................................39<br />

Table 3: Road test sample value 2 .........................................................................40<br />

Table 4: Road test sample value 3 .........................................................................41<br />

Table 5: Water sample values ................................................................................43<br />

Table 6: Water test sample values .........................................................................45<br />

Table 7: Marsh sample values................................................................................50<br />

Table 8: Marsh test sample values .........................................................................52<br />

Table 9: Coniferous forestry sample values...........................................................56<br />

Table 10: Coniferous forestry test sample values ..................................................58<br />

Table 11: Mixed forestry sample values................................................................62<br />

Table 12: Mixed forestry test sample values .........................................................63<br />

Table 13: Track sample values...............................................................................67<br />

Table 14: Track test sample values........................................................................69<br />

Table 15: Shade sample values ..............................................................................73<br />

Table 16: Shade test sample value 1 ......................................................................74<br />

Table 17: Shade test sample value 2 ......................................................................76<br />

Table 18: Shade test sample value 3 ......................................................................77<br />

Table 19: Roof pixel sample values.......................................................................81<br />

Table 20: Roof test sample value 1........................................................................84<br />

Table 21: Roof test sample value 2........................................................................84<br />

Table 22: Roof test sample value 3........................................................................85<br />

Table 23: Pasture sample values ............................................................................87<br />

Table 24: Pasture test sample values......................................................................89<br />

Table 25: Rough pasture sample values.................................................................93<br />

Table 26: Rough pasture test sample values ..........................................................95


List of Figures<br />

iv<br />

Page<br />

Figure 1: Aerial view of sample area.....................................................................34<br />

Figure 2: Road area and surrounding detail...........................................................35<br />

Figure 3: Road area and vector data.......................................................................36<br />

Figure 4: Typical Water Area Image .....................................................................42<br />

Figure 5: Water Area Image Modification.............................................................44<br />

Figure 6: Sample area as a mosaic of polygons.....................................................47<br />

Figure 7: Typical Marsh Area Image.....................................................................49<br />

Figure 8: Typical Mixed Forestry Area Image ......................................................61<br />

Figure 9: Typical Track Area Image......................................................................66<br />

Figure 10: Typical Shade Area Image ...................................................................72<br />

Figure 11: Histogram for Shade and Pasture .........................................................75<br />

Figure 12: Typical Roof Value Area Image...........................................................78<br />

Figure 13: Distribution of Buildings/Roofs in the Sample ....................................80<br />

Figure 14: Blue colour band pixel count for study area.........................................82<br />

Figure 15: Typical Pasture Area Image .................................................................86<br />

Figure 16: Typical Rough Pasture Area Image......................................................92<br />

Figure 17: Creating the ASCII file.......................................................................100<br />

Figure 18: Aerial view of pasture test 1...............................................................100<br />

Figure 19: Red colour band for pasture test 1......................................................101<br />

Figure 20: Green colour band for pasture test 1 ..................................................102<br />

Figure 21: Aerial view of pasture test 2...............................................................102<br />

Figure 22: Red colour band for pasture test 2......................................................103<br />

Figure 23: Green colour band for pasture test 2 ..................................................104<br />

Figure 24: Aerial view of pasture test 3...............................................................104<br />

Figure 25: Red colour band for pasture test 3......................................................105<br />

Figure 26: Green colour band for pasture test 3 ..................................................105<br />

Figure 27: Vector data for pasture test 4..............................................................106<br />

Figure 28: Aerial view of pasture test 4...............................................................107<br />

Figure 29: Red colour band for pasture test 4......................................................107<br />

Figure 30: Green colour band for pasture test 4 ..................................................108<br />

Figure 31: Vector data for rough pasture test 1 ...................................................110<br />

Figure 32: Aerial view of rough pasture test 1.....................................................110<br />

Figure 33: Red colour band for rough pasture test 1 ...........................................111<br />

Figure 34: Green colour band for rough pasture test 1 ........................................112<br />

Figure 35: Aerial view of rough pasture test 2.....................................................112<br />

Figure 36: Red colour band for rough pasture test 2 ...........................................113<br />

Figure 37: Green colour band for rough pasture test 2 ........................................114<br />

Figure 38: Aerial view of rough pasture test 3.....................................................114<br />

Figure 39: Red colour band for rough pasture test 3 ...........................................115<br />

Figure 40: Green colour band for rough pasture test 3 ........................................115<br />

Figure 41: Vector data for rough pasture test 4 ...................................................116<br />

Figure 42: Aerial view of rough pasture test 4.....................................................117<br />

Figure 43: Red colour band for rough pasture test 4 ...........................................117<br />

Figure 44: Green colour band for rough pasture test 4 ........................................118<br />

Figure 45: Vector data for marsh test 1 ...............................................................120<br />

Figure 46: Aerial view of marsh test 1.................................................................121


Figure 47: Red colour band for marsh test 1........................................................122<br />

Figure 48: Green colour band marsh test 1..........................................................122<br />

Figure 49: Blue colour band for marsh test 1.......................................................123<br />

Figure 50: Aerial view of marsh test 2.................................................................124<br />

Figure 51: Red colour band for marsh test 2........................................................125<br />

Figure 52: Green colour band marsh test 2..........................................................125<br />

Figure 53: Blue colour band for marsh test 2.......................................................126<br />

Figure 54: Aerial view of marsh test 3.................................................................127<br />

Figure 55: Red colour band for marsh test 3........................................................128<br />

Figure 56: Blue colour band for marsh test 3.......................................................129<br />

Figure 57: Aerial view of marsh test 4.................................................................130<br />

Figure 58: Red colour band for marsh test 4........................................................130<br />

Figure 59: Blue colour band for marsh test 4.......................................................131<br />

Figure 60: Vector data for bog test 1 ...................................................................132<br />

Figure 61: Aerial view for bog test 1 ...................................................................133<br />

Figure 62: Red colour band for bog test 1 ...........................................................134<br />

Figure 63: Green colour band for bog test 1 ........................................................134<br />

Figure 64: Blue colour band for bog test 1 ..........................................................135<br />

Figure 65: Vector data for bog test 2 ...................................................................136<br />

Figure 66: Aerial view for bog test 2 ...................................................................137<br />

Figure 67: Red colour band for bog test 2 ...........................................................137<br />

Figure 68: Green colour band for bog test 2 ........................................................138<br />

Figure 69: Blue colour band for bog test 2 ..........................................................139<br />

Figure 70: Aerial view for bog test 3 ...................................................................140<br />

Figure 71: Red colour band for bog test 3 ...........................................................141<br />

Figure 72: Green colour band for bog test 3 ........................................................141<br />

Figure 73: Blue colour band for bog test 3 ..........................................................142<br />

Figure 74: Aerial view for bog test 4 ...................................................................143<br />

Figure 75: Red colour band for bog test 4 ...........................................................143<br />

Figure 76: Green colour band for bog test 4 ........................................................144<br />

Figure 77: Blue colour band for bog test 4 ..........................................................145<br />

v


Abstract<br />

This study sets out an algorithm for the automatic analysis of controlled (flattened)<br />

aerial photography using ordnance survey vector data. It uses the vector to clip the<br />

aerial image into a set of small area polygons which are then analyzed for the<br />

spectral properties and classified according to the result. The study tests sections<br />

of aerial photography from a sample area in County Galway for specific spectral<br />

properties. This was to identify the type of ground cover and was achieved using<br />

an image key of spectral properties which was developed during the study. This is<br />

called training the image key. A testing section shows that it is possible to derive<br />

information about the land use type from these areas based on the range of values<br />

returned from a pixel count of spectral properties within a small area polygon.<br />

The study uses several open source software frameworks to complete the<br />

experiment, most notably the MATLAB based Mirone application, but can be<br />

extended to any software capable of handling irregular polygons in a projection<br />

system. The body of the study is set out in three chapters, the first detailing the<br />

process, the second detailing the sampling for unique values and the third details<br />

the testing which took place. Chapters 3 and 4 are sub divided into sections<br />

describing the research on specific land use types and their spectral signatures.<br />

Chopping an aerial image into a mosaic of (relatively) homogenous values, e.g.<br />

pasture, forestry, marsh etc., increases the accuracy of automated analysis. This is<br />

the first time that a spectral analysis has been attempted using ordnance survey<br />

Ireland small area polygons to clip the image. It is of interest to researchers,<br />

planners, developers etc., looking to simplify and automate this type of search<br />

over a large region.<br />

Note: Contact brian.sexton@osi.ie for a set of sample files for training the image<br />

key.<br />

vi


1 Project Outline<br />

1.1 Project Overview<br />

The following study is an attempt to devise an automatic method of analyzing<br />

aerial photography based on vector data. It presents an algorithm and calibration<br />

data for someone seeking to complete a search of aerial imagery based on a<br />

spectral signature. The premise on which the work was undertaken was that, given<br />

the small area polygons and coding data present in ordnance survey data, it should<br />

be possible to cut a controlled aerial photograph into a mosaic of sections and<br />

automatically identify the type of ground cover.<br />

The goal of the study was to identify a series of steps with which this could be<br />

completed. These steps are intended as a template for either a standalone<br />

application which could run searches over large geographical areas, or as a means<br />

of achieving the result for smaller areas using existing open source software<br />

libraries. This document is aimed at people seeking to develop a generic tool for<br />

completing a spectral analysis of aerial photography, or for anyone looking to<br />

execute a search of the Irish landscape for data which exhibits a distinct spectral<br />

value (crop disease, impermeable surface area, flooding etc.). The study differs<br />

from other methods of automatic image processing in that it takes existing<br />

analysis (in the form of Ordnance Survey vector data) and uses it to convert the<br />

spectral data into manageable sections. This summary presents a chronological<br />

overview of the work completed, and outlines the process used.<br />

The basis of this study is the clipping of raster data for spectral analysis, and the<br />

intention is to prove that this makes the process of image analysis easier.<br />

Traditional approaches can refine the analysis itself to reveal more about the<br />

region of interest than is completed here. For example, a lidar (aerial laser imaging)<br />

survey could provide researchers with data relating to the height of a tree canopy<br />

or the depth of peat in a bog. This study, while it does identify specific sets of<br />

values relating to land use types, focuses on identifying an easily replicated<br />

process for automatically obtaining data from aerial imagery.<br />

1


The process was designed so that it can be coded into a standalone process and has<br />

been tested using open source software. In this way the study is aimed at<br />

simplifying what can be expensive and time consuming into a series of steps that<br />

someone without a high level of training in either mapping or computer science<br />

could run.<br />

One of the difficulties presented by attempting to find data from imagery is<br />

identifying target areas within a region of interest. This is compounded by the<br />

nature of values returned by an aerial image –the clusters of pixels with similar<br />

values are often not bounded by clean borders and often display a gradual gradient<br />

of values when merging with another cluster. In other words to automatically<br />

determine the true values on the ground a program needs to know the extent of the<br />

set of data that the clusters sit in. One analogy might be tables in a relational<br />

database –by dividing the total set of pixels for a region of interest into discrete<br />

parcels of land reflected in the photograph the program has a database of tables to<br />

query for specific values. This study takes the vector data and uses it to create a<br />

mosaic of separate pixel groups for analysis. This in itself, however, still leaves a<br />

huge body of data to be analyzed. To further improve an analysis the parcels<br />

within the mosaic are classified according to known values taken from the vector<br />

coding, and these known values are then used to train an image key, which in turn<br />

allows the remaining parcels to be analyzed. It is this clipping process which is at<br />

the core of this study. This provides a means of accessing the raster data which<br />

readers can easily replicate and automate for their own purpose.<br />

Vector data has been used to target and control aerial image analysis in previous<br />

studies with a degree of success. The studies often involve additional user input to<br />

refine the region of interest so that automated analysis techniques like multivariate<br />

analysis of variance can be applied to the image. This process of refining the<br />

region requires a level of technical expertise which could make a spectral analysis<br />

of aerial imagery too time consuming for many users. An example of one of these<br />

processes is contained in the 2007 assessment of impermeable surface area by<br />

Yuyu Zhou and Y.Q. Wang (Zhou & Wang, 2007). The authors determined that<br />

segmentation would be the most important part of the study and applied an<br />

algorithm of multiple-agent segmentation and classification. This involved<br />

importing transportation data to create buffers along the major roads at varying<br />

2


distances from the centre to divide the imagery. This process allowed the authors<br />

to determine the nature of ground cover with a high degree of accuracy, revealed<br />

by random point sampling. A description of similar studies and methods is<br />

contained in the literature review at the end of this paper. The type of preparation<br />

for the analysis required to determine the appropriate extent of image segments<br />

required for similar studies can be difficult to automatically include in a study. For<br />

example, the knowledge of the availability and accuracy of transport data and how<br />

to use it to segment the photography such as in the previously mentioned study<br />

(Zhou & Wang, 2007). An alternative to this approach is to use vector data of a<br />

known accuracy for the image segmentation. This approach is something which<br />

depends on the availability of vector data and confines the focus of this study to an<br />

Irish context.<br />

One of the benefits of applying an algorithm, which involves automatically<br />

segmenting an aerial image into small area parcels, is that it creates a platform on<br />

which further image analysis can take place. This study takes a set of ASCII<br />

coordinates from the vector data and clips the imagery. These parcels are then<br />

classified according to their land use type. A user could then use these<br />

classifications to target specific sets for interpretation. For example, determining<br />

the type of growth present in marsh areas by using the set of polygons for marsh<br />

returned to identifying the percentage of the pixel count corresponding to the<br />

expected value for the growth<br />

There are a couple of pre-requisite data sets for running this type of analysis and a<br />

section from both of these sets (just west of Oughterard, Co.Galway) was used for<br />

this study. The first requirement is digital vector data from the Ordnance Survey<br />

and the second is colour (RGB) photography stores in GeoTiff format and<br />

projected using Irish Transverse Mercator (to match the vector data). It is possible<br />

to automatically re-project the imagery using GDAL_transform (GDAL, nd.)<br />

given other projections, but this was not used in this study. The software<br />

requirements are:<br />

• Something which can manipulate and interrogate vector data files.<br />

3


• Something capable of handling irregular polygons within a coordinate<br />

system.<br />

• Software capable of analyzing pixel values.<br />

In this study a commercial package called Radius vision was used to export the<br />

coordinate set for each small area polygon. The processing of the image polygons<br />

was completed using the Mirone MATLAB based framework tool developed in<br />

the University of the Algarve by Joaquim Luis (Mirone, 2009). The histogram<br />

values for the segmented image sections was obtained using PCI Geomaticas<br />

geomatica package. For each polygon tested in the study the following steps were<br />

taken, which are the basis for the proposed algorithm:<br />

• Extract the point data which surrounds the polygon(s) within the region of<br />

interest.<br />

• Import the point data into a software procedure to clip the aerial imagery<br />

and save the segmented image in GeoTiff format.<br />

• Run a histogram analysis for the image segment.<br />

• Run comparison procedures to classify the segment.<br />

The point data mentioned above refers to controlled data points which indicate<br />

fixed x and y positions on the ground and can be used to analyze the imagery.<br />

Attached to those points are vectors and coding which, for much of the image,<br />

indicate the type of land use present, e.g. forestry, water, buildings, etc. The<br />

comparison procedures mentioned above refer to values obtained during a<br />

sampling process conducted in the early part of this study. This sampling first took<br />

sections from known polygons and recorded the spectral values for these samples<br />

in order to calibrate an image key. Samples for parcels of types not coded into the<br />

vector data were then taken and the percentage variance between the two sets was<br />

recorded.<br />

The sampling for the study was completed on ten separate types of land use. Five<br />

of these types were identified from the vector data, while the remaining types<br />

were identified using the techniques developed in this study. Separate sections of<br />

4


the image were extracted for the analysis, ranging from three to ten for each area<br />

type. The samples were uniform clear examples of each type of terrain, clear of<br />

any biasing factors such as shade or overhanging vegetation so as to obtain clear<br />

baseline data. The samples were extracted as GeoTiff images and analyzed for<br />

their spectral qualities. A full description and tables for each sample are contained<br />

in chapter 3, which is divided into sections according to the land use type for easy<br />

reference.<br />

In general terms the results were what might be expected; large bodies of water<br />

(e.g. river, lake) produced a clearly identifiable signature while mixed forestry and<br />

rough pasture areas had a higher level of standard deviation than more uniform<br />

cover. The areas sampled fell under the categories of; roofs, roads, water, marsh,<br />

rough pasture, mixed forestry, pasture, track and shade. Although shade is not a<br />

distinct area, values for shade (manually identified from the imagery and clipped<br />

for analysis) were used to calibrate the image key so that these could be<br />

recognised when found in polygons identified by the vector data. In a similar way<br />

values taken as representative for spectral qualities present in roadways, for<br />

example, did not include overhanging tree cover which is present in polygons<br />

extracted based on the ground revised vector data.<br />

The aim of this part of the study was to identify a series of proportional values<br />

which could be used to indicate the presence of a land use type for an unknown<br />

polygon –for example, a mean red and green pixel values for the known areas of<br />

water was identified as 30 and 45% that of pasture; something which an automatic<br />

search could use to flag an area being used as pasture. This sampling was not<br />

intended to be comprehensive in terms of creating a key for use in every possible<br />

automated image search, but was undertaken to prove the potential for automated<br />

image processing based on segmenting the images using small area polygons.<br />

One surprising result from this sampling was in the values returned for roof<br />

polygons. These polygons did contain two identifiable ranges of values associated<br />

with the pitch in the roof, where the angle of the light created shade on one side,<br />

which might facilitate a process to determine the angle of the pitch. At the<br />

beginning of the study I had believed that these roof values would provide enough<br />

5


of a control to calibrate most of the image. However, the variation in the ranges of<br />

values from shade to light on the angled surfaces made roof values an unreliable<br />

source of control value for the study. Of the known values samples, the most<br />

useful in terms of providing a consistent control to base comparative procedures<br />

were water, roads and coniferous forestry. Of the unknown (in terms of being<br />

automatically identified from vector data) pasture and bog had the most distinct<br />

sets of values. The next phase of the study involved testing the algorithm against<br />

these identified spectral values to see if the irregular polygons (with internal<br />

distorting factors) matched the range expected from the sampling.<br />

The testing process followed the outline for the algorithm. Polygons were<br />

extracted from the vector data in the form of a set of coordinate points saved in an<br />

ASCII file which were then used to create a clipping path to cut the relevant<br />

section from the aerial image, which was then saved in GeoTiff format. This file<br />

was then analyzed for its spectral content and the resulting range of pixel values<br />

was compared to those expected for the land use type.<br />

The testing focused on sets of known polygon types for three typical areas (not<br />

coded to the vector data); pasture, marsh, bog and rough pasture. It should be<br />

noted that this testing section of the study represented an execution of the<br />

algorithm but the level of automation can be improved when the vector data is<br />

made available in GML format. Coordinate sets for multiple polygons can be<br />

extracted in one file with GML format, something which is expected in the next<br />

two years.<br />

The areas analyzed were polygons containing marsh, bog, pasture and rough<br />

pasture. Of these, pasture and bog produced the most distinctive spectral traits and<br />

matched expected values, allowing for any comparative procedure to<br />

automatically classify them. Both the marsh and rough pasture sets of samples<br />

contained high levels of deviation from the mean pixel value across the red and<br />

green colour bands with a similar range of values. However, these can be<br />

distinguished by a trough between values corresponding to shade and vegetation<br />

present in all of the red colour band values for rough pasture. A full description of<br />

testing can be found in chapter 4 of this study.<br />

6


The results of this study point to the value of accessing the spectral values<br />

contained in aerial imagery through ordnance survey vector data. In almost every<br />

land use tested the polygons returned a consistent pixel count for the type. It is<br />

important to qualify these results by noting that the values are based on an<br />

analysis of the red, green and blue colour bands (so restricted to colour imagery)<br />

and the process relies on the vector data. It does, however, present a relatively<br />

simple means for completing an analysis of aerial imagery. This in turn opens up<br />

the possibility of coding a standalone application for analyzing and comparing<br />

polygons within a region of interest. The procedure takes the form of a series of<br />

loops designed to eliminate known values. Once the known values, followed by<br />

the derived values have been eliminated, the user is left with a relatively small set<br />

of polygons to examine and can apply a key which has been further trained for the<br />

specific study. The thesis layout begins with a description of the background to<br />

the study followed by three chapters.<br />

7


1.2 General Introduction and Background<br />

An overview of the work of this study can be found in the executive summary;<br />

this section is intended to provide background information and explain some of<br />

the terms used. The study was written with the intention of making it easy for<br />

someone to access the part of the study relevant to them and then make use of any<br />

techniques identified. For example, if your intention is to identify the percentage<br />

of bog in your region of interest; read the sampling section on bog, followed by<br />

the testing and then read how to apply the algorithm (chapter 2).<br />

I think it might be helpful if I first explained my interest and motivation for this<br />

work. For the past decade I have been involved in the photogrammetric capture of<br />

the vector data used in this study and know that this type of surveying is difficult<br />

and can be extremely tedious, but I believe it does present a template for<br />

automatic capture of additional data from aerial imagery. A brief description of<br />

the nature of this surveying can be found at the end of this section. I believe that it<br />

should be possible with a robust key of spectral data for known polygons (parcel<br />

of surface area enclosed by controlled boundaries such as walls/ fences/ roads etc.)<br />

to automatically search the data for specific values. In other words someone with<br />

little knowledge of mapping or software could select a region of interest and<br />

search for a particular value from either a selection from the imagery or<br />

coordinates imported form a portable GPS device. This could take the form of a<br />

standalone application or through various freely available software packages. This<br />

study focuses on the use of open source software but suggests areas where a<br />

specialized tool could be developed. In general terms, a search of aerial imagery<br />

requires specialized tools and knowledge to access the information contained in<br />

the data (such as the spread of crop disease, level of impermeable surface area etc.)<br />

and can be a time consuming process. This study is an attempt to automate that<br />

kind of search using small area polygons; something which is unique in its<br />

approach.<br />

The study itself involves a mixture of computer science and mapping. As more<br />

and more of the surface of the earth becomes digitally captured and analyzed,<br />

8


these two fields will by necessity start to merge. This study looks at one small<br />

aspect of mapping and how automated software could be used to increase the<br />

amount of information available to a user. The premise of the study is that, given<br />

enough previously captured and accurate data, it is possible to automatically read<br />

the landscape. In short I hope to take point and line data, slice sections from aerial<br />

photography, and run a spectral analysis. The focus of the study is in the<br />

methodology so that a means for completing automated updating is identified. I<br />

should point out that by updating I am referring to providing information relating<br />

to the percentages of ground cover within a small area polygon. The task of<br />

physically capturing new structures, roads, height values (even when considering<br />

lidar) is probably something that will always require a human eye to interpret the<br />

data to some degree, for example, if the structure is temporary or if road works are<br />

underway.<br />

It might be useful at this point to introduce some more background to this study.<br />

The island of Ireland was fully digitally mapped in 2005 and recently a new<br />

database for this data has been introduced which allows small area polygons to<br />

retain unique identifiers linked to the surrounding geometry and features. The<br />

mapping is on an update cycle but for the most part it can be assumed that these<br />

polygons will remain constant (with the percentage change even lower following<br />

the building boom of the last decade). This opens up the opportunity for someone<br />

to visit the sections of surface area represented by the area polygons over<br />

successive runs of aerial photography and extract land use change data.<br />

I mentioned earlier that the focus of the study is on establishing a methodology;<br />

this was because the motivation for assessing land use change would vary<br />

according to the user. An example of this might be someone considering the<br />

potential for flooding within an area. This person might want to take a look at the<br />

water courses and new housing developments to determine the amount of<br />

impermeable surface area (paving, patios etc.) over the course of their study. To<br />

physically do this, either using a photogrammetry tool such as SOCKET SET or<br />

field GPS, would be an onerous task. The aim of this thesis is to provide a method<br />

for automatically doing this. It might seem an obvious point, but the more<br />

information available to a process when commencing an examination of an area,<br />

9


the higher the chances of successful data being returned. This is where this study<br />

differs from previous attempts at automated data capture. The process being<br />

suggested takes a large amount of previously captured and verified data to aid the<br />

algorithm. By this I mean most sections of the aerial imagery are extracted based<br />

on definite boundaries such as walls, streams, buildings etc. Internal polygons<br />

within the target area such as water, buildings, roads, forestry are also identified<br />

and used to aid the search. In this way the study is entirely dependent on<br />

previously surveyed data. This is something that has not been attempted before<br />

with Irish data. I did not find any similar study from overseas over the course of<br />

my research. I hope to prove to you that this is something that is possible to do<br />

and implement, using sample software.<br />

The software used in the study comes from several open source projects, and also<br />

from one commercial vector data manipulation package (Radius). These are all<br />

packages which could be considered to be generic tools. It is important to note that<br />

this is not in reference to their capabilities or any slight on the people who develop<br />

them but in that the functions being accessed are common to several similar<br />

software packages. For example, the ASCII coordinate files created using Radius<br />

could equally have been achieved using Arc<strong>View</strong> or Microstation (among others).<br />

The intention was to keep the algorithm as flexible as possible so that users could<br />

adapt it to their available resources.<br />

Some of the primary software tools being used in this study come from the GDAL<br />

(geospatial data abstraction) library. In particular, its facility for writing raster<br />

geospatial data format is used to manipulate the geoTiff files containing the aerial<br />

imagery being used in the study. GDAL came about as a project sponsored by the<br />

open source geospatial foundation, which is a non profit, non-governmental<br />

organization set up to support the development of open source geospatial software.<br />

The foundation also supports projects like geotools, grassGis, mapbender and<br />

mapgrade open source mapserver among others. In this study the GDAL library is<br />

accessed using another open source software library known as <strong>Open</strong>EV. This<br />

allows the GDAL library to be presented within an application for displaying and<br />

analyzing the data. As with GDAL, it is implemented in C but has the potential for<br />

manipulation with Python. In this study the processes will be run on a Windows<br />

10


platform and used as a means of accessing the raster data from the geoTiff<br />

imagery. It is necessary to access GDAL in order to open the geoTiff using the<br />

appropriate ITM coordinates for the search area point set being searched. This<br />

choice of access to the raster data should not suggest that GDAL and <strong>Open</strong>EV are<br />

unique. Similar software exists that could also have been used in the study.<br />

Another example is the ImageTool open source software library. This was<br />

developed in the 90’s in the department of dental diagnostic science in the<br />

University of Texas, and written using C++. GDAL and <strong>Open</strong>EV were chosen in<br />

preference because the data returned could be more easily modified to striate<br />

histogram and statistical data with these libraries due to the larger body of work<br />

contained within them. One of the most important software considerations was<br />

flexibility (and extensibility), as ideally the users of any suggested methodology<br />

would modify the process to suit their particular study. As was mentioned above,<br />

the preferred option was to build a top down processing tool tailored to the<br />

methodology which would accept methods from other libraries to create plug in/<br />

additional functions. For this study <strong>Open</strong>EV is substituted for that purpose.<br />

Outside of the basic software, two other core components exist. These are the data<br />

relating to the specific polygons being extracted and the key to represent the<br />

colour values being studied. As with most aerial image analysis studies, defining<br />

and validating this key forms the major part of the work involved. The process is<br />

aided by the availability of data which could be classed as controlled, that is the<br />

knowledge of areas of roof and water, which can be used to reference the other<br />

values in terms of their deviation from these known values.<br />

As mentioned at the start of this general introduction, a commercial geographic<br />

software package was used to extract the coordinates from the vector data (a vital<br />

first step in the algorithm). This particular software was not chosen for any<br />

specific capabilities other than the fact that I already had a licence and wanted to<br />

focus on proving the premise of the study (as opposed to executing the function<br />

over a specially tailored package). There are a number of commercially available<br />

image processing packages which have relevance to this study. One of these,<br />

SOCKET SET, could potentially assist the study. This software is a<br />

photogrammetry package developed by BAE systems for working on aerial<br />

photography. It allows the user to capture three dimensional data points from<br />

11


overlaid aerial imagery. This is due to a system of triangulation based on the<br />

position of the cameras when the photograph was taken. As much of the data used<br />

to extract the sections of aerial photography being analyzed in this study was<br />

captured using this process it is possible that the study could be completed during<br />

this point in data capture. I am referring here to map production, and the stage at<br />

which line and point data are taken from remote imagery. At this stage in map<br />

production it is possible that an-add on process linked to the photogrammetric<br />

software would allow the user to run an analysis of the polygon at the moment it is<br />

fully captured and coded, which in turn would mean that the area marker and<br />

associated polylines could then be given added data of value to the end user<br />

(spectral content of the polygon/ percentage land cover/ rough pasture/<br />

impermeable surface area etc.). I decided against investigating this further for two<br />

reasons. Firstly it would have involved a difficult and time consuming<br />

collaboration with the software provider. Secondly, and more importantly, this<br />

country has been fully digitally mapped and photogrammetric work is now<br />

confined to update only. This means that the application of any algorithm at the<br />

photogrammetric/ data capture stage would be confined to small, mostly urban<br />

areas.<br />

It is difficult to discuss the manipulation of spatial data without reference to the<br />

ArcGIS package of software created by ESRI. This is widely used in both<br />

commercial and educational entities for interpreting spatial data. In terms of this<br />

study the Arc<strong>View</strong> package within ArcGIS could have been utilized for image<br />

processing once the input files were converted to shapefile format. The limitations<br />

imposed by having to obtain a licence for the software (outside of trial/<br />

educational versions) precluded the use of this package. This is not to indicate that<br />

the software would not have been a useful tool for manipulating the imagery in the<br />

study, only that it was not practical at the time of the study..The value of this study<br />

is in allowing new data to be obtained and added to that originally obtained from<br />

an analysis of aerial imagery. While the study identifies one means of doing this,<br />

in terms of software and operating platform ( i.e. executing the process using parts<br />

of the GDAL library and ASCI input files), there are potentially numerous other<br />

software processes that could apply. The algorithm, however, is intended to<br />

remain as independent of software considerations as possible; being constrained<br />

12


only by the quality of the remote imagery and the accuracy of the captured data<br />

points and associated coding.<br />

Looking briefly at some of the other commercially available desktop GIS software,<br />

it is intended that the methodologies suggested could be applied to these products.<br />

However, due to the limitations involved in both learning to use the software and<br />

licensing issues this application of the study was not explored. These products<br />

include AutoDesk, Microstation, the ESRI Arc<strong>View</strong> product mentioned in the<br />

previous paragraph, IDRISI and MapInfo among others, such as the 1Spatial<br />

Radius platform used to edit the geometric input data being used in this study. All<br />

of the above products are useful in the case of updating and editing, that is to say –<br />

dealing with change. This study looks at read only data and could be described as<br />

a way of interpreting already captured data. One result of this is that the functions<br />

required to store and update change polygons and data values are not needed in<br />

the proposed algorithm. The ability to connect the statistical data to the unique<br />

identifier for the polygon should be enough to allow it to be input as an attribute<br />

by the spatial database management system. Outside of analysis the GIS software<br />

requirements relate only to coordinate transformation. As a result, while storage<br />

(of the statistics) remains a consideration, the necessity for creating, editing or<br />

updating (moving points etc.) do not form part of the requirements for the study.<br />

Even though a specific analysis tool (in terms of a standalone executable) is not<br />

presented in this study it is possible to create one and add to the existing body of<br />

open source work. <strong>Open</strong>EV, for example, allows for the addition of newly created<br />

functions using a Python compiler. This programming language allows a user to<br />

interface with GIS applications written in C and has the potential to be a flexible<br />

means of accessing C libraries (e.g., accessing GDAL from <strong>Open</strong>EV). It has been<br />

used to compile different software libraries such as GeoDjango, Thuban, <strong>Open</strong>EV,<br />

pyTerra, and AVPython. The language itself is not used in this thesis as it added<br />

another layer to the process but provides a possible means of packaging an<br />

extended experiment. One of the advantages of using Python as a programming<br />

language in preparing a GIS application is that the assignment of a variable does<br />

not have to indicate whether it is declaring a string, number, list etc. The variables,<br />

however, are case sensitive and follow the ESRI using a combination of lower and<br />

13


upper case, beginning with lower case. The acronym for the variable is at the<br />

beginning of the name, while the descriptive part follows, beginning with an upper<br />

case (Eg. htElev). In addition to modules such as math and string Python also has<br />

several geoprocessing modules. One of these, arcgisscripting, accesses all the arc<br />

toolbox tools. It should be noted that the geoprocessing object that might be being<br />

called is accessed differently, depending on the version of arcGis being used.<br />

Another package called gdal (which accesses the spatial data abstraction library<br />

being utilized in this thesis) allows for the manipulation of this library. In this<br />

Python module the language connects to the original gdal programming language<br />

(usually C or an object oriented variation) using a SWIG interface compiler. An<br />

example of Python in use in GID can be its application alongside ArcGis; ArcGis<br />

was built using hundreds of arc objects such as “featureClass”, “symbol”, “field”<br />

etc., each of which has properties and methods accessed by Python using dot<br />

notation. An example of this notation is the assignment of a variable name tr =<br />

arcGisScripting.create().<br />

Another possible method for implementing the process suggested in this study is<br />

through the .NET platform. In particular VB.NET provides a programming<br />

language that can be utilized to access GeoMedia software –which is a .NET<br />

oriented group of geographic software packages provided by Intergraph. This<br />

software allows the user to interact with ESRI shapefiles and also with spatial<br />

databases created using Oracle Spatial. It also allows for developed tools to tie in<br />

with graphical editing platforms such as Autocad and Microstation which would<br />

allow for interpretation of both images and associated geospatial data. This would<br />

also mean (given appropriate licence) that the developer could access specific<br />

ancillary products for database management (for databases based in Oracle) and<br />

others ranging from map production to 3D modeling. This programming language<br />

(VB.NET) was not used in this thesis because of the potential licencing issues<br />

which may have been involved.<br />

This thesis suggests procedures which lend themselves to C as they involve<br />

repeated loops of steps, from the pixel analysis to the classification of the<br />

polygons identified by the user through the region of interest. These procedures<br />

involve processor heavy analysis and lend themselves to being developed in C. In<br />

14


general terms, of the programming languages used in GIS development (and aerial<br />

image analysis), the C programming language is the most widely used to interpret<br />

geographic information. Many analysis programmes such as MITAB and Shape<br />

Library use C as a means of accessing geographical data. One of the advantages of<br />

using C is that processor heavy functions such as the analysis of pixels in order to<br />

categorize them into shades and variations from a mean can be best achieved in<br />

this language. This is probably most evident by the fact that most of the open<br />

source programming projects such as ImageTool or GDAL have all been written<br />

using the C programming language. This thesis makes use of a small aspect of<br />

these libraries and as such in turn uses the C programming language. This is not to<br />

suggest that the default programming language for this type of study should<br />

necessarily be C but that in order to make use of the available body of knowledge;<br />

previous studies can probably be best extended using C. It is important to note that<br />

the main problems encountered when analyzing aerial data/ imagery are those of<br />

co-ordinating the imagery so that it can be referenced and analyzed properly. The<br />

three main methods of this are geographic, projected and pixel.<br />

This thesis uses a mixture of both pixel and geographic. The fact that it is<br />

necessary to use both for the relatively simple cutting and analysis of image<br />

segments demonstrates the importance to GIS programming of being able to<br />

transform coordinates. Geographic coordinates refer to latitiude and longitude,<br />

while projected coordinates refer to a flat two dimensional coordinate structure.<br />

The C programming language (through the available libraries and its high level<br />

nature) provides an accurate means to execute these coordinate transformations.<br />

Note: Other options are available, such as the modules in .net –and coordinate<br />

transformation is something which can be achieved across all programming<br />

languages once the correct math functions are accessed.<br />

Another factor which needed to be considered while reviewing programming<br />

languages for this thesis was the limitations that would occur due to the<br />

programming experience of the author. Ideally, a processing algorithm, written in<br />

C with a Visual basic front end, which could also tie into OSI metadata would be<br />

the preferred solution. This could then be extended to allow a user to zoom in on a<br />

map window, identify a subject area, review the available photography and target<br />

15


a selected study area. This is possible using existing systems but falls outside of<br />

what could reasonably be achieved with the available resources for the study. The<br />

main purpose of the study is to determine whether the methodology being<br />

suggested is applicable and whether useful data can be returned from this type of<br />

study. The degree of success indicates the fesability of tailoring an application.<br />

The benefits from developing the already well researched methods of analyzing<br />

imagery; in terms of aggregating the pixels and deriving statistical data from a<br />

selected tile of aerial photography would be limited. This is because there is<br />

already a vast body of knowledge dealing with the subject available. In particular I<br />

am referring to work such as ImageTool, which, again written in C, was<br />

developed in California and is open source. It (along with several other open<br />

source image processing projects) effectively interprets imagery in terms of its<br />

spectral content. As I suggested earlier, one of the most important aspects of<br />

viewing and studying the surface of the earth remotely is the way it is projected.<br />

That is to project it into a format which can give valuable information to the user<br />

and this is where a large part of this study will focus. The study takes a captured<br />

and referenced coordinate grouping (set of data points) and uses these to analyze<br />

sections of the earth. These coordinate groupings are definite points along fixed<br />

boundaries which form physical barriers in terms of walls, streams, buildings and<br />

roads. These could probably be better explained in terms of a bull in a field. In<br />

general terms, any polygon which would prevent the bull from escaping forms a<br />

parcel, which is then analyzed. This means that the study areas are bounded by a<br />

series of fixed vectors/ polylines which are unlikely to deviate over time, allowing<br />

the suggested algorithm to be run over successive years of data capture. With a<br />

standard mean and control key for spectral values it should be possible to gain an<br />

insight into changes in land use in the specific semi-urban areas looked at the<br />

study.<br />

The study areas could possibly be extended to rural areas over time. The reason<br />

why the study does not extend to these areas is that it would have to account for a<br />

much larger study area and less well defined boundaries. This would probably<br />

only be effectively done using tried and tested values derived from more fixed (ie.<br />

no fuzzy data/ hazy boundaries where pixels gradiate and physical features such as<br />

man made fences and walls are not present) polygons.<br />

16


The start of this study contains a glossary of terms, which are probably familiar to<br />

most readers, and the next three chapters will refer to some terms which are<br />

specific to this type of analysis. The first term is aerial imagery which is a<br />

reference to all the spectral data obtained during the study. When described as<br />

aerial imagery or raster polygons the reference is to an aerial image corrected to<br />

allow for distortions such as slopes so that it corresponds to the vector mapping.<br />

The second term which is repeated is the vector data, which refers to ordnance<br />

survey data which was captured through a mix of photogrammetry and field<br />

surveying. Although most readers are probably familiar with the .tiff file format it<br />

is probably worth noting that the input photography files for the study are in the<br />

GeoTIFF format. This format complies with the TIFF 6.0 standard and gives the<br />

input data the flexibility to be accessed in a wide range of programs; which<br />

allowed the imagery to be viewed outside the study software as the work was<br />

undertaken. The key metadata components of this file format for this study are the<br />

georeferencing coordinates which allow the sections being analyzed to be<br />

accessed. This format is also recognized by the GDAL library being used in the<br />

study. The projection used in all the files used in the study is ITM. This is not vital<br />

to the success of the algorithm but making use of an additional projection requires<br />

the inclusion of a transform function whenever the datasets intersect. The<br />

following three chapters form chronological record of the study; starting with the<br />

suggested algorithm (Chapter 2), followed by the sampling section necessary for<br />

the basis of the procedure (Chapter 3) and finishing with a test on known polygons<br />

for specific search values (Chapter 4).<br />

17


2 Stepping through the Algorithm<br />

This thesis introduces a method for analyzing aerial imagery that can be translated<br />

into a procedure and run automatically. The operation is specific to two types of<br />

data<br />

• Digital vector files from the ordnance survey.<br />

• Controlled aerial photography stored as GeoTiff files.<br />

Both of these data sources are projected using ITM projection and are referenced<br />

during the study. The premise of the study is that it is possible to automatically<br />

capture additional information about area polygons from aerial photography using<br />

previously captured vector polygons as a guide. It is an attempt to fill in the blanks<br />

in terms of polygon attributes not included in the photogrammetry which led to the<br />

vector data. The focus is not primarily to obtain an accurate list of all polygons<br />

from the sample data but instead to identify a verifiable method for automatically<br />

doing so. As the focus is on the identification of methods, the process that is<br />

outlined can be extended to apply to searches for specific spectral qualities –in<br />

other words someone searching for a particular crop type might employ the<br />

algorithm here, but add a target set of data specific to their work. In short what<br />

follows is an attempt to take the two data sets mentioned above (photography and<br />

vector data), combining them and returning a new set of information derived from<br />

both. The process does not merge the data sources but uses the vector data (a large<br />

portion of which was derived from the photography) as a reference to cut<br />

segments from the imagery and treat these segments as smaller manageable pixel<br />

collections for analysis. This process is helped by the fact that the content of many<br />

of these polygons is known and has been coded to the vector data.<br />

What was completed in the sampling part of the thesis was an attempt to identify<br />

specific spectral qualities that can be applied to these known polygons, and then<br />

used to reference the unknown areas. This had a reasonable level of success with<br />

some polygon types making a more useful reference than others. A description of<br />

these can be found in the sampling section of this study. Automated aerial image<br />

analysis generally focuses on attempting to determine the values of the imagery<br />

18


from scratch. For an example of this type of work see Thomas Knudsens 2005<br />

study on aerial image analysis. What is unique about this study is that it attempts<br />

to use previously captured data as a basis for further image interpretation. This is<br />

something which, from the research into the data and contact with ordnance<br />

survey, has not been attempted before for Irish digital spatial data. All of the<br />

difficult remote sensing is complete (via the vector mapping) before this analysis<br />

begins, and control points, physical boundaries and closed polygons have all been<br />

identified. This study presents a method for developing software to extend the<br />

work completed, and assist users to identify specific traits in what would<br />

otherwise be an impossibly large store of imagery for the human eye to analyze<br />

(without using a team of trained analysts). The algorithm proposed here is<br />

essentially a way of looking for spectral values in small area polygons and<br />

comparing them to known values.<br />

The process outlined is for people intending to scan aerial imagery of Ireland for<br />

specific spectral properties. It is intended as an additional facility for users of<br />

aerial photography. At present it is possible for people to conduct specific research<br />

using the photography and a GIS tool. This algorithm is intended to make the<br />

process accessible to users who do not have the time or access to the resources or<br />

software licences to conduct this type of research. It can be used to identify<br />

specific land use types which are outside those currently captured by ordnance<br />

digital mapping and serve as an add on tool for anyone using that type of data. The<br />

purpose in compiling the data and researching software for the study was to<br />

outline a method and as such the application of the method is dependent on the<br />

user. In the case of this study a lot of emphasis was placed on the identification of<br />

pasture. This is because it is the major form of land use in rural and peri-urban<br />

areas and its correct identification helps in limiting the search for other types of<br />

cover to a relatively narrow number of polygons. In a similar way someone could<br />

take the same steps –identifying unique statistical properties of the pixel count in<br />

the types of area being studied and add them to the algorithm. This would involve<br />

including the additional search to the cycle of flagged polygons at the end of the<br />

third part of the search execution.<br />

19


The process can also be coded into a standalone application or as an extension of<br />

existing software for a specific use (such as searching for crop disease). One<br />

example might be with the python based raster viewer <strong>Open</strong>EV, where a user is<br />

analyzing aerial photography using the package. It is possible to extend the<br />

functionality of this to analyze statistical data using the GDAL library. A user<br />

concerned with a specific set of spectral values, or wanting to confine the research<br />

to a specific area polygon type within the image, could make use of the methods<br />

set out here to set the statistical function to return target data only (as opposed to a<br />

general application of the histogram function). In broad terms this study is for<br />

users of aerial raster imagery and the results of the sampling are based on samples<br />

of Irish data. It may be possible to execute similar studies for different regions but<br />

the small well defined polygon types with clear consistent (over large periods of<br />

time) boundaries are a vital part of the analysis. This is probably a result of<br />

relatively small property divisions and rigorous maintenance of the boundaries<br />

over hundreds of years and may be unique to Ireland. In short the study is a look<br />

at a possible coded routine to analyze the Irish landscape using all the available<br />

data.<br />

As mentioned in the previous paragraph this study is intended for users of aerial<br />

photography. The pre-requisites to this are that it is controlled and has the<br />

projection embedded in the file, and the users have access to ordnance survey<br />

vector data. Outside of these conditions the algorithm is intended for users who do<br />

not have a strong background in information technology as much as those who<br />

have a good knowledge of code and could easily convert the proposed steps into<br />

routines. The open source software described in the study has a familiar user<br />

interface to any GIS package (standard toolbars/ zoom/ measurement etc.) and it is<br />

possible for someone to run the algorithm without having to alter any of the steps.<br />

Ideally, however, the steps would be converted to add on to an existing piece of<br />

software that is being used (Arc<strong>View</strong>, Microstation etc.) so that the user can<br />

quickly run through large amounts of data. In this way the routine is designed for<br />

anyone who is interested in targeting specific properties of Irish topography that<br />

can be defined in terms of their spectral values. These areas range from forestry<br />

and agriculture to urban planning. The limitations of the study are in the quality of<br />

the imagery and it was shown that some potential applications of spectral analysis<br />

20


would not return accurate results. A study of sediment levels in drains or canals,<br />

for example, could not be developed using the methods outlined here because of<br />

the difficulty in getting a large enough sample to train the image key.<br />

The method outlined can be used against small area polygons, so can be applied to<br />

land use types across most of the country, with the exception of full urban areas<br />

and remote mountain areas (where the small land divisions are not found). It<br />

should be noted that the target areas described refer to peri-urban data. This is<br />

because fully urban areas are covered by large scale 1:1000 mapping and spectral<br />

analysis would not improve on the available data (outside of highly specialized<br />

heat radiation studies etc., which are not the intended use for this method). The<br />

method could also be used by someone seeking to trace patterns in land use over<br />

recent decades. This study would be confined to RGB photography as a<br />

comparative analysis of the properties of the pixel counts by colour band are a<br />

requirement for the process. Once the proposed key is calibrated for the particular<br />

run of photography, the algorithm could be set to run for a specified region of<br />

interest across the period when this type of photography was available.<br />

This differs from previous studies looking at automatic aerial photography in two<br />

ways. Firstly, the focus is specific to Irish ordnance survey data and concentrates<br />

on making use on the codes and known values that can be extracted from this.<br />

Secondly, the study uses small area polygons to target the spectral analysis of the<br />

imagery to relatively small sample areas. This is to reduce the difficulty posed by<br />

variations in pixel values found in large samples. In this the process is relatively<br />

unique as it takes the small area polygons as a guide and cuts the matching areas<br />

from the raster aerial imagery. This allows for automatic decisions to be made<br />

regarding the level of standard deviation present in the sample. In many ways this<br />

simplifies the process of image interpretation because most of the difficult image<br />

control work is completed and the software can then focus on variations specific<br />

to the ground cover being studied. While this has limitations on the extension of<br />

the work to other (general small scale) datasets it does outline a method for<br />

automating the search of imagery. Over the course of the last two years of this<br />

Masters study I have learned about how users can interrogate and manipulate data<br />

in large databases.<br />

21


Aerial imagery is a form of database. Once it is structured into tables it can be<br />

interrogated for spectral properties just like vector data in a spatial database. By<br />

using the boundary data points from the vector data the imagery is converted to<br />

manageable sections and properties can be determined and logged. This sub<br />

division of the image into a mosaic of areas, starting with known polygons,<br />

moving to polygons which can be easily classified (strong variation form the other<br />

known values with a low level of standard deviation such as cut pasture) and<br />

flagging any whose values fall outside the image key means the job of analyzing<br />

the image is made easier. This sub-dividing of raster imagery is something which<br />

has not been attempted with Irish ordnance data and aerial photography (to the<br />

best of my knowledge, I have conducted a search of research papers and similar<br />

work had not been undertaken within the ordnance survey). The focus of the study<br />

is on proving that this method is practical, and can be applied to a variety of area<br />

types. The methods suggested by this study are unique to the area divisions and<br />

available vector data, and present the steps necessary to train an image key to look<br />

for specific properties in the Irish landscape.<br />

The process works by taking the point data from the polygons contained within<br />

vector data representing an area of an image. Using this point data to crop the area<br />

of the image the polygon it represents and log pixel values for that area. This is<br />

repeated for every area in the region of interest. These are then compared to an<br />

image key and areas classified according to the presence of values specific to the<br />

key. One such key was developed during this thesis but could be re-calibrated for<br />

higher values. In other words a higher mean for an water bodies within a separate<br />

run of photography would increase the key values by that amount in the key.<br />

Within the key are known values (water, forestry, roads etc.) and the proportional<br />

difference (in terms of the mean pixel count for values in the red, green and blue<br />

colour bands, and the levels of standard deviation) between these known values<br />

and search values (such as pasture) is measured against the histogram values for<br />

cropped polygons of unknown use and a category applied for matches. In other<br />

words the process steps through locating, cutting and analyzing small areas of the<br />

image to enhance the available data and search specific values across the whole<br />

image.<br />

22


2.1 Initial Inputs<br />

The following four sections describe the work in terms of steps through the<br />

algorithm being proposed. This first section introduces the initial inputs<br />

required to define the region of interest to be analyzed:<br />

This study attempts to find an automatic method for image analysis using vector<br />

data as a reference, and in particular small area polygons and their associated<br />

coding. At the beginning of the proposed algorithm a user is required to input a<br />

region of interest for the process. This corresponds to the geographical area in<br />

which the user is interested. The most convenient way for someone to do this is to<br />

manually select the area from vector data or photography (or a combination of<br />

both displayed together) displayed in a window on a pc. The result of this should<br />

be a set of co-ordinates from which the study area can be extracted.<br />

The user also needs to input a sample target area for the study. This can be one of<br />

a set of values developed in this study or may take the form of a particular<br />

variation (such as a distinct type of crop etc.). In the second case a sample of the<br />

required value is needed. This can be obtained in the same way as the first part of<br />

the region of interest selection where, as mentioned above, the user manually<br />

selects the target area from a viewing window and the output is a set of co-<br />

ordinates. A second way this target data might be obtained would be from a co-<br />

ordinate set input from a field survey completed using mobile GPS device. In this<br />

case the co-ordinates first need to be converted to the Irish Transverse Mercator<br />

framework, so as to allow the process to match them to the projection used in the<br />

photography and vector mapping.<br />

The general flow of the first part of the algorithm being suggested is user inputting<br />

the region of interest and a required value from the image analysis, which are then<br />

converted to a format which can be compared against the data. In the case of the<br />

software used in this thesis this takes the form of a simple ASCII file containing a<br />

co-ordinate set, but a common format might also be the .shp file used by ESRI.<br />

23


The software required for this step in the proposed algorithm includes an<br />

application for viewing and analyzing raster data and capable of performing<br />

transformations on sets of co-ordinates. For this study four sets of libraries were<br />

used, packaged into open source applications known as <strong>Open</strong> EV, Mirone and<br />

GDAL. The vector data was clipped using an application which forms part of a<br />

geographical information system called Radius Vision. All the processes<br />

necessary for the first part of this algorithm can be performed using GDAL, with<br />

the exception of clipping to an irregular polygon, which is still under development<br />

(GDAL, 2010). There is a license requirement for the Radius software, which was<br />

used in this study for the step involving the user selecting the extent of the region<br />

of interest in the vector mapping. It should be noted that this can also be<br />

completed using any other vector mapping tools such as Arc<strong>View</strong> (which would<br />

create a .shp file). Another alternative is for the user to manually create an ASCII<br />

file of co-ordinates (with the convention of easting northing, separated by<br />

newline). This alternative can be frustrating for the user and the suggested process<br />

is to make use of software capable of designating the region of interest through a<br />

viewer.<br />

The data required for the aerial image analysis is ordnance survey ortho-rectified<br />

colour aerial photography and matching digital mapping. The archive of aerial<br />

imagery goes back to the 1970s and the algorithm being suggested is designed to<br />

operate with any run of photography so users can discern dispersal patterns over<br />

time through successive photography dates. The process, however, makes use of<br />

the three colour bands present in colour photography and is limited to<br />

photography with the red, green and blue colour bands. The vector data used will<br />

take the form of 1:5000 or 1:2500 scale digital data and it is this data which forms<br />

the basis of the search process. The vector data has the region of interest divided<br />

into a mosaic of small area polygons the majority of which are coded according to<br />

their use or content. The aim of this thesis is primarily to automatically register<br />

additional data for those of unknown use type –and secondly to flag those of<br />

known use type with specified (spectral) anomalies from user requests. In order to<br />

be successful the data requires the coding, which may be useful to consider as a<br />

data hierarchy at this stage in the process. The following hierarchy is only for<br />

illustration. In practice each polygon will be analyzed according to its spectral<br />

24


content and placed in a unique set. Any relationships between those sets would be<br />

made post analysis by the user for the purposes of their particular survey. Having<br />

said that the known polygons are;<br />

Forestry –divided into categories of mixed, coniferous and deciduous.<br />

Water –divided into categories of stream, lake, river, drain, pond and reservoir.<br />

Road –divided into categories of motorway, national primary, national secondary,<br />

regional, third, fourth and track (also coded as footpath and forestry road).<br />

Buildings –variously coded as solid, dwelling and a variety of functions though for<br />

the purposes of the algorithm they will be treated as one data type (this is because<br />

they were found to be unreliable in terms of consistent spectral values and biased<br />

result sets from the spectral analysis).<br />

Two other aspects of this data, the presence of marsh and pasture symbols can be<br />

used to indicate known values for a polygon when found inside the bounding co-<br />

ordinates, though in the study these must be compared against the spectral key to<br />

ensure the symbol in representative of the entire polygon.<br />

The output from this stage in the algorithm is two required and one optional data<br />

set. The first necessary return is an area of vector mapping, containing a mosaic of<br />

vector polygons divided by polylines representing real world physical boundaries<br />

between the areas and the ordnance survey coding related to this data. This was<br />

extracted using the extract_map function from the Radius GIS library, but this<br />

software is not a necessary requirement –the data can be exported to a different<br />

format and a similar procedure completed (using Arc<strong>View</strong>, Microstation,<br />

AutoCAD etc.), once the map projection and co-ordinate attributes associated with<br />

the vectors are retained. The study did not explore the possibility of creating a<br />

unique vector data cutting tool as it can be assumed anyone making use of this<br />

algorithm will have access to some level of mapping software (if not similar open<br />

source software can be obtained from Brazil’s National Institute for Space<br />

Research under the SPRING project –see Appendix) The second necessary return<br />

is a region of aerial photography matching the co-ordinate set outlined in the<br />

section cut from the vector data. This can be cut using the co-ordinate set outlined<br />

from the vector data. In this study the MATLAB based Mirone software<br />

developed in the University of Algarve for earth sciences was used to extract the<br />

25


study area from the photography and the input file was in ASCII format (although<br />

other formats, such as .shp could also be used).<br />

The third optional output from this stage in the process is a co-ordinate set for<br />

possible sample areas in the image which a user is seeking to complete an<br />

inventory. This could be selected from the image using the vector manipulation<br />

software described in the previous paragraph, or could be obtained from point data<br />

collected by the user in the field. If field data is used there are two requirements.<br />

Firstly that it makes a closed polygon so an area can be sampled for spectral<br />

values. Secondly, that the co-ordinates conform to ITM projection to match those<br />

of the imagery. Transformation of the co-ordinates can be achieved using<br />

gdalwarp (GDAL, nd.) which can then be used to extract the required pixel set for<br />

examination through software such as Mirone. If a specific value is required, then<br />

at least three samples are used to obtain a representative value for the image key.<br />

26


2.2 Area Extraction<br />

The following section introduces the second part of the algorithm, where it steps<br />

into a series of loops for cutting out known areas (via vector data) from the<br />

image:<br />

The second step in the process is to extract the known areas from the study area in<br />

the imagery. This involves creating a set of polygons conforming to the coded<br />

values and excluding them from the image search. This set is then either flagged<br />

for analysis further in the algorithm (should a target area search have been created<br />

by the user in the first step) or placed in a holding set for inclusion in the statistical<br />

data output at the end of the analysis. The output from this step should be a set of<br />

unknown polygons and their associated raster image sections, along with the sets<br />

of known polygons.<br />

For this study the software employed for this step in the process was Radius (to<br />

obtain co-ordinate data for cropping the image into area polygons) and Mirone (to<br />

crop the image). This thesis was written with a view to operating on a new spatial<br />

database being developed for ordnance survey data. This database will have the<br />

capacity to return sets of values in GML form, which would mean that the input<br />

sets for this step in the algorithm would be more easily obtained by extracting<br />

… using a text editor to create a master input<br />

set. For the purposes of this study, however the input files were created in ASCII<br />

format from the polygon co-ordinate sets outlined in Radius (by copying and<br />

pasting). These co-ordinate sets were then imported into the Mirone software and<br />

used to create closed polygons, which in turn allowed the target areas to be<br />

exported. It should be noted that this is a user intensive process and was used only<br />

to test the theory being proposed in this thesis. There are many ways in which this<br />

part of the image analysis process could be automated and the process time<br />

reduced, but the focus of this study is to prove that usable data can be obtained<br />

using the methods outlines and they were not expanded on.<br />

27


The areas extracted were placed into sets according to their nature and the total<br />

area of the sets recorded (using the area property associated with the input vectors<br />

–note these can also be calculated at the raster extraction stage using the Mirone<br />

measurement function). For each set of imagery run the image key needs to be<br />

reset to match the spectral values present, and the area sets created at this stage<br />

can then be used for this calibration. The algorithm itself is concerned with the<br />

proportional difference between the pixel values in the polygons so new values<br />

can be applied for the image key using the methods outlined during the sampling<br />

section of this thesis. This means that, should a new set of photography be used<br />

the initial sets for this stage are analyzed to create new baseline data. In other<br />

words the value for each polygon of road, section of coniferous and mixed<br />

forestry, river, lake and pond (though not reservoir as it did not return reliable man<br />

pixel value settings during the sampling); the mean pixel value and standard<br />

deviation by colour band are recorded and averaged by group. These averages in<br />

turn are tested by the expected proportional relationships between them and once<br />

verified are applied as image keys for that particular run of photography. In the<br />

case of this study this was completed using PCI geomatics geomatica software,<br />

which returned statistical data for the clipped polygons using the analysis function<br />

against the red, green and blue colour bands. As with the step as a whole, this<br />

process could benefit from an application specific to the algorithm which would<br />

return these values for the purposes of creating an image key alone.<br />

This stage of the algorithm does not require any user input (the type of<br />

photography is entered at the start of the analysis). The outputs are a series of sets<br />

of polygons and their associated raster image areas containing the image<br />

projection (in GeoTiff format). Once the input data for this stage is available in<br />

GML/ XML format then it should be possible to code a series of iterative loops to<br />

set up the sets and reduce the amount of remaining polygons required for spectral<br />

analysis. These improvements on the step being described would serve to speed up<br />

the analysis and make the process neater to the user but in order to ensure the<br />

process was worthwhile they were omitted and the focus of the study concentrated<br />

on defining relational values and determining if the vector/ raster analysis hybrid<br />

model would reveal useful data for the user.<br />

28


2.3 Spectral Value Comparison<br />

The third part of the algorithm consists of a series of procedures to assign<br />

known areas and areas with values that can be determined from the known sets:<br />

This part of the algorithm involves comparing the spectral values for the unknown<br />

polygon types (from step 2). The first part involves creating statistical histogram<br />

data for all of these polygons and comparing it to an expected value key for<br />

classification according to land use type. Areas which do not conform to known<br />

values are placed in a set for further analysis while those matching are categorized<br />

according to their values. The first part of this step involves verifying any<br />

polygons which were found to include descriptive symbols from the vector data<br />

(marsh, pasture -note: pasture in this case refers to known areas of rough pasture).<br />

Following from this an analysis according to spectral values was completed, and<br />

the set of neighbouring polygons (taken directly from the vector data set)<br />

examined to see if probably neighbouring areas might influence the result. For<br />

example; if an area with a set of pixel values close to those expected to pasture<br />

was identified but displayed a high level of standard deviation this area was given<br />

to the pasture set if three or more neighbouring polygons contained pasture (as the<br />

deviation is probably caused by shade in the image), otherwise the image is<br />

flagged for examination of the histogram results later in the process –to see if a<br />

double spike in the red and green polygons is present.<br />

In order to complete this step the algorithm cycles through a number of relative<br />

values to determine the probable land area of the polygon being analyzed. For this<br />

section of the study the histogram values were exported from the geomatica<br />

software package as tables and graphs and compared manually, in order to<br />

complete the same task on a larger scale this process would be coded into a<br />

routine taking the statistical (image) data and image key as input and outputting<br />

the closest match. For example the mean data by colour band would be compared<br />

to the values for roads and if a 50% decrease in the red, 40% in the green and 50%<br />

in the blue colour bands was detected the polygon would then be compared to<br />

water where if a 70% increase in red, 55% in green and 20% in the blue colour<br />

29


ands was detected the standard deviation would be matched for its range outside<br />

an expected value of 10; allowing the polygon to be coded as pasture. This routine<br />

is not coded here but was executed using a set of comparative tables.<br />

At the beginning of this step the image consists of several sets of known polygon<br />

types, the clipped image polygons, associated vector codes and an image key.<br />

After completion of the step there are several more known polygon sets and a set<br />

of unknown areas which fell outside the ranges expected. This may be the result of<br />

the samples being biased by high levels of shade or the fact that they represent a<br />

transitional data type (bog to rough pasture etc.). These remaining polygons are<br />

further analyzed in the next step but this stage of the algorithm is used to classify<br />

as many known values as possible. These were obtained through a series of<br />

comparative steps as follows:<br />

The sampling during this study pointed to a number of interdependent<br />

relationships between the spectral values found in the polygons studied. The fact<br />

that the polygons are clearly defined (through the vector mapping) and that the<br />

content of many of these the polygons in the image is known prior to analysis<br />

means that the algorithm can focus on identifying a narrow range of additional<br />

area types. To achieve this it is necessary to loop through a series of criteria for<br />

four main land types; pasture, rough pasture, bog and marsh. The last two on this<br />

list will have identifying symbols present in most cases, which can be used to<br />

assist the automatic search. Once the four target areas have been identified (with<br />

pasture being the main land use in most semi-urban imagery) the remaining<br />

polygons form a small set of areas which are further analyzed in the next step of<br />

the process.<br />

The polygon is analyzed using the geomatica analysis tool and the histogram<br />

values exported for comparison to the known values. If the sample has a mean<br />

value for the red colour band 40% lower than that of road polygons, and the blue<br />

colour band displayed a similar 40% lower mean value than roads, and the<br />

standard deviation is lower than a value of 15 then the sample is matched to water<br />

–if the mean for the red colour band is close to three times that of water, and has a<br />

green value close to twice that of water the polygon is coded as pasture.<br />

30


If the sample does not match the above criteria but had a mean pixel value for the<br />

red and green colour bands close to half that of roads, and displayed a level of<br />

standard deviation three times that of roads and the red and green values represent<br />

close to double the value of those in the water polygon (taken from the image key)<br />

then the polygon is coded as rough pasture.<br />

If the sample has red values around 30% lower than mixed forestry, and 20%<br />

lower in green for the same known polygon type and the standard deviation<br />

remains within 10% of the mixed forestry then the polygon can be coded as marsh<br />

(usually a symbol under the level is present in within the polygon in the vector<br />

dataset, but not always).<br />

If the sample does not match those criteria outlined so far but has a low standard<br />

deviation across all three colour bands and contains a decrease in the mean value<br />

of over 30% for all colour bands when compared to the known road values then<br />

the sample is tested for area size, if it is above the maximum value for pasture then<br />

it is coded as bog.<br />

The remaining polygons after this step in the algorithm fall into two categories –<br />

those surrounding buildings or with mixed use and those with a high level of<br />

shade present. Further analysis is required to step through the remainder to<br />

identify areas with a homogenous pixel value but have a higher level of standard<br />

deviation due to levels of shade, and those of mixed use. The output from this part<br />

of the process are six further area sets; pasture, rough pasture, marsh and bog,<br />

areas with high standard deviation for further analysis and areas containing<br />

building polygons (automatically flagged through the vector data).<br />

31


2.4 Confirmation<br />

The final stage of the algorithm involves the reduced data set being stepped<br />

through for manual confirmation (or compared against an additional set of<br />

values determined by the user):<br />

This part of the algorithm is concerned with tidying up some of the remaining data<br />

from the previous sweeps through the polygon. To begin with the polygon set<br />

classified as pasture is selected and analyzed for differences in the mean values of<br />

the red and green colour bands. Those with mean values above 190 on the<br />

converted greyscale in red, and 200 on the scale in green are classified as cut<br />

pasture (while initially this may not appear to be of direct value to the user, it<br />

could help with any subsequent analysis of pasture in particular).<br />

The next loop is designed to remove polygons containing homogenous pixel<br />

values whose standard deviation has been biased by a high proportion of shade in<br />

the sample. It involves checking the histogram for two peaks (one for shade and<br />

one for pasture) in the pixel count. If present, the polygons are assigned to the<br />

pasture polygon set.<br />

The process was completed using the geomatica software to extract the statistical<br />

data from the polygons of raster imagery (extracted earlier in the process using a<br />

combination of ASCII data from the vector manipulation software and the Mirone<br />

clipping function). The remaining areas were cross checked with the polygons<br />

containing buildings other than those coded s dwellings. Those found not<br />

containing a building polygon are retained for further analysis and logged<br />

according to adjacent polygon types (E.g. 123445.34<br />

232234.34 etc. –neighbouring road, building polygon, pasture –area 7658m2).<br />

This was completed manually for the study using the Radius software and GeoTiff<br />

referencing but would be best completed inside a routine for larger samples. These<br />

unknown polygons can then be visually referenced by a user and manually<br />

categorized (displayed according to an input co-ordinate set returned from this<br />

algorithm through software such as Mirone). The result of this sample image study<br />

32


was that only a small area of the original search area was processed at this stage in<br />

the study.<br />

33


3 Sampling for the Baseline Image Key<br />

Figure 1: Aerial view of sample area<br />

The main body of research in this study involved identifying areas which would<br />

make useful benchmarks for an automated image analysis to use as a search key;<br />

ten areas were selected for inclusion as they formed the most distinct sets which<br />

could be used. These ten area types sampled were: Roads (of all class), Water<br />

(Lake, River, Stream, Drain and Pond), Marsh, Coniferous forestry, Mixed<br />

Forestry, Track, Shade (to obtain reference values when a high level of standard<br />

deviation occurred), Building (roofs), Pasture and Rough pasture. Of these<br />

sample values four could not be determined from the vector data (Pasture, Marsh,<br />

Rough pasture and Bog) and were used to test the ability of the process to identify<br />

target values by their relationship to known values. The next section describes the<br />

findings for these sampling areas.<br />

34


3.1 Roads<br />

Figure 2: Road area and surrounding detail<br />

This part of the study looked at sample sections of road (tarred/ hard cover) to see<br />

the relationship between the mean spectral value in these areas and the image as a<br />

whole. In general terms areas of road appear lighter than other parts of an aerial<br />

image due to increased reflection along the surface and this was borne out by a<br />

mean greyscale pixel value of 30% above the image average across the three<br />

colour bands.<br />

The study involved using <strong>Open</strong>EV (open source raster imaging tool based on the<br />

GDAL library) and PCI Geomatics Geomatica geospatial viewing application.<br />

The files were exported as GeoTiff files from the original image using the<br />

GDAL_export facility. In all ten regions were sampled. These sample areas were<br />

taken from within the road polygons (as opposed to sampling the entire polygon)<br />

in order to identify true baseline data for these features. Sampling the entire<br />

feature would also have meant including areas obscured by tree cover, and<br />

necessarily biased the results –the intention of this part of the study was to create a<br />

benchmark against which tolerances for deviation could be included.<br />

The ten sample areas were taken from a series of roads in the south east of the<br />

image and three sections of the national primary road running along the north of<br />

the image (the pixel representation in the example below is to illustrate the area<br />

being sampled but is at a lower resolution than was used in the study).<br />

35


Figure 3: Road area and vector data<br />

In general terms the sample areas had an equal distribution of values across the<br />

red green and blue colour bands when compared to the image as a whole and it<br />

was not possible to discern any unique variation on the proportion of pixels in<br />

each of these bands contained in the road polygons. The results, however, did not<br />

deviate to any great extent between the samples and the mean greyscale values in<br />

the samples remained consistently higher than those of the image as a whole. This<br />

was significant as the variation remained at around 30% higher for each band in<br />

each sample (34.4 on av. in red, 26.6 on av. in green and 36 on av. in blue).<br />

Road value sample 1 Mean pixel value<br />

Red 215.5<br />

Green 243<br />

Blue 194.7<br />

Road value sample 2 Mean pixel value<br />

Red 206.444<br />

Green 234.5<br />

Blue 157<br />

36


Road value sample 3 Mean pixel value<br />

Red 202.3<br />

Green 228.8<br />

Blue 183.9<br />

Road value sample 4 Mean pixel value<br />

Red 163.083<br />

Green 189.75<br />

Blue 147.667<br />

Road value sample 5 Mean pixel value<br />

Red 171.417<br />

Green 196.083<br />

Blue 153.333<br />

Road value sample 6 Mean pixel value<br />

Red 169.444<br />

Green 190.111<br />

Blue 151.444<br />

Road value sample 7 Mean pixel value<br />

Red 167.111<br />

Green 190.444<br />

Blue 155.889<br />

Road value sample 8 Mean pixel value<br />

Red 155.667<br />

Green 185.25<br />

Blue 145.833<br />

Road value sample 9 Mean pixel value<br />

Red 148.417<br />

Green 172.25<br />

Blue 143.833<br />

Road value sample 10 Mean pixel value<br />

Red 187.061<br />

Green 211.545<br />

Blue 173.364<br />

Table 1: Road sample values<br />

37


(The previous samples were compared to statistics from the image as a whole of<br />

Red: Grey Level Values: 6 – 255, Median: 115, Mean: 111.711, StdDev:27.2804;<br />

Green: Grey Level Values: 30 – 255, Median: 135, Mean: 136.542,<br />

StdDev:27.3776; Blue: Grey Level Values: 5 – 255, Median: 102, Mean: 102.636,<br />

StdDev:17.8523)<br />

The mean pixel values are a useful benchmark to base further analysis of the<br />

imagery on and the fact that each sample displayed a more or less uniform<br />

deviation from the image standard implies that there is some merit to applying the<br />

results to a key which identifies impermeable surface area. In general such areas<br />

in an urban area will contain similar properties to a road surface. Further iterations<br />

of this sampling will involve sampling shingle against concrete and tar (see the<br />

analysis of the spectral values contained in areas of track) in order to see if there is<br />

a measurable spectral variation between them; this will, however, involve a small<br />

amount of post processing to enhance the differences between them.<br />

The road network is a useful point of reference in automated image analysis and<br />

ways of analyzing imagery to capture road networks have been well studied (such<br />

as pattern analysis techniques explored by van der Werff & van der Meer, 2008).<br />

In this thesis the focus is not on capturing road data, something which has been<br />

completed and is on a continuous revision cycle, but on utilizing this data to make<br />

the image analysis process easier. Most study areas (and almost every part of the<br />

island of Ireland) will have at least some part of the road network present (at the<br />

risk of sounding pedantic this assumption is not verified here because it can be<br />

assumed as general knowledge and can be discerned from a cursory look at any<br />

online small scale representation of the network). This means that there are<br />

polygons of a specific unique spectral value available for referencing most studies,<br />

even when confined to a particular area or series of photographs. In order to get a<br />

clearer insight into how this can be applied three areas close to roads around the<br />

image were sample and compared to sample values from their closest road<br />

polygon.<br />

38


Road test sample 1 Mean pixel value Standard deviation<br />

Red 205.889 8.18<br />

Green 228.167 11.289<br />

Blue 178.778 12.73<br />

Adjacent spectral values sample 1<br />

(pasture)<br />

Mean pixel value Standard deviation<br />

Red 91.539 6.573<br />

Green 138.706 7.504<br />

Blue 92.519 10.84<br />

Table 2: Road test sample value 1<br />

The first test sample took an area of road and a sample area from an area of<br />

pasture adjacent to the road. This can be assumed to be a recurring set of values<br />

that can be located in most aerial imagery that this study is considering. In the test<br />

the values for standard deviation from the mean across all three colour bands in<br />

both samples did not vary to any large extent and can be omitted as reference<br />

values for the search algorithm looking to match a set of values as pasture using<br />

the nearest road polygon as key. In contrast there was a large difference in the<br />

mean values for all three colour bands with the road polygon containing pixels of<br />

a mean 50% higher for the red, 40% higher for the green and 50% higher for the<br />

blue colour band. This discernable difference allows a range surrounding this<br />

relative difference to be included into the search and areas of pasture to be<br />

identified. Note: potential candidates for a pasture set of polygons are also cross<br />

checked against other relative values, outlined in later sections of this sampling<br />

study part of the thesis. A result of this variance means that once the known areas<br />

are identified the road set can be compared against unknown polygons adjacent to<br />

it and, given similar levels of standard deviation and proportional mean by colour<br />

band outlined above, the unknown polygons can be placed in a pasture polygon<br />

set for further reference and confirmation as the analysis progresses.<br />

39


Road test sample 2 Mean pixel value Standard deviation<br />

Red 157.961 7.537<br />

Green 181.211 8.08<br />

Blue 145.553 10.129<br />

Adjacent test values sample 2<br />

(bog)<br />

Mean pixel value Standard deviation<br />

Red 111.479 7.086<br />

Green 125.162 8.764<br />

Blue 98.326 12.152<br />

Table 3: Road test sample value 2<br />

This sample looked at an area of bog adjacent to a road for discernable pixel<br />

variations between both, although bog will be sampled further at another stage in<br />

this thesis it is worth noting that all bogs have access roads nearby and a spectral<br />

comparison is a useful reference. The surface of the roads mentioned varies but, as<br />

is outlined in the road section of this thesis, surface type has only small affect on<br />

the range of spectral values returned from the road. The standard deviation for all<br />

three colour bands was almost identical between both samples (road and bog)<br />

which could be expected due to the relative uniformity of surface cover (from a<br />

medium altitude aerial perspective). The mean pixel value for these colour bands,<br />

however, varied almost uniformly across the three bands with a 30% smaller value<br />

obtained for the red, green and blue bands in the bog sample. As with road and<br />

pasture this is a strong proportional variation for analysis purposes and allows bog<br />

to be established in an initial polygon set during processing. The set can then be<br />

compared against the other expected variances (water, forestry, pasture) and<br />

matched against polygon size (bog will almost always the largest area polygon in<br />

any sample –other large polygons such as lakes and forestry are coded and can be<br />

automatically placed in a set during analysis).<br />

40


Road test sample 3 Mean pixel value Standard deviation<br />

Red 199.833 10.912<br />

Green 218.667 16.52<br />

Blue 163 165<br />

Adjacent test values sample 3<br />

(Mixed forestry)<br />

Mean pixel value Standard deviation<br />

Red 88.365 28.944<br />

Green 120.627 29.751<br />

Blue 90.361 20.369<br />

Table 4: Road test sample value 3<br />

The third set of samples for referencing adjacent data to the spectral values from<br />

road polygons was a section of mixed forestry close to a road (gravel track). The<br />

samples displayed a notable difference in the level of standard deviation for all<br />

three colour bands. This level of deviation is consistent with other samples of<br />

mixed forestry (and rough pasture) analyzed in this thesis; with roads displaying a<br />

deviation of approximately one third of the mixed forestry values for the red and<br />

green colour bands. The differences in mean pixel values for all three colour<br />

bands was also distinct; 55% lower for mixed forestry in the red, 45% lower in the<br />

green and 45% lower in the blue colour bands. This variation, together with the<br />

level of standard deviation, provides a useful quality assurance value set to test the<br />

accuracy of the derived values being estimated in the algorithm. Since the areas of<br />

mixed forestry and road are both present as polygons in the vector data set the<br />

pixel sets for each can be extracted and matched –any values falling outside a<br />

range close to the above expected variances would flag an issue with either the<br />

photography or vector data.<br />

41


3.2 Water<br />

Figure 4: Typical Water Area Image<br />

This section took a look at four water samples present in the sample image<br />

(comprising of three sections of lake, and one of river) to see if the spectral values<br />

could be used to control and calibrate image processing of land polygons. The<br />

results were good and indicated several unique properties for this cover that could<br />

be used to calibrate a key in relation to surfaces being studies. The percentage of<br />

the image covered by water was proportionally small but the lake section made up<br />

the biggest single polygon. Of the water, the majority of the polygons were for<br />

was drains, several streams (ranging from three to less than a meter in width<br />

stream), there was also a river and lake present. The streams and drains were<br />

eliminated from the study This was for two reasons; firstly they have very small<br />

width (less than a meter in some cases) and were often obscured by overhanging<br />

vegetation and secondly they are already captured and any spectral analysis would<br />

only be of use as comparative values to use against the rest of the image –which<br />

was not possible due to the vegetation.<br />

Variations between the samples were slight, suggesting that lower flown<br />

photography would be necessary to compile any useful information regarding<br />

sediment levels, but allowing good baseline figures to be derived from the values<br />

present. The pixel value was almost uniformly two thirds less than the image<br />

42


average for the red colour band; with a low standard deviation in all samples.<br />

There was also value on the green colour band of just 50% (with little variation) of<br />

the image mean for all samples. Similarly the value returned for the blue colour<br />

band was 20% less than the image average.<br />

Water Sample 1 Mean Pixel Value Standard Deviation<br />

Red 36.455 4.568<br />

Green 69.214 7.254<br />

Blue 81.614 13.217<br />

Water Sample 2 Mean Pixel Value Standard Deviation<br />

Red 36.119 4.562<br />

Green 69.524 6.944<br />

Blue 83.718 12.256<br />

Water Sample 3 Mean Pixel Value Standard Deviation<br />

Red 37.692 5.468<br />

Green 70.758 7.711<br />

Blue 83.386 13.056<br />

Water Sample 4 Mean Pixel Value Standard Deviation<br />

Red 39.714 5.548<br />

Green 73.083 7.07<br />

Blue 82.797 16.235<br />

Table 5: Water sample values<br />

It could also be said that the uniform nature of the results indicate that the relative<br />

depth of the water has little effect on the spectral value of the area for photography<br />

at that height, introducing the potential for water to be used as one of the main<br />

baseline properties in this type of image analysis. It can often be the case that<br />

certain areas contain large amounts of temporary ponds following heavy rain; this<br />

is particularly so in the 1:5000 scale rural mapping. Applying the above values<br />

against pixel histograms for these areas (typically bog or pasture) for photography<br />

runs taken following heavy rainfall could reveal useful data with regards to runoff<br />

and capacity across land areas. In terms of this study the values will form part of a<br />

key against which the histogram values for pixels across the colour bands can be<br />

applied in order to calibrate the key (set of values to identify land cover).<br />

43


The purpose of this thesis is to identify an automated process for image analysis<br />

using vector data alongside a spectral analysis. The ability to extract water areas,<br />

in particular lakes or ponds with a large concentration of pixels of similar values,<br />

and establish a baseline value to calibrate the image key by would allow the user<br />

to target specific areas across successive runs of photography. This could be<br />

completed automatically using GDAL extract and returning the results of a<br />

histogram analysis.<br />

In order to gain a visual impression of the location of the main water bodies in the<br />

sample image the green and blue colour bands were mapped into the red, allowing<br />

the definition between the (relatively) monotone water and the remainder of the<br />

image.<br />

Figure 5: Water Area Image Modification<br />

The next part of the study involved taking separate sample areas from around the<br />

image and comparing them to the spectral values associated with water. The<br />

44


samples were taken at three separate sections around the image; they did not<br />

correspond to samples taken for other parts of this study (specific road, building,<br />

pasture areas etc.) so as to increase the variety of input data. The three areas<br />

consisted of forestry (coniferous plantation), pasture and track. The pasture<br />

sample also contained a high degree of shade, which was not rectified in the table<br />

to see if it could be possible to identify this type of area with shade included.<br />

Water values testing sample 1<br />

(forestry)<br />

45<br />

Mean Pixel<br />

Value<br />

Standard<br />

Deviation<br />

Red 73.532 19.591<br />

Green 112.507 21.605<br />

Blue 96.772 16.042<br />

Water values testing sample 2<br />

(pasture)<br />

Mean Pixel<br />

Value<br />

Standard<br />

Deviation<br />

Red 115.153 12.691<br />

Green 167.608 15.487<br />

Blue 104.872 13.2<br />

Water values testing sample 3<br />

(track)<br />

Mean Pixel<br />

Value<br />

Standard<br />

Deviation<br />

Red 213.429 10.748<br />

Green 237.429 12.369<br />

Blue 192.821 14.636<br />

Table 6: Water test sample values<br />

As might be expected the track (artificial surface) showed the greatest difference,<br />

with a red colour band value of less than 20% of that found in track. The relative<br />

disparity between these two values (the red mean pixel value in areas of water and<br />

track/ road) could be used to calibrate an image key during automatic image<br />

analysis; and the percentages of other less distinct land cover derived by<br />

comparison. One of the main aims of this study is to see if it would be possible to<br />

analyze aerial imagery using an automatic process based on vector data. With<br />

known water polygons and road polygons present this can be achieved, however,<br />

as was mentioned above the body of water needs to be large enough to obtain an<br />

accurate baseline reading for the photography run. If only drains or streams are


present in the target area (or its immediate (~1km) surroundings then extracting a<br />

set of baseline pixel values from water polygons would not benefit the analysis. It<br />

can therefore be concluded that water polygons provide a useful reference for<br />

image analysis, but in the context of using them to add value to ordnance survey<br />

small area polygons they need to be part of areas not less than five pixels in<br />

diameter (i.e. belong to classes of rivers, lakes or large ponds).<br />

When the control values for water (from first table) were compared to pasture the<br />

red and green colour bands showed values which could be used to calculate if<br />

pasture was present in a polygon. The mean pixel value for water in the red colour<br />

band was 30% of the value for pasture and only 45% of the value for the green<br />

colour band found in the pasture sample (which included a section of shade). The<br />

value for the blue colour band also had a disparity of just over 20% less than the<br />

pasture mean pixel value. This implies that if an algorithm was to be run on<br />

sections of ordnance survey data which took polygon co-ordinates from the vector<br />

polygons, calibrated a key from the water polygon and compared the red and<br />

green band colour values against the red and green colour values of the<br />

neighbouring polygon and then confirmed the level of standard deviation (which<br />

was found to be low in an area of pasture, ~10 on the greyscale) it is probable the<br />

area can be labelled as pasture. In itself this does not present much of a<br />

breakthrough but when added to the known polygon it helps complete the picture<br />

of a target area being analyzed.<br />

At this point it might be better to think of the image as its vector representation.<br />

As the polygons which the vector data encloses are identified the areas can be<br />

filled. In this way the study is filling the blanks around known values. If the result<br />

was thought of as a mosaic of known area properties (type and nature of land<br />

cover) then applying the label pasture dramatically reduces the areas left to<br />

identify.<br />

46


Figure 6: Sample area as a mosaic of polygons<br />

The purpose of the study is to identify an automated software process to do this.<br />

The user would start with the requirement to identify the percentage of a certain<br />

property in the photography: throughout this thesis the example of impervious<br />

surface area is given but this could also be a fungal infection affecting crops, the<br />

spread of invasive plant species, the extent of flood damage etc. What the<br />

sampling of water polygons is doing is attempting to create a set of automated<br />

conditions which the software would initially retrieve to set a base for the<br />

algorithm. The user would then select an area from the vector data where the<br />

target values were present. This area would be in the format of a specially coded<br />

polygon composed of vector data; either using those in the ordnance survey data<br />

or appending lines to controlled data to fully enclose the target sample (and<br />

creating the necessary vector set). Once identified the target area could be<br />

calibrated against the value for water (among others) and areas not relevant<br />

eliminated.<br />

As every section of the image will be composed of an area polygon (taken from<br />

the vector data) which in general enclose relatively small areas it should be<br />

47


possible to quickly process each section by clipping and cutting the sections,<br />

comparing the mean pixel values across the colour bands, and classifying the<br />

result. This type of process is linked to the photography and once the edge of a<br />

given run is reached the process needs to be restarted and the values re-calculated.<br />

As the extent of the photography is known the co-ordinates can be included into<br />

the algorithm and the user informed when the extent of the search has been<br />

reached.<br />

48


3.3 Marsh<br />

Figure 7: Typical Marsh Area Image<br />

This study looked at areas of marshy ground. The purpose was to try and identify<br />

if sections of waterlogged surface area had unique values which could be<br />

identified in a small area polygon. The areas used for the study were captured<br />

examples adjoining a lake –they had been field revised and identified specifically<br />

as marsh within an enclosed polygon. The boundary on one side was the edge of<br />

the lake, while the boundary on the other side was the border with pasture<br />

enclosed with a notional (mapping) line. The samples (three in total) can be<br />

assumed to be typical of marsh (due to the position within the target area) but<br />

there was a small amount of variation, around 5%, between results for the<br />

different colour bands. This was as expected and the values were very close to the<br />

image average. This suggests that areas of marsh would be difficult to detect using<br />

a mean pixel/ standard deviation analysis based on area polygons alone, and<br />

additional coding data taken from the original vector mapping is required to make<br />

an accurate prediction as to the probability of marshy ground being present.<br />

49


Marsh Sample 1 Mean Pixel Value Standard Deviation<br />

Red 110.968 10.557<br />

Green 135.143 12.497<br />

Blue 102.401 13.700<br />

Marsh Sample 2 Mean Pixel Value Standard Deviation<br />

Red 108.672 16.799<br />

Green 132.305 15.717<br />

Blue 93.646 14.9<br />

Marsh Sample 3 Mean Pixel Value Standard Deviation<br />

Red 104.652 8.66<br />

Green 123.725 10.028<br />

Blue 94.018 13.032<br />

Table 7: Marsh sample values<br />

One factor which might help to differentiate between areas of marsh and the<br />

overall mean is the fact that the standard deviation was less than half of the overall<br />

for the red and green colour bands in all three samples. This is due to the fact that<br />

although there is variety in the spectral values for the vegetation present; the area<br />

is uniformly covered by vegetation. This difference could become useful when<br />

removing known features from a polygon; in other words taking water and the<br />

built environment from an area polygon, and analyzing to see if the spectral values<br />

displayed similar levels of deviation on the red and green colour bands.<br />

Marsh areas are generally indicated by a symbol which signifies the presence of<br />

this type of ground cover extending to the next logical boundary (or the notional<br />

mapping boundary mentioned above). The boundaries in the vector data do not<br />

contain the level marsh so a method to identify the areas from the relatively<br />

narrow red colour band identified above could provide an automated way of<br />

determining the extent of marsh lands based on existing data. This is one of the<br />

areas of the study which does not lend itself to a software solution and relates to<br />

the question of “fuzzy data” and how to incorporate it into a digital environment.<br />

Without digressing onto a tangent outside the scope of this thesis it needs to be<br />

mentioned, in the context of this part of the study, that certain features of this<br />

50


planet will always have fluid boundaries. In this example the change from marsh<br />

to rough pasture is a gradual one, and does not correspond to a single vector.<br />

Various solutions such as an additional transitional polygon instead of a linear<br />

boundary are simply methods of belting the square peg of a gradual change into a<br />

relational database. Although the study is attempting to identify statistical data<br />

(percentages of types of land cover) that can be appended to the entry for a given<br />

area polygon in a spatial database in this case a bitmap displaying concentrations<br />

of values corresponding to marsh might be more appropriate.<br />

When the values obtained from marsh were compared to three sample sections<br />

from other polygons in the study some unique proportional variations emerged.<br />

The purpose of the study is to iteratively reduce the quantity of unknown (in terms<br />

of land usage) polygons in the search area by appending values derived from the<br />

aerial photograph. The sample areas used in this test were not from any of the<br />

original samples used to obtain baseline spectral data for the land type they<br />

represent but were chosen to see if a reliable (or at least significant) proportional<br />

deviation could be observed. The three sections sampled were pasture (which had<br />

been recently cut), mixed forestry (chosen because of the variety of spectral values<br />

that this type of cover represents) and paving (taken from a yard surrounding<br />

buildings but similar to any of the road and track hard surface areas sampled<br />

elsewhere in this study).<br />

All three sample areas shower unique properties consistent with the sampling used<br />

for their respective baseline values but also useful in terms of obtaining a key for<br />

identifying polygons of marsh. As was mentioned above these will generally fall<br />

within a polygon composed of vector polylines but it can occasionally be the case<br />

where the marsh was not fully enclosed. It may be necessary to introduce a<br />

process that retains all the polygons containing marsh symbols but displaying<br />

spectral values outside those expected for that type of land cover for verification –<br />

this, however, was not the case for the samples used in the study.<br />

51


Marsh test Sample 1<br />

(Pasture)<br />

Mean pixel value Standard Deviation<br />

Red 201.617 8.05<br />

Green 209.713 9.569<br />

Blue 136.645 12.082<br />

Marsh test Sample 2<br />

(Mixed Forestry)<br />

Mean pixel value Standard Deviation<br />

Red 69.22 34.492<br />

Green 103.352 33.620<br />

Blue 86.594 19.885<br />

Marsh test Sample 1<br />

(Paving)<br />

Mean pixel value Standard Deviation<br />

Red 246.167 8.1<br />

Green 252.542 5.4<br />

Blue 206.125 8<br />

Table 8: Marsh test sample values<br />

The first sample, taken from freshly cut pasture, produced the relatively high<br />

values for the red and green colour bands that were found in the pasture testing<br />

samples for that type of ground colour. In relation to the values for marsh they<br />

showed a high level of disparity; with the mean red colour band pixel value for<br />

marsh being half of the pasture sample, and the green value for marsh being 60%<br />

of the test sample. The disparity in the blue colour band was less but this aspect of<br />

the spectral values could be used to relate the disparities found as specific to<br />

pasture, so that an examination of neighbouring polygon could use a known marsh<br />

area (presence of symbol and expected spectral values) as a reference to set the<br />

relative differences and possibly reset the marsh values to within the values for the<br />

known polygon for that particular areas.<br />

The last suggestion will not be included in this study bit it is worth noting that an<br />

algorithm which could constantly recalibrate the relative values as it processed<br />

neighbouring polygons might produce better results than one dependant on a key<br />

set during the beginning of the processing.<br />

52


The second test sample looked at an area of mixed forestry for deviation (in terms<br />

of mean pixel values across the colour bands) from those found to be present in<br />

areas of marsh. The values had a relatively unique variation from marsh in that<br />

while both the red and green colour band mean pixel values were a lot lower (35%<br />

and 20% respectively) the blue colour band had a comparable level of standard<br />

deviation of a mean which was within 10% of marsh, although this could be<br />

attributed to the level of shade present in the forestry due to the tree canopy<br />

varying in height across the sample. As is pointed out elsewhere in this study,<br />

areas of mixed forestry did not give reliable enough data to calibrate other surface<br />

areas form, based on spectral values alone. In the case of these types of areas there<br />

is vector data coding present to uniquely identify the forestry, however,<br />

knowledge of an expected proportional difference between the (known) forestry<br />

and an area of marsh is a useful additional factor to include in the algorithm and<br />

might increase the accuracy of any search for these types of areas (or at least help<br />

to eliminate them from a search for other specific properties).<br />

The third test sample took an area of hard cover (paving/ track) from a yard<br />

between agricultural buildings. This type of cover is a part of this study which<br />

revealed the most distinct values and presents a valuable calibration tool for the<br />

algorithm. When compared to this third sample the mean pixel value (converted to<br />

greyscale) for the red colour band found in the marsh samples was only 43% of<br />

the hard cover, while similar disparity was found between the green and blue<br />

mean pixel values in marsh and the green and blue mean pixel values in the hard<br />

cover (with the marsh mean values at only 51% and 46% of the hard cover<br />

respectively). These types of areas are well coded in the vector data. Some areas<br />

of hard cover surrounding private dwellings and farm buildings may not be<br />

captured and the automated identification of these types of areas through aerial<br />

image processing is one of the aims of this thesis. It can be assumed, however,<br />

that for any given area (excepting rural mapping covering mountains, which this<br />

study is not addressing) the road polygons have been accurately captured and<br />

there will be several sample polygons to calibrate a hard cover value from. In<br />

terms of this part if the study, the relative proportional deviation of marsh values<br />

53


form both pasture and road/ paving allow a key for its identification to be<br />

developed.<br />

54


3.4 Coniferous Forestry<br />

This part of the study used unprocessed sections of coniferous (commercially<br />

planted) forestry to evaluate a deviation from the average for the image that might<br />

indicate this type of ground cover. It took seven samples from a total of five areas<br />

of this type of this type of forestry in the image. These were then analyzed in<br />

terms of mean values through the red green and blue colour bands to determine if<br />

there was any unique deviation from other features. As would be expected for this<br />

type of ground cover the values for red (and near infra red) were lower due to<br />

colour being absorbed by the foliage. This gives a useful indicator for this type of<br />

ground cover and provides a comparative value that the target polygons of the<br />

study can be compared against. It should be noted that there is also the potential to<br />

use a pattern recognition algorithm to accompany any specific search for this type<br />

of forestry as uniform rows are a feature of this type of surface cover. The<br />

statistics for the survey are in the table below –it should also be noted that the<br />

standard deviation remained relatively consistent for each sample area.<br />

55


Forest Sample 1 Mean pixel value Standard deviation<br />

Red 96.1975 19.943<br />

Green 125.339 18.6976<br />

Blue 98.4551 18.6976<br />

Forest Sample 2 Mean pixel value Standard deviation<br />

Red 87.0853 21.8368<br />

Green 125.905 23.8361<br />

Blue 103.376 17.2807<br />

Forest Sample 3 Mean pixel value Standard deviation<br />

Red 97.484 20.715<br />

Green 137.59 21.077<br />

Blue 109.5 14.9791<br />

Forest Sample 4 Mean pixel value Standard deviation<br />

Red 73.072 20.762<br />

Green 112.015 22.976<br />

Blue 96.784 16.405<br />

Forest Sample 5 Mean pixel value Standard deviation<br />

Red 76.670 23.651<br />

Green 111.181 24.107<br />

Blue 94.424 16.962<br />

Forest Sample 6 Mean pixel value Standard deviation<br />

Red 75.534 24.507<br />

Green 113.943 27.153<br />

Blue 97.851 17.693<br />

Forest Sample 7 Mean pixel value Standard deviation<br />

Red 72.424 24.194<br />

Green 111.688 27.628<br />

Blue 96.596 18.1686<br />

Table 9: Coniferous forestry sample values<br />

An indicator for this type of land cover (as with rough pasture) is the larger scale<br />

value for the blue colour band when compared to the red. This was not the case for<br />

the image as a whole where the red band produced a mean almost 10% higher than<br />

56


lue. Another indicator present in all the samples was the fact that there was an<br />

increase in disparity between the red and green colour bands -18.2% for the image<br />

as a whole, compared to almost 30% across the samples. It is the red band which<br />

is the most valuable indicator of this type of ground cover, with over 25% lower<br />

value from the image mean. While coniferous will have been captured and<br />

indicated as a level in the OSI vector data, smaller areas of this type of ground<br />

cover will be typically present along the margins of small area polygons close to<br />

urban areas. In particular a polygon closed by what is called a peck in the vector<br />

data layers could be analyzed for the mean red colour pixel value converted to<br />

greyscale and compared to the image whole. If the variation is close to 25% lower,<br />

then it is probable that either this type of tree cover is present. It should be noted<br />

that further spectral analysis (involving swapping of the colour bands) can present<br />

additional indicators, which will be applied later in the study.<br />

There are several implications of being able to identify coniferous vegetation in an<br />

urban area; it indicates permeable surface area for planning/ flood modelling. In<br />

the context of this thesis it allows a section within the study area to be identified<br />

and adjoining areas to be measured against; for example, once the presence of<br />

coniferous vegetation is detected image processing could be applied to eliminate<br />

this from the result set from the target polygon –allowing another analysis to be<br />

run on the remaining surface area.<br />

57


Coniferous forestry test sample 1 (tree/<br />

shade/ pasture mix)<br />

58<br />

Mean pixel<br />

value<br />

Standard<br />

deviation<br />

Red 81.824 35.235<br />

Green 118.246 34.475<br />

Blue 92.287 17.189<br />

Coniferous forestry test sample 2 (pasture) Mean pixel<br />

value<br />

Standard<br />

deviation<br />

Red 122.813 9.725<br />

Green 174.274 12.991<br />

Blue 105.527 12.604<br />

Coniferous forestry test sample 1 (bog) Mean pixel<br />

value<br />

Standard<br />

deviation<br />

Red 115.672 9.270<br />

Green 133.684 9.933<br />

Blue 105.059 11.881<br />

Table 10: Coniferous forestry test sample values<br />

Three sample areas were chosen to match the data from the coniferous sample<br />

areas against. The first of these does not conform to the vector polygons against<br />

which the proposed algorithm operated, but was chosen for its mix of ground<br />

cover so as to provide a worst possible combination against coniferous values. In<br />

other words the distinguishing features for coniferous (outside the vector coding,<br />

this sampling was only to test the values relative to samples around the image) of<br />

high levels of standard deviation in the red and green colour bands would not be a<br />

useful comparative feature as the sample contained a variety of ground cover. The<br />

mean and standard deviation alone did not provide strong indicators of the ground<br />

cover type but the sample did demonstrate the usefulness of histogram data. The<br />

pixel count for the red and green colour bands displayed two clear spikes when<br />

presented as a histogram, corresponding to the expected values for both pasture<br />

and shade. This presents the possibility of determining a relative proportional (to<br />

the polygon size) pixel count flag which would indicate the percentage of land<br />

type within an area of mixed use. As was mentioned at the introduction the basis<br />

of this study is the referencing of areas within the aerial imagery by small vector


polygons of uniform ground type. Further analysis of these polygons can be done<br />

once initial categorizations have been made and the level of analysis increased. To<br />

accurately determine the correct pixel proportion to flag requires a larger sample<br />

than being used in this study, which could be obtained from the data sets of<br />

polygons requiring further analysis returned from prolonged use of the algorithm.<br />

In other words this is something which would be developed later in the image<br />

analysis cycle because the nature of the sample (deliberately crossing outside the<br />

search polygons) makes it unlikely to be a feature of these types of images.<br />

The second sampling area involved taking a section of pasture for comparison<br />

with the coniferous samples. The values differed with an increase of<br />

approximately 40% for the mean of the red and green colour bands with a<br />

standard deviation 50% reduced on those found in coniferous forestry. This data is<br />

another useful reference in the identification of pasture as coniferous areas are<br />

coded and outlined in the vector data so can be automatically fed into a reference<br />

table during image analysis. As mentioned throughout this part of the study, the<br />

correct identification (and elimination) of areas of pasture from the image analysis<br />

is essential for the success of the suggested algorithm. Using spectral analysis for<br />

aerial image analysis (and remote sensing in general) is a specialized field of<br />

knowledge and studies tend to focus on a particular study area (Such as the<br />

analysis and classification undertaken by Coredo-Sancho and Adler in 2007).<br />

The focus of this thesis is to create a generic method for image analysis which<br />

makes use of captured vector data to filter the image, reduce the study area, and<br />

narrow the range of pixel variations that can be analyzed. In this way the study has<br />

focused on finding an algorithm that can be coded into an easy to use solution for<br />

this type of research. By necessity the current mapping methods are labour<br />

intensive and resources are not available to capture the type of secondary data that<br />

might be gained from automatic image analysis. In addition to this specific<br />

research (such as an inventory of impermeable surface in a region) could require<br />

specialist skills and methods –the algorithm suggested here is aimed to allow a<br />

user to filter through the current data by including their required target area into<br />

the process. For this reason the sampling has been generic (in that the image as a<br />

59


whole was not analyzed but individual sections representative of a specific land<br />

cover type were selected).<br />

The third sampling area was a section of bog, which was chosen because it is<br />

often found close to coniferous forestry in the Irish landscape. The ability to use<br />

coniferous as a reference when analyzing the image for the presence of bog means<br />

for most studies there will be an adjacent source of analysis to base the search key<br />

on. The mean pixel value for the red colour band was similar to the mean pixel<br />

value for the red colour band in pasture but the green value was notably different<br />

(to the green colour band in pasture). The green colour band value for bog was<br />

just under 20% larger than the value found in the coniferous samples; indicating<br />

that a 40% increase in the red colour band and a 20% increase in the green colour<br />

band mean values, coupled with a 50% reduction in the standard deviation for<br />

areas close to the known coniferous figures has a high probability of being an area<br />

of bog. Once this variation is cross checked with other known values in the area<br />

(water, road and buildings/ roofs), and cross checked with the secondary derived<br />

values identified in the algorithm (pasture etc.) areas of bog can be automatically<br />

quantified.<br />

60


3.5 Mixed Forestry<br />

Figure 8: Typical Mixed Forestry Area Image<br />

This part of the study looks at the spectral values for areas of mixed forestry. It is<br />

not concerned with finding a set of unique attributes which would uniquely<br />

identify this type of land cover from an aerial photograph, but is intended to<br />

investigate if values corresponding to this type of cover could be separated from<br />

those of rough pasture.<br />

One reason for attempting to differentiate between areas of mixed forestry and<br />

rough pasture is the age of the surface cover. Mixed forestry generally includes<br />

sections of native woodland, which is slow growing and can be assumed to be an<br />

area capable of supporting wildlife (it is also less prone to change than rough<br />

pasture; due to the difficulty in obtaining permission to clear this type of<br />

woodland). Any study looking at the wildlife corridors across the country would<br />

benefit from an automatic method of distinguishing smaller linear sections of this<br />

type of ground cover (along hedges etc.) from other land use types. As is<br />

evidenced below by the similarity between the results of this to those from an<br />

analysis of rough pasture this remains difficult to do. There is also not much scope<br />

for pattern recognition algorithms to be used in the detection of isolated sections<br />

of mixed forestry (outside those captured by conventional mapping) because of<br />

the seemingly random nature of the shade patterns. Note: An obvious solution is<br />

to fly the same areas at different times of the year and compare the red and near<br />

61


infrared signatures to identify the presence of natural deciduous species at the<br />

borders of area polygons but this would be prohibitively costly and beyond the<br />

budget of any potential environmental survey. A cheaper solution might become<br />

possible in future by identifying patterns in lidar data at polygon boundaries –the<br />

focus of this study, however, is on spectral values and although unique in<br />

comparison to the image average –the sample returned values very close to those<br />

of rough pasture.<br />

Areas of this type of surface cover, comprising of a mixture of coniferous and<br />

natural woodland are already captured in Ireland and the study took a section of<br />

land from one of these (present in the study area) and compared the spectral<br />

values to those of the image as a whole.<br />

Mixed Forestry Sample Mean Pixel Value Standard Deviation<br />

Red 70.246 33.337<br />

Green 104.33 32.726<br />

Blue 85.974 19.626<br />

Table 11: Mixed forestry sample values<br />

As was expected the results showed a similar disparity with the values for the<br />

image as a whole to rough pasture. As with rough pasture the standard deviation in<br />

pixel values for the red and green colour bands was high, with a similar large<br />

difference in values for those bands (37% and 23% lower respectively). This is<br />

also an indication that areas of rough pasture can contain similar coverage to<br />

mixed forestry –in that rough pasture is often overgrown and contains some tree<br />

cover. Once areas without buildings and roads with the comparative variation<br />

between the whole image and sample polygons which match those above (and in<br />

the rough pasture survey) it should be possible to apply the rough pasture attribute.<br />

The known mixed forestry polygon set (which is taken from the vector data<br />

coding) can then be subtracted from this to give a percentage of rough pasture for<br />

a target area.<br />

62


Mixed Forestry comparative sample 1<br />

(bog)<br />

63<br />

Mean Pixel<br />

Value<br />

Standard<br />

Deviation<br />

Red 110.761 6.429<br />

Green 121.958 8.253<br />

Blue 103.929 11.605<br />

Mixed Forestry comparative sample 2<br />

(pasture)<br />

Mean Pixel<br />

Value<br />

Standard<br />

Deviation<br />

Red 128.801 10.469<br />

Green 165.843 9.496<br />

Blue 100.921 12.367<br />

Mixed Forestry comparative sample 3<br />

(cut pasture)<br />

Mean Pixel<br />

Value<br />

Standard<br />

Deviation<br />

Red 220.074 7.623<br />

Green 209.855 9.249<br />

Blue 137.92 11.493<br />

Table 12: Mixed forestry test sample values<br />

The mixed forestry part of the sample was compared against three sample areas to<br />

add to the proportional deviations in the algorithm. The aim is to achieve a high<br />

enough level of relative values in the image to establish the composition of every<br />

polygon. The polygon extraction is made difficult by the fact that cropped small<br />

area polygons have irregular shape –the analysis of which was done against a<br />

blank background –resulting in altered standard deviation values for these samples.<br />

In the case of the three sample areas for this type of forestry the samples were<br />

rectangular areas within the uniform types used for the comparison.<br />

The first sample type used for comparison was an area of bog. This was chosen as<br />

it often occurs close to areas of mixed forestry and rough pasture (which by nature<br />

of the terrain have not been turned over to pasture). The values contained in the<br />

sample were similar (relatively) to coniferous forestry, but differed in the red<br />

colour band with almost a 35% higher mean and a notable low level of standard<br />

deviation. This low level of standard deviation was found in all three of the colour<br />

bands, and could be used to differentiate between the two types of forestry


sampled in this study. In relation to this, a sliding scale of standard deviation for<br />

pixels in the red colour band between bog, coniferous forestry and mixed forestry<br />

is evident –from under 10 values for the area of bog, averaging at 20 for areas of<br />

coniferous forestry and over 30 for areas of mixed forestry. Matching these to the<br />

mean could give a useful relative indication for automatic identification of bog<br />

present in an area. At this point it is probably useful to comment on the nature of<br />

the vector data for areas of bog. These types of areas are generally bounded by<br />

polylines (though these are not coded), it could be the case that the same polygon<br />

contains an area of bog and rough pasture (similar to mixed forestry); being able<br />

to estimate the relative values for this transitional analysis based on the above data<br />

is useful in such cases. If an area polygon bordering one or more areas containing<br />

pixel values contains values close to those of mixed forestry but with a low<br />

standard deviation it is probable that the polygon is describing this type of<br />

transitional area.<br />

The second type was a pasture sample taken for comparison as it is the most likely<br />

neighbouring polygon to an area of mixed forestry and is common to most (rural/<br />

semi urban) areas in the country. This sample of pasture is separate from others<br />

used in this thesis but returned similar values (as expected). The most notable<br />

difference was in the level of standard deviation present in the sample area, with<br />

particular respect to the red and green colour bands. The mean pixel values for<br />

these colour bands also showed a distinct relative difference (over 45% and over<br />

35% for the red and green bands respectively). This variation in values is a useful<br />

indication of which type of ground colour pixel values belong to. In particular it<br />

could serve as one of the primary steps in the algorithm. It is useful to begin the<br />

analysis by eliminating known values from the search so as the target comparison<br />

list (ground cover types to pixel values) is smaller and has a higher chance of<br />

being successful. One further aspect that can be included in any automated<br />

analysis looking for areas of bog is the size of the polygons in the search. Most<br />

large polygons will be assigned a value based on the input vector coding. They<br />

will generally represent known parts of the image such as plots of forestry and<br />

water parcels. The remaining large polygons will (in the context of the Irish<br />

landscape being analyzed) most probably by areas of bog. This can further be<br />

refined by eliminating areas of flat rock as islands within the large polygons (these<br />

64


island areas are present in the vector data). In this way large polygons (with<br />

islands eliminated) matching the deviation from expected mixed forestry values<br />

outlined above can be assumed to be representative of bog. There is probably<br />

some scope for the use of pattern analysis to further refine this search. The<br />

uniform (low standard deviation) values displayed by the bog sample suggest that<br />

it may be possible, once the areas are identified in the automatic search algorithm<br />

suggested in this study, to examine these for rows and machinery tracks to<br />

automatically calculate the level of cutting taking place.<br />

The third sample type taken for comparative analysis with mixed forestry was an<br />

area of cut pasture. This was taken as a control to ensure the previous two samples<br />

matched expected values (in terms of relative percentages) found in the other<br />

survey areas of the image. The high mean pixel values (and low level of standard<br />

deviation) matched expected results in that both the red and green colour bands<br />

showed markedly higher values (see table above). This type of pasture will not be<br />

factored into a proportional value check against mixed forestry in the algorithm as<br />

it can be referenced against the two stable control value sets present in water and<br />

roofs. It should be noted that this particular area type is dependant on the time of<br />

year and weather conditions prior to the time the aerial imagery was flown and is<br />

something that must be included in a second or higher loop of the algorithm.<br />

65


3.6 Track<br />

Figure 9: Typical Track Area Image<br />

This part of the study looks at values for track –corresponding to unpaved or<br />

gravel access roads and takes six sample areas for a comparison of spectral values.<br />

The purpose of the study was to see if there was a way of distinguishing between<br />

the spectral values for these type of roads and paved/ tarred roads (NRA category<br />

four upwards). Roads have unique spectral values in terms of an increase in mean<br />

pixel value of close to 30% for the three colour bands. This in itself is not<br />

particularly to an automated image analysis using OSI vector data as a baseline as<br />

the road network has been captured and is updated, however, if areas of paving or<br />

hard cover (similar to track) could be shown to have similar unique properties it<br />

might be possible to detect the presence of impermeable surface in recently<br />

developed suburban areas. I am conscious that the above explanation is long<br />

winded so the following example might explain things a bit better. A recently<br />

developed suburban area is experiencing problems with flooding and runoff –at<br />

present the mapping captures the water courses, buildings, roads and property<br />

boundaries (and street furniture/ utility details etc.) but does not indicate the extent<br />

of paving and patios within the individual plots; a survey filtering pasture using its<br />

spectral signature and known buildings, tarred road and footpath using vector data<br />

66


and comparing the remainder to expected spectral values for hard cover could<br />

return this value. The results from the six samples are in the table below:<br />

Track sample 1 Mean Pixel Value Standard Deviation<br />

Red 178.5 14.02<br />

Green 194 15.231<br />

Blue 146.5 19<br />

Track sample 2 Mean Pixel Value Standard Deviation<br />

Red 164 23.013<br />

Green 193.833 14.729<br />

Blue 147.5 15.149<br />

Track sample 3 Mean Pixel Value Standard Deviation<br />

Red 196.75 10.511<br />

Green 219.875 12.240<br />

Blue 145 16.639<br />

Track sample 4 Mean Pixel Value Standard Deviation<br />

Red 193.8 193.8<br />

Green 207.067 207.067<br />

Blue 153.667 153.667<br />

Track sample 5 Mean Pixel Value Standard Deviation<br />

Red 190.833 15.014<br />

Green 208.667 11.089<br />

Blue 148.5 10.518<br />

Track sample 6 Mean Pixel Value Standard Deviation<br />

Red 190.5 2.738<br />

Green 217.167 7.574<br />

Blue 138 5.44<br />

Table 13: Track sample values<br />

While the above samples outline unique pixel signatures when compared to the<br />

entire image (40% above the mean for red, 35% for green and 30% for the blue<br />

colour bands) the difference between these samples and those returned for the<br />

standard road network is not significant. This does not necessarily make it difficult<br />

to distinguish between standard tarred road and other types of impermeable<br />

67


surface area that might have been introduced to the landscape. The road network<br />

is present in the vector dataset so subtracting this (along with the other known<br />

polygons such as water, buildings etc.) from the means that polygons with a high<br />

number of pixels corresponding to these values would indicate that the surface<br />

contains an area of cover similar to track or road (hard impermeable cover).<br />

This unique value has potential to increase the accuracy of flood mapping and<br />

prediction but requires additional processing to ensure that the high values are the<br />

result of permeable surface area. This could take place within an automated<br />

software process by swapping the colour bands to increase the difference between<br />

these areas and areas of vegetation.<br />

The purpose for sampling the areas of track was to see if there could be any means<br />

of determining if an area surrounding a private dwelling had been paved, or if<br />

there were any paved yards/ areas of hard cover present in other semi urban<br />

polygons. The test sampling took three areas to compare the values for track<br />

against; an unpaved dirt track, an area of compacted gravel yard and an area of<br />

paved yard. With the exception of the blue colour band the results were similar to<br />

those from cut pasture. These can be discerned from cut pasture by setting the<br />

search algorithm to look at the mean pixel value for the blue colour band, which<br />

was 30% less for both the paved yard and dirt track. The values returned for the<br />

yard of compacted dirt and gravel were similar to cut pasture but were within a<br />

small polygon containing several roofed buildings. The algorithm can therefore be<br />

set to accept values similar to cut pasture for small area polygons containing a<br />

number of roofed buildings as gravel/ dirt hard cover. In the particular case of this<br />

sample the results obtained are most likely due to pigment in the gravel biasing<br />

the sample.<br />

68


Track test sample 1<br />

(unpaved dirt track)<br />

Mean pixel value Standard deviation<br />

Red 218.917 11.378<br />

Green 225.708 13.658<br />

Blue 161.583 10.1<br />

Track test sample 2<br />

(compacted dirt/ gravel)<br />

Mean pixel value Standard deviation<br />

Red 178.315 8.713<br />

Green 195.648 10.802<br />

Blue 131.056 16.122<br />

Track test sample 3<br />

(paved yard)<br />

Mean pixel value Standard deviation<br />

Red 174.278 8.77<br />

Green 200.722 6.257<br />

Blue 171.778 7.075<br />

Table 14: Track test sample values<br />

Taking these values for a larger area would be a difficult task but the fact that the<br />

small area polygons derived from the vector mapping cut apart the image means<br />

that greater levels of information can be derived fro the same set of values in<br />

polygons with different associated coding. The initial sampling displayed values<br />

for pasture in small area land parcels surrounding dwellings matching those of the<br />

mean outside cut pasture. From this it can be inferred that a small area polygon<br />

surrounding a dwelling which displays spectral values similar to cut pasture could<br />

potentially be gravelled and the algorithm would then run a specific analysis on<br />

the values for the blue colour band. For the third sample area, the paved yard, the<br />

values matched those expected for paved covering and again these values would<br />

indicate the high probability of hard cover (patio/ concrete etc.) when present in a<br />

small area polygon surrounding a dwelling.<br />

The identification of track has been the subject of a large amount of work which<br />

looked at pattern recognition software which might extract the network of roads<br />

based on pattern recognition and the unique spectral values for this type of feature<br />

69


(Phynn et al, 2002). In this study all roads and tracks have been captured and the<br />

purpose of analyzing the spectral qualities of these is in an effort to train an<br />

algorithm to recognise the specific properties of hard ground/ impermeable<br />

surface area within small area polygons. The results revealed distinctive qualities<br />

thanks mostly to the high reflective quality of these types of surface for red and<br />

green colour bands.<br />

As is mentioned in other sections of this study, the initial part of the algorithm<br />

involves extracting the roads (and water, and forestry, known marsh etc.) to leave<br />

a smaller number of polygons for analysis. The next step would be the removal<br />

(classification) of areas with relatively unique spectral values such as all the<br />

pasture polygons. Following this, the urban polygons would be analyzed, having<br />

been identified by the presence of building polygons within them –these would<br />

then be classified according to the nature of the building (as this data is only<br />

available in some instances the first iteration of the loop would include all<br />

buildings). The nature of the spectral values would then be compared to the values<br />

sampled here, as the low level of standard deviation among the colour bands<br />

associated with them enables classification with a degree of certainty.<br />

This type of survey has particular benefit in flood mapping and can help the<br />

development of models factoring in runoff rates during times of high rainfall. The<br />

sample area used here is just outside an urban area and as such has a useful mix of<br />

all the possible land cover types –ranging from dirt track to paved yards to<br />

forestry to pasture to dwelling houses within small polygons. Specific searches of<br />

urban developments could expect to find more homogenous ranges within each<br />

polygon. The values taken from this sampling could serve as the baseline for one<br />

of these surveys. The user would then select known areas of the target land cover<br />

being analyzed (via a mobile GPS unit or selected from the vector mapping draped<br />

over the aerial photography). A combination of the standard expected values and<br />

the entered key values could then be used to calculate the percentages across a<br />

wide area. The fact that for each area analysis the pixel variations are confined to<br />

a small area polygon means that there is less chance of a gradual distortion biasing<br />

the results, as each separate polygon is calculated based on its own features (i.e.<br />

the values of neighbouring polygons and presence of buildings mentioned above).<br />

70


It should be noted that the samples used in the above section of the study covered<br />

small areas relative those used for forestry, pasture, water etc. This is because<br />

paved or hard ground was only a small part of the sample area. It is unlikely,<br />

however, that a larger sample of hard ground would have revealed any different<br />

results; firstly because the sampling was done over a relatively wide geographical<br />

area and secondly because large expanses of paved areas are rare enough to be<br />

considered an anomaly in the Irish landscape, which would in any case could be<br />

flagged by the search algorithm (by setting a maximum expected area for values<br />

matching hard cover).<br />

71


3.7 Shade<br />

Figure 10: Typical Shade Area Image<br />

The purpose of this part of the study is to see if there are any unique spectral<br />

qualities from in areas of shade which would allow them to be eliminated from an<br />

examination of a given polygon. There are a number of ways to prevent shade<br />

from distorting the results of a spectral analysis of these types of polygon. The<br />

photography and vector data could be imported into a geographic information<br />

system capable of manipulating vector data. A number of control points could be<br />

taken matching the edge of areas of shade to vertices in the vector data. The vector<br />

dataset could be then transformed/ moved to match the control points and the<br />

offset calculated and eliminated from the original polygons so as a subsequent<br />

spectral analysis would focus on areas outside shade. While this would probably<br />

be the most accurate method it is difficult to automate as selection of control<br />

points requires human input (a random selection would skew results and there is<br />

too much variety in coding and polyline length and shapes to set rules).<br />

A second method might be to identify unique spectral values for shade and<br />

introduce a process to subtract them from the polygon pixel set results to leave<br />

only values which can be identified. This is what is being attempted here and the<br />

results below reveal a similar signature to that of water in the image. Since all<br />

water bodies have been captured it could be possible to subtract these lower pixel<br />

values from a polygon and analyze the remainder.<br />

72


Shade Sample 1 Mean Pixel Value Standard Deviation<br />

Red 33.9264 5.428<br />

Green 71.9221 7.446<br />

Blue 80.2165 14.322<br />

Shade Sample 2 Mean Pixel Value Standard Deviation<br />

Red 41 5.577<br />

Green 81.361 8.077<br />

Blue 78.381 11.5439<br />

Shade Sample 3 Mean Pixel Value Standard Deviation<br />

Red 42.813 6.943<br />

Green 83.186 7.878<br />

Blue 84.505 12.848<br />

Shade Sample 4 Mean Pixel Value Standard Deviation<br />

Red 45.333 15<br />

Green 85.854 17.921<br />

Blue 85.75 17.118<br />

Table 15: Shade sample values<br />

This is something which has been attempted in previous studies (Martin et al,<br />

1998), however, as with the thesis in general, I believe a hybrid method<br />

incorporating aspects of vector mapping and spectral analysis produces the best<br />

results. This is because in certain situations removing shade based on pixel values<br />

alone might bias the analysis (mixed forestry could produce a signature similar to<br />

marsh once lower pixel values are removed). A better method might be to<br />

manually take four or five samples per square kilometre and determine a mean<br />

percentage of shade based on areas of shade within those samples –this can be<br />

done relatively quickly using any vector manipulation software and the sample<br />

areas recorded for reference. Any further automated analysis would not eliminate<br />

pixels matching the shade signature above the sample percentages.<br />

Note: In future the addition of lidar data might allow the calculation of shade<br />

based on the height of the irregular tree canopy/ roof pitches etc. to be calculated<br />

and incorporated into the automated image analysis algorithm. This data is not<br />

73


currently available for all areas and has not been included in the study. It should<br />

be noted that its addition has the potential to improve the accuracy of this type of<br />

study.<br />

A couple of sample sections of the image were selected with varying degrees of<br />

shade present in the image. This was to determine if it is possible to introduce a<br />

step into the algorithm which would correct for areas of shade in otherwise<br />

uniform closed polygons. The first sample involved selecting an area of pasture<br />

which contained a large amount of shaded ground from some high trees along one<br />

of the border vector boundaries i.e., along the hedge. The area of shade<br />

corresponded to roughly half the area of the test polygon. It should be noted that a<br />

degree of shade will be present in all of the orthophotos to which this study<br />

applies; this is because aerial survey takes place in bright sunshine by necessity.<br />

The sample was analyzed for mean pixel values along the colour bands, and also<br />

the standard deviation displayed.<br />

Shade test sample 1<br />

(pasture with app. 50% shade)<br />

Mean pixel value Standard deviation<br />

Red 88.457 37.387<br />

Green 135.151 40.01<br />

Blue 100.616 20.386<br />

Table 16: Shade test sample value 1<br />

The results for this were as might be expected; with an increase in the standard<br />

deviation for the first red and green colour bands reflecting the variety of tone<br />

present in the sample. This, however, does not give a complete representation of<br />

the data and a histogram (see below), reveals peaks for the shade value, and a peak<br />

for the pasture value in both the red and green colour bands. These two peaks<br />

indicate that the area is pasture, and that the high pixel count for the lower values<br />

in the red and green colour bands is a result of shade.<br />

74


Figure 11: Histogram for Shade and Pasture<br />

There are two factors for inclusion in the automated search algorithm that can be<br />

taken from this; firstly a reading of high levels of standard deviation from the<br />

mean value for the red and green colour bands for a given polygon are a strong<br />

indication that the analysis might be distorted by shade. Secondly if a histogram of<br />

the pixel count for the red and green colour bands shows a peak at the mean for<br />

shade, and at the mean for pasture then the area is pasture (given that the other<br />

probable cause, water, has already been identified through vector mapping).<br />

The algorithm could account for this by taking a sample of neighbouring polygons<br />

–if these contain areas of pasture then the standard deviation is flagged. If the<br />

standard deviation is above 30 values on the greyscale then the polygon is flagged<br />

and a histogram appended for further processing. The resulting set of these types<br />

of polygons would be returned with the result set from the analysis so as the user<br />

could accept or reject the anomaly as the result of shade (in terms of large<br />

amounts of standard deviation for the red and green colour bands in a small area<br />

polygon).<br />

The second test sample for shade took an area of pasture with a smaller percentage<br />

of shade present for comparison with the areas of shade. The values for the sample<br />

matched those expected for pasture with the exception of a higher level of<br />

standard deviation in the red and green colour bands. These could again be<br />

flagged once a level of standard deviation above that expected for pasture, and the<br />

75


trends of bordering polygons mentioned above was detected and included in the<br />

flagged polygon set, where the histogram could be analyzed.<br />

Shade test sample 2<br />

(Pasture containing small areas of<br />

shade)<br />

76<br />

Mean pixel<br />

value<br />

Standard<br />

deviation<br />

Red 136.056 23.905<br />

Green 179.235 25.065<br />

Blue 114.285 14.684<br />

Table 17: Shade test sample value 2<br />

Although the mean pixel values matched those expected for pasture across the<br />

three colour bands the high level of standard deviation, suggests further<br />

examination could be necessary. A histogram representation shows the relatively<br />

higher pixel count across the values associated with shade but displays a clear<br />

peak for pasture. From the evidence of these samples it would seem safe to set the<br />

tolerance for standard deviation in both the red and green colour bands to a figure<br />

between 25 and 30; where any polygons exceeding this (even if they are bordering<br />

pasture polygons) would be flagged and exported to an examination set together<br />

with appended histograms.<br />

The final part of the testing took a sample from relatively clear pasture (little or no<br />

shade present) to see if the trend identified in the first two samples would continue.<br />

In other words if a large amount of shade resulted in a high pixel count and peak<br />

at the values for shade, and a smaller amount less so then the trend should show a<br />

simple peak for an area with little or no shade present. While this produced<br />

expected results, it was necessary to confirm the trend.


Shade test sample 3<br />

(Pasture containing<br />

little shade)<br />

Mean pixel value Standard deviation<br />

Red 123.828 8.645<br />

Green 177.68 9.8<br />

Blue 105.261 12.69<br />

Table 18: Shade test sample value 3<br />

The sampling for shade highlighted the need for any automatic aerial image<br />

algorithm to account for levels of standard deviation; and also the necessity to<br />

detect peaks of values within the colour bands. The two peaks for lower and<br />

higher values in the second test sample demonstrated the usefulness of<br />

categorizing the images according to spectral values and is a good example of how,<br />

once the imagery can be broken into its constituent area polygons using vector<br />

mapping, useful data can be derived. The properties of shade in the sample<br />

imagery were uniform, allowing the distortion caused by shade to the spectral<br />

values of the target polygons to be included into image processing. The results of<br />

this part of the study go some way to proving the central premise of this thesis; it<br />

is possible to automatically scan aerial imagery using good quality vector data.<br />

The process of eliminating known areas and flagging borderline values (such as<br />

the distortion in the first test sample) means that the land cover can be examined<br />

through a series of scans until all the surface area is accounted for. This is a<br />

process which could then be adapted for specific projects (flood mapping etc.).<br />

The software process outlined here would act as a template for these projects and<br />

allow the users to input specific search values themselves without relying on<br />

obtaining the data from previous studies.<br />

77


3.8 Roof Areas<br />

Figure 12: Typical Roof Value Area Image<br />

This study involved taking a sample of roof imagery from a test area and<br />

comparing the luminescence values to those of the entire image to determine if a<br />

distinguishing deviation in values existed for these features of the image. The<br />

initial part of the experiment involved taking a sample number of roof values from<br />

the orthophotography and converting them to a format suitable for individual<br />

examination. The study involved a section of rural landscape in south county<br />

Galway (Ordnance survey sheet no. 3012-c).<br />

The aim of this part of the thesis is to try to obtain one of a series of benchmark<br />

control values which can then be applied to the area polygons in determining<br />

relative deviation of pixel values. In other words this section of the study is an<br />

attempt to try to get a unique indicator for roof pixel values to form part of a key<br />

for interpreting statistical values from the polygon being processed. It involved the<br />

use of three image analysis software packages, but can be achieved using the<br />

GDAL library alone (using GDAL_translate).<br />

The first step involved editing the image (aerial orthophoto corresponding to the c<br />

quadrant of OSI #3012-c) using <strong>Open</strong>EV. This involved targeting the areas of<br />

interest from the imagery based on building polygons captured by OSI vector data.<br />

78


The vector data was overlaid on the image and the target areas exported using the<br />

GDAL export tool to create ten separate .tiff files containing roof values.<br />

The entire image was then loaded into the geomatica software and statistical data<br />

for the red, green and blue primary values (converted to greyscale) obtained. This<br />

was so as to obtain a mean value for these within the entire image, which<br />

contained a range of features including forestry, pasture and several water bodies.<br />

The image was not enhanced and no post processing was used in order to obtain<br />

standard baseline values to measure future iterations of the study against.<br />

It should be noted that the sample area contained 76 buildings and this study<br />

concentrated on the south east of the image (which contained the largest number<br />

of buildings). For each building polygon there was a clear division between<br />

sections of shade and light due to the roof pitch. While there is little value in<br />

pursuing this unique aspect of the image features (as buildings are already being<br />

well captured for mapping purposes by traditional photogrammetry and field<br />

survey), this might be useful for automatic key generation. By this I mean that<br />

pattern recognition software could be applied to separate these two areas and the<br />

resultant variations analyzed to determine the roof pitch; this is, however, beyond<br />

the scope of this study.<br />

Below is a screenshot of the buildings present in the image (so as to give an<br />

impression of their dispersal in the study area);<br />

79


Figure 13: Distribution of Buildings/Roofs in the Sample<br />

The ten sample roof values were all taken from the concentration of buildings in<br />

the south east, and the mean values for the three primary values (when converted<br />

to greyscale) were as follows:<br />

Roof Sample 1 Mean pixel value<br />

Red 102.683<br />

Green 130.923<br />

Blue 120.673<br />

Roof Sample 2 Mean pixel value<br />

Red 146.152<br />

Green 168.429<br />

Blue 161.457<br />

80


Roof Sample 3 Mean pixel value<br />

Red 51.0667<br />

Green 79.2<br />

Blue 85.5176<br />

Roof Sample 4 Mean pixel value<br />

Red 145.414<br />

Green 165.429<br />

Blue 161.457<br />

Roof Sample 5 Mean pixel value<br />

Red 95.2596<br />

Green 123.74<br />

Blue 113.106<br />

Roof Sample 6 Mean pixel value<br />

Red 92.416<br />

Green 122.633<br />

Blue 118.177<br />

Roof Sample 7 Mean pixel value<br />

Red 220.818<br />

Green 194<br />

Blue 142.896<br />

Roof Sample 8 Mean pixel value<br />

Red 133.528<br />

Green 124.556<br />

Blue 116.528<br />

Roof Sample 9 Mean pixel value<br />

Red 119.315<br />

Green 148.351<br />

Blue 142.967<br />

Roof Sample 10 Mean pixel value<br />

Red 173.111<br />

Green 187.506<br />

Blue 149.494<br />

Table 19: Roof pixel sample values<br />

81


These mean greyscale pixel values for the image were 111.711 for the red channel,<br />

136.542 for the green and 102.636 for the blue. This meant an average deviation<br />

of 22% in the blue channel for areas of roof (average mean for the roof samples of<br />

130.776 compared to 102.636 for the entire image). In terms of automated image<br />

analysis the variation could help develop a control by training a key against the<br />

known building values.<br />

Figure 14: Blue colour band pixel count for study area<br />

Histogram of mean blue channel pixel values for the c quadrant of ordnance<br />

survey sheet # 3012 –building polygons in this are were found to vary by 20%,<br />

indicating that this variation has the potential to form part of a training algorithm<br />

used in the development of a key for automated image analysis.<br />

It should be noted that image quality and the degree of shade and light can vary<br />

according to the environmental variations present when the image was captured.<br />

This study is not concerned with determining an exact value for the capture of<br />

buildings but is instead looking for variations from the mean values in the entire<br />

image. The aim is to gather enough unique deviations (such as the variance found<br />

in the blue channel above) to narrow the remaining values into categories.<br />

82


An iterative run through a practical application might be:<br />

• Obtain the mean for unique known features (e.g. the blue channel in<br />

buildings).<br />

• Subtract these known features from the area polygon.<br />

• Obtain the mean values for the remaining pixels.<br />

• Compare these values relative to the known features and determine the<br />

most probable land coverage.<br />

Following this initial sampling of roof values, it emerged that the values were not<br />

consistent enough for reference to determine adjacent land use types (though the<br />

fact that the polygons are coded vector polygons means they can be excluded form<br />

pixel value searches of small area polygons. One aspect of the pixel values which<br />

did emerge, and has potential for further analysis, was the clear division and pixel<br />

count peaks for roofs with a pitch. Taken further, these values, in terms of their<br />

relative pitch and matched to the set of proportional values that this study sets out<br />

to establish, the ability to determine roof pitch could be included. This would<br />

necessitate a separate study (over a large quantity of buildings across a wide<br />

geographical area and will not form part of this study; although the algorithm<br />

outlined in the thesis would be an extremely beneficial for such a study.<br />

The sampling for roof values in proportional reference to adjacent areas took three<br />

samples of separate land cover types and the road polygons adjacent to them. The<br />

three areas analyzed were; a section of pasture close to a farm dwelling, a section<br />

of mixed forest close to another dwelling (with several pitches in the roof) and a<br />

section of cut pasture close to a flat roofed shed. As mentioned above the<br />

sampling did not produce any useful data for the algorithm (except to eliminate<br />

the possibility of buildings being used for spectral reference). It did, however,<br />

outline the potential for a separate building analysis based on roof pitches. The<br />

following few paragraphs outline the results of the comparison.<br />

83


Roof test sample 1 Mean pixel value Standard deviation<br />

Red 171.121 25.028<br />

Green 202.818 22.936<br />

Blue 178.773 17.709<br />

Roof test adjacent sample 1<br />

(pasture)<br />

Mean pixel value Standard deviation<br />

Red 113.004 7.169<br />

Green 156.632 8.779<br />

Blue 99.867 11.634<br />

Table 20: Roof test sample value 1<br />

The first sample took an area of pasture close to a dwelling where the pasture gave<br />

expected values for the type of land cover (which is outlined in the pasture<br />

sampling in this section of the thesis). The roof sample showed a relatively high<br />

level of standard deviation across the colour bands (although low by roof value<br />

standards) and the mean colour values returned, although high for a small area,<br />

could not be used to uniquely identify other features. The double peak (of shade<br />

and light) in the red and green colour bands indicated that that the roof had a high<br />

pitch. One problem which makes using this type of surface difficult is the degree<br />

of reflection that can take place form the relatively smooth surface of roof<br />

covering – further distorting spectral values.<br />

Roof test sample 2 Mean pixel value Standard deviation<br />

Red 121.394 42.959<br />

Green 144.111 41.557<br />

Blue 128.182 32.159<br />

Roof test adjacent sample 2<br />

(mixed forestry)<br />

Mean pixel value Standard deviation<br />

Red 70.209 34.505<br />

Green 104.58 33.996<br />

Blue 86.887 20.602<br />

Table 21: Roof test sample value 2<br />

84


The second sample took an area of mixed forestry and a dwelling adjacent to it –<br />

the dwelling had multiple pitches due to dormer windows. As with the first the<br />

values of the area sampled adjacent to the roof was consistent with values<br />

calculated from samples of this land type (mixed forestry) elsewhere in this thesis.<br />

The roof sample itself had a high level of standard deviation (revealing a high<br />

level of reflection and shade) and a variation in mean colour band values from<br />

other roof samples consistent with the initial roof survey. From this it can be<br />

concluded that roof values are not a reliable indicator of forestry values and<br />

cannot be reliably used as a reference.<br />

Roof test sample 3 Mean pixel value Standard deviation<br />

Red 213.815 41.112<br />

Green 223.148 34.272<br />

Blue 208.148 42.432<br />

Roof test adjacent sample 3<br />

(cut pasture)<br />

Mean pixel value Standard deviation<br />

Red 205.074 7.903<br />

Green 206.539 8.182<br />

Blue 134.537 9.784<br />

Table 22: Roof test sample value 3<br />

The third set of samples for determining the potential for using roof spectral<br />

values as a tool for validating adjacent area values used a section of cut pasture<br />

and a flat roofed shed. The values for the shed were high (with a correspondingly<br />

high level of standard deviation), indicating the level of reflection on the roof<br />

surface. By comparison the standard deviation across all three colour bands was<br />

low in the cut pasture, which is consistent with values identified elsewhere in this<br />

thesis. The compared mean values between the two samples were similar and this,<br />

together with the fact that buildings have been returning inconsistent spectral<br />

values in the study, suggests there is little to be gained by examining the<br />

relationship between the spectral values in roofs and adjacent polygons. This is<br />

something of a surprise, as at the outset of the research I had expected the<br />

relatively uniform nature of the shades and texture of roofing used in Ireland to<br />

provide an ideal reference for spectral analysis of aerial imagery.<br />

85


3.9 Pasture<br />

Figure 15: Typical Pasture Area Image<br />

This part of the study looks at spectral values for areas of pasture within a section<br />

of aerial photography, corresponding to 3sqkm of semi rural county Galway. The<br />

aim is to establish baseline data for the relatively stable spectral values returned<br />

from this type of land cover. The polygons in question correspond to field areas<br />

bounded by walls or fences (and occasionally roads and water bodies). They can<br />

be described as stable in terms of their usage; they will almost always be turned<br />

over to one particular usage –in other words they will not contain large areas of<br />

separate land cover within them (other than islands of forestry or water, which is<br />

already captured and can be eliminated from the image processing). The study<br />

looked at nine separate areas, two of which were freshly cut. The values returned<br />

were uniform with the variation in the two freshly cut fields remaining consistent.<br />

Another feature which emerged was the low level of deviation for values in the<br />

red colour band, approximately one third of that returned from the image as a<br />

whole, across all the samples.<br />

Pasture Sample 1 Mean Pixel Value Standard Deviation<br />

Red 126.131 8.538<br />

Green 173.848 9.45<br />

Blue 109.745 12.646<br />

86


Pasture Sample 2 Mean Pixel Value Standard Deviation<br />

Red 139.318 6.267<br />

Green 182.748 8.016<br />

Blue 115.605 12.547<br />

Pasture Sample 3 Mean Pixel Value Standard Deviation<br />

Red 133.267 8.871<br />

Green 173.4 10.926<br />

Blue 108.128 11.748<br />

Pasture Sample 4 Mean Pixel Value Standard Deviation<br />

Red 198.363 9.629<br />

Green 207.759 10.881<br />

Blue 134.204 11.83<br />

Pasture Sample 5 Mean Pixel Value Standard Deviation<br />

Red 197.046 14.712<br />

Green 207.816 13.067<br />

Blue 135.054 14.474<br />

Pasture Sample 6 Mean Pixel Value Standard Deviation<br />

Red 110.019 6.96<br />

Green 163.79 8.215<br />

Blue 100.187 11.92<br />

Pasture Sample 7 Mean Pixel Value Standard Deviation<br />

Red 126.238 9.459<br />

Green 168.753 9.56<br />

Blue 100.17 12.073<br />

Pasture Sample 8 Mean Pixel Value Standard Deviation<br />

Red 117.301 10.329<br />

Green 169.254 11.042<br />

Blue 104.371 12.349<br />

Pasture Sample 9 Mean Pixel Value Standard Deviation<br />

Red 126.357 8.542<br />

Green 183.48 9.611<br />

Blue 110.941 12.799<br />

Table 23: Pasture sample values<br />

87


From the above samples, samples 4 and 5 were of freshly cut fields, reflected in<br />

the high mean pixel value for the red colour band (almost 50% higher than the<br />

average for the image as a whole). This gives areas of freshly cut pasture a unique<br />

trait in a high red colour band mean and low standard deviation for a given<br />

polygon. This trait also distinguishes cut pasture as the red colour band mean<br />

values are over 35% higher than those of pasture in general. The mean green<br />

colour band pixel value for pasture is 25% more than that of the image as a whole;<br />

and when matched to a low standard deviation gives another strong indication of<br />

the type of land use.<br />

The above data is useful for indicating the presence of earthworks and landscaped<br />

areas within a small polygon. With a key calibrated to identify polygons<br />

containing the above values it should be able to accurately calculate the<br />

percentage of land area given to pasture; and also improve the accuracy of any<br />

process ran on imagery (in conjunction with vector data) by reducing the number<br />

of polygons for analysis.<br />

Pasture also serves an important part in the algorithm this thesis is attempting to<br />

map out. This is for two reasons. Its uniform property (in terms of standard<br />

deviation of the pixels form mean values and also the relative proportional<br />

difference between those mean values and the mean values for other land<br />

coverage types) allows it to form the bases of keys generated at the beginning of<br />

image processing to identify land use with. The second reason is that in an Irish<br />

context it is by far the biggest form of land use and any algorithm processing<br />

aerial photography (even in semi urban areas) will have successfully identified the<br />

majority of the surface area by correctly flagging area polygons of pasture. Of the<br />

remaining areas most can be identified by polyline coding from the vector data<br />

leaving the study with a higher chance of success in obtaining useful information<br />

about the remaining areas. It is useful to divide the classes of pasture into the two<br />

categories identified tin the samples above –that is to create a separate category of<br />

cut pasture for the purposes of executing a comparative survey of polygons in the<br />

aerial photography. With this in mind the following test samples were compared<br />

against the spectral values of both.<br />

88


Pasture test sample 1<br />

(track)<br />

Mean pixel value Standard deviation<br />

Red 213.952 18.444<br />

Green 236.762 16.532<br />

Blue 194.833 19<br />

Pasture test sample 2<br />

(coniferous forestry)<br />

Mean pixel value Standard deviation<br />

Red 72.566 19.204<br />

Green 111.643 21.468<br />

Blue 96.381 16.026<br />

Pasture test sample 3<br />

(mixed forestry)<br />

Mean pixel value Standard deviation<br />

Red 69.968 32.486<br />

Green 104.287 32.054<br />

Blue 86.217 19.492<br />

Table 24: Pasture test sample values<br />

The above samples were chosen from areas outside those already captured for the<br />

baseline survey of the spectral values for track, coniferous and mixed forestry.<br />

The track was chosen as hard cover gives unique high values and is of benefit in<br />

calibrating a proportional difference between target areas (such as pasture in this<br />

case) and its high values. As was noted above, pasture itself is also useful in<br />

calibrating a key for an algorithm due to its specific mean colour values in the red<br />

colour band and low level of standard deviation from those mean values, but is not<br />

coded in the original vector input data and can only be derived from the image<br />

processing. The area of mixed forestry was chosen for its high level of standard<br />

deviation and the fact that it has similar spectral values to rough pasture. Mixed<br />

forestry will be identified and present in most imagery (or the specific properties<br />

could be pre-set into a key for the automated analysis) so using those values<br />

would allow an automated processing technique to eliminate areas with this level<br />

of high standard deviation from the process.<br />

89


For the track/ hard cover the difference in the values for mean number of pixels<br />

converted to greyscale for the red colour band was significantly higher. This was<br />

also the case for the mean of the green colour band where the variation was<br />

upwards of 40% for the pasture which had not been cut. It should be noted,<br />

however, that the mean pixel in the green colour band was higher for cut pasture<br />

and this variation brought it closer to the value of hard ground –the difference in<br />

the mean of the blue colour band between the test sample and all the samples of<br />

pasture (including the cut pasture samples) was consistently 40% higher. This<br />

differential (in conjunction with a low standard deviation and a mean 10% to 40%<br />

lower than hard ground in the red and green colour bands) could be used to<br />

determine the presence of pasture.<br />

The two forestry samples showed a mean pixel value in the red colour band of<br />

approximately 50% less than pasture –with an increased standard deviation for<br />

both. In terms of distinguishing between coniferous and mixed; the standard<br />

deviation was one third higher for the red and green colour bands in the mixed<br />

forestry. The mean value for the green colour band in both of the forestry types<br />

was 40% lower than the value of the mean in the pasture, giving another relative<br />

indicator to calibrate pasture values from. Small areas of forestry are found close<br />

to most semi urban areas and the fact that they are coded and measured means<br />

they are useful in training an algorithm to match land use to neighbouring<br />

polygons.<br />

The aim of this sampling is to try and achieve a number of probability factors that<br />

will allow an automated process to determine land use or coverage types based on<br />

the available data. With this in mind a broad variety of samples were taken and the<br />

relative proportions between them assessed. In the above example the pasture area<br />

was compared against other known area types. The increased amount of cross<br />

checking allows the elimination of areas which conform to a particular type. In<br />

this way the image is gradually broken down until every polygon is referenced.<br />

The value of this is not necessarily in the ability to classify field use (although this<br />

has some merit in that it adds value to available mapping) but in the identified<br />

process. This enables a piece of software to be developed which can be reused<br />

90


according to a users requirements. The process would remain constant (known<br />

values eliminated until the user is left with a set of polygons not conforming to<br />

known values and matching a set of target values identified by the user. The<br />

system could accept a set of co-ordinates recorded on a mobile GPS device or<br />

alternatively allow the user to view ordnance survey vector mapping draped over<br />

the orthophoto and manually enter the location of sample target values. In this<br />

way the process would allow the user to quickly establish the initial conditions for<br />

an automated search of aerial photography and return the spatial extent of the<br />

values input.<br />

91


3.10 Rough Pasture<br />

Figure 16: Typical Rough Pasture Area Image<br />

This part of the study focuses on areas of land cover known as rough pasture.<br />

They correspond to areas where scrub/ mixed forestry or overgrown vegetation are<br />

present. In general terms it could be used to refer to areas of bad pasture but could<br />

also be found close to an urban area where the land is not in use. It is similar to<br />

mixed forestry but would be slightly lighter than forested areas (and returned<br />

spectral values accordingly –by app 75 across all colour bands). The values for<br />

this type of coverage could be particularly useful in determining whether the<br />

target polygon was in use or not. The results from the sample areas returned a high<br />

level of deviation from the mean value in both the red and green colour bands.<br />

This would not be expected where the land was in use. If the polygon being<br />

examined was not part of an urban development (comprising of a mixture of<br />

artificial and landscaped surface cover) and was in use then a relatively low level<br />

of deviation form the mean value would result from human activity –for example;<br />

the results from a study of polygons used for pasture in this survey returned a<br />

deviation of almost one quarter of what was found here.<br />

92


Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />

Red 81.61 31.475<br />

Green 120.334 31.922<br />

Blue 94.606 17.356<br />

Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />

Red 73.432 34.752<br />

Green 114.44 36.594<br />

Blue 95.898 18.988<br />

Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />

Red 69.694 35.687<br />

Green 111.165 37.813<br />

Blue 95.857 18.771<br />

Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />

Red 66.43 27.096<br />

Green 105.61 28.348<br />

Blue 94.081 17.243<br />

Table 25: Rough pasture sample values<br />

As mentioned above; it could be possible to obtain similar values to rough pasture<br />

in a spectral analysis of mixed forestry or certain urban developments. This,<br />

however, does not present a problem for the potential image key as both those<br />

areas can be eliminated from an automatic survey due to the fact that both mixed<br />

forestry and areas containing buildings and dwellings can be identified by<br />

previously captured attributes. In terms of the spectral values for the image as a<br />

whole the red band was 35% lower, the green 17% lower and blue 7% lower<br />

across the samples. While this alone does not give complete indication of rough<br />

pasture, when coupled with high levels of standard deviation in the red and green<br />

colour bands the probability of this type of land cover is high. When coupled with<br />

an additional vetting process (eliminating known land cover types), it should be<br />

possible to identify rough pasture using lower red and green colour band values<br />

with high standard deviation.<br />

93


The identification of rough pasture using an automated process would also lend<br />

itself to the study of abandoned developments. It could reasonable be assumed that<br />

the presence of rough pasture within more than half the plots of land in a<br />

development indicates that it has been abandoned. This might allow for a quick<br />

survey of developments to determine the level of repair by feeding the associated<br />

polygon set into a process designed to identify the percentage of rough pasture. In<br />

particular levels of pixel values in the red colour band similar to those above<br />

would indicate that the site has been abandoned. On an anecdotal level the rate at<br />

which former construction sites can be reclaimed by vegetation has been<br />

impressive over the last few years of good growing weather. Although there are<br />

several other ways to identify incomplete dwellings such as missing access<br />

roadways and ordnance survey field revision text the presence of spectral values<br />

corresponding to rough pasture would indicate long term neglect.<br />

Because of the relatively high level of standard deviation associated with the<br />

sampling for rough pasture it is probably one of the more important parts of<br />

automated image processing to identify as once these areas are removed from the<br />

search remaining areas displaying similar levels of deviation in the red and green<br />

colour bands can be analyzed more closely (possibly even with flagged histograms<br />

for users to identify). In order to recognise the values discovered relative to the<br />

rest of the image three separate areas of cover were sample.<br />

94


Rough Pasture testing sample 1<br />

(pasture)<br />

Mean pixel<br />

deviation<br />

95<br />

Standard<br />

deviation<br />

Red 131.413 10.813<br />

Green 167.157 9.494<br />

Blue 102.256 12.591<br />

Rough Pasture testing sample 2<br />

(water)<br />

Mean pixel<br />

deviation<br />

Standard<br />

deviation<br />

Red 35.191 4.374<br />

Green 68.865 6.877<br />

Blue 83.6 12.964<br />

Rough Pasture testing sample 1<br />

(road)<br />

Mean pixel<br />

deviation<br />

Standard<br />

deviation<br />

Red 218.867 7.405<br />

Green 240.2 9.321<br />

Blue 197.533 10.802<br />

Table 26: Rough pasture test sample values<br />

The first testing sample for comparison to rough pasture was pasture –this<br />

contrasts well with rough pasture and is a useful benchmark in the algorithm. As<br />

with the similar (in terms of pixel values) area of mixed forestry, rough pasture<br />

has high levels of standard deviation from a relatively low mean pixel value for<br />

the red colour band (app. 70 on the converted greyscale). This contrasts well with<br />

the mean red pixel value expected for pasture (almost double), something which<br />

was borne out in the pasture sample. The level of standard deviation is similarly<br />

low (app. One third of the value found in rough pasture for the red and green<br />

colour bands). The fact that pasture is often adjacent to rough pasture makes this<br />

also a very useful comparative measurement. It should also be noted that in terms<br />

of the vector spatial data this is also useful as the same boundary polyline will flag<br />

both areas and could be used to refine the algorithm. This is helpful to automated<br />

image interpretation as it reduces the possible value set by allowing a reduced set<br />

of values to be applied over the first iteration of the analysis. In this way an initial<br />

analysis by polygon can apply the spectral values to a smaller subset of possible<br />

neighbouring polygons and save time, opening up the possibility for the software


cycle being suggested here to present an early estimate of the type of surface<br />

coverage the image is composed of.<br />

One other factor that could be included in a search for rough pasture is the<br />

presence of symbols within polygons of this type in the vector mapping. Initial<br />

steps within the algorithm could extract all polygons and all indication symbols<br />

(the polylines surrounding known areas of rough pasture are not coded) and retain<br />

the set of polygons where they intersect. These could then be discounted from the<br />

automated image analysis.<br />

The second sample area was an area of water. This was taken form a lake present<br />

on the plan because, as was indicated earlier in the study, only water polygons<br />

above the level stream (wider than 3m) return a pixel sample useful for reference.<br />

The values obtained matched those identified in the water area sampling in this<br />

thesis and contrasted with values for rough pasture (less than 50% of the red and<br />

50% of the green colour band values of rough pasture). The level of standard<br />

deviation varied greatly also with values almost six times higher in rough pasture.<br />

This makes water a good benchmark to rate the probability of rough pasture<br />

against, using the percentage increase on the values for the red and green colour<br />

bands as a reference.<br />

The third value sampled was a section of road. This sample is unique to this part<br />

of the study, so as to avoid overlap with road samples taken elsewhere and ensure<br />

the values remain consistent for the feature type. Road was selected for sampling<br />

because, as with water, it contains relatively unique spectral properties with a low<br />

level of standard deviation. This allows it to be used as a stable contrast to the<br />

values found in rough pasture (which are similar to some types of forestry and<br />

contain a high level of standard deviation in all three colour bands).For the red<br />

colour band the road had an increased mean value of over 20% on those found in<br />

rough pasture, with a standard deviation of less than a third of the rough pasture<br />

value. The green colour band showed the highest difference returning a value of<br />

over twice that found in the rough pasture samples and, as with the red colour<br />

band, a level of standard deviation less than a third of rough pasture. The blue<br />

96


colour band also returned a distinct difference, with a mean value for road over<br />

twice that of rough pasture.<br />

The last two sample areas gave a clear indicator for use in the identification of<br />

rough pasture. When these are combined with the vector attributes (symbol search<br />

and nearest neighbour probability) the algorithm is equipped with a robust means<br />

of ensuring as many areas of this type of land cover are identified and removed<br />

from the search as possible. In other words it a user (of the proposed image search<br />

process) was seeking to identify a crop type (or disease within that crop type etc.)<br />

they could easily set the process to discount rough pasture from the analysis.<br />

97


4 Testing<br />

This chapter describes a set of tests for known polygon types in the imagery being<br />

used in the study. Four land use types were selected; pasture, rough pasture, marsh<br />

and bog; and the polygons clipped from the raster imagery for spectral analysis.<br />

The process followed the outline presented in Chapter 2, with the vector<br />

coordinates being identified from the ordnance survey data. This set of<br />

coordinates was then used to extract the corresponding section of image, which in<br />

turn was analyzed for its spectral values. The overall results were good in terms of<br />

there being unique proportional pixel count values present in the polygons<br />

(selected from across the image with typical biasing factors such as shade present).<br />

Of the four sample areas, bog and pasture had the least amount of standard<br />

deviation –it was possible to automatically classify pasture even with a high<br />

proportion of shade present in the image section. Marsh and rough pasture had<br />

similar levels of standard deviation but can be distinguished by a dip in values<br />

between the shade and vegetation range in rough pasture, which was not present in<br />

the marsh polygons. This chapter is divided into four sections; as with the<br />

previous chapter the results are grouped according to a particular area of interest,<br />

starting with a look at the values from the known pasture polygons.<br />

98


4.1 Pasture Test<br />

Four areas of pasture were sampled across varying degrees of shade and<br />

unbounded (from vector data) internal ground cover outside pasture (exposed<br />

rock). The first of these was an area in the south west of the sample area and<br />

comprised a polygon surrounded by five fence vectors close to the road. The<br />

sample was chosen to see if the expected values for the man and standard<br />

deviation across the colour bands would be reflected in a section of image with a<br />

relatively high degree of variety in terms of ground cover.<br />

The data was sampled using the Radius software where the polyline values were<br />

exported to an ASCII file. This process will be easier once GML format spatial<br />

data becomes available (not available at the time of writing but will be over the<br />

next few years, OSI 2010). This process has two requirements in order for the<br />

polygon to be correctly sampled:<br />

• Firstly the line orientation needs to be the same for all polylines which<br />

bound the sample polygon, by convention this is anticlockwise (left of the<br />

line direction falling on the inside of the area to be extracted).<br />

• Secondly the first and last co-ordinates must match, with no other<br />

duplicate values present in the file.<br />

These qualifiers can be validated at the time of extraction once GML data is<br />

available, however for this study a text editor was used to search duplicate values.<br />

99


Figure 17: Creating the ASCII file<br />

Once the ASCII file was created, it was used to extract the region of interest from<br />

the file using the Mirone cropping tool;<br />

Figure 18: Aerial view of pasture test 1<br />

100


This then presented a sample area which can be analyzed for pixel values –note<br />

the high degree of shade in the south east and exposed rock in the north west of<br />

the area. The next step in the process involved analyzing the histogram data for<br />

the target sample. This was completed in Geomatica, with the pixel count reduced<br />

to remove the values counted outside the image values. In other words the<br />

irregular image was stored as a GeoTiff image which placed the area onto a blank<br />

background, resulting in a high pixel count for values above the expected range<br />

(in the order of thousands), which were discounted by reducing the pixel count.<br />

Figure 19: Red colour band for pasture test 1<br />

The above histogram shows the values for the red colour band, with a clear peak<br />

for pasture values (see sampling section). The lesser spike of pixels of lower<br />

values is consistent with the type of shade expected in this type of area, while the<br />

slightly higher values returned by the exposed rock distorted the standard<br />

deviation slightly. The areas content, however, can be clearly seen in the spike for<br />

pasture, something which is clearer to see in the histogram for the green colour<br />

band:<br />

101


Figure 20: Green colour band for pasture test 1<br />

In the above sample, the values for both shade and pasture are evident in spikes in<br />

the pixel count.<br />

The second sample gave a clearer representation of the area type with a relatively<br />

monochrome sample taken from the south east of the imagery:<br />

Figure 21: Aerial view of pasture test 2<br />

The relatively low level of shade and lack of exposed earth allows the algorithm to<br />

classify the area as pasture, based on the values from both the red and green<br />

colour bands:<br />

102


Figure 22: Red colour band for pasture test 2<br />

The red colour band histogram (above) displayed a peak at the expected pasture<br />

value with the small level of count for lower values indicate the shade present in<br />

the east of the area. Note: The high value at the far right is the count for bland<br />

pixels represented by null values outside the cropped section present in the<br />

GeoTiff image. These values were discounted by lowering the total pixel count to<br />

a range within which expected levels of values will fall. The values for the green<br />

colour band also showed that the sample could correctly be identified as pasture<br />

from the peak count values found in this band (see sampling section for an<br />

elaboration on how these values were arrived at).<br />

103


Figure 23: Green colour band for pasture test 2<br />

The third sample took a polygon to the centre of the plan, close to areas of bog,<br />

rough pasture and marsh. It had a high degree of shade and also contained a<br />

relatively high degree of variation in use (the darker band to the north, which<br />

contained trees –the dark band to the south is shade), although it is still an area of<br />

pasture.<br />

Figure 24: Aerial view of pasture test 3<br />

The histogram values for both the red and green colour bands still, however,<br />

pointed to a polygon with spectral values which falls within the expected range of<br />

pasture:<br />

104


Figure 25: Red colour band for pasture test 3<br />

The above red colour band sample shows the peak at the pasture band with the<br />

variance in terms of darker values representing the scrub/ bushy part of the sample.<br />

The values are slightly higher than the overall mean for pasture, but still fall inside<br />

what might be expected for an area of cut pasture<br />

Figure 26: Green colour band for pasture test 3<br />

105


The peak for the green colour band, around 190 on the converted greyscale, means<br />

that the slightly higher peak for the red colour band could be accepted as variance<br />

within the pasture range.<br />

The fourth sample that was used to test the expected ranges for the presence of<br />

pasture from automatically extracted polygons was an area to the south east of the<br />

image which was close to farm buildings and used as pasture, giving a relatively<br />

clean example of the land use type to test the algorithm against:<br />

Figure 27: Vector data for pasture test 4<br />

The above screen grab from the vector data (farm buildings are to the right),<br />

shows how the process being developed in this study uses known controlled<br />

106


coordinates to crop the imagery into a mosaic of tiles for processing (same area<br />

below);<br />

Figure 28: Aerial view of pasture test 4<br />

The values for the red colour band were consistent with the expected values and,<br />

with the small level of distortion as a result of shade increasing the level of values<br />

counted in the lower part of the colour range, the polygon can easily be processed<br />

as an area of pasture:<br />

Figure 29: Red colour band for pasture test 4<br />

Similar results were returned for the green colour band (below), with the shade<br />

causing some distortion to the pixel count but the overall values indicate an area<br />

of pasture:<br />

107


Figure 30: Green colour band for pasture test 4<br />

The above sampling demonstrates that it is possible to cut up an image based on<br />

controlled vector data and analyze each section with some degree of success. The<br />

aim of this sampling was not to show absolute values for each land type, it was<br />

only to prove the theory that when given correct vector data representing small<br />

area polygons then valuable data can be further derived based on spectral. Of<br />

further advantage of anyone using the suggested algorithm is that for this type of<br />

land cover (peri-urban and rural but close to settlements –which covers most of<br />

the country) the controlled vector data has changed little over time and means the<br />

process also lends itself to studies looking at change over time.<br />

108


4.2 Rough Pasture Test<br />

Four areas of pasture were sampled across varying degrees of shade and<br />

unbounded (from vector data) internal ground cover outside pasture (exposed<br />

rock). The first of these was an area in the south west of the sample area and<br />

comprised a polygon surrounded by five fence vectors close to the road. The<br />

sample was chosen to see if the expected values for the man and standard<br />

deviation across the colour bands would be reflected in a section of image with a<br />

relatively high degree of variety in terms of ground cover.<br />

The data was sampled using the Radius software where the polyline values were<br />

exported to an ASCII file. This process will be easier once GML format spatial<br />

data becomes available (not available at the time of writing but will be over the<br />

next few years, OSI 2010). This process has two requirements in order for the<br />

polygon to be correctly sampled:<br />

• Firstly the line orientation needs to be the same for all polylines which<br />

bound the sample polygon, by convention this is anticlockwise (left of the<br />

line direction falling on the inside of the area to be extracted).<br />

• Secondly the first and last co-ordinates must match, with no other<br />

duplicate values present in the file.<br />

These qualifiers can be validated at the time of extraction once GML data is<br />

available, however for this study a text editor was used to search duplicate values.<br />

109


Figure 31: Vector data for rough pasture test 1<br />

Once the ASCII file was created, it was used to extract the region of interest from<br />

the file using the Mirone cropping tool;<br />

Figure 32: Aerial view of rough pasture test 1<br />

110


This then presented a sample area which can be analyzed for pixel values –note<br />

the high degree of shade in the south east and exposed rock in the north west of<br />

the area. The next step in the process involved analyzing the histogram data for<br />

the target sample. This was completed in Geomatica, with the pixel count reduced<br />

to remove the values counted outside the image values. In other words the<br />

irregular image was stored as a GeoTiff image which placed the area onto a blank<br />

background, resulting in a high pixel count for values above the expected range<br />

(in the order of thousands), which were discounted by reducing the pixel count.<br />

Figure 33: Red colour band for rough pasture test 1<br />

The above histogram shows the values for the red colour band, with a clear peak<br />

for pasture values (see sampling section). The lesser spike of pixels of lower<br />

values is consistent with the type of shade expected in this type of area, while the<br />

slightly higher values returned by the exposed rock distorted the standard<br />

deviation slightly. The areas content, however, can be clearly seen in the spike for<br />

pasture, something which is clearer to see in the histogram for the green colour<br />

band:<br />

111


Figure 34: Green colour band for rough pasture test 1<br />

In the above sample, the values for both shade and pasture are evident in spikes in<br />

the pixel count.<br />

The second sample gave a clearer representation of the area type with a relatively<br />

monochrome sample taken from the south east of the imagery:<br />

Figure 35: Aerial view of rough pasture test 2<br />

The relatively low level of shade and lack of exposed earth allows the algorithm to<br />

classify the area as pasture, based on the values from both the red and green<br />

colour bands:<br />

112


Figure 36: Red colour band for rough pasture test 2<br />

The red colour band histogram (above) displayed a peak at the expected pasture<br />

value with the small level of count for lower values indicate the shade present in<br />

the east of the area. Note: The high value at the far right is the count for bland<br />

pixels represented by null values outside the cropped section present in the<br />

GeoTiff image. These values were discounted by lowering the total pixel count to<br />

a range within which expected levels of values will fall. The values for the green<br />

colour band also showed that the sample could correctly be identified as pasture<br />

from the peak count values found in this band (see sampling section for an<br />

elaboration on how these values were arrived at).<br />

113


Figure 37: Green colour band for rough pasture test 2<br />

The third sample took a polygon to the centre of the plan, close to areas of bog,<br />

rough pasture and marsh. It had a high degree of shade and also contained a<br />

relatively high degree of variation in use (the darker band to the north, which<br />

contained trees –the dark band to the south is shade), although it is still an area of<br />

pasture.<br />

Figure 38: Aerial view of rough pasture test 3<br />

The histogram values for both the red and green colour bands still, however,<br />

pointed to a polygon with spectral values which falls within the expected range of<br />

pasture:<br />

114


Figure 39: Red colour band for rough pasture test 3<br />

The above red colour band sample shows the peak at the pasture band with the<br />

variance in terms of darker values representing the scrub/ bushy part of the sample.<br />

The values are slightly higher than the overall mean for pasture, but still fall inside<br />

what might be expected for an area of cut pasture<br />

Figure 40: Green colour band for rough pasture test 3<br />

115


The peak for the green colour band, around 190 on the converted greyscale, means<br />

that the slightly higher peak for the red colour band could be accepted as variance<br />

within the pasture range.<br />

The fourth sample that was used to test the expected ranges for the presence of<br />

pasture from automatically extracted polygons was an area to the south east of the<br />

image which was close to farm buildings and used as pasture, giving a relatively<br />

clean example of the land use type to test the algorithm against:<br />

Figure 41: Vector data for rough pasture test 4<br />

The above screenshot from the vector data (farm buildings are to the right), shows<br />

how the process being developed in this study uses known controlled coordinates<br />

to crop the imagery into a mosaic of tiles for processing (same area below);<br />

116


Figure 42: Aerial view of rough pasture test 4<br />

The values for the red colour band were consistent with the expected values and,<br />

with the small level of distortion as a result of shade increasing the level of values<br />

counted in the lower part of the colour range, the polygon can easily be processed<br />

as an area of pasture:<br />

Figure 43: Red colour band for rough pasture test 4<br />

Similar results were returned for the green colour band (below), with the shade<br />

causing some distortion to the pixel count but the overall values indicate an area<br />

of pasture:<br />

117


Figure 44: Green colour band for rough pasture test 4<br />

The above sampling demonstrates that it is possible to cut up an image based on<br />

controlled vector data and analyze each section with some degree of success. The<br />

aim of this sampling was not to show absolute values for each land type, it was<br />

only to prove the theory that when given correct vector data representing small<br />

area polygons then valuable data can be further derived based on spectral. Of<br />

further advantage of anyone using the suggested algorithm is that for this type of<br />

land cover (peri-urban and rural but close to settlements –which covers most of<br />

the country) the controlled vector data has changed little over time and means the<br />

process also lends itself to studies looking at change over time.<br />

118


4.3 Marsh Test<br />

The testing for the marsh areas involved taking four known marsh areas (but<br />

geographically separate) from the imagery as ASCII co-ordinate files and using<br />

this data to extract the raster sections for spectral analysis. The sampling section<br />

of this study found marsh to have a relatively low level of standard deviation<br />

across the colour bands (when compared to the similar rough pasture type<br />

polygons). This might have been expected to change as there can be a relatively<br />

large degree of shade in these areas due to the presence of other vegetation –and to<br />

some extent this was the case. The samples, however, despite a high degree of<br />

standard deviation, did display some unique properties in line with the sampling<br />

section which allow them to be automatically classified. In general terms polygons<br />

with the value ranges displayed below will belong to either rough pasture or marsh<br />

(see sampling section, this is in reference to Irish small area polygons only).<br />

Rough pasture displays two peaks of values where the shade and vegetation<br />

contrast, the marsh samples did not have this distinction and the range of values<br />

graduated towards a peak (slightly lower than rough pasture ~10 on the converted<br />

greyscale).<br />

119


The first sample area came from a polygon of marsh north of a lake and included a<br />

large amount of vegetation.<br />

Figure 45: Vector data for marsh test 1<br />

The sampling for these was from separate areas across the imagery used in the<br />

study to try and achieve as consistent a picture as possible of this type of area. The<br />

co-ordinates of the area were extracted from the vector data (above) in ITM<br />

projection and used to clip the polygon from the raster imagery (below). As with<br />

all other samples in this test –the final results were adjusted by lowering the pixel<br />

count so as to account for plank pixels created in the output GeoTiff of the<br />

irregular clipped polygon:<br />

120


Figure 46: Aerial view of marsh test 1<br />

The results for the red colour band pixel count were similar to the spectral values<br />

for rough pasture but had a definite gradient between the areas of and vegetation,<br />

as opposed to two peaks for growth consistent with high vegetation, which<br />

emerged as typical of the marsh samples taken during the testing of this algorithm<br />

(the sampling section had a lower level of standard deviation but the full polygon<br />

samples also included shade and other growth, as in the east of the sample above):<br />

121


Figure 47: Red colour band for marsh test 1<br />

As with the spectral values for the red colour band, the green colour band showed<br />

a gradual increase in pixel count values along the greyscale from those related to<br />

areas of shade to those falling within the expected range for marsh:<br />

Figure 48: Green colour band marsh test 1<br />

The above trend would emerge as consistent through the marsh value testing, with<br />

a consistent graph for blue (within the sampling range) with values evenly<br />

122


distributed either side of a peak between 90 and 100 on the converted greyscale.<br />

The fact that these values mirror rough pasture closely mean that the marsh areas<br />

can only be classified once rough pasture values have been fully identified<br />

(towards the end of the third step in the algorithm).<br />

Figure 49: Blue colour band for marsh test 1<br />

The second known marsh area selected (outlined below) to test the algorithm was<br />

taken from an area east of the first sample, which included some areas of tree<br />

cover (north east) and vegetation close to rough pasture (south):<br />

123


Figure 50: Aerial view of marsh test 2<br />

One of the advantages of using vector data which has already been controlled to<br />

cut the photography into a mosaic of polygons is the extent to which the set data<br />

points mirror detail that would be extremely difficult to identify using automatic<br />

methods (such as the outline of the drain, visible in the indent on the eastern edge<br />

of the above polygon). The values for this larger sample area, although it<br />

contained more tree cover that the first sample, remained consistent with all four<br />

samples with a gradual incline from shade values to the expected values for marsh<br />

in the red colour band (following page):<br />

124


Figure 51: Red colour band for marsh test 2<br />

This gradual increase in values from shade to marsh (as per the original sampling<br />

for this study, see sampling section) was also present in the green colour band:<br />

Figure 52: Green colour band marsh test 2<br />

The blue colour band remained consistent both with the other three test polygons<br />

used in this study, and the sampling for marsh. This suggests that the blue colour<br />

band pixel count values (below) can be added as an additional point of reference,<br />

125


from which any variance would flag a problem with the polygon analysis and flag<br />

it for further analysis (step 4 in the algorithm):<br />

Figure 53: Blue colour band for marsh test 2<br />

The third known marsh polygon used to test the expected values was taken from<br />

an area east of the image, with a high degree of shade and trees (and as a<br />

consequence overhanging growth making aerial analysis difficult). This was used<br />

to see if a relatively small area of marsh (in comparison to the other areas used in<br />

this section in of the study) could return values close to what an automatic<br />

analysis would expect, even with some distortion from neighbouring features):<br />

126


Figure 54: Aerial view of marsh test 3<br />

The values for the red colour band for this smaller section of known marsh were<br />

not consistent with all the other test polygons, with a small rise from shade values<br />

to those expected for marsh. This suggests, to a greater extent than the larger<br />

samples, that it is difficult to automatically classify marsh based on spectral values<br />

alone. The classification of this type of polygon must, as a result, take place<br />

towards the end of the search –so that all known values, including those<br />

automatically registered in the algorithm (such as pasture and bog) are first<br />

removed from the pool of areas being studied. It should be noted that this part of<br />

the study is not intended to be an exhaustive search for a means of automatically<br />

identifying marsh, but to show that it is possible when an aerial image is divided<br />

into discrete small area polygons –other factors, for example the percentage of<br />

rough pasture and water polygons in the same region of the image (taken from the<br />

vector data) could potentially be used increase accuracy in identifying this type of<br />

ground cover.<br />

127


Figure 55: Red colour band for marsh test 3<br />

The values for the green colour band showed a similar level of distortion and, as<br />

with the red colour band values form this area and unlike the other three samples<br />

in this section, the count did not peak neatly in the expected range for marsh. The<br />

blue colour band, however, was consistent with all other test polygons and<br />

sampled imagery. This suggests that for small areas (below 2ha.) marsh is difficult<br />

to detect automatically and could only be assigned the attribute when all other<br />

values in the region of interest being analyzed have been assigned.<br />

128


Figure 56: Blue colour band for marsh test 3<br />

The fourth sample known marsh polygon used in the test was from an area to the<br />

south of the sample image (below). As with all others it formed an irregular<br />

polygon bordered by other non-marsh areas. There were also some trees and other<br />

vegetation present in the area, which was something present in all the marsh<br />

samples. This suggests that the type of ground cover being described is often<br />

something that occurs in transitional areas, and the difficulties associated with<br />

getting clear spectral values (in terms of low levels of standard deviation) are a<br />

result of this variation. The samples, however, were consistent with the other two<br />

large marsh test polygons, and the pixel count for the red and green colour bands<br />

peaked within the expected range.<br />

129


Figure 57: Aerial view of marsh test 4<br />

The values for the red colour band for the fourth test polygon showed a gradual<br />

incline from the values for shade to the expected range for marsh:<br />

Figure 58: Red colour band for marsh test 4<br />

The green colour band values were also consistent with this trend and, once shade<br />

is removed from the analysis, these could be used to identify marsh. The values<br />

130


for the clue colour band, as with the previous three test polygons, remained<br />

consistent with an expected range for marsh (see sampling section):<br />

Figure 59: Blue colour band for marsh test 4<br />

The values returned for the test of marsh polygons were not as distinct as the<br />

previous three polygon tests (pasture, rough pasture and bog). There were sets of<br />

values (such as a consistent blue range and gradual increase in count from shade<br />

to the expected marsh range) which can be used to classify this type of area, but<br />

the classification needs to be confined to the latter (step 3) part of an automatic<br />

search algorithm to increase the probability of the spectral values matching<br />

correctly.<br />

131


4.4 Bog Test<br />

The sampling for areas of bog took larger polygons which, with forestry and water<br />

bodies already eliminated from the search algorithm (with the vector data) could<br />

automatically be assumed to have a high probability of belonging to an area of<br />

bog. The first sample was taken from an area of bog bounded by roads on all sides<br />

in the northeast of the imagery being used in this study (see the general<br />

introduction for details). This polygon was chosen for sampling as it could be<br />

considered a good example of this type of land cover:<br />

Figure 60: Vector data for bog test 1<br />

The data points which form the polylines bounding the polygon were exported in<br />

ASCII format. As with all clip files for this algorithm the points are recorded in<br />

132


anti-clockwise format with a space separating easting and northings, a newline<br />

separating coordinate pairs and the start/ end point appearing twice. The<br />

projection is, as with all the data in this study, Irish Transverse Mercator. This<br />

coordinate set then formed the boundaries for a clipped area of the raster image:<br />

Figure 61: Aerial view for bog test 1<br />

As with pasture, areas of bog produced a uniform pixel count with a low level of<br />

standard deviation (see sampling section for more detail on pixel values). One<br />

aspect of this type of ground cover, however, which sets it aside from other<br />

searches in this study is the absence of shade (due to the absence of trees and thick<br />

vegetation). The small area of shade present in the south of the above sample is<br />

not enough to distort the mean values and proved to be typical for the imagery. Ad<br />

with all the samples in this section of the study, the histogram values for the<br />

polygons have been adjusted to remove the distortion caused by blank pixels<br />

adjacent to the irregular shape created when it is exported to GeoTiff format. This<br />

was achieved by reducing the pixel count to eliminate values over 5000 (400 in<br />

the case of pasture and rough pasture, where sample areas are much smaller). The<br />

red colour band histogram (below) returned a clear spike, consistent with expected<br />

bog spectral values (see sampling section) and unique among the polygons in the<br />

imagery being studied (Note the slight peak in lower values, giving a clear<br />

indication of the presence, and proportion of, shade in the sample –something<br />

which facilitates automatic attribute detection):<br />

133


Figure 62: Red colour band for bog test 1<br />

The green colour band values for the same polygon also revealed a clear pattern to<br />

the values, peaking at the expected range for an area of bog:<br />

Figure 63: Green colour band for bog test 1<br />

The blue colour band also displayed a range of values which peaked for expected<br />

pixel values for an area of bog; the result of which is that the area can be classified<br />

as bog. This has implications for the determination of the growth/ decline of this<br />

134


type of area over time. The focus of this study is to determine the value of<br />

cropping raster imagery based on small area polygons so as to automatically<br />

detect values in the image, so the specifics of the type of bog or other factors<br />

which can be determined from further processing have not been developed, save<br />

to flag the necessary properties (i.e. The spectral values set in the sampling section<br />

and identified in this sample, and the area property of being larger than four<br />

hectares –which eliminates all other probable land cover types).<br />

Figure 64: Blue colour band for bog test 1<br />

The second sample area of bog was extracted from an area bounded by river,<br />

rough pasture, drain and road and produced an irregular polygon which was then<br />

extracted from the imagery using Mirone software and saved as a GeoTiff file.<br />

This file was examined for the spectral values across the colour bands. As is clear<br />

from the crop area from the image, the properties contain the uniform values (and<br />

associated low level of standard deviation) which are useful in for automatic<br />

analysis and classification.<br />

135


Figure 65: Vector data for bog test 2<br />

136


As with the first sample, little or no shade is present in the area being analyzed:<br />

Figure 66: Aerial view for bog test 2<br />

The red colour band produced a clear indication of an area of bog, consistent with<br />

expected values but with a small distortion due to the vegetation in the west of the<br />

image (evidenced by the very slight grade of values below the expected range):<br />

Figure 67: Red colour band for bog test 2<br />

137


The histogram for the green colour band also displayed a proportion of spectral<br />

values which was consistent with an area of bog. It should be noted that the values<br />

are very similar to the benchmark values identified in the sampling section, but are<br />

from a real world polygon –the sampling section used sections of the land type<br />

from within known areas to set the benchmark. This part of the study is looking at<br />

typical polygons extracted using vector data. It is therefore significant that the<br />

results are so similar:<br />

Figure 68: Green colour band for bog test 2<br />

As with the first sample, all three colour bands matched the expected range, with<br />

the blue colour band peaking for a range consistent with bog. This makes the<br />

proportional method (comparing the values to their proportional range to known<br />

areas such as water bodies and roads) possible, and suggests that during the<br />

second step of the algorithm the areas flagged as bog could be included into a<br />

known set of values to test areas with high standard deviation and multiple peaks<br />

of values against;<br />

138


Figure 69: Blue colour band for bog test 2<br />

The third sample for the bog land area type took a large section of bog to the north<br />

west of the imagery, which was exported from the vector software as a set of co-<br />

ordinates in ASCII format. It is worth noting that this format has been used<br />

throughout this study as it allows for easy manipulation of the files and would<br />

make them compatible with a GML routine, but the sets of co-ordinates can also<br />

be contained in a ESRI shape file (.shp) and this could then be used to cut the<br />

sections with the software used here. The downside to this, however, is that the<br />

sets cannot be (as easily) fed into reports or user created routines.<br />

139


Figure 70: Aerial view for bog test 3<br />

The results across all three colour bands were similar to the first two test samples<br />

and matched the expected values identified in the sampling section of this study,<br />

which indicates for general areas of raised bog this algorithm gives an accurate<br />

indicator of total area. This may not be the case for more remote sections of this<br />

land type, found on mountain ranges. The algorithm is dependant on closed vector<br />

polygons, which are available for almost all of the country, with the exception of<br />

high mountains. In those cases it would be necessary to re-set the image key to<br />

search for values of exposed rock and determine the proportional difference<br />

between the expected bog values and those for exposed rock in much larger<br />

mountainous polygons –this is something which the algorithm can be adapted for<br />

but is not attempted here as the focus is on matching generic land use types as<br />

much as possible (the larger mountain areas would contain a mix of marsh, bog,<br />

exposed rock and vegetation).<br />

140


Figure 71: Red colour band for bog test 3<br />

As with all the bog samples this test area returned values within the expected<br />

range for the land use type and the pixel count for the colour bands fell within<br />

expected range for red (above) and green (below):<br />

Figure 72: Green colour band for bog test 3<br />

141


This trend was continued for the blue colour band and indicates that there is a high<br />

degree of probability of bog once a low standard deviation and the value ranges<br />

are met –the graph for the blue colour band for the third test sample is below:<br />

Figure 73: Blue colour band for bog test 3<br />

The final test sample for an area of bog was taken from the west of the image and<br />

contained some vegetation and shade which may have distorted the results –as<br />

with all the samples of this type of land cover, the percentage shade is very low,<br />

allowing it to be a distinguishing feature for inclusion in an automatic search:<br />

142


Figure 74: Aerial view for bog test 4<br />

The red colour band results showed a slight grade from values of shade into the<br />

expected range of the land type, which was a result of the small area to the south<br />

and west of the polygon, but the values still fell within the range expected:<br />

Figure 75: Red colour band for bog test 4<br />

143


The green colour band pixel count also produced similar results, with the<br />

indications being that the polygon can (coupled with the relatively small level of<br />

standard deviation, large area size and low values of shade) be automatically<br />

recorded as an area of bog:<br />

Figure 76: Green colour band for bog test 4<br />

As with all the bog samples and test areas in this study the blue colour band values<br />

remained within a small range with a low level of standard deviation (see<br />

sampling section).<br />

144


Figure 77: Blue colour band for bog test 4<br />

In general terms bog is a very useful land use type for inclusion into an automatic<br />

aerial image survey such as this one, as it can be quickly identified and logged<br />

early in a looping cycle through the spectral values of polygons. As mentioned<br />

above, this is dependant on controlled polylines bounding the area, something<br />

which is available for most of the country but results may be distorted on remote<br />

high ground (identified by 1:5000 mapping and the presence of cropped rock).<br />

The purpose of this test was to determine the value of using the vector data to clip<br />

aerial imagery into a mosaic of polygons for spectral analysis. As mentioned in<br />

the sampling section of the study it is possible to include pattern analysis testing to<br />

determine the level of cutting taking place (drains are included in the vector data<br />

and could also factored into such a search).<br />

145


4.5 Conclusion<br />

Image segmentation is one of the most important parts of automatic analysis of<br />

aerial imagery (Zhou & Wang, 2007). A set of reference data is necessary to know<br />

where to divide image sections. This can be obtained from a survey input by the<br />

user or from spatial data specific to the area being studied (peat, forestry etc.).<br />

Ordnance survey vector data provides a comprehensive set of reference points and<br />

allows an aerial image to be cropped into small discrete area polygons. These<br />

polygons can also benefit from the previously captured coding which identifies<br />

many of them as a specific land type. The result of adding this data to an<br />

automatic search for specific spectral values is that the user can gain context from<br />

known neighbouring polygons and calibrate the specific search accordingly. This<br />

in turn means that the process of image analysis can be simplified by applying a<br />

generic technique for identifying polygons and refining it to search for a given<br />

value.<br />

This study looked at the value of cropping aerial imagery into a mosaic of known<br />

and unknown polygons. It attempts to automatically derive probable types for the<br />

unknown areas based on the known data and a sampled image key. The sampling<br />

and testing undertaken during the study indicated that it is possible to derive<br />

useful value from a spectral analysis based on a pixel count alone. This was<br />

because the vector data introduced into the process reduced the number of<br />

possible values that can be attributed to a pixel set –for example, as the extent of<br />

forestry is known, similar values returned from an unknown polygon must<br />

represent marsh or rough pasture while further analysis of the shape of the range<br />

can distinguish between either.<br />

The process is possible using open source software but could also be coded into a<br />

standalone application, e.g. using the GDAL library and a function to crop<br />

irregular polygons. The potential for automation will be supported by the release<br />

of the vector data in GML format, from which a large ASCII file of coordinate<br />

sets could be fed into the process. By removing the requirement for a user to<br />

control areas of the image through the use of the vector data, and by presenting the<br />

146


user with a pre calibrated image key of expected spectral values it is possible to<br />

automatically classify aerial image sections. It should be noted that this study was<br />

confined to a specific type of vector data (ordnance survey) and the landscape of<br />

small polygons with a single land use may not apply to all landscapes. However,<br />

the study proves that large scale vector data can be used to simplify aerial image<br />

processing.<br />

147


5 Literature Review<br />

The goal of this thesis is one that is in line with most work completed using<br />

remote sensing processing in that it is looking for traits in aerial imagery which<br />

can be used to derive useful information about the surface of the earth. One of the<br />

early studies of aerial image interpretation (Kittlers ‘Image processing for remote<br />

sensing’ paper) described the process as “the interpretation of image segments that<br />

exhibit similar statistical properties” (J.Kittler, 1983). One addition that could be<br />

made to that definition is that, for the majority of studies in this field the<br />

interpretation should be automatic, or as close to automatic as to make<br />

interpretation of a large volume of data viable.<br />

There is a vast body of work available which documents various methods for<br />

interpreting aerial and satellite imagery. In general terms there is always a focus<br />

for each study, e.g. identifying coffee plantations in Costa Rica (Corado-Sanches<br />

and Sader, 2005) and this influences the methodology. One result of this is that<br />

there are a large variety of methods employed. This literature review considers a<br />

representative sample of these in terms of their focus, in other words treating work<br />

that uses patterns or shapes as one category, spectral deviation for agricultural/<br />

forestry purposes as another and urban analysis as a further category. In terms of<br />

previous studies, the ones that are closest to what this thesis is attempting is the<br />

body of work that has been completed on what has been termed ISAs<br />

(impermeable surface areas, or, more usefully, hard ground). The focus of these<br />

works is to identify the percentage of hard ground within urban areas which can<br />

then be used in modeling flood events. During the early part of the study I made<br />

use use of the SWAP technique recommended by T.Knudsen in his 2005 analysis<br />

of color in aerial imagery to identify grey areas in the test imagery (Figure 5:<br />

Water Area Image Modification). In the context of the data being analyzed by this<br />

thesis (Irish peri-urban land parcels) these grey areas within the image can be<br />

made to correspond to hard ground.<br />

Before considering the body of work underlying this thesis it is probably useful to<br />

answer two questions which the reader might ask; is this not just<br />

148


photogrammetry?, and why not use an established algorithm such as the 2000<br />

vegetation-impervious surface-soil sub pixel analysis techniques published in the<br />

2000 issue of Remote Sensing (Phinn et al)?<br />

In answer to these questions this review will not be considering a history of<br />

photogrammetry other than a general outline of established (traditional)<br />

processing techniques. It will also not be describing some of the segmentation,<br />

target area identification pre-processing methods used in the various studies. This<br />

work is often a major component of this type of analysis. The answer to the first<br />

question is that in general terms this study is photogrammetry but takes as a<br />

starting point controlled photography and captured polygon data so to consider the<br />

body of work underlying theses techniques falls outside the scope of what this<br />

thesis is attempting<br />

The answer to the second question is that this study differs from previous<br />

techniques in that it pre-supposes a large amount of information form the data<br />

(features of the built environment, feature coding, water parcels, forestry parcels,<br />

roads by category, footpaths and buildings by category) so feature capture is not<br />

part of the study. A possible addition to the study would be a consideration of<br />

feature capture using pattern analysis. In particular the identification of out<br />

buildings adjoining existing dwellings would be useful. However, this is outside<br />

the scope of this study. It can be assumed from the outset most of the major<br />

physical features present in the built environment are present in the data in vector<br />

format. This narrows the application of the technique to areas that are covered by<br />

large scale mapping but results in an automatic method for adding data to this<br />

mapping. One possible application is calculating the percentage of hard ground in<br />

a region of interest.<br />

In general terms the study can be seen as specific to urban areas which have been<br />

digitally mapped at a large scale (1:2500 or 1:1000 scales). Arbitrarily segmenting<br />

an image is a technique that has been used in previous studies (Ketting &<br />

Landgrebe, 1976) but this study differs in that the segments are specific small area<br />

polygons corresponding to property divisions and physical features. Results<br />

149


shown give a more detailed picture of these areas and in this way eliminate some<br />

of the problems encountered in previous studies.<br />

There are two broad areas within the body of work on processing of remotely<br />

captured spatial data which will be considered in the remainder of this review;<br />

these can be considered spectral analysis methods, and the methods associated<br />

with identifying polygons and the deviation of data from polygon patterns.<br />

Before discussing these it might be useful to briefly run through the data capture<br />

process as it stands in a traditional mapping environment (such as in OSI, the<br />

former OSNI and OSGB). This process has been neatly explained by Bingcai<br />

Zhang and Neal Olander in their paper to the 2000 ESRI user conference. They set<br />

out the process as the captured imagery being manipulated by a user in a software<br />

package (e.g. SOCKET SET) to produce a shape file of vector data from the<br />

original hardcopy imagery. In this study the process of creating the data has<br />

already been completed (along with the prior image control as described by Zhang<br />

and Olander), so the thesis can be considered to be a method of re-visiting the<br />

imagery to add value to the captured vectors/ features. It should be noted that the<br />

process developed by this thesis is not dependant on expensive packages (such as<br />

SOCKET SET) and could be applied to lower budget environmental monitoring<br />

systems. Using low cost photogrammetric packages such as ShapeCapture there<br />

would be a trade off in terms of consistency and accuracy (Aguilar et al, 2005).<br />

Josef Kittlers 1983 Philosophical Times paper, as cited above, provides a useful<br />

introduction to the subject matter of this thesis. It was written as an attempt to<br />

summarize the various attempts that had been made towards automating the<br />

analysis of aerial imagery at that time. The text is useful not just as a background<br />

to the historical development of the field but also as an outline of the processes<br />

involved (with respect to multispectral image segmentation). The author looked at<br />

a wide number of previous studies (27 are cited) and summarized the work into a<br />

series of categories. The technology available to process imagery has evolved<br />

massively over the intervening quarter century but the basic techniques (in terms<br />

of pixel analysis) and motivations (in terms of data required) remain similar today.<br />

I have chosen this paper for the review as in some ways it sets the context for the<br />

work being undertaken, that is the “interpretation of image segments that exhibit<br />

150


similar statistical properties” (Kittler, P.323). Kittler divides image processing into<br />

six sections; (1) the sensor and (2) data collection, (4) image preprocessing, (5)<br />

segmentation and classification and (6) image interpretation. The work undertaken<br />

in this thesis relates to part five of this system, although it will differ from the<br />

work considered by Kittler’s paper in that some image interpretation will have<br />

taken palace beforehand.<br />

Kittler has identified the first part (of the segmentation and classification step<br />

described above) as analyzing remotely sensed data in order to “identify<br />

homogeneous segments in the image” (Kittler, P.324). He introduced a method for<br />

pixel-by-pixel classification to achieve this. For this method he suggests<br />

identifying and classifying all the pixels on a pixel by pixel basis and then linking<br />

identical pixels to form connected segments. In many ways this is probably the<br />

holy grail of image interpretation as if perfected a machine could automatically<br />

identify change and update maps. Kittler also proposes that segments of pixels<br />

exhibiting similar properties could be used for this purpose. This thesis does not<br />

suppose an identical method, instead it attempts to identify the proportion of<br />

pixels corresponding to hard ground in a small area polygon, subtract buildings,<br />

roads and water polygons and attach a value for impermeable surface to the area<br />

data. Kittler suggests that a Bayesian probability formula can be used to determine<br />

the class of a segment. He suggests using what he terms “ground truth data”<br />

(Kittler, P.325) for the probability function to assign pixels to a class and<br />

determine the composition of a segment. He cites the class homogeneity of land<br />

surface covers as the means to initially segment and classify the pixels. This work<br />

can be made difficult by weather conditions and instrumental scanning errors<br />

(Note: Kittler’s paper is from a time when GPS controlled measurements were not<br />

readily available and such errors are less of a factor today, though small<br />

inconsistencies can occur, particularly at high altitude).<br />

Kittler further suggests a method for partitioning the image into segments (initially<br />

into cells of 2*2 pixels as suggested by Ketting & Landgrebe, 1976). In this way<br />

he estimates that the larger size of land cover would allow an analysis to identify<br />

neighboring pixels with similar properties. He also suggests another method for<br />

identifying segments of the image which he terms “two dimensional spatial<br />

151


dependencies” (Kettling, P.330). This is the use of the four neighbours of a given<br />

pixel to identify the probability of them being part of the same group.<br />

In terms of this thesis Kettling’s paper suggests that an analysis of aerial imagery<br />

could potentially reveal large amounts of data about selected features (particularly<br />

if the segmentation has been completed already by vector mapping). Kettling’s<br />

paper does suggest that it is possible to identify homogenous areas on a pixel by<br />

pixel basis and is the focus of this thesis.<br />

5.1 Spectral and image considerations for the thesis<br />

This body of work forms the basis for what will be the main argument of this<br />

thesis, that it is possible to automatically capture spatial data relating to<br />

impervious ground in Irish towns, using controlled photography and matching<br />

vector data. This requires processing which makes use of the spectral data<br />

contained in aerial imagery of the sample data, which in turn presents a number of<br />

separate problems. One of these is bidirectional reflectance, and while this is not<br />

expected to be a major consideration while developing the algorithm for the thesis<br />

it nevertheless warrants consideration. One method of calculating for this is to<br />

adjust the imagery based on either ground sampled data or imagery from a higher<br />

(possibly satellite) vantage, and is something that was considered by Sakari<br />

Tuominen and Anssi Pekkarinen in their 2004 study of forestry in southern<br />

Finland. The authors consider a method of improving the value of data being<br />

retrieved from aerial photography (in conjunction with satellite data) by reducing<br />

the presence of bidirectional reflectance. This is a problem with the way light is<br />

hitting the surface of the earth causing the spectral values if image pixels to<br />

depend on their location in the image.<br />

One approach would be to focus any study on the centre of the image, where<br />

bidirectional reflectance would not be as big an issue. However, the study was<br />

attempting to find a more effective method of correcting this using overlaying<br />

satellite images and a correcting algorithm. The reason they chose satellite<br />

imagery was that they are less affected by bidirectional reflectance and this<br />

152


enchmark would allow the authors to conduct local adjustment for the pixel<br />

values. The study covered 4500 hectares of boreal forest located in the<br />

municipality of Kuru in the south of Finland.<br />

The core of the study is a local radiometric correction method for reducing the<br />

effect of bidirectional reflectance. This problem was not a major issue in the thesis<br />

but the methods used by the authors (finding a larger scale benchmark image to<br />

reference study areas against) was considered.At the heart of the Finnish study the<br />

problem was of similar objects possessing different spectral characteristics in<br />

different parts of the image. This was a problem which was less relevant to the<br />

focus of this study (the authors are focused on forestry data). The authors<br />

conclude, not unsurprisingly, that the value of remote sensing is dependant on<br />

what is visible and what can be registered by the airborne sensor.<br />

As mentioned in the introduction to this chapter, the body of work which<br />

examines automatic capture of hard ground within urban areas is of particular<br />

relevance to this thesis. The 2007 study by Yuyu Zhou and Yu Wang of urban<br />

examples in Rhode Island is a good illustration of the type of factors that need to<br />

be considered in this type of survey. The study, which used true-colour digital<br />

orthophotographic data with a 1m spatial resolution (forming a controlled dataset<br />

in .tiff format with red, green and blue spectral bands present) segmented the<br />

imagery according to urban districts. The authors note that “successful image<br />

segmentation is the most important prerequisite in object-oriented classification”<br />

(P.644). It is hoped that by using previously captured and verified vector data this<br />

thesis will have met this prerequisite.<br />

The algorithm which was employed for this survey was broken down into four<br />

parts; segmentation, compensation for shadow effect, analysis of variance<br />

classification and post classification of the data. There is a large body of work that<br />

has been completed using automatic interpretation of aerial imagery; the focus of<br />

this work is usually towards a specific purpose, such as the 2007 analysis of coffee<br />

crops in Costa Rica outlined by S.Cordero-Snacho and S.Adler. The study is a<br />

useful example of some of the problems that can be encountered when attempting<br />

automatic image analysis. In the study the authors consider the problem of<br />

153


identifying coffee plantations from remotely sensed data of tropical forestry. The<br />

main focus of the study was to identify a means of separating the areas of coffee<br />

from similar data (in terms of wavelength and spectral values) representing<br />

tropical forest. This problem is made even more difficult due to the fact that the<br />

coffee plantations are often set under forestry (due to the shelter the cover<br />

provides for the crop). In addition the terrain is often mountainous so the authors<br />

had the additional problems of variety in terms of elevation, associated mist/ cloud<br />

cover and shade to overcome. While these problem is not something that was a<br />

large factor in the low flown aerial photography of Irish suburban landscape used<br />

in this thesis, the methods the authors employ to deal with cloud and haze are<br />

relevant. It was advantageous to the thesis to be able to reduce the impact of any<br />

of the areas of shading that existed.<br />

The authors took rectified Landsat imagery of a large tract of land in central Costa<br />

Rica, the Central Valley surrounding San Jose. This imagery had been captured<br />

during the rainy season. They broke the study down into a series of steps, starting<br />

with classifying three different waveband combinations in the imagery. They then<br />

developed what they termed a “Coffee Environmental Stratification Model”<br />

(Coredo-Sancho & Ader, P.1581) before comparing this with supervised results<br />

(from known data). They then set out to identify which waveband combination<br />

best matched coffee crops.<br />

The identification of a control section of water which the authors used to reduce<br />

haze in their imagery helped with the development of the image key for this thesis.<br />

The method the authors used in the study was to find an area of deep water to<br />

identify a minimum reflectance value and subtract this from each of the non<br />

thermal bands. The next step the authors took was to remove clouds from the<br />

imagery, they did this by creating a binary mask using a classification of arbitrary<br />

clusters they had developed. This allowed them to recode areas which it identified<br />

as being contaminated by cloud or shadow. When this was applied the clusters<br />

were recoded to a zero value, they also digitized “isolated” (Cordero-Sancho &<br />

Ader, P.1582) clusters on a case by case basis. It is very beneficial to smooth areas<br />

of shade within a polygon. This might be done by estimating a percentage of<br />

shade that should be present in the polygon (based on time of day and vector code<br />

154


making up the boundary, e.g. fence/ forestry/ building/ water). By identifying a<br />

value that this shade should fall under it is possible to derive a corresponding<br />

relationship in the histogram and adjust the results of the study accordingly.<br />

The final result of the authors work in Costa Rica was only “moderately<br />

successful” (Cordero-Sancho & Ader, P.1589). This would seem to have been<br />

largely due to the altitude with which the imagery they used was captured, by their<br />

own admission the results would probably be better had the imagery been low<br />

flown or of better resolution. The methods the authors employed in the study are<br />

useful examples of how inconsistencies in results obtained during the process of<br />

completing this thesis might be countered –in particular potential methods for<br />

reducing the effect shade might have on altering the results could be applied.<br />

In this thesis, as with most automatic aerial imagery analysis, the classification of<br />

target areas in the photo (usually according to spectral values) is a vital part of the<br />

process. One solution is to develop a key to differentiate between features. The<br />

level of detail that can be obtained can be quite precise, but is dependant on the<br />

resolution of the imagery and the complexity in the patterns of distribution of the<br />

target. This is illustrated by Megan Lewis 1998 study of vegetation communities<br />

in an area of westerns New South Wales. In this study she attempted to develop a<br />

key to differentiate between vegetation types. The problem she was attempting to<br />

counter was that of identifying particular species. She noted that existing aerial<br />

analysis could detect the presence of vegetation and in an attempt to improve this<br />

process she divided these species into colour bands. She took sample plots of<br />

250sqm corresponding to 8*8 blocks of pixels in a relatively homogenous area<br />

and calibrated the relationship between field verified data and fifty of these blocks<br />

(using them as training areas for the study). The study used 12 colour bands which<br />

were allocated into nine classes and identified a link between these and vegetation<br />

classes. In the conclusion to the study the author noted that it was possible to<br />

portray sub-polygon variation using pixel-based imagery.<br />

In order to complete this study it was necessary to make the best use of the<br />

available imagery. This imagery can benefit from preprocessing in order to<br />

highlight the areas being captured. One method for achieving this might be to<br />

155


apply an algorithm to colour the data so as the target areas are easily captured.<br />

This is something which was identified by Thomas Knudsen in his 2005 study of<br />

pseudo natural colour aerial imagery for urban and suburban mapping (and in<br />

previous studies by the same author). In his study he suggests an algorithm for<br />

automatic urban and suburban aerial image interpretation. The paper uses test data<br />

from (pseudo) natural color images used in traditional photogrammetry (as<br />

opposed to airborne four channel imagers). His aim is to discriminate between<br />

vegetation and human made materials, which was also one of the aims of this<br />

thesis. The author cites the relative importance of separating vegetation (which he<br />

considers to be void of mapping objects) and human-made materials in respect to<br />

automated photogrammetric mapping. It is worth noting at this point that imagery<br />

captured in the near-infrared band is generally indicative of vegetation and<br />

Knudsen’s work is an attempt to identify this band using only aerial photography<br />

captured using red, green, blue three channel instruments. His work is very<br />

relevant to this thesis as over the course of his study he identifies a method of<br />

obtaining “excellent” (Knudsen, P.2691) reproduction of grey surfaces (which in<br />

an urban area correspond to paving and exposed rock).<br />

The author takes a look at three algorithms in terms of their effectiveness in<br />

discriminating between areas in scanned aerial photograph. The first is a pseudo<br />

natural color algorithm developed by the author in a previous study (Knudsen<br />

2002) where he managed to create a blue channel based on green, red and near<br />

infrared values and left the green and red values as captured. This allowed for<br />

good reproduction of red surfaces (which corresponded to roof surfaces in the<br />

Danish sample data) but suffers slightly from haze effect.<br />

The second algorithm the author considers is one which creates a blue channel<br />

between green and near infrared and a green channel form similar values and<br />

leaves the red as captured. This, similar to the first algorithm, gave good<br />

reproduction of red surfaces and of vegetation, but failed in reproducing clear grey<br />

surfaces. The third algorithm the author considers involved swapping the green<br />

data for blue, the near infrared for green and leaving the red as was. This allowed<br />

him to reproduce grey surfaces accurately (making them stand out in the<br />

156


photography) but was less useful for vegetation, leaving an “artificial looking<br />

hue” (Knudsen, P.2691).<br />

The final part of the paper sets out a method of modifying the first algorithm to<br />

improve its value in creating data for interpretation. The steps are to restore<br />

black/grey/white, vegetation-covered and red/yellow-reddish areas lost in<br />

preprocessing, to re-whiten very bright objects and to amplify the pixels (enhance<br />

the colour saturation). The author provides reference to a more detailed technical<br />

implementation (using information from previous papers he published) but these<br />

are less relevant to this thesis as the result would not be suitable for identifying<br />

hard ground.<br />

One area where there is considerable information to be gained is in the area of<br />

forestry, particularly in capturing the spread of disease or invasive species in a<br />

plantation. One such study was undertaken by M.Martin, S.Newman, J.Aber and<br />

R.Congalton in 1998. I have included it here as I think it is a good example of<br />

what appears to be a standard remotely sensed image analysis. In this study the<br />

authors set out to obtain remote data relating to tree species in an area called<br />

Prospect Hill in central Massachusetts. Their target data was species identified by<br />

11 forest cover types. To do this they used a maximum likelihood algorithm<br />

assigning all pixels in the aerial image to one of the 11 categories they were<br />

searching. The survey was validated using field data (taken from a database of<br />

species type). They note that at that time (late 19990’s) spectral data had already<br />

been used to identify categories of forest cover. These prior surveys had been<br />

successful in discriminating between coniferous and deciduous cover (the authors<br />

cite the examples of Nelson et al., 1985, Shen et al., 1985 and Landthrop et al.,<br />

1994). The primary goal was identification of species composition from the forest<br />

canopy and in this the authors had reasonable success –a random selection of<br />

pixels yielded an overall classification accuracy of 75% (Martin et al 1998). The<br />

study used photographic tiles of 10*10km with a high spectral resolution. The<br />

authors suggest further improvements in accuracy could be made by identifying<br />

the (deciduous) species with both their leaves on and off which would allow for a<br />

foliar biomass calculation to be made.<br />

157


In 2002 S. Phinn, M. Stanford, P. Scarth, A. Murray and P. Shyy attempted to<br />

apply a similar technique to that used by Megan Lewis in her 1998 study of<br />

vegetation communities, only with reference to vegetation impervious hard<br />

ground in urban areas. This study has similar goals to this thesis, but was<br />

undertaken in an Australian urban context, and does not make use of existing<br />

vector mapping and polygons. In the study by Phinn et al the authors conducted a<br />

survey of the area surrounding the city of Brisbane, in an attempt to establish an<br />

image processing method which would yield information about the urban<br />

environment. In particular they focused on vegetation –impervious surface-soil, or<br />

hard ground. They identified 60 spectral zones which they aggregated based on<br />

their location, they were able to identify distinctive zones of hard ground based on<br />

their per-pixel classification. The data used was from the Landsat5 Thematic<br />

mapper and 1:5000 scale aerial photography. Their aims were to deliminate land<br />

cover and use types, identify areas of impervious and pervious surfaces. One of<br />

the most difficult aspects they encountered was classifying non-vegetation areas<br />

into exposed soil and developed surfaces. They extracted the information along<br />

transects radiating from Brisbane’s city centre. The study noted that water bodies<br />

and vegetation were “separate in all spectral bands” (Phinn et al, 2002). The<br />

maximum separation along the cleared hard ground was along bands 3, 4 and 7.<br />

the authors concluded that the increased resolution enabled more detailed<br />

assessment of the surface.<br />

One recurring theme within these studies of spectral analysis of aerial<br />

photography (and satellite imagery) was that the success of the study is dependant<br />

on three main factors; the resolution of the photography, the correct segmentation<br />

of the target areas and an accurate knowledge of the colour bands which apply to<br />

the study. To a lesser extent it is also important to introduce methods to correct for<br />

haze, cloud cover and shade. These last three considerations are not the focus of<br />

this study; the fact that they warrant the complete focus of previously published<br />

papers indicates that it would not be possible to completely eliminate their<br />

presence. By applying some of the aspects of masking (Cordero-Sancho & Adler)<br />

and adjusting for shade (Tuominen & Pekkarinen) it might be possible to reduce<br />

the influence of these in the outcome to within an acceptable error margin for high<br />

158


flown and satellite imagery. This was not necessary in this thesis due to the clarity<br />

of the photography.<br />

5.2 Vector and polygon based studies of aerial photography<br />

I have labeled this body of knowledge of aerial (and satellite) image processing as<br />

vector and polygon based as the trend that unites the studies is the fact that the<br />

authors sought a pattern or shape based method for extracting the information. As<br />

with spectral (and hybrid spectral and spatial) methods, the underlying cause for<br />

the studies can vary, from understanding Alaskan watercourses (van der Werff &<br />

van der Meer, 2008) to examining the built environment in Moscow (Dudarev,<br />

2009). I did not make use of a particular algorithm or technique from these studies,<br />

but have considered them in this review for their potential in offering a method for<br />

sub dividing small area polygons. It should be noted that there appears to be a<br />

point where pattern analysis is less beneficial, such as the 500pixel minimum<br />

suggested by H. van der Meer and F. van der Meer.<br />

This thesis makes use of building polygons captured from the Irish peri-urban<br />

landscape. These served both as indicators as to the type of land use in the small<br />

area polygon surrounding them (and possibly a spectral control in terms of the<br />

roof tile value). In terms of pattern analysis any automated identification of<br />

modifications or new buildings should recognize the polygon outline as a building.<br />

This is a particular problem in peri-urban areas, in a rural context newly built<br />

slatted sheds etc. will conform to a standard outline but the shapes are more varied<br />

in urban areas. This is particularly so in the peri-urban Irish landscape where one<br />

off housing and a fashion for ugly looking (in terms of aerial analysis) extensions<br />

and sections of building jutting from a main structure mean that establishing a<br />

template pattern for dwellings would be difficult. Outside of considering other<br />

data (such as presence of tarred road etc.) an automated study relying on pattern<br />

identification would need a complicated signature algorithm. This was the basis<br />

for Roman Duradevs 2009 study of building polygon signature point definition. In<br />

this he was considering buildings in context of the city of Moscow, but intended<br />

the algorithm he created for use in a wider variety of data sets. The study tied in<br />

159


with the authors work for a software development company (Enterra) which is<br />

involved in developing GIS software and the algorithm was also an attempt to find<br />

a solution to identifying building signatures within the software. The paper<br />

develops a work around for identifying polygons which are consistent with<br />

buildings on a map. He describes this as an ordered point set. This could provide<br />

useful background information should spatial deviation pattern recognition ever<br />

become an option. It is not within the scope of the thesis to modify the algorithm<br />

but nevertheless Duradev’s study provides a possible alternative to the spectral<br />

deviation method of identifying additional data.<br />

The author notes that building signatures in many cases will not look “neat and<br />

beautiful” (Duradev, P.109). By this he is referring to the fact that the polygon<br />

will not conform to a standard shape which would be easily identified. It should be<br />

noted that the author acknowledges that a similar algorithm exists for the product<br />

of another software provider, ESRI (with Arc <strong>View</strong> software) and that the test of<br />

the algorithm was carried out on uncoordinated data; implying that any application<br />

of the algorithm in this thesis would involve a high level of modification for<br />

something that could be obtained from a desktop application. It is however, useful<br />

to consider the methodology for breaking down the problem (the author outlined<br />

an implementation algorithm before considering the process steps required).<br />

Duradev broke down the necessary implementation into five steps, starting with a<br />

search for a convex polygon inscribed within the shape being identified. He then<br />

suggested that if the polygon shape returned from this was bigger than the<br />

maximum (original shape) then the new polygon should be considered as the<br />

maximum, otherwise another convex polygon within the shape should be<br />

identified. This step is repeated until all the polygons have been searched at which<br />

point the centre of the polygon is searched and the result returned.<br />

Duradev then stated the mathematical steps that would be required to implement<br />

the steps he outlined; namely –take a start point (on the polygon) for the search,<br />

take the neighbor vertex in a set rotation –if this point matches the starting point,<br />

return the resulting polygon; if the point addition results in a positive vertex then it<br />

is added to the polygon, otherwise the neighbor vertex is selected again. This<br />

160


algorithm is accompanied by sample C code which can be used to test it. I believe<br />

it could be used if a method of spectral analysis of aerial photography could be<br />

developed to return sharp enough edge detail to identify the component points of<br />

these polygons. This seems unlikely in relation to the imagery (and processing<br />

techniques) that are currently available and the algorithm is probably of more use<br />

in a situation where the vector detail had already been manually captured (in<br />

which case the appropriate building code should also be present).<br />

Much of the other work involved in manipulation of polygons could be said to fall<br />

under the banner of graphic editing of GIS data (as could also be the case with<br />

Duradevs algorithm). This type of work (physically manipulating and extracting<br />

specific polygons in vector format) would be of particular significance it pattern<br />

analysis was being used in this study. If particular patterns could be identified then<br />

algorithms for clipping and determining intersections between polygons, such as<br />

the one developed by Kui Liua et al in 2007 would be a central part of the process.<br />

This would then mean the study would take polygon edges as the basis for<br />

captured data and perform calculations to construct an output polygon. As these<br />

polygons have already been manually captured, this body of work slightly less<br />

relevant. That is not to say that they would not be of central value to an automated<br />

image processing technique should it become viable.<br />

In terms of studies this concept has been the focus of a lot of effort -such as Pal &<br />

Foodys 2009 feature selection study that showed that accuracy of classification<br />

declines with additional features when using support vector machines. The fact<br />

that an underlying verifiable automatic technique for identifying change in<br />

photography and converting it to accurate vector data has not been yielded from<br />

these studies indicates that it is probably something that will always be specific to<br />

the terrain being analyzed. This work is beyond the scope of this thesis so it<br />

focused on an aspect of polygon identification that could be applied to a more<br />

general spectral analysis. One previous attempt at this is H.van der Werff and F.<br />

van der Meer’s 2008 study into shape based algorithm for identifying spectrally<br />

identical objects. In this study the authors took a look at the potential of shape<br />

signatures in aerial imagery in order to establish a means of identifying and<br />

classifying the object. They look at three broad methods for this; solely shape<br />

161


ased analysis, solely spectral based analysis and a combined “spatial-spectral<br />

classification” (H. van der Weff & F. van der Meer, P.251). These studies are<br />

slightly different from the one being undertaken in this thesis in that the shape and<br />

classification of much of the data will already be known however, the study is<br />

useful to this thesis in that it suggests the potential for a method of identifying new<br />

farm buildings based on a similar classification. The authors are seeking a method<br />

to enhance pixel-based spectral classifications (as will be used in this thesis) by<br />

adding spatial information. It is worth noting that the results of the study were not<br />

satisfactory in terms of automatically correctly identifying features.<br />

The first step the authors used to determine the shape of the areas being examined<br />

was to “seed” (H. van der Meer & F. van der Meer, P.252) the object. This<br />

involved beginning an object with a single pixel of a set value and increasing the<br />

size of the area until a spectral variance in a non-overlapping 3*3 pixel occurs.<br />

This part of the study continued until all the image pixels were segmented into<br />

objects. The authors noted that size was a factor at this point and objects of 500<br />

pixels or less were more successfully determined. The study itself was looking at<br />

parts of Alaska, and the objects being classifies were water bodies; i.e. separating<br />

streams from ox-bow lakes, thaw waters from rivers and sediment rich water. This<br />

is a difficult task due to the relative random nature of these shapes when compared<br />

to a well defined linear pattern that can be observed in the Irish landscape. The<br />

authors conclude (in the case of water bodies) “(that) an object should consist of<br />

approximately 500 pixels at minimum to be able to use the absolute value of shape<br />

measurements” (H. van der Meer & F. van der Meer, P.257).<br />

The authors created a combined analysis method by classifying shapes according<br />

to threes spectral bands from the imagery being used and comparing the results<br />

against the pixel based shape measurements. Using these results they were unable<br />

to distinguish between the water bodies being considered by the study and the<br />

authors suggest that further research is required to better combine the two (shape<br />

and spectral) classifications. In some ways this thesis is a continuation of this, in<br />

that it will be using a spectral analysis in combination with spectral signatures (in<br />

the form of previously captured and coded vector data). The aim the authors had<br />

was to established a means of measurement using an “unbiased software<br />

162


algorithm” (H. van der Meer & F. van der Meer, P.257), this would seem difficult<br />

to achieve in the case of a relatively chaotic Alaskan wilderness but might be<br />

better applied to peri-urban land parcels.<br />

Another example of a study which combines a number of different aspects of<br />

remote sensing to analyze aerial data in the 2007 random field model for urban<br />

area detection developed by Ping Zhong and Runsheng. In this study the authors<br />

presented a method for interpreting remote images of urban environments that<br />

makes use of what they call “conditional random fields” (Zhong and Wang, 2007).<br />

The study is a response to the fact that although considerable research has been<br />

completed on land cover analysis, the algorithms generally adapt for only a<br />

narrow range of image resolutions and therefore only a few types of urban areas.<br />

They see previous attempts at urban analysis as being based on either gray-level-<br />

based spectral analysis or using texture descriptors. They further note that edge<br />

strength measures can be used to extract homogenous regions. This is an<br />

interesting concept, and may have an application in the automatic capture of large<br />

utility features in rural areas, such as silage pits.<br />

The authors establish a discriminative method for identifying regions in the<br />

photography based on interactions with the neighboring regions. This allows them<br />

to utilize the conditional random fields in terms of context to identify areas. The<br />

authors broke this technique into the jobs of configuring the features, selecting<br />

classifications and classifier fusion. The proposed algorithm compares the fields<br />

against the data segments and places them in a classified segmented model; the<br />

authors compare their results against two previous algorithms, Stacked Feature<br />

Based (where a number of different feature types a re concatenated into one model)<br />

and Straight Line Statistics (where areas of high incidence are used to identify<br />

urban areas). They observed a higher output rate against the first method (based<br />

on time on a 2.4Dhz Pentium machine) and decreased accuracy in detecting<br />

smaller rural areas against the second (where straight line statistics were not<br />

effective against urban areas smaller than 400*400pixels). The method the authors<br />

use, of allowing each component part of the search to train based on “its own<br />

aspects” (Zhong & Wang, 2007) appeared to give positive results against the 60<br />

training and 91 test images used, and was able to successfully identify blocks of<br />

163


16*16 pixels as urban or nonurban. The results of the study gave 85.3% accuracy<br />

in terms of correctly identifying blocks as urban (Zhong & Wang, P.3986). The<br />

overall methodology is probably best suited to a larger study area, however it may<br />

be possible to apply the multiple conditional random fields model to a smaller<br />

scale with success.<br />

A further example of similar methodology being applied to aerial data on a large<br />

scale is the 2009 study of the Guangzhou urban area by Fenglei Fan, Yunpeng<br />

Wang, Maohui Qiu and Zhishi Wang. Although similar results to what the authors<br />

achieved in their much larger study would be an effective failure of this thesis the<br />

study indicates that it is possible to determine a lot through automatic image<br />

analysis, even with the disadvantage of poor imagery, random settlement patterns<br />

and a large test area. In the study the authors set out to examine urban growth as<br />

experienced by the people of Guangzhou (a city of 7.5 million inhabitants in the<br />

southern Chinese Guangdong province). They were limited by available imagery –<br />

their study attempted to extract urban areas from a series of images dating back to<br />

the 1970’s and some cycles were not available. They determined that fractal<br />

geometry was useful in studying the development of the city and that a “fractal<br />

dimension index is an effective index to evaluate urban form” (Fan et al, 2009).<br />

The study area covered 3178sqkm and took five separate years as sample points in<br />

time to identify a pattern in the city’s development.<br />

The data capture was completed using a maximum likelihood algorithm<br />

performed on the images. The algorithm took in seven categories to classify the<br />

imagery with; the target urban settlement, forestry, cropland, orchard, natural<br />

water, artificial water and bare land (vegetation free surface area outside the urban<br />

settlement). In order to verify the accuracy of this classification the authors took<br />

reference data captured from fieldwork and separate land use mapping and<br />

sampled the results of their study against each category in the reference. They<br />

achieved an accuracy in correct classification of over 80% using this method. The<br />

study completed segmentation of the imagery by using two transects, running<br />

from west to east, comprising nine blocks of 1306130pixels and south-west to<br />

north-east, comprising ten blocks of the same quantity of pixels.<br />

164


The Guangzhou study is useful in proving that a relatively high level of accuracy<br />

can be obtained when automatically capturing urban data over a large scale. This<br />

data was improved on by making use of a smaller area for this thesis using higher<br />

resolution imagery and additional indicators (vector and code data imported from<br />

large scale mapping).<br />

This thesis is fortunate not to have the variety in landscape patterns that previous<br />

studies have had to contend with, which meant that a high accuracy level was<br />

possible. In much of the available literature the studies are completed on a very<br />

large scale (as in the previous two papers) with very specific data in mind. They<br />

attempt to identify particular plant species or types of urban development. The<br />

methodology being used for this study may benefit from applying some of those<br />

techniques to a more stable sample. There are several advantages present in the<br />

area being targeted. The temperate nature of the Irish climate means that areas<br />

which are not developed will be covered by vegetation, so may fall into the near<br />

infrared category, while areas under development should display values consistent<br />

with earthworks or paving. At the outset of the study it was expected that most<br />

roofing would fall within a relatively small range of colors and could be used to<br />

calibrate the search. This was not the case, however, tarmac road data proved a<br />

useful replacement in terms of consistent spectral property throughout the image.<br />

The thesis looked at a very specific aspect of this body of knowledge and attempt<br />

to bridge the gap between automatic aerial data capture and traditional<br />

photogrammetric methods. It is noted that in most of the study areas the authors<br />

did not benefit from the availability of large scale coded vector data and the<br />

premise for the study was that if this is available then the accuracy of automatic<br />

capture can be increased. At the core of all of the literature mentioned in this<br />

review is the classification of imagery (with the exception of the point signature<br />

algorithm proposed by Duradev). In the course of this review I encountered one<br />

study which posed one of the same questions that are considered in this thesis; can<br />

the use of geometric information increase classification accuracies in aerial image<br />

processing? This study (Bellens et al, 2008) proposed a method of morphological<br />

profiling to improve the data capture. The authors identified “substantial<br />

improvement” (Bellens et al, P.2803). The study points out that urban areas such<br />

165


as roads and car parks are so similar spectrally that they cannot be separated by a<br />

spectral analysis alone. They further divide spectral analysis into pixel-based or<br />

object-based. The object based methods group pixels together in a meaningful<br />

way, something which the authors identify as a difficult task. It is the intention of<br />

this thesis to use the former method. The authors identify a method for<br />

automatically obtaining structuring elements to help construct the segmentation<br />

(such as solid rectangular objects, roofs etc.), allowing a shape index to help<br />

extract man-made structures from the image.<br />

One observation that can be made from the available literature on automatic aerial<br />

image processing is that even with accurately segmented imagery (such as clearly<br />

divided vegetation and urban areas) a considerable amount of work is involved in<br />

training the algorithms to classify target areas. The creation of a standard key,<br />

which can be extended by the user, became one of the main focuses of this study.<br />

This thesis enables the user to reduce the workload by presenting a method for<br />

quickly calibrating an automated search.<br />

166


6 References<br />

Geospatial Data Abstraction Library (2010) GDAL utility programs. Retrieved on<br />

18th August 2010 from: http://www.gdal.org/gdal_utilities.html<br />

Universedade do Algarve (2010) MIRONE. Retrieved on 8th July 2010 from:<br />

http://w3.ualg.pt/~jluis/mirone/<br />

PCI Geomatics (2010) Geomatica. Retrieved on 5th June 2010 from:<br />

http://www.pcigeomatics.com/index.php?option=com_content&view=article&id=<br />

5&Itemid=4<br />

<strong>Open</strong>EV (2006) Geospatial Toolkit. Retrieved on 5th June 2010 from:<br />

http://openev.sourceforge.net/<br />

Josef Kittler (1983) Image processing for remote sensing.<br />

Philosophical Times, 309, 323-335.<br />

Thomas Knudsen (2005) Pseudo natural colour aerial imagery for urban and<br />

suburban mapping.<br />

Int. Journal of Remote Sensing, Vol. 26, No.12, 2689-2698<br />

Roman Dudarev (2009) Plain Polygon Signature Point Definition Algorithm.<br />

Survey and Land Information Science 69, No2.<br />

S. Cordero-Sancho & S.A.Ader (2007) Spectral analysis and classification<br />

accuracy of coffee crops using Landsat and a topographic-environmental model<br />

International Journal of Remote Sensing Vol. 28, No. 7, 10 April 2007, 1577–1593<br />

167


H. van der Werff and F. van der Meer (2008) Shape-based classification of<br />

spectrally identical objects.<br />

ISPRS Journal of Photogrammetry & Remote Sensing 63, 251-258<br />

S. Phinn, M. Stanford, P. Scarth, A. Murray and P. Shyy (2002) Monitoring the<br />

composition of urban environments based on the vegetation–impervious surface–<br />

soil (VIS) model by sub pixel analysis techniques.<br />

International Journal of Remote Sensing, vol. 23, no. 20, 4131–4153<br />

Sakari Tuominen and Anssi Pekkarinen (2004) Local radiometric correction of<br />

digital aerial photographs for multi source forest inventory.<br />

Remote Sensing of Environment 89, 72–82<br />

Manuel A. Aguilar, Fernando J. Aguilar and Francisco Aguilera (2005) Mapping<br />

small areas using a low-cost close range Photogrammetric package with aerial<br />

photography. The Photogrammetric Record 20(112): 335–350<br />

Yuyu Zhou' and Y.Q. Wang' (2007) An Assessment of Impervious Surface Areas<br />

in<br />

Rhode Island<br />

NORTHEASTERN NATURAUST I4 {4):643-650<br />

Megan M. Lewis (1998) Numeric classification as an aid to spectral mapping of<br />

vegetation communities. Plant Ecology 136: 133–149<br />

Yong Kui Liua, Xiao Qiang Wanga, Shu Zhe Baoa, Matej Gombosib, Borut Zalik<br />

(2007) An algorithm for polygon clipping, and for determining polygon<br />

intersections and unions<br />

Computers & Geosciences 33 (2007) 589–598<br />

Rama Rao Nidamanuri Bernd Zbell (2010) A method for selecting optimal<br />

spectral resolution and comparison metric for material mapping by spectral library<br />

search Progress in Physical Geography 34(1) 47–58<br />

168


Xiaoping Liu, Xia Li, and Xiaohu Zhang (2009) Determining Class Proportions<br />

Within a Pixel Using a New Mixed-Label Analysis Method<br />

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING<br />

Mahesh Pal and Giles M. Foody (2009) Feature Selection for Classification of<br />

Hyperspectral Data by SVM<br />

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING<br />

Bingcai Zhang and Neal Olander(2000) How to get GIS Data from Imagery. ESRI<br />

user conference 2000 proceedings.<br />

Retrieved on the 2 March 2010 from:<br />

proceedings.esri.com/library/userconf/proc00/professional/papers/pap427/p427.ht<br />

m<br />

Ping Zhong and Runsheng Wang (2007) A Multiple Conditional Random Fields<br />

Ensemble Model for Urban Area Detection in Remote Sensing Optical Images.<br />

IEEE Transactions on geoscience and remote sensing, Vol. 45, No. 12<br />

Fenglei Fan, Yunpeng Wang, Maohui Qiu and Zhishi Wang. (2009) Evaluating<br />

the Temporal and Spatial Urban Expansion Patterns of Guangzhou from 1979 to<br />

2003 by Remote Sensing and GIS Methods. International Journal of Geographical<br />

Information Science, Vol. 23, No. 11, 1371–1388<br />

M. E. Martin, S. D. Newman, J. D. Aber, and R. G. Congalton (1998)<br />

Determining Forest Species Composition Using High Spectral Resolution Remote<br />

Sensing Data. Remote Sens. Environ. 65:249–254 (1998)<br />

Nelson, R. F., Latty, R. S., and Mott, G. (1985), Classifying<br />

northern forests using Thematic Mapper Simulator data.<br />

Photogramm. Eng. Remote Sens. 50:607–617.<br />

169


Shen, S. S., Badhwar, G. D., and Carnes, J. G. (1985), Separability of boreal forest<br />

species in the Lake Jennette area,<br />

Photogramm. Eng. Remote Sens. 51:1775–1783.<br />

Lathrop, R. G., Aber, J. D., Bognar, J. A., Ollinger, S. V., Casset, S., and Ellis, J.<br />

M. (1994), GIS development to support regional simulation modeling of<br />

northeastern (USA) forest Analysis (W. Michener, J. W. Brunt, and S. Stafford,<br />

Eds.), Skidmore, A. K. (1989), An expert system classifies eucalypt<br />

Taylor and Francis, London, pp. 431–451.<br />

Rik Bellens, Sidharta Gautama, Leyden Martinez-Fonte, Wilfried Philips,<br />

Jonathan Cheung-Wai Chan, and Frank Canters (2008) Improved Classification of<br />

VHR Images of Urban Areas Using Directional Morphological Profiles<br />

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46,<br />

NO. 10.<br />

170

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!