Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Automated Aerial Image Analysis using Ordnance<br />
Survey Vector Data<br />
Brian Sexton<br />
Master of Science<br />
NUI Galway<br />
Department of Information Technology<br />
August 2010<br />
Dr. James Duggan<br />
Dr. Sam Redfern
Certificate of Authorship<br />
i
Contents<br />
ii<br />
Page<br />
Certificate of Authorship .......................................................................................i<br />
Contents ..................................................................................................................ii<br />
List of Tables ........................................................................................................ iii<br />
List of Figures........................................................................................................iv<br />
Abstract..................................................................................................................vi<br />
1 Project Outline...................................................................................................1<br />
1.1 Project Overview .......................................................................................1<br />
1.2 General Introduction and Background.......................................................8<br />
2 Stepping through the Algorithm....................................................................18<br />
2.1 Initial Inputs.............................................................................................23<br />
2.2 Area Extraction........................................................................................27<br />
2.3 Spectral Value Comparison .....................................................................29<br />
2.4 Confirmation............................................................................................32<br />
3 Sampling for the Baseline Image Key ...........................................................34<br />
3.1 Roads .......................................................................................................35<br />
3.2 Water........................................................................................................42<br />
3.3 Marsh .......................................................................................................49<br />
3.4 Coniferous Forestry .................................................................................55<br />
3.5 Mixed Forestry.........................................................................................61<br />
3.6 Track ........................................................................................................66<br />
3.7 Shade........................................................................................................72<br />
3.8 Roof Areas ...............................................................................................78<br />
3.9 Pasture......................................................................................................86<br />
3.10 Rough Pasture..........................................................................................92<br />
4 Testing ..............................................................................................................98<br />
4.1 Pasture Test..............................................................................................99<br />
4.2 Rough Pasture Test ................................................................................109<br />
4.3 Marsh Test .............................................................................................119<br />
4.4 Bog Test.................................................................................................132<br />
4.5 Conclusion .............................................................................................146<br />
5 Literature Review..........................................................................................148<br />
5.1 Spectral and image considerations for the thesis...................................152<br />
5.2 Vector and polygon based studies of aerial photography......................159<br />
6 References ......................................................................................................167
List of Tables<br />
iii<br />
Page<br />
Table 1: Road sample values .................................................................................37<br />
Table 2: Road test sample value 1 .........................................................................39<br />
Table 3: Road test sample value 2 .........................................................................40<br />
Table 4: Road test sample value 3 .........................................................................41<br />
Table 5: Water sample values ................................................................................43<br />
Table 6: Water test sample values .........................................................................45<br />
Table 7: Marsh sample values................................................................................50<br />
Table 8: Marsh test sample values .........................................................................52<br />
Table 9: Coniferous forestry sample values...........................................................56<br />
Table 10: Coniferous forestry test sample values ..................................................58<br />
Table 11: Mixed forestry sample values................................................................62<br />
Table 12: Mixed forestry test sample values .........................................................63<br />
Table 13: Track sample values...............................................................................67<br />
Table 14: Track test sample values........................................................................69<br />
Table 15: Shade sample values ..............................................................................73<br />
Table 16: Shade test sample value 1 ......................................................................74<br />
Table 17: Shade test sample value 2 ......................................................................76<br />
Table 18: Shade test sample value 3 ......................................................................77<br />
Table 19: Roof pixel sample values.......................................................................81<br />
Table 20: Roof test sample value 1........................................................................84<br />
Table 21: Roof test sample value 2........................................................................84<br />
Table 22: Roof test sample value 3........................................................................85<br />
Table 23: Pasture sample values ............................................................................87<br />
Table 24: Pasture test sample values......................................................................89<br />
Table 25: Rough pasture sample values.................................................................93<br />
Table 26: Rough pasture test sample values ..........................................................95
List of Figures<br />
iv<br />
Page<br />
Figure 1: Aerial view of sample area.....................................................................34<br />
Figure 2: Road area and surrounding detail...........................................................35<br />
Figure 3: Road area and vector data.......................................................................36<br />
Figure 4: Typical Water Area Image .....................................................................42<br />
Figure 5: Water Area Image Modification.............................................................44<br />
Figure 6: Sample area as a mosaic of polygons.....................................................47<br />
Figure 7: Typical Marsh Area Image.....................................................................49<br />
Figure 8: Typical Mixed Forestry Area Image ......................................................61<br />
Figure 9: Typical Track Area Image......................................................................66<br />
Figure 10: Typical Shade Area Image ...................................................................72<br />
Figure 11: Histogram for Shade and Pasture .........................................................75<br />
Figure 12: Typical Roof Value Area Image...........................................................78<br />
Figure 13: Distribution of Buildings/Roofs in the Sample ....................................80<br />
Figure 14: Blue colour band pixel count for study area.........................................82<br />
Figure 15: Typical Pasture Area Image .................................................................86<br />
Figure 16: Typical Rough Pasture Area Image......................................................92<br />
Figure 17: Creating the ASCII file.......................................................................100<br />
Figure 18: Aerial view of pasture test 1...............................................................100<br />
Figure 19: Red colour band for pasture test 1......................................................101<br />
Figure 20: Green colour band for pasture test 1 ..................................................102<br />
Figure 21: Aerial view of pasture test 2...............................................................102<br />
Figure 22: Red colour band for pasture test 2......................................................103<br />
Figure 23: Green colour band for pasture test 2 ..................................................104<br />
Figure 24: Aerial view of pasture test 3...............................................................104<br />
Figure 25: Red colour band for pasture test 3......................................................105<br />
Figure 26: Green colour band for pasture test 3 ..................................................105<br />
Figure 27: Vector data for pasture test 4..............................................................106<br />
Figure 28: Aerial view of pasture test 4...............................................................107<br />
Figure 29: Red colour band for pasture test 4......................................................107<br />
Figure 30: Green colour band for pasture test 4 ..................................................108<br />
Figure 31: Vector data for rough pasture test 1 ...................................................110<br />
Figure 32: Aerial view of rough pasture test 1.....................................................110<br />
Figure 33: Red colour band for rough pasture test 1 ...........................................111<br />
Figure 34: Green colour band for rough pasture test 1 ........................................112<br />
Figure 35: Aerial view of rough pasture test 2.....................................................112<br />
Figure 36: Red colour band for rough pasture test 2 ...........................................113<br />
Figure 37: Green colour band for rough pasture test 2 ........................................114<br />
Figure 38: Aerial view of rough pasture test 3.....................................................114<br />
Figure 39: Red colour band for rough pasture test 3 ...........................................115<br />
Figure 40: Green colour band for rough pasture test 3 ........................................115<br />
Figure 41: Vector data for rough pasture test 4 ...................................................116<br />
Figure 42: Aerial view of rough pasture test 4.....................................................117<br />
Figure 43: Red colour band for rough pasture test 4 ...........................................117<br />
Figure 44: Green colour band for rough pasture test 4 ........................................118<br />
Figure 45: Vector data for marsh test 1 ...............................................................120<br />
Figure 46: Aerial view of marsh test 1.................................................................121
Figure 47: Red colour band for marsh test 1........................................................122<br />
Figure 48: Green colour band marsh test 1..........................................................122<br />
Figure 49: Blue colour band for marsh test 1.......................................................123<br />
Figure 50: Aerial view of marsh test 2.................................................................124<br />
Figure 51: Red colour band for marsh test 2........................................................125<br />
Figure 52: Green colour band marsh test 2..........................................................125<br />
Figure 53: Blue colour band for marsh test 2.......................................................126<br />
Figure 54: Aerial view of marsh test 3.................................................................127<br />
Figure 55: Red colour band for marsh test 3........................................................128<br />
Figure 56: Blue colour band for marsh test 3.......................................................129<br />
Figure 57: Aerial view of marsh test 4.................................................................130<br />
Figure 58: Red colour band for marsh test 4........................................................130<br />
Figure 59: Blue colour band for marsh test 4.......................................................131<br />
Figure 60: Vector data for bog test 1 ...................................................................132<br />
Figure 61: Aerial view for bog test 1 ...................................................................133<br />
Figure 62: Red colour band for bog test 1 ...........................................................134<br />
Figure 63: Green colour band for bog test 1 ........................................................134<br />
Figure 64: Blue colour band for bog test 1 ..........................................................135<br />
Figure 65: Vector data for bog test 2 ...................................................................136<br />
Figure 66: Aerial view for bog test 2 ...................................................................137<br />
Figure 67: Red colour band for bog test 2 ...........................................................137<br />
Figure 68: Green colour band for bog test 2 ........................................................138<br />
Figure 69: Blue colour band for bog test 2 ..........................................................139<br />
Figure 70: Aerial view for bog test 3 ...................................................................140<br />
Figure 71: Red colour band for bog test 3 ...........................................................141<br />
Figure 72: Green colour band for bog test 3 ........................................................141<br />
Figure 73: Blue colour band for bog test 3 ..........................................................142<br />
Figure 74: Aerial view for bog test 4 ...................................................................143<br />
Figure 75: Red colour band for bog test 4 ...........................................................143<br />
Figure 76: Green colour band for bog test 4 ........................................................144<br />
Figure 77: Blue colour band for bog test 4 ..........................................................145<br />
v
Abstract<br />
This study sets out an algorithm for the automatic analysis of controlled (flattened)<br />
aerial photography using ordnance survey vector data. It uses the vector to clip the<br />
aerial image into a set of small area polygons which are then analyzed for the<br />
spectral properties and classified according to the result. The study tests sections<br />
of aerial photography from a sample area in County Galway for specific spectral<br />
properties. This was to identify the type of ground cover and was achieved using<br />
an image key of spectral properties which was developed during the study. This is<br />
called training the image key. A testing section shows that it is possible to derive<br />
information about the land use type from these areas based on the range of values<br />
returned from a pixel count of spectral properties within a small area polygon.<br />
The study uses several open source software frameworks to complete the<br />
experiment, most notably the MATLAB based Mirone application, but can be<br />
extended to any software capable of handling irregular polygons in a projection<br />
system. The body of the study is set out in three chapters, the first detailing the<br />
process, the second detailing the sampling for unique values and the third details<br />
the testing which took place. Chapters 3 and 4 are sub divided into sections<br />
describing the research on specific land use types and their spectral signatures.<br />
Chopping an aerial image into a mosaic of (relatively) homogenous values, e.g.<br />
pasture, forestry, marsh etc., increases the accuracy of automated analysis. This is<br />
the first time that a spectral analysis has been attempted using ordnance survey<br />
Ireland small area polygons to clip the image. It is of interest to researchers,<br />
planners, developers etc., looking to simplify and automate this type of search<br />
over a large region.<br />
Note: Contact brian.sexton@osi.ie for a set of sample files for training the image<br />
key.<br />
vi
1 Project Outline<br />
1.1 Project Overview<br />
The following study is an attempt to devise an automatic method of analyzing<br />
aerial photography based on vector data. It presents an algorithm and calibration<br />
data for someone seeking to complete a search of aerial imagery based on a<br />
spectral signature. The premise on which the work was undertaken was that, given<br />
the small area polygons and coding data present in ordnance survey data, it should<br />
be possible to cut a controlled aerial photograph into a mosaic of sections and<br />
automatically identify the type of ground cover.<br />
The goal of the study was to identify a series of steps with which this could be<br />
completed. These steps are intended as a template for either a standalone<br />
application which could run searches over large geographical areas, or as a means<br />
of achieving the result for smaller areas using existing open source software<br />
libraries. This document is aimed at people seeking to develop a generic tool for<br />
completing a spectral analysis of aerial photography, or for anyone looking to<br />
execute a search of the Irish landscape for data which exhibits a distinct spectral<br />
value (crop disease, impermeable surface area, flooding etc.). The study differs<br />
from other methods of automatic image processing in that it takes existing<br />
analysis (in the form of Ordnance Survey vector data) and uses it to convert the<br />
spectral data into manageable sections. This summary presents a chronological<br />
overview of the work completed, and outlines the process used.<br />
The basis of this study is the clipping of raster data for spectral analysis, and the<br />
intention is to prove that this makes the process of image analysis easier.<br />
Traditional approaches can refine the analysis itself to reveal more about the<br />
region of interest than is completed here. For example, a lidar (aerial laser imaging)<br />
survey could provide researchers with data relating to the height of a tree canopy<br />
or the depth of peat in a bog. This study, while it does identify specific sets of<br />
values relating to land use types, focuses on identifying an easily replicated<br />
process for automatically obtaining data from aerial imagery.<br />
1
The process was designed so that it can be coded into a standalone process and has<br />
been tested using open source software. In this way the study is aimed at<br />
simplifying what can be expensive and time consuming into a series of steps that<br />
someone without a high level of training in either mapping or computer science<br />
could run.<br />
One of the difficulties presented by attempting to find data from imagery is<br />
identifying target areas within a region of interest. This is compounded by the<br />
nature of values returned by an aerial image –the clusters of pixels with similar<br />
values are often not bounded by clean borders and often display a gradual gradient<br />
of values when merging with another cluster. In other words to automatically<br />
determine the true values on the ground a program needs to know the extent of the<br />
set of data that the clusters sit in. One analogy might be tables in a relational<br />
database –by dividing the total set of pixels for a region of interest into discrete<br />
parcels of land reflected in the photograph the program has a database of tables to<br />
query for specific values. This study takes the vector data and uses it to create a<br />
mosaic of separate pixel groups for analysis. This in itself, however, still leaves a<br />
huge body of data to be analyzed. To further improve an analysis the parcels<br />
within the mosaic are classified according to known values taken from the vector<br />
coding, and these known values are then used to train an image key, which in turn<br />
allows the remaining parcels to be analyzed. It is this clipping process which is at<br />
the core of this study. This provides a means of accessing the raster data which<br />
readers can easily replicate and automate for their own purpose.<br />
Vector data has been used to target and control aerial image analysis in previous<br />
studies with a degree of success. The studies often involve additional user input to<br />
refine the region of interest so that automated analysis techniques like multivariate<br />
analysis of variance can be applied to the image. This process of refining the<br />
region requires a level of technical expertise which could make a spectral analysis<br />
of aerial imagery too time consuming for many users. An example of one of these<br />
processes is contained in the 2007 assessment of impermeable surface area by<br />
Yuyu Zhou and Y.Q. Wang (Zhou & Wang, 2007). The authors determined that<br />
segmentation would be the most important part of the study and applied an<br />
algorithm of multiple-agent segmentation and classification. This involved<br />
importing transportation data to create buffers along the major roads at varying<br />
2
distances from the centre to divide the imagery. This process allowed the authors<br />
to determine the nature of ground cover with a high degree of accuracy, revealed<br />
by random point sampling. A description of similar studies and methods is<br />
contained in the literature review at the end of this paper. The type of preparation<br />
for the analysis required to determine the appropriate extent of image segments<br />
required for similar studies can be difficult to automatically include in a study. For<br />
example, the knowledge of the availability and accuracy of transport data and how<br />
to use it to segment the photography such as in the previously mentioned study<br />
(Zhou & Wang, 2007). An alternative to this approach is to use vector data of a<br />
known accuracy for the image segmentation. This approach is something which<br />
depends on the availability of vector data and confines the focus of this study to an<br />
Irish context.<br />
One of the benefits of applying an algorithm, which involves automatically<br />
segmenting an aerial image into small area parcels, is that it creates a platform on<br />
which further image analysis can take place. This study takes a set of ASCII<br />
coordinates from the vector data and clips the imagery. These parcels are then<br />
classified according to their land use type. A user could then use these<br />
classifications to target specific sets for interpretation. For example, determining<br />
the type of growth present in marsh areas by using the set of polygons for marsh<br />
returned to identifying the percentage of the pixel count corresponding to the<br />
expected value for the growth<br />
There are a couple of pre-requisite data sets for running this type of analysis and a<br />
section from both of these sets (just west of Oughterard, Co.Galway) was used for<br />
this study. The first requirement is digital vector data from the Ordnance Survey<br />
and the second is colour (RGB) photography stores in GeoTiff format and<br />
projected using Irish Transverse Mercator (to match the vector data). It is possible<br />
to automatically re-project the imagery using GDAL_transform (GDAL, nd.)<br />
given other projections, but this was not used in this study. The software<br />
requirements are:<br />
• Something which can manipulate and interrogate vector data files.<br />
3
• Something capable of handling irregular polygons within a coordinate<br />
system.<br />
• Software capable of analyzing pixel values.<br />
In this study a commercial package called Radius vision was used to export the<br />
coordinate set for each small area polygon. The processing of the image polygons<br />
was completed using the Mirone MATLAB based framework tool developed in<br />
the University of the Algarve by Joaquim Luis (Mirone, 2009). The histogram<br />
values for the segmented image sections was obtained using PCI Geomaticas<br />
geomatica package. For each polygon tested in the study the following steps were<br />
taken, which are the basis for the proposed algorithm:<br />
• Extract the point data which surrounds the polygon(s) within the region of<br />
interest.<br />
• Import the point data into a software procedure to clip the aerial imagery<br />
and save the segmented image in GeoTiff format.<br />
• Run a histogram analysis for the image segment.<br />
• Run comparison procedures to classify the segment.<br />
The point data mentioned above refers to controlled data points which indicate<br />
fixed x and y positions on the ground and can be used to analyze the imagery.<br />
Attached to those points are vectors and coding which, for much of the image,<br />
indicate the type of land use present, e.g. forestry, water, buildings, etc. The<br />
comparison procedures mentioned above refer to values obtained during a<br />
sampling process conducted in the early part of this study. This sampling first took<br />
sections from known polygons and recorded the spectral values for these samples<br />
in order to calibrate an image key. Samples for parcels of types not coded into the<br />
vector data were then taken and the percentage variance between the two sets was<br />
recorded.<br />
The sampling for the study was completed on ten separate types of land use. Five<br />
of these types were identified from the vector data, while the remaining types<br />
were identified using the techniques developed in this study. Separate sections of<br />
4
the image were extracted for the analysis, ranging from three to ten for each area<br />
type. The samples were uniform clear examples of each type of terrain, clear of<br />
any biasing factors such as shade or overhanging vegetation so as to obtain clear<br />
baseline data. The samples were extracted as GeoTiff images and analyzed for<br />
their spectral qualities. A full description and tables for each sample are contained<br />
in chapter 3, which is divided into sections according to the land use type for easy<br />
reference.<br />
In general terms the results were what might be expected; large bodies of water<br />
(e.g. river, lake) produced a clearly identifiable signature while mixed forestry and<br />
rough pasture areas had a higher level of standard deviation than more uniform<br />
cover. The areas sampled fell under the categories of; roofs, roads, water, marsh,<br />
rough pasture, mixed forestry, pasture, track and shade. Although shade is not a<br />
distinct area, values for shade (manually identified from the imagery and clipped<br />
for analysis) were used to calibrate the image key so that these could be<br />
recognised when found in polygons identified by the vector data. In a similar way<br />
values taken as representative for spectral qualities present in roadways, for<br />
example, did not include overhanging tree cover which is present in polygons<br />
extracted based on the ground revised vector data.<br />
The aim of this part of the study was to identify a series of proportional values<br />
which could be used to indicate the presence of a land use type for an unknown<br />
polygon –for example, a mean red and green pixel values for the known areas of<br />
water was identified as 30 and 45% that of pasture; something which an automatic<br />
search could use to flag an area being used as pasture. This sampling was not<br />
intended to be comprehensive in terms of creating a key for use in every possible<br />
automated image search, but was undertaken to prove the potential for automated<br />
image processing based on segmenting the images using small area polygons.<br />
One surprising result from this sampling was in the values returned for roof<br />
polygons. These polygons did contain two identifiable ranges of values associated<br />
with the pitch in the roof, where the angle of the light created shade on one side,<br />
which might facilitate a process to determine the angle of the pitch. At the<br />
beginning of the study I had believed that these roof values would provide enough<br />
5
of a control to calibrate most of the image. However, the variation in the ranges of<br />
values from shade to light on the angled surfaces made roof values an unreliable<br />
source of control value for the study. Of the known values samples, the most<br />
useful in terms of providing a consistent control to base comparative procedures<br />
were water, roads and coniferous forestry. Of the unknown (in terms of being<br />
automatically identified from vector data) pasture and bog had the most distinct<br />
sets of values. The next phase of the study involved testing the algorithm against<br />
these identified spectral values to see if the irregular polygons (with internal<br />
distorting factors) matched the range expected from the sampling.<br />
The testing process followed the outline for the algorithm. Polygons were<br />
extracted from the vector data in the form of a set of coordinate points saved in an<br />
ASCII file which were then used to create a clipping path to cut the relevant<br />
section from the aerial image, which was then saved in GeoTiff format. This file<br />
was then analyzed for its spectral content and the resulting range of pixel values<br />
was compared to those expected for the land use type.<br />
The testing focused on sets of known polygon types for three typical areas (not<br />
coded to the vector data); pasture, marsh, bog and rough pasture. It should be<br />
noted that this testing section of the study represented an execution of the<br />
algorithm but the level of automation can be improved when the vector data is<br />
made available in GML format. Coordinate sets for multiple polygons can be<br />
extracted in one file with GML format, something which is expected in the next<br />
two years.<br />
The areas analyzed were polygons containing marsh, bog, pasture and rough<br />
pasture. Of these, pasture and bog produced the most distinctive spectral traits and<br />
matched expected values, allowing for any comparative procedure to<br />
automatically classify them. Both the marsh and rough pasture sets of samples<br />
contained high levels of deviation from the mean pixel value across the red and<br />
green colour bands with a similar range of values. However, these can be<br />
distinguished by a trough between values corresponding to shade and vegetation<br />
present in all of the red colour band values for rough pasture. A full description of<br />
testing can be found in chapter 4 of this study.<br />
6
The results of this study point to the value of accessing the spectral values<br />
contained in aerial imagery through ordnance survey vector data. In almost every<br />
land use tested the polygons returned a consistent pixel count for the type. It is<br />
important to qualify these results by noting that the values are based on an<br />
analysis of the red, green and blue colour bands (so restricted to colour imagery)<br />
and the process relies on the vector data. It does, however, present a relatively<br />
simple means for completing an analysis of aerial imagery. This in turn opens up<br />
the possibility of coding a standalone application for analyzing and comparing<br />
polygons within a region of interest. The procedure takes the form of a series of<br />
loops designed to eliminate known values. Once the known values, followed by<br />
the derived values have been eliminated, the user is left with a relatively small set<br />
of polygons to examine and can apply a key which has been further trained for the<br />
specific study. The thesis layout begins with a description of the background to<br />
the study followed by three chapters.<br />
7
1.2 General Introduction and Background<br />
An overview of the work of this study can be found in the executive summary;<br />
this section is intended to provide background information and explain some of<br />
the terms used. The study was written with the intention of making it easy for<br />
someone to access the part of the study relevant to them and then make use of any<br />
techniques identified. For example, if your intention is to identify the percentage<br />
of bog in your region of interest; read the sampling section on bog, followed by<br />
the testing and then read how to apply the algorithm (chapter 2).<br />
I think it might be helpful if I first explained my interest and motivation for this<br />
work. For the past decade I have been involved in the photogrammetric capture of<br />
the vector data used in this study and know that this type of surveying is difficult<br />
and can be extremely tedious, but I believe it does present a template for<br />
automatic capture of additional data from aerial imagery. A brief description of<br />
the nature of this surveying can be found at the end of this section. I believe that it<br />
should be possible with a robust key of spectral data for known polygons (parcel<br />
of surface area enclosed by controlled boundaries such as walls/ fences/ roads etc.)<br />
to automatically search the data for specific values. In other words someone with<br />
little knowledge of mapping or software could select a region of interest and<br />
search for a particular value from either a selection from the imagery or<br />
coordinates imported form a portable GPS device. This could take the form of a<br />
standalone application or through various freely available software packages. This<br />
study focuses on the use of open source software but suggests areas where a<br />
specialized tool could be developed. In general terms, a search of aerial imagery<br />
requires specialized tools and knowledge to access the information contained in<br />
the data (such as the spread of crop disease, level of impermeable surface area etc.)<br />
and can be a time consuming process. This study is an attempt to automate that<br />
kind of search using small area polygons; something which is unique in its<br />
approach.<br />
The study itself involves a mixture of computer science and mapping. As more<br />
and more of the surface of the earth becomes digitally captured and analyzed,<br />
8
these two fields will by necessity start to merge. This study looks at one small<br />
aspect of mapping and how automated software could be used to increase the<br />
amount of information available to a user. The premise of the study is that, given<br />
enough previously captured and accurate data, it is possible to automatically read<br />
the landscape. In short I hope to take point and line data, slice sections from aerial<br />
photography, and run a spectral analysis. The focus of the study is in the<br />
methodology so that a means for completing automated updating is identified. I<br />
should point out that by updating I am referring to providing information relating<br />
to the percentages of ground cover within a small area polygon. The task of<br />
physically capturing new structures, roads, height values (even when considering<br />
lidar) is probably something that will always require a human eye to interpret the<br />
data to some degree, for example, if the structure is temporary or if road works are<br />
underway.<br />
It might be useful at this point to introduce some more background to this study.<br />
The island of Ireland was fully digitally mapped in 2005 and recently a new<br />
database for this data has been introduced which allows small area polygons to<br />
retain unique identifiers linked to the surrounding geometry and features. The<br />
mapping is on an update cycle but for the most part it can be assumed that these<br />
polygons will remain constant (with the percentage change even lower following<br />
the building boom of the last decade). This opens up the opportunity for someone<br />
to visit the sections of surface area represented by the area polygons over<br />
successive runs of aerial photography and extract land use change data.<br />
I mentioned earlier that the focus of the study is on establishing a methodology;<br />
this was because the motivation for assessing land use change would vary<br />
according to the user. An example of this might be someone considering the<br />
potential for flooding within an area. This person might want to take a look at the<br />
water courses and new housing developments to determine the amount of<br />
impermeable surface area (paving, patios etc.) over the course of their study. To<br />
physically do this, either using a photogrammetry tool such as SOCKET SET or<br />
field GPS, would be an onerous task. The aim of this thesis is to provide a method<br />
for automatically doing this. It might seem an obvious point, but the more<br />
information available to a process when commencing an examination of an area,<br />
9
the higher the chances of successful data being returned. This is where this study<br />
differs from previous attempts at automated data capture. The process being<br />
suggested takes a large amount of previously captured and verified data to aid the<br />
algorithm. By this I mean most sections of the aerial imagery are extracted based<br />
on definite boundaries such as walls, streams, buildings etc. Internal polygons<br />
within the target area such as water, buildings, roads, forestry are also identified<br />
and used to aid the search. In this way the study is entirely dependent on<br />
previously surveyed data. This is something that has not been attempted before<br />
with Irish data. I did not find any similar study from overseas over the course of<br />
my research. I hope to prove to you that this is something that is possible to do<br />
and implement, using sample software.<br />
The software used in the study comes from several open source projects, and also<br />
from one commercial vector data manipulation package (Radius). These are all<br />
packages which could be considered to be generic tools. It is important to note that<br />
this is not in reference to their capabilities or any slight on the people who develop<br />
them but in that the functions being accessed are common to several similar<br />
software packages. For example, the ASCII coordinate files created using Radius<br />
could equally have been achieved using Arc<strong>View</strong> or Microstation (among others).<br />
The intention was to keep the algorithm as flexible as possible so that users could<br />
adapt it to their available resources.<br />
Some of the primary software tools being used in this study come from the GDAL<br />
(geospatial data abstraction) library. In particular, its facility for writing raster<br />
geospatial data format is used to manipulate the geoTiff files containing the aerial<br />
imagery being used in the study. GDAL came about as a project sponsored by the<br />
open source geospatial foundation, which is a non profit, non-governmental<br />
organization set up to support the development of open source geospatial software.<br />
The foundation also supports projects like geotools, grassGis, mapbender and<br />
mapgrade open source mapserver among others. In this study the GDAL library is<br />
accessed using another open source software library known as <strong>Open</strong>EV. This<br />
allows the GDAL library to be presented within an application for displaying and<br />
analyzing the data. As with GDAL, it is implemented in C but has the potential for<br />
manipulation with Python. In this study the processes will be run on a Windows<br />
10
platform and used as a means of accessing the raster data from the geoTiff<br />
imagery. It is necessary to access GDAL in order to open the geoTiff using the<br />
appropriate ITM coordinates for the search area point set being searched. This<br />
choice of access to the raster data should not suggest that GDAL and <strong>Open</strong>EV are<br />
unique. Similar software exists that could also have been used in the study.<br />
Another example is the ImageTool open source software library. This was<br />
developed in the 90’s in the department of dental diagnostic science in the<br />
University of Texas, and written using C++. GDAL and <strong>Open</strong>EV were chosen in<br />
preference because the data returned could be more easily modified to striate<br />
histogram and statistical data with these libraries due to the larger body of work<br />
contained within them. One of the most important software considerations was<br />
flexibility (and extensibility), as ideally the users of any suggested methodology<br />
would modify the process to suit their particular study. As was mentioned above,<br />
the preferred option was to build a top down processing tool tailored to the<br />
methodology which would accept methods from other libraries to create plug in/<br />
additional functions. For this study <strong>Open</strong>EV is substituted for that purpose.<br />
Outside of the basic software, two other core components exist. These are the data<br />
relating to the specific polygons being extracted and the key to represent the<br />
colour values being studied. As with most aerial image analysis studies, defining<br />
and validating this key forms the major part of the work involved. The process is<br />
aided by the availability of data which could be classed as controlled, that is the<br />
knowledge of areas of roof and water, which can be used to reference the other<br />
values in terms of their deviation from these known values.<br />
As mentioned at the start of this general introduction, a commercial geographic<br />
software package was used to extract the coordinates from the vector data (a vital<br />
first step in the algorithm). This particular software was not chosen for any<br />
specific capabilities other than the fact that I already had a licence and wanted to<br />
focus on proving the premise of the study (as opposed to executing the function<br />
over a specially tailored package). There are a number of commercially available<br />
image processing packages which have relevance to this study. One of these,<br />
SOCKET SET, could potentially assist the study. This software is a<br />
photogrammetry package developed by BAE systems for working on aerial<br />
photography. It allows the user to capture three dimensional data points from<br />
11
overlaid aerial imagery. This is due to a system of triangulation based on the<br />
position of the cameras when the photograph was taken. As much of the data used<br />
to extract the sections of aerial photography being analyzed in this study was<br />
captured using this process it is possible that the study could be completed during<br />
this point in data capture. I am referring here to map production, and the stage at<br />
which line and point data are taken from remote imagery. At this stage in map<br />
production it is possible that an-add on process linked to the photogrammetric<br />
software would allow the user to run an analysis of the polygon at the moment it is<br />
fully captured and coded, which in turn would mean that the area marker and<br />
associated polylines could then be given added data of value to the end user<br />
(spectral content of the polygon/ percentage land cover/ rough pasture/<br />
impermeable surface area etc.). I decided against investigating this further for two<br />
reasons. Firstly it would have involved a difficult and time consuming<br />
collaboration with the software provider. Secondly, and more importantly, this<br />
country has been fully digitally mapped and photogrammetric work is now<br />
confined to update only. This means that the application of any algorithm at the<br />
photogrammetric/ data capture stage would be confined to small, mostly urban<br />
areas.<br />
It is difficult to discuss the manipulation of spatial data without reference to the<br />
ArcGIS package of software created by ESRI. This is widely used in both<br />
commercial and educational entities for interpreting spatial data. In terms of this<br />
study the Arc<strong>View</strong> package within ArcGIS could have been utilized for image<br />
processing once the input files were converted to shapefile format. The limitations<br />
imposed by having to obtain a licence for the software (outside of trial/<br />
educational versions) precluded the use of this package. This is not to indicate that<br />
the software would not have been a useful tool for manipulating the imagery in the<br />
study, only that it was not practical at the time of the study..The value of this study<br />
is in allowing new data to be obtained and added to that originally obtained from<br />
an analysis of aerial imagery. While the study identifies one means of doing this,<br />
in terms of software and operating platform ( i.e. executing the process using parts<br />
of the GDAL library and ASCI input files), there are potentially numerous other<br />
software processes that could apply. The algorithm, however, is intended to<br />
remain as independent of software considerations as possible; being constrained<br />
12
only by the quality of the remote imagery and the accuracy of the captured data<br />
points and associated coding.<br />
Looking briefly at some of the other commercially available desktop GIS software,<br />
it is intended that the methodologies suggested could be applied to these products.<br />
However, due to the limitations involved in both learning to use the software and<br />
licensing issues this application of the study was not explored. These products<br />
include AutoDesk, Microstation, the ESRI Arc<strong>View</strong> product mentioned in the<br />
previous paragraph, IDRISI and MapInfo among others, such as the 1Spatial<br />
Radius platform used to edit the geometric input data being used in this study. All<br />
of the above products are useful in the case of updating and editing, that is to say –<br />
dealing with change. This study looks at read only data and could be described as<br />
a way of interpreting already captured data. One result of this is that the functions<br />
required to store and update change polygons and data values are not needed in<br />
the proposed algorithm. The ability to connect the statistical data to the unique<br />
identifier for the polygon should be enough to allow it to be input as an attribute<br />
by the spatial database management system. Outside of analysis the GIS software<br />
requirements relate only to coordinate transformation. As a result, while storage<br />
(of the statistics) remains a consideration, the necessity for creating, editing or<br />
updating (moving points etc.) do not form part of the requirements for the study.<br />
Even though a specific analysis tool (in terms of a standalone executable) is not<br />
presented in this study it is possible to create one and add to the existing body of<br />
open source work. <strong>Open</strong>EV, for example, allows for the addition of newly created<br />
functions using a Python compiler. This programming language allows a user to<br />
interface with GIS applications written in C and has the potential to be a flexible<br />
means of accessing C libraries (e.g., accessing GDAL from <strong>Open</strong>EV). It has been<br />
used to compile different software libraries such as GeoDjango, Thuban, <strong>Open</strong>EV,<br />
pyTerra, and AVPython. The language itself is not used in this thesis as it added<br />
another layer to the process but provides a possible means of packaging an<br />
extended experiment. One of the advantages of using Python as a programming<br />
language in preparing a GIS application is that the assignment of a variable does<br />
not have to indicate whether it is declaring a string, number, list etc. The variables,<br />
however, are case sensitive and follow the ESRI using a combination of lower and<br />
13
upper case, beginning with lower case. The acronym for the variable is at the<br />
beginning of the name, while the descriptive part follows, beginning with an upper<br />
case (Eg. htElev). In addition to modules such as math and string Python also has<br />
several geoprocessing modules. One of these, arcgisscripting, accesses all the arc<br />
toolbox tools. It should be noted that the geoprocessing object that might be being<br />
called is accessed differently, depending on the version of arcGis being used.<br />
Another package called gdal (which accesses the spatial data abstraction library<br />
being utilized in this thesis) allows for the manipulation of this library. In this<br />
Python module the language connects to the original gdal programming language<br />
(usually C or an object oriented variation) using a SWIG interface compiler. An<br />
example of Python in use in GID can be its application alongside ArcGis; ArcGis<br />
was built using hundreds of arc objects such as “featureClass”, “symbol”, “field”<br />
etc., each of which has properties and methods accessed by Python using dot<br />
notation. An example of this notation is the assignment of a variable name tr =<br />
arcGisScripting.create().<br />
Another possible method for implementing the process suggested in this study is<br />
through the .NET platform. In particular VB.NET provides a programming<br />
language that can be utilized to access GeoMedia software –which is a .NET<br />
oriented group of geographic software packages provided by Intergraph. This<br />
software allows the user to interact with ESRI shapefiles and also with spatial<br />
databases created using Oracle Spatial. It also allows for developed tools to tie in<br />
with graphical editing platforms such as Autocad and Microstation which would<br />
allow for interpretation of both images and associated geospatial data. This would<br />
also mean (given appropriate licence) that the developer could access specific<br />
ancillary products for database management (for databases based in Oracle) and<br />
others ranging from map production to 3D modeling. This programming language<br />
(VB.NET) was not used in this thesis because of the potential licencing issues<br />
which may have been involved.<br />
This thesis suggests procedures which lend themselves to C as they involve<br />
repeated loops of steps, from the pixel analysis to the classification of the<br />
polygons identified by the user through the region of interest. These procedures<br />
involve processor heavy analysis and lend themselves to being developed in C. In<br />
14
general terms, of the programming languages used in GIS development (and aerial<br />
image analysis), the C programming language is the most widely used to interpret<br />
geographic information. Many analysis programmes such as MITAB and Shape<br />
Library use C as a means of accessing geographical data. One of the advantages of<br />
using C is that processor heavy functions such as the analysis of pixels in order to<br />
categorize them into shades and variations from a mean can be best achieved in<br />
this language. This is probably most evident by the fact that most of the open<br />
source programming projects such as ImageTool or GDAL have all been written<br />
using the C programming language. This thesis makes use of a small aspect of<br />
these libraries and as such in turn uses the C programming language. This is not to<br />
suggest that the default programming language for this type of study should<br />
necessarily be C but that in order to make use of the available body of knowledge;<br />
previous studies can probably be best extended using C. It is important to note that<br />
the main problems encountered when analyzing aerial data/ imagery are those of<br />
co-ordinating the imagery so that it can be referenced and analyzed properly. The<br />
three main methods of this are geographic, projected and pixel.<br />
This thesis uses a mixture of both pixel and geographic. The fact that it is<br />
necessary to use both for the relatively simple cutting and analysis of image<br />
segments demonstrates the importance to GIS programming of being able to<br />
transform coordinates. Geographic coordinates refer to latitiude and longitude,<br />
while projected coordinates refer to a flat two dimensional coordinate structure.<br />
The C programming language (through the available libraries and its high level<br />
nature) provides an accurate means to execute these coordinate transformations.<br />
Note: Other options are available, such as the modules in .net –and coordinate<br />
transformation is something which can be achieved across all programming<br />
languages once the correct math functions are accessed.<br />
Another factor which needed to be considered while reviewing programming<br />
languages for this thesis was the limitations that would occur due to the<br />
programming experience of the author. Ideally, a processing algorithm, written in<br />
C with a Visual basic front end, which could also tie into OSI metadata would be<br />
the preferred solution. This could then be extended to allow a user to zoom in on a<br />
map window, identify a subject area, review the available photography and target<br />
15
a selected study area. This is possible using existing systems but falls outside of<br />
what could reasonably be achieved with the available resources for the study. The<br />
main purpose of the study is to determine whether the methodology being<br />
suggested is applicable and whether useful data can be returned from this type of<br />
study. The degree of success indicates the fesability of tailoring an application.<br />
The benefits from developing the already well researched methods of analyzing<br />
imagery; in terms of aggregating the pixels and deriving statistical data from a<br />
selected tile of aerial photography would be limited. This is because there is<br />
already a vast body of knowledge dealing with the subject available. In particular I<br />
am referring to work such as ImageTool, which, again written in C, was<br />
developed in California and is open source. It (along with several other open<br />
source image processing projects) effectively interprets imagery in terms of its<br />
spectral content. As I suggested earlier, one of the most important aspects of<br />
viewing and studying the surface of the earth remotely is the way it is projected.<br />
That is to project it into a format which can give valuable information to the user<br />
and this is where a large part of this study will focus. The study takes a captured<br />
and referenced coordinate grouping (set of data points) and uses these to analyze<br />
sections of the earth. These coordinate groupings are definite points along fixed<br />
boundaries which form physical barriers in terms of walls, streams, buildings and<br />
roads. These could probably be better explained in terms of a bull in a field. In<br />
general terms, any polygon which would prevent the bull from escaping forms a<br />
parcel, which is then analyzed. This means that the study areas are bounded by a<br />
series of fixed vectors/ polylines which are unlikely to deviate over time, allowing<br />
the suggested algorithm to be run over successive years of data capture. With a<br />
standard mean and control key for spectral values it should be possible to gain an<br />
insight into changes in land use in the specific semi-urban areas looked at the<br />
study.<br />
The study areas could possibly be extended to rural areas over time. The reason<br />
why the study does not extend to these areas is that it would have to account for a<br />
much larger study area and less well defined boundaries. This would probably<br />
only be effectively done using tried and tested values derived from more fixed (ie.<br />
no fuzzy data/ hazy boundaries where pixels gradiate and physical features such as<br />
man made fences and walls are not present) polygons.<br />
16
The start of this study contains a glossary of terms, which are probably familiar to<br />
most readers, and the next three chapters will refer to some terms which are<br />
specific to this type of analysis. The first term is aerial imagery which is a<br />
reference to all the spectral data obtained during the study. When described as<br />
aerial imagery or raster polygons the reference is to an aerial image corrected to<br />
allow for distortions such as slopes so that it corresponds to the vector mapping.<br />
The second term which is repeated is the vector data, which refers to ordnance<br />
survey data which was captured through a mix of photogrammetry and field<br />
surveying. Although most readers are probably familiar with the .tiff file format it<br />
is probably worth noting that the input photography files for the study are in the<br />
GeoTIFF format. This format complies with the TIFF 6.0 standard and gives the<br />
input data the flexibility to be accessed in a wide range of programs; which<br />
allowed the imagery to be viewed outside the study software as the work was<br />
undertaken. The key metadata components of this file format for this study are the<br />
georeferencing coordinates which allow the sections being analyzed to be<br />
accessed. This format is also recognized by the GDAL library being used in the<br />
study. The projection used in all the files used in the study is ITM. This is not vital<br />
to the success of the algorithm but making use of an additional projection requires<br />
the inclusion of a transform function whenever the datasets intersect. The<br />
following three chapters form chronological record of the study; starting with the<br />
suggested algorithm (Chapter 2), followed by the sampling section necessary for<br />
the basis of the procedure (Chapter 3) and finishing with a test on known polygons<br />
for specific search values (Chapter 4).<br />
17
2 Stepping through the Algorithm<br />
This thesis introduces a method for analyzing aerial imagery that can be translated<br />
into a procedure and run automatically. The operation is specific to two types of<br />
data<br />
• Digital vector files from the ordnance survey.<br />
• Controlled aerial photography stored as GeoTiff files.<br />
Both of these data sources are projected using ITM projection and are referenced<br />
during the study. The premise of the study is that it is possible to automatically<br />
capture additional information about area polygons from aerial photography using<br />
previously captured vector polygons as a guide. It is an attempt to fill in the blanks<br />
in terms of polygon attributes not included in the photogrammetry which led to the<br />
vector data. The focus is not primarily to obtain an accurate list of all polygons<br />
from the sample data but instead to identify a verifiable method for automatically<br />
doing so. As the focus is on the identification of methods, the process that is<br />
outlined can be extended to apply to searches for specific spectral qualities –in<br />
other words someone searching for a particular crop type might employ the<br />
algorithm here, but add a target set of data specific to their work. In short what<br />
follows is an attempt to take the two data sets mentioned above (photography and<br />
vector data), combining them and returning a new set of information derived from<br />
both. The process does not merge the data sources but uses the vector data (a large<br />
portion of which was derived from the photography) as a reference to cut<br />
segments from the imagery and treat these segments as smaller manageable pixel<br />
collections for analysis. This process is helped by the fact that the content of many<br />
of these polygons is known and has been coded to the vector data.<br />
What was completed in the sampling part of the thesis was an attempt to identify<br />
specific spectral qualities that can be applied to these known polygons, and then<br />
used to reference the unknown areas. This had a reasonable level of success with<br />
some polygon types making a more useful reference than others. A description of<br />
these can be found in the sampling section of this study. Automated aerial image<br />
analysis generally focuses on attempting to determine the values of the imagery<br />
18
from scratch. For an example of this type of work see Thomas Knudsens 2005<br />
study on aerial image analysis. What is unique about this study is that it attempts<br />
to use previously captured data as a basis for further image interpretation. This is<br />
something which, from the research into the data and contact with ordnance<br />
survey, has not been attempted before for Irish digital spatial data. All of the<br />
difficult remote sensing is complete (via the vector mapping) before this analysis<br />
begins, and control points, physical boundaries and closed polygons have all been<br />
identified. This study presents a method for developing software to extend the<br />
work completed, and assist users to identify specific traits in what would<br />
otherwise be an impossibly large store of imagery for the human eye to analyze<br />
(without using a team of trained analysts). The algorithm proposed here is<br />
essentially a way of looking for spectral values in small area polygons and<br />
comparing them to known values.<br />
The process outlined is for people intending to scan aerial imagery of Ireland for<br />
specific spectral properties. It is intended as an additional facility for users of<br />
aerial photography. At present it is possible for people to conduct specific research<br />
using the photography and a GIS tool. This algorithm is intended to make the<br />
process accessible to users who do not have the time or access to the resources or<br />
software licences to conduct this type of research. It can be used to identify<br />
specific land use types which are outside those currently captured by ordnance<br />
digital mapping and serve as an add on tool for anyone using that type of data. The<br />
purpose in compiling the data and researching software for the study was to<br />
outline a method and as such the application of the method is dependent on the<br />
user. In the case of this study a lot of emphasis was placed on the identification of<br />
pasture. This is because it is the major form of land use in rural and peri-urban<br />
areas and its correct identification helps in limiting the search for other types of<br />
cover to a relatively narrow number of polygons. In a similar way someone could<br />
take the same steps –identifying unique statistical properties of the pixel count in<br />
the types of area being studied and add them to the algorithm. This would involve<br />
including the additional search to the cycle of flagged polygons at the end of the<br />
third part of the search execution.<br />
19
The process can also be coded into a standalone application or as an extension of<br />
existing software for a specific use (such as searching for crop disease). One<br />
example might be with the python based raster viewer <strong>Open</strong>EV, where a user is<br />
analyzing aerial photography using the package. It is possible to extend the<br />
functionality of this to analyze statistical data using the GDAL library. A user<br />
concerned with a specific set of spectral values, or wanting to confine the research<br />
to a specific area polygon type within the image, could make use of the methods<br />
set out here to set the statistical function to return target data only (as opposed to a<br />
general application of the histogram function). In broad terms this study is for<br />
users of aerial raster imagery and the results of the sampling are based on samples<br />
of Irish data. It may be possible to execute similar studies for different regions but<br />
the small well defined polygon types with clear consistent (over large periods of<br />
time) boundaries are a vital part of the analysis. This is probably a result of<br />
relatively small property divisions and rigorous maintenance of the boundaries<br />
over hundreds of years and may be unique to Ireland. In short the study is a look<br />
at a possible coded routine to analyze the Irish landscape using all the available<br />
data.<br />
As mentioned in the previous paragraph this study is intended for users of aerial<br />
photography. The pre-requisites to this are that it is controlled and has the<br />
projection embedded in the file, and the users have access to ordnance survey<br />
vector data. Outside of these conditions the algorithm is intended for users who do<br />
not have a strong background in information technology as much as those who<br />
have a good knowledge of code and could easily convert the proposed steps into<br />
routines. The open source software described in the study has a familiar user<br />
interface to any GIS package (standard toolbars/ zoom/ measurement etc.) and it is<br />
possible for someone to run the algorithm without having to alter any of the steps.<br />
Ideally, however, the steps would be converted to add on to an existing piece of<br />
software that is being used (Arc<strong>View</strong>, Microstation etc.) so that the user can<br />
quickly run through large amounts of data. In this way the routine is designed for<br />
anyone who is interested in targeting specific properties of Irish topography that<br />
can be defined in terms of their spectral values. These areas range from forestry<br />
and agriculture to urban planning. The limitations of the study are in the quality of<br />
the imagery and it was shown that some potential applications of spectral analysis<br />
20
would not return accurate results. A study of sediment levels in drains or canals,<br />
for example, could not be developed using the methods outlined here because of<br />
the difficulty in getting a large enough sample to train the image key.<br />
The method outlined can be used against small area polygons, so can be applied to<br />
land use types across most of the country, with the exception of full urban areas<br />
and remote mountain areas (where the small land divisions are not found). It<br />
should be noted that the target areas described refer to peri-urban data. This is<br />
because fully urban areas are covered by large scale 1:1000 mapping and spectral<br />
analysis would not improve on the available data (outside of highly specialized<br />
heat radiation studies etc., which are not the intended use for this method). The<br />
method could also be used by someone seeking to trace patterns in land use over<br />
recent decades. This study would be confined to RGB photography as a<br />
comparative analysis of the properties of the pixel counts by colour band are a<br />
requirement for the process. Once the proposed key is calibrated for the particular<br />
run of photography, the algorithm could be set to run for a specified region of<br />
interest across the period when this type of photography was available.<br />
This differs from previous studies looking at automatic aerial photography in two<br />
ways. Firstly, the focus is specific to Irish ordnance survey data and concentrates<br />
on making use on the codes and known values that can be extracted from this.<br />
Secondly, the study uses small area polygons to target the spectral analysis of the<br />
imagery to relatively small sample areas. This is to reduce the difficulty posed by<br />
variations in pixel values found in large samples. In this the process is relatively<br />
unique as it takes the small area polygons as a guide and cuts the matching areas<br />
from the raster aerial imagery. This allows for automatic decisions to be made<br />
regarding the level of standard deviation present in the sample. In many ways this<br />
simplifies the process of image interpretation because most of the difficult image<br />
control work is completed and the software can then focus on variations specific<br />
to the ground cover being studied. While this has limitations on the extension of<br />
the work to other (general small scale) datasets it does outline a method for<br />
automating the search of imagery. Over the course of the last two years of this<br />
Masters study I have learned about how users can interrogate and manipulate data<br />
in large databases.<br />
21
Aerial imagery is a form of database. Once it is structured into tables it can be<br />
interrogated for spectral properties just like vector data in a spatial database. By<br />
using the boundary data points from the vector data the imagery is converted to<br />
manageable sections and properties can be determined and logged. This sub<br />
division of the image into a mosaic of areas, starting with known polygons,<br />
moving to polygons which can be easily classified (strong variation form the other<br />
known values with a low level of standard deviation such as cut pasture) and<br />
flagging any whose values fall outside the image key means the job of analyzing<br />
the image is made easier. This sub-dividing of raster imagery is something which<br />
has not been attempted with Irish ordnance data and aerial photography (to the<br />
best of my knowledge, I have conducted a search of research papers and similar<br />
work had not been undertaken within the ordnance survey). The focus of the study<br />
is on proving that this method is practical, and can be applied to a variety of area<br />
types. The methods suggested by this study are unique to the area divisions and<br />
available vector data, and present the steps necessary to train an image key to look<br />
for specific properties in the Irish landscape.<br />
The process works by taking the point data from the polygons contained within<br />
vector data representing an area of an image. Using this point data to crop the area<br />
of the image the polygon it represents and log pixel values for that area. This is<br />
repeated for every area in the region of interest. These are then compared to an<br />
image key and areas classified according to the presence of values specific to the<br />
key. One such key was developed during this thesis but could be re-calibrated for<br />
higher values. In other words a higher mean for an water bodies within a separate<br />
run of photography would increase the key values by that amount in the key.<br />
Within the key are known values (water, forestry, roads etc.) and the proportional<br />
difference (in terms of the mean pixel count for values in the red, green and blue<br />
colour bands, and the levels of standard deviation) between these known values<br />
and search values (such as pasture) is measured against the histogram values for<br />
cropped polygons of unknown use and a category applied for matches. In other<br />
words the process steps through locating, cutting and analyzing small areas of the<br />
image to enhance the available data and search specific values across the whole<br />
image.<br />
22
2.1 Initial Inputs<br />
The following four sections describe the work in terms of steps through the<br />
algorithm being proposed. This first section introduces the initial inputs<br />
required to define the region of interest to be analyzed:<br />
This study attempts to find an automatic method for image analysis using vector<br />
data as a reference, and in particular small area polygons and their associated<br />
coding. At the beginning of the proposed algorithm a user is required to input a<br />
region of interest for the process. This corresponds to the geographical area in<br />
which the user is interested. The most convenient way for someone to do this is to<br />
manually select the area from vector data or photography (or a combination of<br />
both displayed together) displayed in a window on a pc. The result of this should<br />
be a set of co-ordinates from which the study area can be extracted.<br />
The user also needs to input a sample target area for the study. This can be one of<br />
a set of values developed in this study or may take the form of a particular<br />
variation (such as a distinct type of crop etc.). In the second case a sample of the<br />
required value is needed. This can be obtained in the same way as the first part of<br />
the region of interest selection where, as mentioned above, the user manually<br />
selects the target area from a viewing window and the output is a set of co-<br />
ordinates. A second way this target data might be obtained would be from a co-<br />
ordinate set input from a field survey completed using mobile GPS device. In this<br />
case the co-ordinates first need to be converted to the Irish Transverse Mercator<br />
framework, so as to allow the process to match them to the projection used in the<br />
photography and vector mapping.<br />
The general flow of the first part of the algorithm being suggested is user inputting<br />
the region of interest and a required value from the image analysis, which are then<br />
converted to a format which can be compared against the data. In the case of the<br />
software used in this thesis this takes the form of a simple ASCII file containing a<br />
co-ordinate set, but a common format might also be the .shp file used by ESRI.<br />
23
The software required for this step in the proposed algorithm includes an<br />
application for viewing and analyzing raster data and capable of performing<br />
transformations on sets of co-ordinates. For this study four sets of libraries were<br />
used, packaged into open source applications known as <strong>Open</strong> EV, Mirone and<br />
GDAL. The vector data was clipped using an application which forms part of a<br />
geographical information system called Radius Vision. All the processes<br />
necessary for the first part of this algorithm can be performed using GDAL, with<br />
the exception of clipping to an irregular polygon, which is still under development<br />
(GDAL, 2010). There is a license requirement for the Radius software, which was<br />
used in this study for the step involving the user selecting the extent of the region<br />
of interest in the vector mapping. It should be noted that this can also be<br />
completed using any other vector mapping tools such as Arc<strong>View</strong> (which would<br />
create a .shp file). Another alternative is for the user to manually create an ASCII<br />
file of co-ordinates (with the convention of easting northing, separated by<br />
newline). This alternative can be frustrating for the user and the suggested process<br />
is to make use of software capable of designating the region of interest through a<br />
viewer.<br />
The data required for the aerial image analysis is ordnance survey ortho-rectified<br />
colour aerial photography and matching digital mapping. The archive of aerial<br />
imagery goes back to the 1970s and the algorithm being suggested is designed to<br />
operate with any run of photography so users can discern dispersal patterns over<br />
time through successive photography dates. The process, however, makes use of<br />
the three colour bands present in colour photography and is limited to<br />
photography with the red, green and blue colour bands. The vector data used will<br />
take the form of 1:5000 or 1:2500 scale digital data and it is this data which forms<br />
the basis of the search process. The vector data has the region of interest divided<br />
into a mosaic of small area polygons the majority of which are coded according to<br />
their use or content. The aim of this thesis is primarily to automatically register<br />
additional data for those of unknown use type –and secondly to flag those of<br />
known use type with specified (spectral) anomalies from user requests. In order to<br />
be successful the data requires the coding, which may be useful to consider as a<br />
data hierarchy at this stage in the process. The following hierarchy is only for<br />
illustration. In practice each polygon will be analyzed according to its spectral<br />
24
content and placed in a unique set. Any relationships between those sets would be<br />
made post analysis by the user for the purposes of their particular survey. Having<br />
said that the known polygons are;<br />
Forestry –divided into categories of mixed, coniferous and deciduous.<br />
Water –divided into categories of stream, lake, river, drain, pond and reservoir.<br />
Road –divided into categories of motorway, national primary, national secondary,<br />
regional, third, fourth and track (also coded as footpath and forestry road).<br />
Buildings –variously coded as solid, dwelling and a variety of functions though for<br />
the purposes of the algorithm they will be treated as one data type (this is because<br />
they were found to be unreliable in terms of consistent spectral values and biased<br />
result sets from the spectral analysis).<br />
Two other aspects of this data, the presence of marsh and pasture symbols can be<br />
used to indicate known values for a polygon when found inside the bounding co-<br />
ordinates, though in the study these must be compared against the spectral key to<br />
ensure the symbol in representative of the entire polygon.<br />
The output from this stage in the algorithm is two required and one optional data<br />
set. The first necessary return is an area of vector mapping, containing a mosaic of<br />
vector polygons divided by polylines representing real world physical boundaries<br />
between the areas and the ordnance survey coding related to this data. This was<br />
extracted using the extract_map function from the Radius GIS library, but this<br />
software is not a necessary requirement –the data can be exported to a different<br />
format and a similar procedure completed (using Arc<strong>View</strong>, Microstation,<br />
AutoCAD etc.), once the map projection and co-ordinate attributes associated with<br />
the vectors are retained. The study did not explore the possibility of creating a<br />
unique vector data cutting tool as it can be assumed anyone making use of this<br />
algorithm will have access to some level of mapping software (if not similar open<br />
source software can be obtained from Brazil’s National Institute for Space<br />
Research under the SPRING project –see Appendix) The second necessary return<br />
is a region of aerial photography matching the co-ordinate set outlined in the<br />
section cut from the vector data. This can be cut using the co-ordinate set outlined<br />
from the vector data. In this study the MATLAB based Mirone software<br />
developed in the University of Algarve for earth sciences was used to extract the<br />
25
study area from the photography and the input file was in ASCII format (although<br />
other formats, such as .shp could also be used).<br />
The third optional output from this stage in the process is a co-ordinate set for<br />
possible sample areas in the image which a user is seeking to complete an<br />
inventory. This could be selected from the image using the vector manipulation<br />
software described in the previous paragraph, or could be obtained from point data<br />
collected by the user in the field. If field data is used there are two requirements.<br />
Firstly that it makes a closed polygon so an area can be sampled for spectral<br />
values. Secondly, that the co-ordinates conform to ITM projection to match those<br />
of the imagery. Transformation of the co-ordinates can be achieved using<br />
gdalwarp (GDAL, nd.) which can then be used to extract the required pixel set for<br />
examination through software such as Mirone. If a specific value is required, then<br />
at least three samples are used to obtain a representative value for the image key.<br />
26
2.2 Area Extraction<br />
The following section introduces the second part of the algorithm, where it steps<br />
into a series of loops for cutting out known areas (via vector data) from the<br />
image:<br />
The second step in the process is to extract the known areas from the study area in<br />
the imagery. This involves creating a set of polygons conforming to the coded<br />
values and excluding them from the image search. This set is then either flagged<br />
for analysis further in the algorithm (should a target area search have been created<br />
by the user in the first step) or placed in a holding set for inclusion in the statistical<br />
data output at the end of the analysis. The output from this step should be a set of<br />
unknown polygons and their associated raster image sections, along with the sets<br />
of known polygons.<br />
For this study the software employed for this step in the process was Radius (to<br />
obtain co-ordinate data for cropping the image into area polygons) and Mirone (to<br />
crop the image). This thesis was written with a view to operating on a new spatial<br />
database being developed for ordnance survey data. This database will have the<br />
capacity to return sets of values in GML form, which would mean that the input<br />
sets for this step in the algorithm would be more easily obtained by extracting<br />
… using a text editor to create a master input<br />
set. For the purposes of this study, however the input files were created in ASCII<br />
format from the polygon co-ordinate sets outlined in Radius (by copying and<br />
pasting). These co-ordinate sets were then imported into the Mirone software and<br />
used to create closed polygons, which in turn allowed the target areas to be<br />
exported. It should be noted that this is a user intensive process and was used only<br />
to test the theory being proposed in this thesis. There are many ways in which this<br />
part of the image analysis process could be automated and the process time<br />
reduced, but the focus of this study is to prove that usable data can be obtained<br />
using the methods outlines and they were not expanded on.<br />
27
The areas extracted were placed into sets according to their nature and the total<br />
area of the sets recorded (using the area property associated with the input vectors<br />
–note these can also be calculated at the raster extraction stage using the Mirone<br />
measurement function). For each set of imagery run the image key needs to be<br />
reset to match the spectral values present, and the area sets created at this stage<br />
can then be used for this calibration. The algorithm itself is concerned with the<br />
proportional difference between the pixel values in the polygons so new values<br />
can be applied for the image key using the methods outlined during the sampling<br />
section of this thesis. This means that, should a new set of photography be used<br />
the initial sets for this stage are analyzed to create new baseline data. In other<br />
words the value for each polygon of road, section of coniferous and mixed<br />
forestry, river, lake and pond (though not reservoir as it did not return reliable man<br />
pixel value settings during the sampling); the mean pixel value and standard<br />
deviation by colour band are recorded and averaged by group. These averages in<br />
turn are tested by the expected proportional relationships between them and once<br />
verified are applied as image keys for that particular run of photography. In the<br />
case of this study this was completed using PCI geomatics geomatica software,<br />
which returned statistical data for the clipped polygons using the analysis function<br />
against the red, green and blue colour bands. As with the step as a whole, this<br />
process could benefit from an application specific to the algorithm which would<br />
return these values for the purposes of creating an image key alone.<br />
This stage of the algorithm does not require any user input (the type of<br />
photography is entered at the start of the analysis). The outputs are a series of sets<br />
of polygons and their associated raster image areas containing the image<br />
projection (in GeoTiff format). Once the input data for this stage is available in<br />
GML/ XML format then it should be possible to code a series of iterative loops to<br />
set up the sets and reduce the amount of remaining polygons required for spectral<br />
analysis. These improvements on the step being described would serve to speed up<br />
the analysis and make the process neater to the user but in order to ensure the<br />
process was worthwhile they were omitted and the focus of the study concentrated<br />
on defining relational values and determining if the vector/ raster analysis hybrid<br />
model would reveal useful data for the user.<br />
28
2.3 Spectral Value Comparison<br />
The third part of the algorithm consists of a series of procedures to assign<br />
known areas and areas with values that can be determined from the known sets:<br />
This part of the algorithm involves comparing the spectral values for the unknown<br />
polygon types (from step 2). The first part involves creating statistical histogram<br />
data for all of these polygons and comparing it to an expected value key for<br />
classification according to land use type. Areas which do not conform to known<br />
values are placed in a set for further analysis while those matching are categorized<br />
according to their values. The first part of this step involves verifying any<br />
polygons which were found to include descriptive symbols from the vector data<br />
(marsh, pasture -note: pasture in this case refers to known areas of rough pasture).<br />
Following from this an analysis according to spectral values was completed, and<br />
the set of neighbouring polygons (taken directly from the vector data set)<br />
examined to see if probably neighbouring areas might influence the result. For<br />
example; if an area with a set of pixel values close to those expected to pasture<br />
was identified but displayed a high level of standard deviation this area was given<br />
to the pasture set if three or more neighbouring polygons contained pasture (as the<br />
deviation is probably caused by shade in the image), otherwise the image is<br />
flagged for examination of the histogram results later in the process –to see if a<br />
double spike in the red and green polygons is present.<br />
In order to complete this step the algorithm cycles through a number of relative<br />
values to determine the probable land area of the polygon being analyzed. For this<br />
section of the study the histogram values were exported from the geomatica<br />
software package as tables and graphs and compared manually, in order to<br />
complete the same task on a larger scale this process would be coded into a<br />
routine taking the statistical (image) data and image key as input and outputting<br />
the closest match. For example the mean data by colour band would be compared<br />
to the values for roads and if a 50% decrease in the red, 40% in the green and 50%<br />
in the blue colour bands was detected the polygon would then be compared to<br />
water where if a 70% increase in red, 55% in green and 20% in the blue colour<br />
29
ands was detected the standard deviation would be matched for its range outside<br />
an expected value of 10; allowing the polygon to be coded as pasture. This routine<br />
is not coded here but was executed using a set of comparative tables.<br />
At the beginning of this step the image consists of several sets of known polygon<br />
types, the clipped image polygons, associated vector codes and an image key.<br />
After completion of the step there are several more known polygon sets and a set<br />
of unknown areas which fell outside the ranges expected. This may be the result of<br />
the samples being biased by high levels of shade or the fact that they represent a<br />
transitional data type (bog to rough pasture etc.). These remaining polygons are<br />
further analyzed in the next step but this stage of the algorithm is used to classify<br />
as many known values as possible. These were obtained through a series of<br />
comparative steps as follows:<br />
The sampling during this study pointed to a number of interdependent<br />
relationships between the spectral values found in the polygons studied. The fact<br />
that the polygons are clearly defined (through the vector mapping) and that the<br />
content of many of these the polygons in the image is known prior to analysis<br />
means that the algorithm can focus on identifying a narrow range of additional<br />
area types. To achieve this it is necessary to loop through a series of criteria for<br />
four main land types; pasture, rough pasture, bog and marsh. The last two on this<br />
list will have identifying symbols present in most cases, which can be used to<br />
assist the automatic search. Once the four target areas have been identified (with<br />
pasture being the main land use in most semi-urban imagery) the remaining<br />
polygons form a small set of areas which are further analyzed in the next step of<br />
the process.<br />
The polygon is analyzed using the geomatica analysis tool and the histogram<br />
values exported for comparison to the known values. If the sample has a mean<br />
value for the red colour band 40% lower than that of road polygons, and the blue<br />
colour band displayed a similar 40% lower mean value than roads, and the<br />
standard deviation is lower than a value of 15 then the sample is matched to water<br />
–if the mean for the red colour band is close to three times that of water, and has a<br />
green value close to twice that of water the polygon is coded as pasture.<br />
30
If the sample does not match the above criteria but had a mean pixel value for the<br />
red and green colour bands close to half that of roads, and displayed a level of<br />
standard deviation three times that of roads and the red and green values represent<br />
close to double the value of those in the water polygon (taken from the image key)<br />
then the polygon is coded as rough pasture.<br />
If the sample has red values around 30% lower than mixed forestry, and 20%<br />
lower in green for the same known polygon type and the standard deviation<br />
remains within 10% of the mixed forestry then the polygon can be coded as marsh<br />
(usually a symbol under the level is present in within the polygon in the vector<br />
dataset, but not always).<br />
If the sample does not match those criteria outlined so far but has a low standard<br />
deviation across all three colour bands and contains a decrease in the mean value<br />
of over 30% for all colour bands when compared to the known road values then<br />
the sample is tested for area size, if it is above the maximum value for pasture then<br />
it is coded as bog.<br />
The remaining polygons after this step in the algorithm fall into two categories –<br />
those surrounding buildings or with mixed use and those with a high level of<br />
shade present. Further analysis is required to step through the remainder to<br />
identify areas with a homogenous pixel value but have a higher level of standard<br />
deviation due to levels of shade, and those of mixed use. The output from this part<br />
of the process are six further area sets; pasture, rough pasture, marsh and bog,<br />
areas with high standard deviation for further analysis and areas containing<br />
building polygons (automatically flagged through the vector data).<br />
31
2.4 Confirmation<br />
The final stage of the algorithm involves the reduced data set being stepped<br />
through for manual confirmation (or compared against an additional set of<br />
values determined by the user):<br />
This part of the algorithm is concerned with tidying up some of the remaining data<br />
from the previous sweeps through the polygon. To begin with the polygon set<br />
classified as pasture is selected and analyzed for differences in the mean values of<br />
the red and green colour bands. Those with mean values above 190 on the<br />
converted greyscale in red, and 200 on the scale in green are classified as cut<br />
pasture (while initially this may not appear to be of direct value to the user, it<br />
could help with any subsequent analysis of pasture in particular).<br />
The next loop is designed to remove polygons containing homogenous pixel<br />
values whose standard deviation has been biased by a high proportion of shade in<br />
the sample. It involves checking the histogram for two peaks (one for shade and<br />
one for pasture) in the pixel count. If present, the polygons are assigned to the<br />
pasture polygon set.<br />
The process was completed using the geomatica software to extract the statistical<br />
data from the polygons of raster imagery (extracted earlier in the process using a<br />
combination of ASCII data from the vector manipulation software and the Mirone<br />
clipping function). The remaining areas were cross checked with the polygons<br />
containing buildings other than those coded s dwellings. Those found not<br />
containing a building polygon are retained for further analysis and logged<br />
according to adjacent polygon types (E.g. 123445.34<br />
232234.34 etc. –neighbouring road, building polygon, pasture –area 7658m2).<br />
This was completed manually for the study using the Radius software and GeoTiff<br />
referencing but would be best completed inside a routine for larger samples. These<br />
unknown polygons can then be visually referenced by a user and manually<br />
categorized (displayed according to an input co-ordinate set returned from this<br />
algorithm through software such as Mirone). The result of this sample image study<br />
32
was that only a small area of the original search area was processed at this stage in<br />
the study.<br />
33
3 Sampling for the Baseline Image Key<br />
Figure 1: Aerial view of sample area<br />
The main body of research in this study involved identifying areas which would<br />
make useful benchmarks for an automated image analysis to use as a search key;<br />
ten areas were selected for inclusion as they formed the most distinct sets which<br />
could be used. These ten area types sampled were: Roads (of all class), Water<br />
(Lake, River, Stream, Drain and Pond), Marsh, Coniferous forestry, Mixed<br />
Forestry, Track, Shade (to obtain reference values when a high level of standard<br />
deviation occurred), Building (roofs), Pasture and Rough pasture. Of these<br />
sample values four could not be determined from the vector data (Pasture, Marsh,<br />
Rough pasture and Bog) and were used to test the ability of the process to identify<br />
target values by their relationship to known values. The next section describes the<br />
findings for these sampling areas.<br />
34
3.1 Roads<br />
Figure 2: Road area and surrounding detail<br />
This part of the study looked at sample sections of road (tarred/ hard cover) to see<br />
the relationship between the mean spectral value in these areas and the image as a<br />
whole. In general terms areas of road appear lighter than other parts of an aerial<br />
image due to increased reflection along the surface and this was borne out by a<br />
mean greyscale pixel value of 30% above the image average across the three<br />
colour bands.<br />
The study involved using <strong>Open</strong>EV (open source raster imaging tool based on the<br />
GDAL library) and PCI Geomatics Geomatica geospatial viewing application.<br />
The files were exported as GeoTiff files from the original image using the<br />
GDAL_export facility. In all ten regions were sampled. These sample areas were<br />
taken from within the road polygons (as opposed to sampling the entire polygon)<br />
in order to identify true baseline data for these features. Sampling the entire<br />
feature would also have meant including areas obscured by tree cover, and<br />
necessarily biased the results –the intention of this part of the study was to create a<br />
benchmark against which tolerances for deviation could be included.<br />
The ten sample areas were taken from a series of roads in the south east of the<br />
image and three sections of the national primary road running along the north of<br />
the image (the pixel representation in the example below is to illustrate the area<br />
being sampled but is at a lower resolution than was used in the study).<br />
35
Figure 3: Road area and vector data<br />
In general terms the sample areas had an equal distribution of values across the<br />
red green and blue colour bands when compared to the image as a whole and it<br />
was not possible to discern any unique variation on the proportion of pixels in<br />
each of these bands contained in the road polygons. The results, however, did not<br />
deviate to any great extent between the samples and the mean greyscale values in<br />
the samples remained consistently higher than those of the image as a whole. This<br />
was significant as the variation remained at around 30% higher for each band in<br />
each sample (34.4 on av. in red, 26.6 on av. in green and 36 on av. in blue).<br />
Road value sample 1 Mean pixel value<br />
Red 215.5<br />
Green 243<br />
Blue 194.7<br />
Road value sample 2 Mean pixel value<br />
Red 206.444<br />
Green 234.5<br />
Blue 157<br />
36
Road value sample 3 Mean pixel value<br />
Red 202.3<br />
Green 228.8<br />
Blue 183.9<br />
Road value sample 4 Mean pixel value<br />
Red 163.083<br />
Green 189.75<br />
Blue 147.667<br />
Road value sample 5 Mean pixel value<br />
Red 171.417<br />
Green 196.083<br />
Blue 153.333<br />
Road value sample 6 Mean pixel value<br />
Red 169.444<br />
Green 190.111<br />
Blue 151.444<br />
Road value sample 7 Mean pixel value<br />
Red 167.111<br />
Green 190.444<br />
Blue 155.889<br />
Road value sample 8 Mean pixel value<br />
Red 155.667<br />
Green 185.25<br />
Blue 145.833<br />
Road value sample 9 Mean pixel value<br />
Red 148.417<br />
Green 172.25<br />
Blue 143.833<br />
Road value sample 10 Mean pixel value<br />
Red 187.061<br />
Green 211.545<br />
Blue 173.364<br />
Table 1: Road sample values<br />
37
(The previous samples were compared to statistics from the image as a whole of<br />
Red: Grey Level Values: 6 – 255, Median: 115, Mean: 111.711, StdDev:27.2804;<br />
Green: Grey Level Values: 30 – 255, Median: 135, Mean: 136.542,<br />
StdDev:27.3776; Blue: Grey Level Values: 5 – 255, Median: 102, Mean: 102.636,<br />
StdDev:17.8523)<br />
The mean pixel values are a useful benchmark to base further analysis of the<br />
imagery on and the fact that each sample displayed a more or less uniform<br />
deviation from the image standard implies that there is some merit to applying the<br />
results to a key which identifies impermeable surface area. In general such areas<br />
in an urban area will contain similar properties to a road surface. Further iterations<br />
of this sampling will involve sampling shingle against concrete and tar (see the<br />
analysis of the spectral values contained in areas of track) in order to see if there is<br />
a measurable spectral variation between them; this will, however, involve a small<br />
amount of post processing to enhance the differences between them.<br />
The road network is a useful point of reference in automated image analysis and<br />
ways of analyzing imagery to capture road networks have been well studied (such<br />
as pattern analysis techniques explored by van der Werff & van der Meer, 2008).<br />
In this thesis the focus is not on capturing road data, something which has been<br />
completed and is on a continuous revision cycle, but on utilizing this data to make<br />
the image analysis process easier. Most study areas (and almost every part of the<br />
island of Ireland) will have at least some part of the road network present (at the<br />
risk of sounding pedantic this assumption is not verified here because it can be<br />
assumed as general knowledge and can be discerned from a cursory look at any<br />
online small scale representation of the network). This means that there are<br />
polygons of a specific unique spectral value available for referencing most studies,<br />
even when confined to a particular area or series of photographs. In order to get a<br />
clearer insight into how this can be applied three areas close to roads around the<br />
image were sample and compared to sample values from their closest road<br />
polygon.<br />
38
Road test sample 1 Mean pixel value Standard deviation<br />
Red 205.889 8.18<br />
Green 228.167 11.289<br />
Blue 178.778 12.73<br />
Adjacent spectral values sample 1<br />
(pasture)<br />
Mean pixel value Standard deviation<br />
Red 91.539 6.573<br />
Green 138.706 7.504<br />
Blue 92.519 10.84<br />
Table 2: Road test sample value 1<br />
The first test sample took an area of road and a sample area from an area of<br />
pasture adjacent to the road. This can be assumed to be a recurring set of values<br />
that can be located in most aerial imagery that this study is considering. In the test<br />
the values for standard deviation from the mean across all three colour bands in<br />
both samples did not vary to any large extent and can be omitted as reference<br />
values for the search algorithm looking to match a set of values as pasture using<br />
the nearest road polygon as key. In contrast there was a large difference in the<br />
mean values for all three colour bands with the road polygon containing pixels of<br />
a mean 50% higher for the red, 40% higher for the green and 50% higher for the<br />
blue colour band. This discernable difference allows a range surrounding this<br />
relative difference to be included into the search and areas of pasture to be<br />
identified. Note: potential candidates for a pasture set of polygons are also cross<br />
checked against other relative values, outlined in later sections of this sampling<br />
study part of the thesis. A result of this variance means that once the known areas<br />
are identified the road set can be compared against unknown polygons adjacent to<br />
it and, given similar levels of standard deviation and proportional mean by colour<br />
band outlined above, the unknown polygons can be placed in a pasture polygon<br />
set for further reference and confirmation as the analysis progresses.<br />
39
Road test sample 2 Mean pixel value Standard deviation<br />
Red 157.961 7.537<br />
Green 181.211 8.08<br />
Blue 145.553 10.129<br />
Adjacent test values sample 2<br />
(bog)<br />
Mean pixel value Standard deviation<br />
Red 111.479 7.086<br />
Green 125.162 8.764<br />
Blue 98.326 12.152<br />
Table 3: Road test sample value 2<br />
This sample looked at an area of bog adjacent to a road for discernable pixel<br />
variations between both, although bog will be sampled further at another stage in<br />
this thesis it is worth noting that all bogs have access roads nearby and a spectral<br />
comparison is a useful reference. The surface of the roads mentioned varies but, as<br />
is outlined in the road section of this thesis, surface type has only small affect on<br />
the range of spectral values returned from the road. The standard deviation for all<br />
three colour bands was almost identical between both samples (road and bog)<br />
which could be expected due to the relative uniformity of surface cover (from a<br />
medium altitude aerial perspective). The mean pixel value for these colour bands,<br />
however, varied almost uniformly across the three bands with a 30% smaller value<br />
obtained for the red, green and blue bands in the bog sample. As with road and<br />
pasture this is a strong proportional variation for analysis purposes and allows bog<br />
to be established in an initial polygon set during processing. The set can then be<br />
compared against the other expected variances (water, forestry, pasture) and<br />
matched against polygon size (bog will almost always the largest area polygon in<br />
any sample –other large polygons such as lakes and forestry are coded and can be<br />
automatically placed in a set during analysis).<br />
40
Road test sample 3 Mean pixel value Standard deviation<br />
Red 199.833 10.912<br />
Green 218.667 16.52<br />
Blue 163 165<br />
Adjacent test values sample 3<br />
(Mixed forestry)<br />
Mean pixel value Standard deviation<br />
Red 88.365 28.944<br />
Green 120.627 29.751<br />
Blue 90.361 20.369<br />
Table 4: Road test sample value 3<br />
The third set of samples for referencing adjacent data to the spectral values from<br />
road polygons was a section of mixed forestry close to a road (gravel track). The<br />
samples displayed a notable difference in the level of standard deviation for all<br />
three colour bands. This level of deviation is consistent with other samples of<br />
mixed forestry (and rough pasture) analyzed in this thesis; with roads displaying a<br />
deviation of approximately one third of the mixed forestry values for the red and<br />
green colour bands. The differences in mean pixel values for all three colour<br />
bands was also distinct; 55% lower for mixed forestry in the red, 45% lower in the<br />
green and 45% lower in the blue colour bands. This variation, together with the<br />
level of standard deviation, provides a useful quality assurance value set to test the<br />
accuracy of the derived values being estimated in the algorithm. Since the areas of<br />
mixed forestry and road are both present as polygons in the vector data set the<br />
pixel sets for each can be extracted and matched –any values falling outside a<br />
range close to the above expected variances would flag an issue with either the<br />
photography or vector data.<br />
41
3.2 Water<br />
Figure 4: Typical Water Area Image<br />
This section took a look at four water samples present in the sample image<br />
(comprising of three sections of lake, and one of river) to see if the spectral values<br />
could be used to control and calibrate image processing of land polygons. The<br />
results were good and indicated several unique properties for this cover that could<br />
be used to calibrate a key in relation to surfaces being studies. The percentage of<br />
the image covered by water was proportionally small but the lake section made up<br />
the biggest single polygon. Of the water, the majority of the polygons were for<br />
was drains, several streams (ranging from three to less than a meter in width<br />
stream), there was also a river and lake present. The streams and drains were<br />
eliminated from the study This was for two reasons; firstly they have very small<br />
width (less than a meter in some cases) and were often obscured by overhanging<br />
vegetation and secondly they are already captured and any spectral analysis would<br />
only be of use as comparative values to use against the rest of the image –which<br />
was not possible due to the vegetation.<br />
Variations between the samples were slight, suggesting that lower flown<br />
photography would be necessary to compile any useful information regarding<br />
sediment levels, but allowing good baseline figures to be derived from the values<br />
present. The pixel value was almost uniformly two thirds less than the image<br />
42
average for the red colour band; with a low standard deviation in all samples.<br />
There was also value on the green colour band of just 50% (with little variation) of<br />
the image mean for all samples. Similarly the value returned for the blue colour<br />
band was 20% less than the image average.<br />
Water Sample 1 Mean Pixel Value Standard Deviation<br />
Red 36.455 4.568<br />
Green 69.214 7.254<br />
Blue 81.614 13.217<br />
Water Sample 2 Mean Pixel Value Standard Deviation<br />
Red 36.119 4.562<br />
Green 69.524 6.944<br />
Blue 83.718 12.256<br />
Water Sample 3 Mean Pixel Value Standard Deviation<br />
Red 37.692 5.468<br />
Green 70.758 7.711<br />
Blue 83.386 13.056<br />
Water Sample 4 Mean Pixel Value Standard Deviation<br />
Red 39.714 5.548<br />
Green 73.083 7.07<br />
Blue 82.797 16.235<br />
Table 5: Water sample values<br />
It could also be said that the uniform nature of the results indicate that the relative<br />
depth of the water has little effect on the spectral value of the area for photography<br />
at that height, introducing the potential for water to be used as one of the main<br />
baseline properties in this type of image analysis. It can often be the case that<br />
certain areas contain large amounts of temporary ponds following heavy rain; this<br />
is particularly so in the 1:5000 scale rural mapping. Applying the above values<br />
against pixel histograms for these areas (typically bog or pasture) for photography<br />
runs taken following heavy rainfall could reveal useful data with regards to runoff<br />
and capacity across land areas. In terms of this study the values will form part of a<br />
key against which the histogram values for pixels across the colour bands can be<br />
applied in order to calibrate the key (set of values to identify land cover).<br />
43
The purpose of this thesis is to identify an automated process for image analysis<br />
using vector data alongside a spectral analysis. The ability to extract water areas,<br />
in particular lakes or ponds with a large concentration of pixels of similar values,<br />
and establish a baseline value to calibrate the image key by would allow the user<br />
to target specific areas across successive runs of photography. This could be<br />
completed automatically using GDAL extract and returning the results of a<br />
histogram analysis.<br />
In order to gain a visual impression of the location of the main water bodies in the<br />
sample image the green and blue colour bands were mapped into the red, allowing<br />
the definition between the (relatively) monotone water and the remainder of the<br />
image.<br />
Figure 5: Water Area Image Modification<br />
The next part of the study involved taking separate sample areas from around the<br />
image and comparing them to the spectral values associated with water. The<br />
44
samples were taken at three separate sections around the image; they did not<br />
correspond to samples taken for other parts of this study (specific road, building,<br />
pasture areas etc.) so as to increase the variety of input data. The three areas<br />
consisted of forestry (coniferous plantation), pasture and track. The pasture<br />
sample also contained a high degree of shade, which was not rectified in the table<br />
to see if it could be possible to identify this type of area with shade included.<br />
Water values testing sample 1<br />
(forestry)<br />
45<br />
Mean Pixel<br />
Value<br />
Standard<br />
Deviation<br />
Red 73.532 19.591<br />
Green 112.507 21.605<br />
Blue 96.772 16.042<br />
Water values testing sample 2<br />
(pasture)<br />
Mean Pixel<br />
Value<br />
Standard<br />
Deviation<br />
Red 115.153 12.691<br />
Green 167.608 15.487<br />
Blue 104.872 13.2<br />
Water values testing sample 3<br />
(track)<br />
Mean Pixel<br />
Value<br />
Standard<br />
Deviation<br />
Red 213.429 10.748<br />
Green 237.429 12.369<br />
Blue 192.821 14.636<br />
Table 6: Water test sample values<br />
As might be expected the track (artificial surface) showed the greatest difference,<br />
with a red colour band value of less than 20% of that found in track. The relative<br />
disparity between these two values (the red mean pixel value in areas of water and<br />
track/ road) could be used to calibrate an image key during automatic image<br />
analysis; and the percentages of other less distinct land cover derived by<br />
comparison. One of the main aims of this study is to see if it would be possible to<br />
analyze aerial imagery using an automatic process based on vector data. With<br />
known water polygons and road polygons present this can be achieved, however,<br />
as was mentioned above the body of water needs to be large enough to obtain an<br />
accurate baseline reading for the photography run. If only drains or streams are
present in the target area (or its immediate (~1km) surroundings then extracting a<br />
set of baseline pixel values from water polygons would not benefit the analysis. It<br />
can therefore be concluded that water polygons provide a useful reference for<br />
image analysis, but in the context of using them to add value to ordnance survey<br />
small area polygons they need to be part of areas not less than five pixels in<br />
diameter (i.e. belong to classes of rivers, lakes or large ponds).<br />
When the control values for water (from first table) were compared to pasture the<br />
red and green colour bands showed values which could be used to calculate if<br />
pasture was present in a polygon. The mean pixel value for water in the red colour<br />
band was 30% of the value for pasture and only 45% of the value for the green<br />
colour band found in the pasture sample (which included a section of shade). The<br />
value for the blue colour band also had a disparity of just over 20% less than the<br />
pasture mean pixel value. This implies that if an algorithm was to be run on<br />
sections of ordnance survey data which took polygon co-ordinates from the vector<br />
polygons, calibrated a key from the water polygon and compared the red and<br />
green band colour values against the red and green colour values of the<br />
neighbouring polygon and then confirmed the level of standard deviation (which<br />
was found to be low in an area of pasture, ~10 on the greyscale) it is probable the<br />
area can be labelled as pasture. In itself this does not present much of a<br />
breakthrough but when added to the known polygon it helps complete the picture<br />
of a target area being analyzed.<br />
At this point it might be better to think of the image as its vector representation.<br />
As the polygons which the vector data encloses are identified the areas can be<br />
filled. In this way the study is filling the blanks around known values. If the result<br />
was thought of as a mosaic of known area properties (type and nature of land<br />
cover) then applying the label pasture dramatically reduces the areas left to<br />
identify.<br />
46
Figure 6: Sample area as a mosaic of polygons<br />
The purpose of the study is to identify an automated software process to do this.<br />
The user would start with the requirement to identify the percentage of a certain<br />
property in the photography: throughout this thesis the example of impervious<br />
surface area is given but this could also be a fungal infection affecting crops, the<br />
spread of invasive plant species, the extent of flood damage etc. What the<br />
sampling of water polygons is doing is attempting to create a set of automated<br />
conditions which the software would initially retrieve to set a base for the<br />
algorithm. The user would then select an area from the vector data where the<br />
target values were present. This area would be in the format of a specially coded<br />
polygon composed of vector data; either using those in the ordnance survey data<br />
or appending lines to controlled data to fully enclose the target sample (and<br />
creating the necessary vector set). Once identified the target area could be<br />
calibrated against the value for water (among others) and areas not relevant<br />
eliminated.<br />
As every section of the image will be composed of an area polygon (taken from<br />
the vector data) which in general enclose relatively small areas it should be<br />
47
possible to quickly process each section by clipping and cutting the sections,<br />
comparing the mean pixel values across the colour bands, and classifying the<br />
result. This type of process is linked to the photography and once the edge of a<br />
given run is reached the process needs to be restarted and the values re-calculated.<br />
As the extent of the photography is known the co-ordinates can be included into<br />
the algorithm and the user informed when the extent of the search has been<br />
reached.<br />
48
3.3 Marsh<br />
Figure 7: Typical Marsh Area Image<br />
This study looked at areas of marshy ground. The purpose was to try and identify<br />
if sections of waterlogged surface area had unique values which could be<br />
identified in a small area polygon. The areas used for the study were captured<br />
examples adjoining a lake –they had been field revised and identified specifically<br />
as marsh within an enclosed polygon. The boundary on one side was the edge of<br />
the lake, while the boundary on the other side was the border with pasture<br />
enclosed with a notional (mapping) line. The samples (three in total) can be<br />
assumed to be typical of marsh (due to the position within the target area) but<br />
there was a small amount of variation, around 5%, between results for the<br />
different colour bands. This was as expected and the values were very close to the<br />
image average. This suggests that areas of marsh would be difficult to detect using<br />
a mean pixel/ standard deviation analysis based on area polygons alone, and<br />
additional coding data taken from the original vector mapping is required to make<br />
an accurate prediction as to the probability of marshy ground being present.<br />
49
Marsh Sample 1 Mean Pixel Value Standard Deviation<br />
Red 110.968 10.557<br />
Green 135.143 12.497<br />
Blue 102.401 13.700<br />
Marsh Sample 2 Mean Pixel Value Standard Deviation<br />
Red 108.672 16.799<br />
Green 132.305 15.717<br />
Blue 93.646 14.9<br />
Marsh Sample 3 Mean Pixel Value Standard Deviation<br />
Red 104.652 8.66<br />
Green 123.725 10.028<br />
Blue 94.018 13.032<br />
Table 7: Marsh sample values<br />
One factor which might help to differentiate between areas of marsh and the<br />
overall mean is the fact that the standard deviation was less than half of the overall<br />
for the red and green colour bands in all three samples. This is due to the fact that<br />
although there is variety in the spectral values for the vegetation present; the area<br />
is uniformly covered by vegetation. This difference could become useful when<br />
removing known features from a polygon; in other words taking water and the<br />
built environment from an area polygon, and analyzing to see if the spectral values<br />
displayed similar levels of deviation on the red and green colour bands.<br />
Marsh areas are generally indicated by a symbol which signifies the presence of<br />
this type of ground cover extending to the next logical boundary (or the notional<br />
mapping boundary mentioned above). The boundaries in the vector data do not<br />
contain the level marsh so a method to identify the areas from the relatively<br />
narrow red colour band identified above could provide an automated way of<br />
determining the extent of marsh lands based on existing data. This is one of the<br />
areas of the study which does not lend itself to a software solution and relates to<br />
the question of “fuzzy data” and how to incorporate it into a digital environment.<br />
Without digressing onto a tangent outside the scope of this thesis it needs to be<br />
mentioned, in the context of this part of the study, that certain features of this<br />
50
planet will always have fluid boundaries. In this example the change from marsh<br />
to rough pasture is a gradual one, and does not correspond to a single vector.<br />
Various solutions such as an additional transitional polygon instead of a linear<br />
boundary are simply methods of belting the square peg of a gradual change into a<br />
relational database. Although the study is attempting to identify statistical data<br />
(percentages of types of land cover) that can be appended to the entry for a given<br />
area polygon in a spatial database in this case a bitmap displaying concentrations<br />
of values corresponding to marsh might be more appropriate.<br />
When the values obtained from marsh were compared to three sample sections<br />
from other polygons in the study some unique proportional variations emerged.<br />
The purpose of the study is to iteratively reduce the quantity of unknown (in terms<br />
of land usage) polygons in the search area by appending values derived from the<br />
aerial photograph. The sample areas used in this test were not from any of the<br />
original samples used to obtain baseline spectral data for the land type they<br />
represent but were chosen to see if a reliable (or at least significant) proportional<br />
deviation could be observed. The three sections sampled were pasture (which had<br />
been recently cut), mixed forestry (chosen because of the variety of spectral values<br />
that this type of cover represents) and paving (taken from a yard surrounding<br />
buildings but similar to any of the road and track hard surface areas sampled<br />
elsewhere in this study).<br />
All three sample areas shower unique properties consistent with the sampling used<br />
for their respective baseline values but also useful in terms of obtaining a key for<br />
identifying polygons of marsh. As was mentioned above these will generally fall<br />
within a polygon composed of vector polylines but it can occasionally be the case<br />
where the marsh was not fully enclosed. It may be necessary to introduce a<br />
process that retains all the polygons containing marsh symbols but displaying<br />
spectral values outside those expected for that type of land cover for verification –<br />
this, however, was not the case for the samples used in the study.<br />
51
Marsh test Sample 1<br />
(Pasture)<br />
Mean pixel value Standard Deviation<br />
Red 201.617 8.05<br />
Green 209.713 9.569<br />
Blue 136.645 12.082<br />
Marsh test Sample 2<br />
(Mixed Forestry)<br />
Mean pixel value Standard Deviation<br />
Red 69.22 34.492<br />
Green 103.352 33.620<br />
Blue 86.594 19.885<br />
Marsh test Sample 1<br />
(Paving)<br />
Mean pixel value Standard Deviation<br />
Red 246.167 8.1<br />
Green 252.542 5.4<br />
Blue 206.125 8<br />
Table 8: Marsh test sample values<br />
The first sample, taken from freshly cut pasture, produced the relatively high<br />
values for the red and green colour bands that were found in the pasture testing<br />
samples for that type of ground colour. In relation to the values for marsh they<br />
showed a high level of disparity; with the mean red colour band pixel value for<br />
marsh being half of the pasture sample, and the green value for marsh being 60%<br />
of the test sample. The disparity in the blue colour band was less but this aspect of<br />
the spectral values could be used to relate the disparities found as specific to<br />
pasture, so that an examination of neighbouring polygon could use a known marsh<br />
area (presence of symbol and expected spectral values) as a reference to set the<br />
relative differences and possibly reset the marsh values to within the values for the<br />
known polygon for that particular areas.<br />
The last suggestion will not be included in this study bit it is worth noting that an<br />
algorithm which could constantly recalibrate the relative values as it processed<br />
neighbouring polygons might produce better results than one dependant on a key<br />
set during the beginning of the processing.<br />
52
The second test sample looked at an area of mixed forestry for deviation (in terms<br />
of mean pixel values across the colour bands) from those found to be present in<br />
areas of marsh. The values had a relatively unique variation from marsh in that<br />
while both the red and green colour band mean pixel values were a lot lower (35%<br />
and 20% respectively) the blue colour band had a comparable level of standard<br />
deviation of a mean which was within 10% of marsh, although this could be<br />
attributed to the level of shade present in the forestry due to the tree canopy<br />
varying in height across the sample. As is pointed out elsewhere in this study,<br />
areas of mixed forestry did not give reliable enough data to calibrate other surface<br />
areas form, based on spectral values alone. In the case of these types of areas there<br />
is vector data coding present to uniquely identify the forestry, however,<br />
knowledge of an expected proportional difference between the (known) forestry<br />
and an area of marsh is a useful additional factor to include in the algorithm and<br />
might increase the accuracy of any search for these types of areas (or at least help<br />
to eliminate them from a search for other specific properties).<br />
The third test sample took an area of hard cover (paving/ track) from a yard<br />
between agricultural buildings. This type of cover is a part of this study which<br />
revealed the most distinct values and presents a valuable calibration tool for the<br />
algorithm. When compared to this third sample the mean pixel value (converted to<br />
greyscale) for the red colour band found in the marsh samples was only 43% of<br />
the hard cover, while similar disparity was found between the green and blue<br />
mean pixel values in marsh and the green and blue mean pixel values in the hard<br />
cover (with the marsh mean values at only 51% and 46% of the hard cover<br />
respectively). These types of areas are well coded in the vector data. Some areas<br />
of hard cover surrounding private dwellings and farm buildings may not be<br />
captured and the automated identification of these types of areas through aerial<br />
image processing is one of the aims of this thesis. It can be assumed, however,<br />
that for any given area (excepting rural mapping covering mountains, which this<br />
study is not addressing) the road polygons have been accurately captured and<br />
there will be several sample polygons to calibrate a hard cover value from. In<br />
terms of this part if the study, the relative proportional deviation of marsh values<br />
53
form both pasture and road/ paving allow a key for its identification to be<br />
developed.<br />
54
3.4 Coniferous Forestry<br />
This part of the study used unprocessed sections of coniferous (commercially<br />
planted) forestry to evaluate a deviation from the average for the image that might<br />
indicate this type of ground cover. It took seven samples from a total of five areas<br />
of this type of this type of forestry in the image. These were then analyzed in<br />
terms of mean values through the red green and blue colour bands to determine if<br />
there was any unique deviation from other features. As would be expected for this<br />
type of ground cover the values for red (and near infra red) were lower due to<br />
colour being absorbed by the foliage. This gives a useful indicator for this type of<br />
ground cover and provides a comparative value that the target polygons of the<br />
study can be compared against. It should be noted that there is also the potential to<br />
use a pattern recognition algorithm to accompany any specific search for this type<br />
of forestry as uniform rows are a feature of this type of surface cover. The<br />
statistics for the survey are in the table below –it should also be noted that the<br />
standard deviation remained relatively consistent for each sample area.<br />
55
Forest Sample 1 Mean pixel value Standard deviation<br />
Red 96.1975 19.943<br />
Green 125.339 18.6976<br />
Blue 98.4551 18.6976<br />
Forest Sample 2 Mean pixel value Standard deviation<br />
Red 87.0853 21.8368<br />
Green 125.905 23.8361<br />
Blue 103.376 17.2807<br />
Forest Sample 3 Mean pixel value Standard deviation<br />
Red 97.484 20.715<br />
Green 137.59 21.077<br />
Blue 109.5 14.9791<br />
Forest Sample 4 Mean pixel value Standard deviation<br />
Red 73.072 20.762<br />
Green 112.015 22.976<br />
Blue 96.784 16.405<br />
Forest Sample 5 Mean pixel value Standard deviation<br />
Red 76.670 23.651<br />
Green 111.181 24.107<br />
Blue 94.424 16.962<br />
Forest Sample 6 Mean pixel value Standard deviation<br />
Red 75.534 24.507<br />
Green 113.943 27.153<br />
Blue 97.851 17.693<br />
Forest Sample 7 Mean pixel value Standard deviation<br />
Red 72.424 24.194<br />
Green 111.688 27.628<br />
Blue 96.596 18.1686<br />
Table 9: Coniferous forestry sample values<br />
An indicator for this type of land cover (as with rough pasture) is the larger scale<br />
value for the blue colour band when compared to the red. This was not the case for<br />
the image as a whole where the red band produced a mean almost 10% higher than<br />
56
lue. Another indicator present in all the samples was the fact that there was an<br />
increase in disparity between the red and green colour bands -18.2% for the image<br />
as a whole, compared to almost 30% across the samples. It is the red band which<br />
is the most valuable indicator of this type of ground cover, with over 25% lower<br />
value from the image mean. While coniferous will have been captured and<br />
indicated as a level in the OSI vector data, smaller areas of this type of ground<br />
cover will be typically present along the margins of small area polygons close to<br />
urban areas. In particular a polygon closed by what is called a peck in the vector<br />
data layers could be analyzed for the mean red colour pixel value converted to<br />
greyscale and compared to the image whole. If the variation is close to 25% lower,<br />
then it is probable that either this type of tree cover is present. It should be noted<br />
that further spectral analysis (involving swapping of the colour bands) can present<br />
additional indicators, which will be applied later in the study.<br />
There are several implications of being able to identify coniferous vegetation in an<br />
urban area; it indicates permeable surface area for planning/ flood modelling. In<br />
the context of this thesis it allows a section within the study area to be identified<br />
and adjoining areas to be measured against; for example, once the presence of<br />
coniferous vegetation is detected image processing could be applied to eliminate<br />
this from the result set from the target polygon –allowing another analysis to be<br />
run on the remaining surface area.<br />
57
Coniferous forestry test sample 1 (tree/<br />
shade/ pasture mix)<br />
58<br />
Mean pixel<br />
value<br />
Standard<br />
deviation<br />
Red 81.824 35.235<br />
Green 118.246 34.475<br />
Blue 92.287 17.189<br />
Coniferous forestry test sample 2 (pasture) Mean pixel<br />
value<br />
Standard<br />
deviation<br />
Red 122.813 9.725<br />
Green 174.274 12.991<br />
Blue 105.527 12.604<br />
Coniferous forestry test sample 1 (bog) Mean pixel<br />
value<br />
Standard<br />
deviation<br />
Red 115.672 9.270<br />
Green 133.684 9.933<br />
Blue 105.059 11.881<br />
Table 10: Coniferous forestry test sample values<br />
Three sample areas were chosen to match the data from the coniferous sample<br />
areas against. The first of these does not conform to the vector polygons against<br />
which the proposed algorithm operated, but was chosen for its mix of ground<br />
cover so as to provide a worst possible combination against coniferous values. In<br />
other words the distinguishing features for coniferous (outside the vector coding,<br />
this sampling was only to test the values relative to samples around the image) of<br />
high levels of standard deviation in the red and green colour bands would not be a<br />
useful comparative feature as the sample contained a variety of ground cover. The<br />
mean and standard deviation alone did not provide strong indicators of the ground<br />
cover type but the sample did demonstrate the usefulness of histogram data. The<br />
pixel count for the red and green colour bands displayed two clear spikes when<br />
presented as a histogram, corresponding to the expected values for both pasture<br />
and shade. This presents the possibility of determining a relative proportional (to<br />
the polygon size) pixel count flag which would indicate the percentage of land<br />
type within an area of mixed use. As was mentioned at the introduction the basis<br />
of this study is the referencing of areas within the aerial imagery by small vector
polygons of uniform ground type. Further analysis of these polygons can be done<br />
once initial categorizations have been made and the level of analysis increased. To<br />
accurately determine the correct pixel proportion to flag requires a larger sample<br />
than being used in this study, which could be obtained from the data sets of<br />
polygons requiring further analysis returned from prolonged use of the algorithm.<br />
In other words this is something which would be developed later in the image<br />
analysis cycle because the nature of the sample (deliberately crossing outside the<br />
search polygons) makes it unlikely to be a feature of these types of images.<br />
The second sampling area involved taking a section of pasture for comparison<br />
with the coniferous samples. The values differed with an increase of<br />
approximately 40% for the mean of the red and green colour bands with a<br />
standard deviation 50% reduced on those found in coniferous forestry. This data is<br />
another useful reference in the identification of pasture as coniferous areas are<br />
coded and outlined in the vector data so can be automatically fed into a reference<br />
table during image analysis. As mentioned throughout this part of the study, the<br />
correct identification (and elimination) of areas of pasture from the image analysis<br />
is essential for the success of the suggested algorithm. Using spectral analysis for<br />
aerial image analysis (and remote sensing in general) is a specialized field of<br />
knowledge and studies tend to focus on a particular study area (Such as the<br />
analysis and classification undertaken by Coredo-Sancho and Adler in 2007).<br />
The focus of this thesis is to create a generic method for image analysis which<br />
makes use of captured vector data to filter the image, reduce the study area, and<br />
narrow the range of pixel variations that can be analyzed. In this way the study has<br />
focused on finding an algorithm that can be coded into an easy to use solution for<br />
this type of research. By necessity the current mapping methods are labour<br />
intensive and resources are not available to capture the type of secondary data that<br />
might be gained from automatic image analysis. In addition to this specific<br />
research (such as an inventory of impermeable surface in a region) could require<br />
specialist skills and methods –the algorithm suggested here is aimed to allow a<br />
user to filter through the current data by including their required target area into<br />
the process. For this reason the sampling has been generic (in that the image as a<br />
59
whole was not analyzed but individual sections representative of a specific land<br />
cover type were selected).<br />
The third sampling area was a section of bog, which was chosen because it is<br />
often found close to coniferous forestry in the Irish landscape. The ability to use<br />
coniferous as a reference when analyzing the image for the presence of bog means<br />
for most studies there will be an adjacent source of analysis to base the search key<br />
on. The mean pixel value for the red colour band was similar to the mean pixel<br />
value for the red colour band in pasture but the green value was notably different<br />
(to the green colour band in pasture). The green colour band value for bog was<br />
just under 20% larger than the value found in the coniferous samples; indicating<br />
that a 40% increase in the red colour band and a 20% increase in the green colour<br />
band mean values, coupled with a 50% reduction in the standard deviation for<br />
areas close to the known coniferous figures has a high probability of being an area<br />
of bog. Once this variation is cross checked with other known values in the area<br />
(water, road and buildings/ roofs), and cross checked with the secondary derived<br />
values identified in the algorithm (pasture etc.) areas of bog can be automatically<br />
quantified.<br />
60
3.5 Mixed Forestry<br />
Figure 8: Typical Mixed Forestry Area Image<br />
This part of the study looks at the spectral values for areas of mixed forestry. It is<br />
not concerned with finding a set of unique attributes which would uniquely<br />
identify this type of land cover from an aerial photograph, but is intended to<br />
investigate if values corresponding to this type of cover could be separated from<br />
those of rough pasture.<br />
One reason for attempting to differentiate between areas of mixed forestry and<br />
rough pasture is the age of the surface cover. Mixed forestry generally includes<br />
sections of native woodland, which is slow growing and can be assumed to be an<br />
area capable of supporting wildlife (it is also less prone to change than rough<br />
pasture; due to the difficulty in obtaining permission to clear this type of<br />
woodland). Any study looking at the wildlife corridors across the country would<br />
benefit from an automatic method of distinguishing smaller linear sections of this<br />
type of ground cover (along hedges etc.) from other land use types. As is<br />
evidenced below by the similarity between the results of this to those from an<br />
analysis of rough pasture this remains difficult to do. There is also not much scope<br />
for pattern recognition algorithms to be used in the detection of isolated sections<br />
of mixed forestry (outside those captured by conventional mapping) because of<br />
the seemingly random nature of the shade patterns. Note: An obvious solution is<br />
to fly the same areas at different times of the year and compare the red and near<br />
61
infrared signatures to identify the presence of natural deciduous species at the<br />
borders of area polygons but this would be prohibitively costly and beyond the<br />
budget of any potential environmental survey. A cheaper solution might become<br />
possible in future by identifying patterns in lidar data at polygon boundaries –the<br />
focus of this study, however, is on spectral values and although unique in<br />
comparison to the image average –the sample returned values very close to those<br />
of rough pasture.<br />
Areas of this type of surface cover, comprising of a mixture of coniferous and<br />
natural woodland are already captured in Ireland and the study took a section of<br />
land from one of these (present in the study area) and compared the spectral<br />
values to those of the image as a whole.<br />
Mixed Forestry Sample Mean Pixel Value Standard Deviation<br />
Red 70.246 33.337<br />
Green 104.33 32.726<br />
Blue 85.974 19.626<br />
Table 11: Mixed forestry sample values<br />
As was expected the results showed a similar disparity with the values for the<br />
image as a whole to rough pasture. As with rough pasture the standard deviation in<br />
pixel values for the red and green colour bands was high, with a similar large<br />
difference in values for those bands (37% and 23% lower respectively). This is<br />
also an indication that areas of rough pasture can contain similar coverage to<br />
mixed forestry –in that rough pasture is often overgrown and contains some tree<br />
cover. Once areas without buildings and roads with the comparative variation<br />
between the whole image and sample polygons which match those above (and in<br />
the rough pasture survey) it should be possible to apply the rough pasture attribute.<br />
The known mixed forestry polygon set (which is taken from the vector data<br />
coding) can then be subtracted from this to give a percentage of rough pasture for<br />
a target area.<br />
62
Mixed Forestry comparative sample 1<br />
(bog)<br />
63<br />
Mean Pixel<br />
Value<br />
Standard<br />
Deviation<br />
Red 110.761 6.429<br />
Green 121.958 8.253<br />
Blue 103.929 11.605<br />
Mixed Forestry comparative sample 2<br />
(pasture)<br />
Mean Pixel<br />
Value<br />
Standard<br />
Deviation<br />
Red 128.801 10.469<br />
Green 165.843 9.496<br />
Blue 100.921 12.367<br />
Mixed Forestry comparative sample 3<br />
(cut pasture)<br />
Mean Pixel<br />
Value<br />
Standard<br />
Deviation<br />
Red 220.074 7.623<br />
Green 209.855 9.249<br />
Blue 137.92 11.493<br />
Table 12: Mixed forestry test sample values<br />
The mixed forestry part of the sample was compared against three sample areas to<br />
add to the proportional deviations in the algorithm. The aim is to achieve a high<br />
enough level of relative values in the image to establish the composition of every<br />
polygon. The polygon extraction is made difficult by the fact that cropped small<br />
area polygons have irregular shape –the analysis of which was done against a<br />
blank background –resulting in altered standard deviation values for these samples.<br />
In the case of the three sample areas for this type of forestry the samples were<br />
rectangular areas within the uniform types used for the comparison.<br />
The first sample type used for comparison was an area of bog. This was chosen as<br />
it often occurs close to areas of mixed forestry and rough pasture (which by nature<br />
of the terrain have not been turned over to pasture). The values contained in the<br />
sample were similar (relatively) to coniferous forestry, but differed in the red<br />
colour band with almost a 35% higher mean and a notable low level of standard<br />
deviation. This low level of standard deviation was found in all three of the colour<br />
bands, and could be used to differentiate between the two types of forestry
sampled in this study. In relation to this, a sliding scale of standard deviation for<br />
pixels in the red colour band between bog, coniferous forestry and mixed forestry<br />
is evident –from under 10 values for the area of bog, averaging at 20 for areas of<br />
coniferous forestry and over 30 for areas of mixed forestry. Matching these to the<br />
mean could give a useful relative indication for automatic identification of bog<br />
present in an area. At this point it is probably useful to comment on the nature of<br />
the vector data for areas of bog. These types of areas are generally bounded by<br />
polylines (though these are not coded), it could be the case that the same polygon<br />
contains an area of bog and rough pasture (similar to mixed forestry); being able<br />
to estimate the relative values for this transitional analysis based on the above data<br />
is useful in such cases. If an area polygon bordering one or more areas containing<br />
pixel values contains values close to those of mixed forestry but with a low<br />
standard deviation it is probable that the polygon is describing this type of<br />
transitional area.<br />
The second type was a pasture sample taken for comparison as it is the most likely<br />
neighbouring polygon to an area of mixed forestry and is common to most (rural/<br />
semi urban) areas in the country. This sample of pasture is separate from others<br />
used in this thesis but returned similar values (as expected). The most notable<br />
difference was in the level of standard deviation present in the sample area, with<br />
particular respect to the red and green colour bands. The mean pixel values for<br />
these colour bands also showed a distinct relative difference (over 45% and over<br />
35% for the red and green bands respectively). This variation in values is a useful<br />
indication of which type of ground colour pixel values belong to. In particular it<br />
could serve as one of the primary steps in the algorithm. It is useful to begin the<br />
analysis by eliminating known values from the search so as the target comparison<br />
list (ground cover types to pixel values) is smaller and has a higher chance of<br />
being successful. One further aspect that can be included in any automated<br />
analysis looking for areas of bog is the size of the polygons in the search. Most<br />
large polygons will be assigned a value based on the input vector coding. They<br />
will generally represent known parts of the image such as plots of forestry and<br />
water parcels. The remaining large polygons will (in the context of the Irish<br />
landscape being analyzed) most probably by areas of bog. This can further be<br />
refined by eliminating areas of flat rock as islands within the large polygons (these<br />
64
island areas are present in the vector data). In this way large polygons (with<br />
islands eliminated) matching the deviation from expected mixed forestry values<br />
outlined above can be assumed to be representative of bog. There is probably<br />
some scope for the use of pattern analysis to further refine this search. The<br />
uniform (low standard deviation) values displayed by the bog sample suggest that<br />
it may be possible, once the areas are identified in the automatic search algorithm<br />
suggested in this study, to examine these for rows and machinery tracks to<br />
automatically calculate the level of cutting taking place.<br />
The third sample type taken for comparative analysis with mixed forestry was an<br />
area of cut pasture. This was taken as a control to ensure the previous two samples<br />
matched expected values (in terms of relative percentages) found in the other<br />
survey areas of the image. The high mean pixel values (and low level of standard<br />
deviation) matched expected results in that both the red and green colour bands<br />
showed markedly higher values (see table above). This type of pasture will not be<br />
factored into a proportional value check against mixed forestry in the algorithm as<br />
it can be referenced against the two stable control value sets present in water and<br />
roofs. It should be noted that this particular area type is dependant on the time of<br />
year and weather conditions prior to the time the aerial imagery was flown and is<br />
something that must be included in a second or higher loop of the algorithm.<br />
65
3.6 Track<br />
Figure 9: Typical Track Area Image<br />
This part of the study looks at values for track –corresponding to unpaved or<br />
gravel access roads and takes six sample areas for a comparison of spectral values.<br />
The purpose of the study was to see if there was a way of distinguishing between<br />
the spectral values for these type of roads and paved/ tarred roads (NRA category<br />
four upwards). Roads have unique spectral values in terms of an increase in mean<br />
pixel value of close to 30% for the three colour bands. This in itself is not<br />
particularly to an automated image analysis using OSI vector data as a baseline as<br />
the road network has been captured and is updated, however, if areas of paving or<br />
hard cover (similar to track) could be shown to have similar unique properties it<br />
might be possible to detect the presence of impermeable surface in recently<br />
developed suburban areas. I am conscious that the above explanation is long<br />
winded so the following example might explain things a bit better. A recently<br />
developed suburban area is experiencing problems with flooding and runoff –at<br />
present the mapping captures the water courses, buildings, roads and property<br />
boundaries (and street furniture/ utility details etc.) but does not indicate the extent<br />
of paving and patios within the individual plots; a survey filtering pasture using its<br />
spectral signature and known buildings, tarred road and footpath using vector data<br />
66
and comparing the remainder to expected spectral values for hard cover could<br />
return this value. The results from the six samples are in the table below:<br />
Track sample 1 Mean Pixel Value Standard Deviation<br />
Red 178.5 14.02<br />
Green 194 15.231<br />
Blue 146.5 19<br />
Track sample 2 Mean Pixel Value Standard Deviation<br />
Red 164 23.013<br />
Green 193.833 14.729<br />
Blue 147.5 15.149<br />
Track sample 3 Mean Pixel Value Standard Deviation<br />
Red 196.75 10.511<br />
Green 219.875 12.240<br />
Blue 145 16.639<br />
Track sample 4 Mean Pixel Value Standard Deviation<br />
Red 193.8 193.8<br />
Green 207.067 207.067<br />
Blue 153.667 153.667<br />
Track sample 5 Mean Pixel Value Standard Deviation<br />
Red 190.833 15.014<br />
Green 208.667 11.089<br />
Blue 148.5 10.518<br />
Track sample 6 Mean Pixel Value Standard Deviation<br />
Red 190.5 2.738<br />
Green 217.167 7.574<br />
Blue 138 5.44<br />
Table 13: Track sample values<br />
While the above samples outline unique pixel signatures when compared to the<br />
entire image (40% above the mean for red, 35% for green and 30% for the blue<br />
colour bands) the difference between these samples and those returned for the<br />
standard road network is not significant. This does not necessarily make it difficult<br />
to distinguish between standard tarred road and other types of impermeable<br />
67
surface area that might have been introduced to the landscape. The road network<br />
is present in the vector dataset so subtracting this (along with the other known<br />
polygons such as water, buildings etc.) from the means that polygons with a high<br />
number of pixels corresponding to these values would indicate that the surface<br />
contains an area of cover similar to track or road (hard impermeable cover).<br />
This unique value has potential to increase the accuracy of flood mapping and<br />
prediction but requires additional processing to ensure that the high values are the<br />
result of permeable surface area. This could take place within an automated<br />
software process by swapping the colour bands to increase the difference between<br />
these areas and areas of vegetation.<br />
The purpose for sampling the areas of track was to see if there could be any means<br />
of determining if an area surrounding a private dwelling had been paved, or if<br />
there were any paved yards/ areas of hard cover present in other semi urban<br />
polygons. The test sampling took three areas to compare the values for track<br />
against; an unpaved dirt track, an area of compacted gravel yard and an area of<br />
paved yard. With the exception of the blue colour band the results were similar to<br />
those from cut pasture. These can be discerned from cut pasture by setting the<br />
search algorithm to look at the mean pixel value for the blue colour band, which<br />
was 30% less for both the paved yard and dirt track. The values returned for the<br />
yard of compacted dirt and gravel were similar to cut pasture but were within a<br />
small polygon containing several roofed buildings. The algorithm can therefore be<br />
set to accept values similar to cut pasture for small area polygons containing a<br />
number of roofed buildings as gravel/ dirt hard cover. In the particular case of this<br />
sample the results obtained are most likely due to pigment in the gravel biasing<br />
the sample.<br />
68
Track test sample 1<br />
(unpaved dirt track)<br />
Mean pixel value Standard deviation<br />
Red 218.917 11.378<br />
Green 225.708 13.658<br />
Blue 161.583 10.1<br />
Track test sample 2<br />
(compacted dirt/ gravel)<br />
Mean pixel value Standard deviation<br />
Red 178.315 8.713<br />
Green 195.648 10.802<br />
Blue 131.056 16.122<br />
Track test sample 3<br />
(paved yard)<br />
Mean pixel value Standard deviation<br />
Red 174.278 8.77<br />
Green 200.722 6.257<br />
Blue 171.778 7.075<br />
Table 14: Track test sample values<br />
Taking these values for a larger area would be a difficult task but the fact that the<br />
small area polygons derived from the vector mapping cut apart the image means<br />
that greater levels of information can be derived fro the same set of values in<br />
polygons with different associated coding. The initial sampling displayed values<br />
for pasture in small area land parcels surrounding dwellings matching those of the<br />
mean outside cut pasture. From this it can be inferred that a small area polygon<br />
surrounding a dwelling which displays spectral values similar to cut pasture could<br />
potentially be gravelled and the algorithm would then run a specific analysis on<br />
the values for the blue colour band. For the third sample area, the paved yard, the<br />
values matched those expected for paved covering and again these values would<br />
indicate the high probability of hard cover (patio/ concrete etc.) when present in a<br />
small area polygon surrounding a dwelling.<br />
The identification of track has been the subject of a large amount of work which<br />
looked at pattern recognition software which might extract the network of roads<br />
based on pattern recognition and the unique spectral values for this type of feature<br />
69
(Phynn et al, 2002). In this study all roads and tracks have been captured and the<br />
purpose of analyzing the spectral qualities of these is in an effort to train an<br />
algorithm to recognise the specific properties of hard ground/ impermeable<br />
surface area within small area polygons. The results revealed distinctive qualities<br />
thanks mostly to the high reflective quality of these types of surface for red and<br />
green colour bands.<br />
As is mentioned in other sections of this study, the initial part of the algorithm<br />
involves extracting the roads (and water, and forestry, known marsh etc.) to leave<br />
a smaller number of polygons for analysis. The next step would be the removal<br />
(classification) of areas with relatively unique spectral values such as all the<br />
pasture polygons. Following this, the urban polygons would be analyzed, having<br />
been identified by the presence of building polygons within them –these would<br />
then be classified according to the nature of the building (as this data is only<br />
available in some instances the first iteration of the loop would include all<br />
buildings). The nature of the spectral values would then be compared to the values<br />
sampled here, as the low level of standard deviation among the colour bands<br />
associated with them enables classification with a degree of certainty.<br />
This type of survey has particular benefit in flood mapping and can help the<br />
development of models factoring in runoff rates during times of high rainfall. The<br />
sample area used here is just outside an urban area and as such has a useful mix of<br />
all the possible land cover types –ranging from dirt track to paved yards to<br />
forestry to pasture to dwelling houses within small polygons. Specific searches of<br />
urban developments could expect to find more homogenous ranges within each<br />
polygon. The values taken from this sampling could serve as the baseline for one<br />
of these surveys. The user would then select known areas of the target land cover<br />
being analyzed (via a mobile GPS unit or selected from the vector mapping draped<br />
over the aerial photography). A combination of the standard expected values and<br />
the entered key values could then be used to calculate the percentages across a<br />
wide area. The fact that for each area analysis the pixel variations are confined to<br />
a small area polygon means that there is less chance of a gradual distortion biasing<br />
the results, as each separate polygon is calculated based on its own features (i.e.<br />
the values of neighbouring polygons and presence of buildings mentioned above).<br />
70
It should be noted that the samples used in the above section of the study covered<br />
small areas relative those used for forestry, pasture, water etc. This is because<br />
paved or hard ground was only a small part of the sample area. It is unlikely,<br />
however, that a larger sample of hard ground would have revealed any different<br />
results; firstly because the sampling was done over a relatively wide geographical<br />
area and secondly because large expanses of paved areas are rare enough to be<br />
considered an anomaly in the Irish landscape, which would in any case could be<br />
flagged by the search algorithm (by setting a maximum expected area for values<br />
matching hard cover).<br />
71
3.7 Shade<br />
Figure 10: Typical Shade Area Image<br />
The purpose of this part of the study is to see if there are any unique spectral<br />
qualities from in areas of shade which would allow them to be eliminated from an<br />
examination of a given polygon. There are a number of ways to prevent shade<br />
from distorting the results of a spectral analysis of these types of polygon. The<br />
photography and vector data could be imported into a geographic information<br />
system capable of manipulating vector data. A number of control points could be<br />
taken matching the edge of areas of shade to vertices in the vector data. The vector<br />
dataset could be then transformed/ moved to match the control points and the<br />
offset calculated and eliminated from the original polygons so as a subsequent<br />
spectral analysis would focus on areas outside shade. While this would probably<br />
be the most accurate method it is difficult to automate as selection of control<br />
points requires human input (a random selection would skew results and there is<br />
too much variety in coding and polyline length and shapes to set rules).<br />
A second method might be to identify unique spectral values for shade and<br />
introduce a process to subtract them from the polygon pixel set results to leave<br />
only values which can be identified. This is what is being attempted here and the<br />
results below reveal a similar signature to that of water in the image. Since all<br />
water bodies have been captured it could be possible to subtract these lower pixel<br />
values from a polygon and analyze the remainder.<br />
72
Shade Sample 1 Mean Pixel Value Standard Deviation<br />
Red 33.9264 5.428<br />
Green 71.9221 7.446<br />
Blue 80.2165 14.322<br />
Shade Sample 2 Mean Pixel Value Standard Deviation<br />
Red 41 5.577<br />
Green 81.361 8.077<br />
Blue 78.381 11.5439<br />
Shade Sample 3 Mean Pixel Value Standard Deviation<br />
Red 42.813 6.943<br />
Green 83.186 7.878<br />
Blue 84.505 12.848<br />
Shade Sample 4 Mean Pixel Value Standard Deviation<br />
Red 45.333 15<br />
Green 85.854 17.921<br />
Blue 85.75 17.118<br />
Table 15: Shade sample values<br />
This is something which has been attempted in previous studies (Martin et al,<br />
1998), however, as with the thesis in general, I believe a hybrid method<br />
incorporating aspects of vector mapping and spectral analysis produces the best<br />
results. This is because in certain situations removing shade based on pixel values<br />
alone might bias the analysis (mixed forestry could produce a signature similar to<br />
marsh once lower pixel values are removed). A better method might be to<br />
manually take four or five samples per square kilometre and determine a mean<br />
percentage of shade based on areas of shade within those samples –this can be<br />
done relatively quickly using any vector manipulation software and the sample<br />
areas recorded for reference. Any further automated analysis would not eliminate<br />
pixels matching the shade signature above the sample percentages.<br />
Note: In future the addition of lidar data might allow the calculation of shade<br />
based on the height of the irregular tree canopy/ roof pitches etc. to be calculated<br />
and incorporated into the automated image analysis algorithm. This data is not<br />
73
currently available for all areas and has not been included in the study. It should<br />
be noted that its addition has the potential to improve the accuracy of this type of<br />
study.<br />
A couple of sample sections of the image were selected with varying degrees of<br />
shade present in the image. This was to determine if it is possible to introduce a<br />
step into the algorithm which would correct for areas of shade in otherwise<br />
uniform closed polygons. The first sample involved selecting an area of pasture<br />
which contained a large amount of shaded ground from some high trees along one<br />
of the border vector boundaries i.e., along the hedge. The area of shade<br />
corresponded to roughly half the area of the test polygon. It should be noted that a<br />
degree of shade will be present in all of the orthophotos to which this study<br />
applies; this is because aerial survey takes place in bright sunshine by necessity.<br />
The sample was analyzed for mean pixel values along the colour bands, and also<br />
the standard deviation displayed.<br />
Shade test sample 1<br />
(pasture with app. 50% shade)<br />
Mean pixel value Standard deviation<br />
Red 88.457 37.387<br />
Green 135.151 40.01<br />
Blue 100.616 20.386<br />
Table 16: Shade test sample value 1<br />
The results for this were as might be expected; with an increase in the standard<br />
deviation for the first red and green colour bands reflecting the variety of tone<br />
present in the sample. This, however, does not give a complete representation of<br />
the data and a histogram (see below), reveals peaks for the shade value, and a peak<br />
for the pasture value in both the red and green colour bands. These two peaks<br />
indicate that the area is pasture, and that the high pixel count for the lower values<br />
in the red and green colour bands is a result of shade.<br />
74
Figure 11: Histogram for Shade and Pasture<br />
There are two factors for inclusion in the automated search algorithm that can be<br />
taken from this; firstly a reading of high levels of standard deviation from the<br />
mean value for the red and green colour bands for a given polygon are a strong<br />
indication that the analysis might be distorted by shade. Secondly if a histogram of<br />
the pixel count for the red and green colour bands shows a peak at the mean for<br />
shade, and at the mean for pasture then the area is pasture (given that the other<br />
probable cause, water, has already been identified through vector mapping).<br />
The algorithm could account for this by taking a sample of neighbouring polygons<br />
–if these contain areas of pasture then the standard deviation is flagged. If the<br />
standard deviation is above 30 values on the greyscale then the polygon is flagged<br />
and a histogram appended for further processing. The resulting set of these types<br />
of polygons would be returned with the result set from the analysis so as the user<br />
could accept or reject the anomaly as the result of shade (in terms of large<br />
amounts of standard deviation for the red and green colour bands in a small area<br />
polygon).<br />
The second test sample for shade took an area of pasture with a smaller percentage<br />
of shade present for comparison with the areas of shade. The values for the sample<br />
matched those expected for pasture with the exception of a higher level of<br />
standard deviation in the red and green colour bands. These could again be<br />
flagged once a level of standard deviation above that expected for pasture, and the<br />
75
trends of bordering polygons mentioned above was detected and included in the<br />
flagged polygon set, where the histogram could be analyzed.<br />
Shade test sample 2<br />
(Pasture containing small areas of<br />
shade)<br />
76<br />
Mean pixel<br />
value<br />
Standard<br />
deviation<br />
Red 136.056 23.905<br />
Green 179.235 25.065<br />
Blue 114.285 14.684<br />
Table 17: Shade test sample value 2<br />
Although the mean pixel values matched those expected for pasture across the<br />
three colour bands the high level of standard deviation, suggests further<br />
examination could be necessary. A histogram representation shows the relatively<br />
higher pixel count across the values associated with shade but displays a clear<br />
peak for pasture. From the evidence of these samples it would seem safe to set the<br />
tolerance for standard deviation in both the red and green colour bands to a figure<br />
between 25 and 30; where any polygons exceeding this (even if they are bordering<br />
pasture polygons) would be flagged and exported to an examination set together<br />
with appended histograms.<br />
The final part of the testing took a sample from relatively clear pasture (little or no<br />
shade present) to see if the trend identified in the first two samples would continue.<br />
In other words if a large amount of shade resulted in a high pixel count and peak<br />
at the values for shade, and a smaller amount less so then the trend should show a<br />
simple peak for an area with little or no shade present. While this produced<br />
expected results, it was necessary to confirm the trend.
Shade test sample 3<br />
(Pasture containing<br />
little shade)<br />
Mean pixel value Standard deviation<br />
Red 123.828 8.645<br />
Green 177.68 9.8<br />
Blue 105.261 12.69<br />
Table 18: Shade test sample value 3<br />
The sampling for shade highlighted the need for any automatic aerial image<br />
algorithm to account for levels of standard deviation; and also the necessity to<br />
detect peaks of values within the colour bands. The two peaks for lower and<br />
higher values in the second test sample demonstrated the usefulness of<br />
categorizing the images according to spectral values and is a good example of how,<br />
once the imagery can be broken into its constituent area polygons using vector<br />
mapping, useful data can be derived. The properties of shade in the sample<br />
imagery were uniform, allowing the distortion caused by shade to the spectral<br />
values of the target polygons to be included into image processing. The results of<br />
this part of the study go some way to proving the central premise of this thesis; it<br />
is possible to automatically scan aerial imagery using good quality vector data.<br />
The process of eliminating known areas and flagging borderline values (such as<br />
the distortion in the first test sample) means that the land cover can be examined<br />
through a series of scans until all the surface area is accounted for. This is a<br />
process which could then be adapted for specific projects (flood mapping etc.).<br />
The software process outlined here would act as a template for these projects and<br />
allow the users to input specific search values themselves without relying on<br />
obtaining the data from previous studies.<br />
77
3.8 Roof Areas<br />
Figure 12: Typical Roof Value Area Image<br />
This study involved taking a sample of roof imagery from a test area and<br />
comparing the luminescence values to those of the entire image to determine if a<br />
distinguishing deviation in values existed for these features of the image. The<br />
initial part of the experiment involved taking a sample number of roof values from<br />
the orthophotography and converting them to a format suitable for individual<br />
examination. The study involved a section of rural landscape in south county<br />
Galway (Ordnance survey sheet no. 3012-c).<br />
The aim of this part of the thesis is to try to obtain one of a series of benchmark<br />
control values which can then be applied to the area polygons in determining<br />
relative deviation of pixel values. In other words this section of the study is an<br />
attempt to try to get a unique indicator for roof pixel values to form part of a key<br />
for interpreting statistical values from the polygon being processed. It involved the<br />
use of three image analysis software packages, but can be achieved using the<br />
GDAL library alone (using GDAL_translate).<br />
The first step involved editing the image (aerial orthophoto corresponding to the c<br />
quadrant of OSI #3012-c) using <strong>Open</strong>EV. This involved targeting the areas of<br />
interest from the imagery based on building polygons captured by OSI vector data.<br />
78
The vector data was overlaid on the image and the target areas exported using the<br />
GDAL export tool to create ten separate .tiff files containing roof values.<br />
The entire image was then loaded into the geomatica software and statistical data<br />
for the red, green and blue primary values (converted to greyscale) obtained. This<br />
was so as to obtain a mean value for these within the entire image, which<br />
contained a range of features including forestry, pasture and several water bodies.<br />
The image was not enhanced and no post processing was used in order to obtain<br />
standard baseline values to measure future iterations of the study against.<br />
It should be noted that the sample area contained 76 buildings and this study<br />
concentrated on the south east of the image (which contained the largest number<br />
of buildings). For each building polygon there was a clear division between<br />
sections of shade and light due to the roof pitch. While there is little value in<br />
pursuing this unique aspect of the image features (as buildings are already being<br />
well captured for mapping purposes by traditional photogrammetry and field<br />
survey), this might be useful for automatic key generation. By this I mean that<br />
pattern recognition software could be applied to separate these two areas and the<br />
resultant variations analyzed to determine the roof pitch; this is, however, beyond<br />
the scope of this study.<br />
Below is a screenshot of the buildings present in the image (so as to give an<br />
impression of their dispersal in the study area);<br />
79
Figure 13: Distribution of Buildings/Roofs in the Sample<br />
The ten sample roof values were all taken from the concentration of buildings in<br />
the south east, and the mean values for the three primary values (when converted<br />
to greyscale) were as follows:<br />
Roof Sample 1 Mean pixel value<br />
Red 102.683<br />
Green 130.923<br />
Blue 120.673<br />
Roof Sample 2 Mean pixel value<br />
Red 146.152<br />
Green 168.429<br />
Blue 161.457<br />
80
Roof Sample 3 Mean pixel value<br />
Red 51.0667<br />
Green 79.2<br />
Blue 85.5176<br />
Roof Sample 4 Mean pixel value<br />
Red 145.414<br />
Green 165.429<br />
Blue 161.457<br />
Roof Sample 5 Mean pixel value<br />
Red 95.2596<br />
Green 123.74<br />
Blue 113.106<br />
Roof Sample 6 Mean pixel value<br />
Red 92.416<br />
Green 122.633<br />
Blue 118.177<br />
Roof Sample 7 Mean pixel value<br />
Red 220.818<br />
Green 194<br />
Blue 142.896<br />
Roof Sample 8 Mean pixel value<br />
Red 133.528<br />
Green 124.556<br />
Blue 116.528<br />
Roof Sample 9 Mean pixel value<br />
Red 119.315<br />
Green 148.351<br />
Blue 142.967<br />
Roof Sample 10 Mean pixel value<br />
Red 173.111<br />
Green 187.506<br />
Blue 149.494<br />
Table 19: Roof pixel sample values<br />
81
These mean greyscale pixel values for the image were 111.711 for the red channel,<br />
136.542 for the green and 102.636 for the blue. This meant an average deviation<br />
of 22% in the blue channel for areas of roof (average mean for the roof samples of<br />
130.776 compared to 102.636 for the entire image). In terms of automated image<br />
analysis the variation could help develop a control by training a key against the<br />
known building values.<br />
Figure 14: Blue colour band pixel count for study area<br />
Histogram of mean blue channel pixel values for the c quadrant of ordnance<br />
survey sheet # 3012 –building polygons in this are were found to vary by 20%,<br />
indicating that this variation has the potential to form part of a training algorithm<br />
used in the development of a key for automated image analysis.<br />
It should be noted that image quality and the degree of shade and light can vary<br />
according to the environmental variations present when the image was captured.<br />
This study is not concerned with determining an exact value for the capture of<br />
buildings but is instead looking for variations from the mean values in the entire<br />
image. The aim is to gather enough unique deviations (such as the variance found<br />
in the blue channel above) to narrow the remaining values into categories.<br />
82
An iterative run through a practical application might be:<br />
• Obtain the mean for unique known features (e.g. the blue channel in<br />
buildings).<br />
• Subtract these known features from the area polygon.<br />
• Obtain the mean values for the remaining pixels.<br />
• Compare these values relative to the known features and determine the<br />
most probable land coverage.<br />
Following this initial sampling of roof values, it emerged that the values were not<br />
consistent enough for reference to determine adjacent land use types (though the<br />
fact that the polygons are coded vector polygons means they can be excluded form<br />
pixel value searches of small area polygons. One aspect of the pixel values which<br />
did emerge, and has potential for further analysis, was the clear division and pixel<br />
count peaks for roofs with a pitch. Taken further, these values, in terms of their<br />
relative pitch and matched to the set of proportional values that this study sets out<br />
to establish, the ability to determine roof pitch could be included. This would<br />
necessitate a separate study (over a large quantity of buildings across a wide<br />
geographical area and will not form part of this study; although the algorithm<br />
outlined in the thesis would be an extremely beneficial for such a study.<br />
The sampling for roof values in proportional reference to adjacent areas took three<br />
samples of separate land cover types and the road polygons adjacent to them. The<br />
three areas analyzed were; a section of pasture close to a farm dwelling, a section<br />
of mixed forest close to another dwelling (with several pitches in the roof) and a<br />
section of cut pasture close to a flat roofed shed. As mentioned above the<br />
sampling did not produce any useful data for the algorithm (except to eliminate<br />
the possibility of buildings being used for spectral reference). It did, however,<br />
outline the potential for a separate building analysis based on roof pitches. The<br />
following few paragraphs outline the results of the comparison.<br />
83
Roof test sample 1 Mean pixel value Standard deviation<br />
Red 171.121 25.028<br />
Green 202.818 22.936<br />
Blue 178.773 17.709<br />
Roof test adjacent sample 1<br />
(pasture)<br />
Mean pixel value Standard deviation<br />
Red 113.004 7.169<br />
Green 156.632 8.779<br />
Blue 99.867 11.634<br />
Table 20: Roof test sample value 1<br />
The first sample took an area of pasture close to a dwelling where the pasture gave<br />
expected values for the type of land cover (which is outlined in the pasture<br />
sampling in this section of the thesis). The roof sample showed a relatively high<br />
level of standard deviation across the colour bands (although low by roof value<br />
standards) and the mean colour values returned, although high for a small area,<br />
could not be used to uniquely identify other features. The double peak (of shade<br />
and light) in the red and green colour bands indicated that that the roof had a high<br />
pitch. One problem which makes using this type of surface difficult is the degree<br />
of reflection that can take place form the relatively smooth surface of roof<br />
covering – further distorting spectral values.<br />
Roof test sample 2 Mean pixel value Standard deviation<br />
Red 121.394 42.959<br />
Green 144.111 41.557<br />
Blue 128.182 32.159<br />
Roof test adjacent sample 2<br />
(mixed forestry)<br />
Mean pixel value Standard deviation<br />
Red 70.209 34.505<br />
Green 104.58 33.996<br />
Blue 86.887 20.602<br />
Table 21: Roof test sample value 2<br />
84
The second sample took an area of mixed forestry and a dwelling adjacent to it –<br />
the dwelling had multiple pitches due to dormer windows. As with the first the<br />
values of the area sampled adjacent to the roof was consistent with values<br />
calculated from samples of this land type (mixed forestry) elsewhere in this thesis.<br />
The roof sample itself had a high level of standard deviation (revealing a high<br />
level of reflection and shade) and a variation in mean colour band values from<br />
other roof samples consistent with the initial roof survey. From this it can be<br />
concluded that roof values are not a reliable indicator of forestry values and<br />
cannot be reliably used as a reference.<br />
Roof test sample 3 Mean pixel value Standard deviation<br />
Red 213.815 41.112<br />
Green 223.148 34.272<br />
Blue 208.148 42.432<br />
Roof test adjacent sample 3<br />
(cut pasture)<br />
Mean pixel value Standard deviation<br />
Red 205.074 7.903<br />
Green 206.539 8.182<br />
Blue 134.537 9.784<br />
Table 22: Roof test sample value 3<br />
The third set of samples for determining the potential for using roof spectral<br />
values as a tool for validating adjacent area values used a section of cut pasture<br />
and a flat roofed shed. The values for the shed were high (with a correspondingly<br />
high level of standard deviation), indicating the level of reflection on the roof<br />
surface. By comparison the standard deviation across all three colour bands was<br />
low in the cut pasture, which is consistent with values identified elsewhere in this<br />
thesis. The compared mean values between the two samples were similar and this,<br />
together with the fact that buildings have been returning inconsistent spectral<br />
values in the study, suggests there is little to be gained by examining the<br />
relationship between the spectral values in roofs and adjacent polygons. This is<br />
something of a surprise, as at the outset of the research I had expected the<br />
relatively uniform nature of the shades and texture of roofing used in Ireland to<br />
provide an ideal reference for spectral analysis of aerial imagery.<br />
85
3.9 Pasture<br />
Figure 15: Typical Pasture Area Image<br />
This part of the study looks at spectral values for areas of pasture within a section<br />
of aerial photography, corresponding to 3sqkm of semi rural county Galway. The<br />
aim is to establish baseline data for the relatively stable spectral values returned<br />
from this type of land cover. The polygons in question correspond to field areas<br />
bounded by walls or fences (and occasionally roads and water bodies). They can<br />
be described as stable in terms of their usage; they will almost always be turned<br />
over to one particular usage –in other words they will not contain large areas of<br />
separate land cover within them (other than islands of forestry or water, which is<br />
already captured and can be eliminated from the image processing). The study<br />
looked at nine separate areas, two of which were freshly cut. The values returned<br />
were uniform with the variation in the two freshly cut fields remaining consistent.<br />
Another feature which emerged was the low level of deviation for values in the<br />
red colour band, approximately one third of that returned from the image as a<br />
whole, across all the samples.<br />
Pasture Sample 1 Mean Pixel Value Standard Deviation<br />
Red 126.131 8.538<br />
Green 173.848 9.45<br />
Blue 109.745 12.646<br />
86
Pasture Sample 2 Mean Pixel Value Standard Deviation<br />
Red 139.318 6.267<br />
Green 182.748 8.016<br />
Blue 115.605 12.547<br />
Pasture Sample 3 Mean Pixel Value Standard Deviation<br />
Red 133.267 8.871<br />
Green 173.4 10.926<br />
Blue 108.128 11.748<br />
Pasture Sample 4 Mean Pixel Value Standard Deviation<br />
Red 198.363 9.629<br />
Green 207.759 10.881<br />
Blue 134.204 11.83<br />
Pasture Sample 5 Mean Pixel Value Standard Deviation<br />
Red 197.046 14.712<br />
Green 207.816 13.067<br />
Blue 135.054 14.474<br />
Pasture Sample 6 Mean Pixel Value Standard Deviation<br />
Red 110.019 6.96<br />
Green 163.79 8.215<br />
Blue 100.187 11.92<br />
Pasture Sample 7 Mean Pixel Value Standard Deviation<br />
Red 126.238 9.459<br />
Green 168.753 9.56<br />
Blue 100.17 12.073<br />
Pasture Sample 8 Mean Pixel Value Standard Deviation<br />
Red 117.301 10.329<br />
Green 169.254 11.042<br />
Blue 104.371 12.349<br />
Pasture Sample 9 Mean Pixel Value Standard Deviation<br />
Red 126.357 8.542<br />
Green 183.48 9.611<br />
Blue 110.941 12.799<br />
Table 23: Pasture sample values<br />
87
From the above samples, samples 4 and 5 were of freshly cut fields, reflected in<br />
the high mean pixel value for the red colour band (almost 50% higher than the<br />
average for the image as a whole). This gives areas of freshly cut pasture a unique<br />
trait in a high red colour band mean and low standard deviation for a given<br />
polygon. This trait also distinguishes cut pasture as the red colour band mean<br />
values are over 35% higher than those of pasture in general. The mean green<br />
colour band pixel value for pasture is 25% more than that of the image as a whole;<br />
and when matched to a low standard deviation gives another strong indication of<br />
the type of land use.<br />
The above data is useful for indicating the presence of earthworks and landscaped<br />
areas within a small polygon. With a key calibrated to identify polygons<br />
containing the above values it should be able to accurately calculate the<br />
percentage of land area given to pasture; and also improve the accuracy of any<br />
process ran on imagery (in conjunction with vector data) by reducing the number<br />
of polygons for analysis.<br />
Pasture also serves an important part in the algorithm this thesis is attempting to<br />
map out. This is for two reasons. Its uniform property (in terms of standard<br />
deviation of the pixels form mean values and also the relative proportional<br />
difference between those mean values and the mean values for other land<br />
coverage types) allows it to form the bases of keys generated at the beginning of<br />
image processing to identify land use with. The second reason is that in an Irish<br />
context it is by far the biggest form of land use and any algorithm processing<br />
aerial photography (even in semi urban areas) will have successfully identified the<br />
majority of the surface area by correctly flagging area polygons of pasture. Of the<br />
remaining areas most can be identified by polyline coding from the vector data<br />
leaving the study with a higher chance of success in obtaining useful information<br />
about the remaining areas. It is useful to divide the classes of pasture into the two<br />
categories identified tin the samples above –that is to create a separate category of<br />
cut pasture for the purposes of executing a comparative survey of polygons in the<br />
aerial photography. With this in mind the following test samples were compared<br />
against the spectral values of both.<br />
88
Pasture test sample 1<br />
(track)<br />
Mean pixel value Standard deviation<br />
Red 213.952 18.444<br />
Green 236.762 16.532<br />
Blue 194.833 19<br />
Pasture test sample 2<br />
(coniferous forestry)<br />
Mean pixel value Standard deviation<br />
Red 72.566 19.204<br />
Green 111.643 21.468<br />
Blue 96.381 16.026<br />
Pasture test sample 3<br />
(mixed forestry)<br />
Mean pixel value Standard deviation<br />
Red 69.968 32.486<br />
Green 104.287 32.054<br />
Blue 86.217 19.492<br />
Table 24: Pasture test sample values<br />
The above samples were chosen from areas outside those already captured for the<br />
baseline survey of the spectral values for track, coniferous and mixed forestry.<br />
The track was chosen as hard cover gives unique high values and is of benefit in<br />
calibrating a proportional difference between target areas (such as pasture in this<br />
case) and its high values. As was noted above, pasture itself is also useful in<br />
calibrating a key for an algorithm due to its specific mean colour values in the red<br />
colour band and low level of standard deviation from those mean values, but is not<br />
coded in the original vector input data and can only be derived from the image<br />
processing. The area of mixed forestry was chosen for its high level of standard<br />
deviation and the fact that it has similar spectral values to rough pasture. Mixed<br />
forestry will be identified and present in most imagery (or the specific properties<br />
could be pre-set into a key for the automated analysis) so using those values<br />
would allow an automated processing technique to eliminate areas with this level<br />
of high standard deviation from the process.<br />
89
For the track/ hard cover the difference in the values for mean number of pixels<br />
converted to greyscale for the red colour band was significantly higher. This was<br />
also the case for the mean of the green colour band where the variation was<br />
upwards of 40% for the pasture which had not been cut. It should be noted,<br />
however, that the mean pixel in the green colour band was higher for cut pasture<br />
and this variation brought it closer to the value of hard ground –the difference in<br />
the mean of the blue colour band between the test sample and all the samples of<br />
pasture (including the cut pasture samples) was consistently 40% higher. This<br />
differential (in conjunction with a low standard deviation and a mean 10% to 40%<br />
lower than hard ground in the red and green colour bands) could be used to<br />
determine the presence of pasture.<br />
The two forestry samples showed a mean pixel value in the red colour band of<br />
approximately 50% less than pasture –with an increased standard deviation for<br />
both. In terms of distinguishing between coniferous and mixed; the standard<br />
deviation was one third higher for the red and green colour bands in the mixed<br />
forestry. The mean value for the green colour band in both of the forestry types<br />
was 40% lower than the value of the mean in the pasture, giving another relative<br />
indicator to calibrate pasture values from. Small areas of forestry are found close<br />
to most semi urban areas and the fact that they are coded and measured means<br />
they are useful in training an algorithm to match land use to neighbouring<br />
polygons.<br />
The aim of this sampling is to try and achieve a number of probability factors that<br />
will allow an automated process to determine land use or coverage types based on<br />
the available data. With this in mind a broad variety of samples were taken and the<br />
relative proportions between them assessed. In the above example the pasture area<br />
was compared against other known area types. The increased amount of cross<br />
checking allows the elimination of areas which conform to a particular type. In<br />
this way the image is gradually broken down until every polygon is referenced.<br />
The value of this is not necessarily in the ability to classify field use (although this<br />
has some merit in that it adds value to available mapping) but in the identified<br />
process. This enables a piece of software to be developed which can be reused<br />
90
according to a users requirements. The process would remain constant (known<br />
values eliminated until the user is left with a set of polygons not conforming to<br />
known values and matching a set of target values identified by the user. The<br />
system could accept a set of co-ordinates recorded on a mobile GPS device or<br />
alternatively allow the user to view ordnance survey vector mapping draped over<br />
the orthophoto and manually enter the location of sample target values. In this<br />
way the process would allow the user to quickly establish the initial conditions for<br />
an automated search of aerial photography and return the spatial extent of the<br />
values input.<br />
91
3.10 Rough Pasture<br />
Figure 16: Typical Rough Pasture Area Image<br />
This part of the study focuses on areas of land cover known as rough pasture.<br />
They correspond to areas where scrub/ mixed forestry or overgrown vegetation are<br />
present. In general terms it could be used to refer to areas of bad pasture but could<br />
also be found close to an urban area where the land is not in use. It is similar to<br />
mixed forestry but would be slightly lighter than forested areas (and returned<br />
spectral values accordingly –by app 75 across all colour bands). The values for<br />
this type of coverage could be particularly useful in determining whether the<br />
target polygon was in use or not. The results from the sample areas returned a high<br />
level of deviation from the mean value in both the red and green colour bands.<br />
This would not be expected where the land was in use. If the polygon being<br />
examined was not part of an urban development (comprising of a mixture of<br />
artificial and landscaped surface cover) and was in use then a relatively low level<br />
of deviation form the mean value would result from human activity –for example;<br />
the results from a study of polygons used for pasture in this survey returned a<br />
deviation of almost one quarter of what was found here.<br />
92
Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />
Red 81.61 31.475<br />
Green 120.334 31.922<br />
Blue 94.606 17.356<br />
Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />
Red 73.432 34.752<br />
Green 114.44 36.594<br />
Blue 95.898 18.988<br />
Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />
Red 69.694 35.687<br />
Green 111.165 37.813<br />
Blue 95.857 18.771<br />
Rough Pasture Sample 1 Mean pixel value Standard Deviation<br />
Red 66.43 27.096<br />
Green 105.61 28.348<br />
Blue 94.081 17.243<br />
Table 25: Rough pasture sample values<br />
As mentioned above; it could be possible to obtain similar values to rough pasture<br />
in a spectral analysis of mixed forestry or certain urban developments. This,<br />
however, does not present a problem for the potential image key as both those<br />
areas can be eliminated from an automatic survey due to the fact that both mixed<br />
forestry and areas containing buildings and dwellings can be identified by<br />
previously captured attributes. In terms of the spectral values for the image as a<br />
whole the red band was 35% lower, the green 17% lower and blue 7% lower<br />
across the samples. While this alone does not give complete indication of rough<br />
pasture, when coupled with high levels of standard deviation in the red and green<br />
colour bands the probability of this type of land cover is high. When coupled with<br />
an additional vetting process (eliminating known land cover types), it should be<br />
possible to identify rough pasture using lower red and green colour band values<br />
with high standard deviation.<br />
93
The identification of rough pasture using an automated process would also lend<br />
itself to the study of abandoned developments. It could reasonable be assumed that<br />
the presence of rough pasture within more than half the plots of land in a<br />
development indicates that it has been abandoned. This might allow for a quick<br />
survey of developments to determine the level of repair by feeding the associated<br />
polygon set into a process designed to identify the percentage of rough pasture. In<br />
particular levels of pixel values in the red colour band similar to those above<br />
would indicate that the site has been abandoned. On an anecdotal level the rate at<br />
which former construction sites can be reclaimed by vegetation has been<br />
impressive over the last few years of good growing weather. Although there are<br />
several other ways to identify incomplete dwellings such as missing access<br />
roadways and ordnance survey field revision text the presence of spectral values<br />
corresponding to rough pasture would indicate long term neglect.<br />
Because of the relatively high level of standard deviation associated with the<br />
sampling for rough pasture it is probably one of the more important parts of<br />
automated image processing to identify as once these areas are removed from the<br />
search remaining areas displaying similar levels of deviation in the red and green<br />
colour bands can be analyzed more closely (possibly even with flagged histograms<br />
for users to identify). In order to recognise the values discovered relative to the<br />
rest of the image three separate areas of cover were sample.<br />
94
Rough Pasture testing sample 1<br />
(pasture)<br />
Mean pixel<br />
deviation<br />
95<br />
Standard<br />
deviation<br />
Red 131.413 10.813<br />
Green 167.157 9.494<br />
Blue 102.256 12.591<br />
Rough Pasture testing sample 2<br />
(water)<br />
Mean pixel<br />
deviation<br />
Standard<br />
deviation<br />
Red 35.191 4.374<br />
Green 68.865 6.877<br />
Blue 83.6 12.964<br />
Rough Pasture testing sample 1<br />
(road)<br />
Mean pixel<br />
deviation<br />
Standard<br />
deviation<br />
Red 218.867 7.405<br />
Green 240.2 9.321<br />
Blue 197.533 10.802<br />
Table 26: Rough pasture test sample values<br />
The first testing sample for comparison to rough pasture was pasture –this<br />
contrasts well with rough pasture and is a useful benchmark in the algorithm. As<br />
with the similar (in terms of pixel values) area of mixed forestry, rough pasture<br />
has high levels of standard deviation from a relatively low mean pixel value for<br />
the red colour band (app. 70 on the converted greyscale). This contrasts well with<br />
the mean red pixel value expected for pasture (almost double), something which<br />
was borne out in the pasture sample. The level of standard deviation is similarly<br />
low (app. One third of the value found in rough pasture for the red and green<br />
colour bands). The fact that pasture is often adjacent to rough pasture makes this<br />
also a very useful comparative measurement. It should also be noted that in terms<br />
of the vector spatial data this is also useful as the same boundary polyline will flag<br />
both areas and could be used to refine the algorithm. This is helpful to automated<br />
image interpretation as it reduces the possible value set by allowing a reduced set<br />
of values to be applied over the first iteration of the analysis. In this way an initial<br />
analysis by polygon can apply the spectral values to a smaller subset of possible<br />
neighbouring polygons and save time, opening up the possibility for the software
cycle being suggested here to present an early estimate of the type of surface<br />
coverage the image is composed of.<br />
One other factor that could be included in a search for rough pasture is the<br />
presence of symbols within polygons of this type in the vector mapping. Initial<br />
steps within the algorithm could extract all polygons and all indication symbols<br />
(the polylines surrounding known areas of rough pasture are not coded) and retain<br />
the set of polygons where they intersect. These could then be discounted from the<br />
automated image analysis.<br />
The second sample area was an area of water. This was taken form a lake present<br />
on the plan because, as was indicated earlier in the study, only water polygons<br />
above the level stream (wider than 3m) return a pixel sample useful for reference.<br />
The values obtained matched those identified in the water area sampling in this<br />
thesis and contrasted with values for rough pasture (less than 50% of the red and<br />
50% of the green colour band values of rough pasture). The level of standard<br />
deviation varied greatly also with values almost six times higher in rough pasture.<br />
This makes water a good benchmark to rate the probability of rough pasture<br />
against, using the percentage increase on the values for the red and green colour<br />
bands as a reference.<br />
The third value sampled was a section of road. This sample is unique to this part<br />
of the study, so as to avoid overlap with road samples taken elsewhere and ensure<br />
the values remain consistent for the feature type. Road was selected for sampling<br />
because, as with water, it contains relatively unique spectral properties with a low<br />
level of standard deviation. This allows it to be used as a stable contrast to the<br />
values found in rough pasture (which are similar to some types of forestry and<br />
contain a high level of standard deviation in all three colour bands).For the red<br />
colour band the road had an increased mean value of over 20% on those found in<br />
rough pasture, with a standard deviation of less than a third of the rough pasture<br />
value. The green colour band showed the highest difference returning a value of<br />
over twice that found in the rough pasture samples and, as with the red colour<br />
band, a level of standard deviation less than a third of rough pasture. The blue<br />
96
colour band also returned a distinct difference, with a mean value for road over<br />
twice that of rough pasture.<br />
The last two sample areas gave a clear indicator for use in the identification of<br />
rough pasture. When these are combined with the vector attributes (symbol search<br />
and nearest neighbour probability) the algorithm is equipped with a robust means<br />
of ensuring as many areas of this type of land cover are identified and removed<br />
from the search as possible. In other words it a user (of the proposed image search<br />
process) was seeking to identify a crop type (or disease within that crop type etc.)<br />
they could easily set the process to discount rough pasture from the analysis.<br />
97
4 Testing<br />
This chapter describes a set of tests for known polygon types in the imagery being<br />
used in the study. Four land use types were selected; pasture, rough pasture, marsh<br />
and bog; and the polygons clipped from the raster imagery for spectral analysis.<br />
The process followed the outline presented in Chapter 2, with the vector<br />
coordinates being identified from the ordnance survey data. This set of<br />
coordinates was then used to extract the corresponding section of image, which in<br />
turn was analyzed for its spectral values. The overall results were good in terms of<br />
there being unique proportional pixel count values present in the polygons<br />
(selected from across the image with typical biasing factors such as shade present).<br />
Of the four sample areas, bog and pasture had the least amount of standard<br />
deviation –it was possible to automatically classify pasture even with a high<br />
proportion of shade present in the image section. Marsh and rough pasture had<br />
similar levels of standard deviation but can be distinguished by a dip in values<br />
between the shade and vegetation range in rough pasture, which was not present in<br />
the marsh polygons. This chapter is divided into four sections; as with the<br />
previous chapter the results are grouped according to a particular area of interest,<br />
starting with a look at the values from the known pasture polygons.<br />
98
4.1 Pasture Test<br />
Four areas of pasture were sampled across varying degrees of shade and<br />
unbounded (from vector data) internal ground cover outside pasture (exposed<br />
rock). The first of these was an area in the south west of the sample area and<br />
comprised a polygon surrounded by five fence vectors close to the road. The<br />
sample was chosen to see if the expected values for the man and standard<br />
deviation across the colour bands would be reflected in a section of image with a<br />
relatively high degree of variety in terms of ground cover.<br />
The data was sampled using the Radius software where the polyline values were<br />
exported to an ASCII file. This process will be easier once GML format spatial<br />
data becomes available (not available at the time of writing but will be over the<br />
next few years, OSI 2010). This process has two requirements in order for the<br />
polygon to be correctly sampled:<br />
• Firstly the line orientation needs to be the same for all polylines which<br />
bound the sample polygon, by convention this is anticlockwise (left of the<br />
line direction falling on the inside of the area to be extracted).<br />
• Secondly the first and last co-ordinates must match, with no other<br />
duplicate values present in the file.<br />
These qualifiers can be validated at the time of extraction once GML data is<br />
available, however for this study a text editor was used to search duplicate values.<br />
99
Figure 17: Creating the ASCII file<br />
Once the ASCII file was created, it was used to extract the region of interest from<br />
the file using the Mirone cropping tool;<br />
Figure 18: Aerial view of pasture test 1<br />
100
This then presented a sample area which can be analyzed for pixel values –note<br />
the high degree of shade in the south east and exposed rock in the north west of<br />
the area. The next step in the process involved analyzing the histogram data for<br />
the target sample. This was completed in Geomatica, with the pixel count reduced<br />
to remove the values counted outside the image values. In other words the<br />
irregular image was stored as a GeoTiff image which placed the area onto a blank<br />
background, resulting in a high pixel count for values above the expected range<br />
(in the order of thousands), which were discounted by reducing the pixel count.<br />
Figure 19: Red colour band for pasture test 1<br />
The above histogram shows the values for the red colour band, with a clear peak<br />
for pasture values (see sampling section). The lesser spike of pixels of lower<br />
values is consistent with the type of shade expected in this type of area, while the<br />
slightly higher values returned by the exposed rock distorted the standard<br />
deviation slightly. The areas content, however, can be clearly seen in the spike for<br />
pasture, something which is clearer to see in the histogram for the green colour<br />
band:<br />
101
Figure 20: Green colour band for pasture test 1<br />
In the above sample, the values for both shade and pasture are evident in spikes in<br />
the pixel count.<br />
The second sample gave a clearer representation of the area type with a relatively<br />
monochrome sample taken from the south east of the imagery:<br />
Figure 21: Aerial view of pasture test 2<br />
The relatively low level of shade and lack of exposed earth allows the algorithm to<br />
classify the area as pasture, based on the values from both the red and green<br />
colour bands:<br />
102
Figure 22: Red colour band for pasture test 2<br />
The red colour band histogram (above) displayed a peak at the expected pasture<br />
value with the small level of count for lower values indicate the shade present in<br />
the east of the area. Note: The high value at the far right is the count for bland<br />
pixels represented by null values outside the cropped section present in the<br />
GeoTiff image. These values were discounted by lowering the total pixel count to<br />
a range within which expected levels of values will fall. The values for the green<br />
colour band also showed that the sample could correctly be identified as pasture<br />
from the peak count values found in this band (see sampling section for an<br />
elaboration on how these values were arrived at).<br />
103
Figure 23: Green colour band for pasture test 2<br />
The third sample took a polygon to the centre of the plan, close to areas of bog,<br />
rough pasture and marsh. It had a high degree of shade and also contained a<br />
relatively high degree of variation in use (the darker band to the north, which<br />
contained trees –the dark band to the south is shade), although it is still an area of<br />
pasture.<br />
Figure 24: Aerial view of pasture test 3<br />
The histogram values for both the red and green colour bands still, however,<br />
pointed to a polygon with spectral values which falls within the expected range of<br />
pasture:<br />
104
Figure 25: Red colour band for pasture test 3<br />
The above red colour band sample shows the peak at the pasture band with the<br />
variance in terms of darker values representing the scrub/ bushy part of the sample.<br />
The values are slightly higher than the overall mean for pasture, but still fall inside<br />
what might be expected for an area of cut pasture<br />
Figure 26: Green colour band for pasture test 3<br />
105
The peak for the green colour band, around 190 on the converted greyscale, means<br />
that the slightly higher peak for the red colour band could be accepted as variance<br />
within the pasture range.<br />
The fourth sample that was used to test the expected ranges for the presence of<br />
pasture from automatically extracted polygons was an area to the south east of the<br />
image which was close to farm buildings and used as pasture, giving a relatively<br />
clean example of the land use type to test the algorithm against:<br />
Figure 27: Vector data for pasture test 4<br />
The above screen grab from the vector data (farm buildings are to the right),<br />
shows how the process being developed in this study uses known controlled<br />
106
coordinates to crop the imagery into a mosaic of tiles for processing (same area<br />
below);<br />
Figure 28: Aerial view of pasture test 4<br />
The values for the red colour band were consistent with the expected values and,<br />
with the small level of distortion as a result of shade increasing the level of values<br />
counted in the lower part of the colour range, the polygon can easily be processed<br />
as an area of pasture:<br />
Figure 29: Red colour band for pasture test 4<br />
Similar results were returned for the green colour band (below), with the shade<br />
causing some distortion to the pixel count but the overall values indicate an area<br />
of pasture:<br />
107
Figure 30: Green colour band for pasture test 4<br />
The above sampling demonstrates that it is possible to cut up an image based on<br />
controlled vector data and analyze each section with some degree of success. The<br />
aim of this sampling was not to show absolute values for each land type, it was<br />
only to prove the theory that when given correct vector data representing small<br />
area polygons then valuable data can be further derived based on spectral. Of<br />
further advantage of anyone using the suggested algorithm is that for this type of<br />
land cover (peri-urban and rural but close to settlements –which covers most of<br />
the country) the controlled vector data has changed little over time and means the<br />
process also lends itself to studies looking at change over time.<br />
108
4.2 Rough Pasture Test<br />
Four areas of pasture were sampled across varying degrees of shade and<br />
unbounded (from vector data) internal ground cover outside pasture (exposed<br />
rock). The first of these was an area in the south west of the sample area and<br />
comprised a polygon surrounded by five fence vectors close to the road. The<br />
sample was chosen to see if the expected values for the man and standard<br />
deviation across the colour bands would be reflected in a section of image with a<br />
relatively high degree of variety in terms of ground cover.<br />
The data was sampled using the Radius software where the polyline values were<br />
exported to an ASCII file. This process will be easier once GML format spatial<br />
data becomes available (not available at the time of writing but will be over the<br />
next few years, OSI 2010). This process has two requirements in order for the<br />
polygon to be correctly sampled:<br />
• Firstly the line orientation needs to be the same for all polylines which<br />
bound the sample polygon, by convention this is anticlockwise (left of the<br />
line direction falling on the inside of the area to be extracted).<br />
• Secondly the first and last co-ordinates must match, with no other<br />
duplicate values present in the file.<br />
These qualifiers can be validated at the time of extraction once GML data is<br />
available, however for this study a text editor was used to search duplicate values.<br />
109
Figure 31: Vector data for rough pasture test 1<br />
Once the ASCII file was created, it was used to extract the region of interest from<br />
the file using the Mirone cropping tool;<br />
Figure 32: Aerial view of rough pasture test 1<br />
110
This then presented a sample area which can be analyzed for pixel values –note<br />
the high degree of shade in the south east and exposed rock in the north west of<br />
the area. The next step in the process involved analyzing the histogram data for<br />
the target sample. This was completed in Geomatica, with the pixel count reduced<br />
to remove the values counted outside the image values. In other words the<br />
irregular image was stored as a GeoTiff image which placed the area onto a blank<br />
background, resulting in a high pixel count for values above the expected range<br />
(in the order of thousands), which were discounted by reducing the pixel count.<br />
Figure 33: Red colour band for rough pasture test 1<br />
The above histogram shows the values for the red colour band, with a clear peak<br />
for pasture values (see sampling section). The lesser spike of pixels of lower<br />
values is consistent with the type of shade expected in this type of area, while the<br />
slightly higher values returned by the exposed rock distorted the standard<br />
deviation slightly. The areas content, however, can be clearly seen in the spike for<br />
pasture, something which is clearer to see in the histogram for the green colour<br />
band:<br />
111
Figure 34: Green colour band for rough pasture test 1<br />
In the above sample, the values for both shade and pasture are evident in spikes in<br />
the pixel count.<br />
The second sample gave a clearer representation of the area type with a relatively<br />
monochrome sample taken from the south east of the imagery:<br />
Figure 35: Aerial view of rough pasture test 2<br />
The relatively low level of shade and lack of exposed earth allows the algorithm to<br />
classify the area as pasture, based on the values from both the red and green<br />
colour bands:<br />
112
Figure 36: Red colour band for rough pasture test 2<br />
The red colour band histogram (above) displayed a peak at the expected pasture<br />
value with the small level of count for lower values indicate the shade present in<br />
the east of the area. Note: The high value at the far right is the count for bland<br />
pixels represented by null values outside the cropped section present in the<br />
GeoTiff image. These values were discounted by lowering the total pixel count to<br />
a range within which expected levels of values will fall. The values for the green<br />
colour band also showed that the sample could correctly be identified as pasture<br />
from the peak count values found in this band (see sampling section for an<br />
elaboration on how these values were arrived at).<br />
113
Figure 37: Green colour band for rough pasture test 2<br />
The third sample took a polygon to the centre of the plan, close to areas of bog,<br />
rough pasture and marsh. It had a high degree of shade and also contained a<br />
relatively high degree of variation in use (the darker band to the north, which<br />
contained trees –the dark band to the south is shade), although it is still an area of<br />
pasture.<br />
Figure 38: Aerial view of rough pasture test 3<br />
The histogram values for both the red and green colour bands still, however,<br />
pointed to a polygon with spectral values which falls within the expected range of<br />
pasture:<br />
114
Figure 39: Red colour band for rough pasture test 3<br />
The above red colour band sample shows the peak at the pasture band with the<br />
variance in terms of darker values representing the scrub/ bushy part of the sample.<br />
The values are slightly higher than the overall mean for pasture, but still fall inside<br />
what might be expected for an area of cut pasture<br />
Figure 40: Green colour band for rough pasture test 3<br />
115
The peak for the green colour band, around 190 on the converted greyscale, means<br />
that the slightly higher peak for the red colour band could be accepted as variance<br />
within the pasture range.<br />
The fourth sample that was used to test the expected ranges for the presence of<br />
pasture from automatically extracted polygons was an area to the south east of the<br />
image which was close to farm buildings and used as pasture, giving a relatively<br />
clean example of the land use type to test the algorithm against:<br />
Figure 41: Vector data for rough pasture test 4<br />
The above screenshot from the vector data (farm buildings are to the right), shows<br />
how the process being developed in this study uses known controlled coordinates<br />
to crop the imagery into a mosaic of tiles for processing (same area below);<br />
116
Figure 42: Aerial view of rough pasture test 4<br />
The values for the red colour band were consistent with the expected values and,<br />
with the small level of distortion as a result of shade increasing the level of values<br />
counted in the lower part of the colour range, the polygon can easily be processed<br />
as an area of pasture:<br />
Figure 43: Red colour band for rough pasture test 4<br />
Similar results were returned for the green colour band (below), with the shade<br />
causing some distortion to the pixel count but the overall values indicate an area<br />
of pasture:<br />
117
Figure 44: Green colour band for rough pasture test 4<br />
The above sampling demonstrates that it is possible to cut up an image based on<br />
controlled vector data and analyze each section with some degree of success. The<br />
aim of this sampling was not to show absolute values for each land type, it was<br />
only to prove the theory that when given correct vector data representing small<br />
area polygons then valuable data can be further derived based on spectral. Of<br />
further advantage of anyone using the suggested algorithm is that for this type of<br />
land cover (peri-urban and rural but close to settlements –which covers most of<br />
the country) the controlled vector data has changed little over time and means the<br />
process also lends itself to studies looking at change over time.<br />
118
4.3 Marsh Test<br />
The testing for the marsh areas involved taking four known marsh areas (but<br />
geographically separate) from the imagery as ASCII co-ordinate files and using<br />
this data to extract the raster sections for spectral analysis. The sampling section<br />
of this study found marsh to have a relatively low level of standard deviation<br />
across the colour bands (when compared to the similar rough pasture type<br />
polygons). This might have been expected to change as there can be a relatively<br />
large degree of shade in these areas due to the presence of other vegetation –and to<br />
some extent this was the case. The samples, however, despite a high degree of<br />
standard deviation, did display some unique properties in line with the sampling<br />
section which allow them to be automatically classified. In general terms polygons<br />
with the value ranges displayed below will belong to either rough pasture or marsh<br />
(see sampling section, this is in reference to Irish small area polygons only).<br />
Rough pasture displays two peaks of values where the shade and vegetation<br />
contrast, the marsh samples did not have this distinction and the range of values<br />
graduated towards a peak (slightly lower than rough pasture ~10 on the converted<br />
greyscale).<br />
119
The first sample area came from a polygon of marsh north of a lake and included a<br />
large amount of vegetation.<br />
Figure 45: Vector data for marsh test 1<br />
The sampling for these was from separate areas across the imagery used in the<br />
study to try and achieve as consistent a picture as possible of this type of area. The<br />
co-ordinates of the area were extracted from the vector data (above) in ITM<br />
projection and used to clip the polygon from the raster imagery (below). As with<br />
all other samples in this test –the final results were adjusted by lowering the pixel<br />
count so as to account for plank pixels created in the output GeoTiff of the<br />
irregular clipped polygon:<br />
120
Figure 46: Aerial view of marsh test 1<br />
The results for the red colour band pixel count were similar to the spectral values<br />
for rough pasture but had a definite gradient between the areas of and vegetation,<br />
as opposed to two peaks for growth consistent with high vegetation, which<br />
emerged as typical of the marsh samples taken during the testing of this algorithm<br />
(the sampling section had a lower level of standard deviation but the full polygon<br />
samples also included shade and other growth, as in the east of the sample above):<br />
121
Figure 47: Red colour band for marsh test 1<br />
As with the spectral values for the red colour band, the green colour band showed<br />
a gradual increase in pixel count values along the greyscale from those related to<br />
areas of shade to those falling within the expected range for marsh:<br />
Figure 48: Green colour band marsh test 1<br />
The above trend would emerge as consistent through the marsh value testing, with<br />
a consistent graph for blue (within the sampling range) with values evenly<br />
122
distributed either side of a peak between 90 and 100 on the converted greyscale.<br />
The fact that these values mirror rough pasture closely mean that the marsh areas<br />
can only be classified once rough pasture values have been fully identified<br />
(towards the end of the third step in the algorithm).<br />
Figure 49: Blue colour band for marsh test 1<br />
The second known marsh area selected (outlined below) to test the algorithm was<br />
taken from an area east of the first sample, which included some areas of tree<br />
cover (north east) and vegetation close to rough pasture (south):<br />
123
Figure 50: Aerial view of marsh test 2<br />
One of the advantages of using vector data which has already been controlled to<br />
cut the photography into a mosaic of polygons is the extent to which the set data<br />
points mirror detail that would be extremely difficult to identify using automatic<br />
methods (such as the outline of the drain, visible in the indent on the eastern edge<br />
of the above polygon). The values for this larger sample area, although it<br />
contained more tree cover that the first sample, remained consistent with all four<br />
samples with a gradual incline from shade values to the expected values for marsh<br />
in the red colour band (following page):<br />
124
Figure 51: Red colour band for marsh test 2<br />
This gradual increase in values from shade to marsh (as per the original sampling<br />
for this study, see sampling section) was also present in the green colour band:<br />
Figure 52: Green colour band marsh test 2<br />
The blue colour band remained consistent both with the other three test polygons<br />
used in this study, and the sampling for marsh. This suggests that the blue colour<br />
band pixel count values (below) can be added as an additional point of reference,<br />
125
from which any variance would flag a problem with the polygon analysis and flag<br />
it for further analysis (step 4 in the algorithm):<br />
Figure 53: Blue colour band for marsh test 2<br />
The third known marsh polygon used to test the expected values was taken from<br />
an area east of the image, with a high degree of shade and trees (and as a<br />
consequence overhanging growth making aerial analysis difficult). This was used<br />
to see if a relatively small area of marsh (in comparison to the other areas used in<br />
this section in of the study) could return values close to what an automatic<br />
analysis would expect, even with some distortion from neighbouring features):<br />
126
Figure 54: Aerial view of marsh test 3<br />
The values for the red colour band for this smaller section of known marsh were<br />
not consistent with all the other test polygons, with a small rise from shade values<br />
to those expected for marsh. This suggests, to a greater extent than the larger<br />
samples, that it is difficult to automatically classify marsh based on spectral values<br />
alone. The classification of this type of polygon must, as a result, take place<br />
towards the end of the search –so that all known values, including those<br />
automatically registered in the algorithm (such as pasture and bog) are first<br />
removed from the pool of areas being studied. It should be noted that this part of<br />
the study is not intended to be an exhaustive search for a means of automatically<br />
identifying marsh, but to show that it is possible when an aerial image is divided<br />
into discrete small area polygons –other factors, for example the percentage of<br />
rough pasture and water polygons in the same region of the image (taken from the<br />
vector data) could potentially be used increase accuracy in identifying this type of<br />
ground cover.<br />
127
Figure 55: Red colour band for marsh test 3<br />
The values for the green colour band showed a similar level of distortion and, as<br />
with the red colour band values form this area and unlike the other three samples<br />
in this section, the count did not peak neatly in the expected range for marsh. The<br />
blue colour band, however, was consistent with all other test polygons and<br />
sampled imagery. This suggests that for small areas (below 2ha.) marsh is difficult<br />
to detect automatically and could only be assigned the attribute when all other<br />
values in the region of interest being analyzed have been assigned.<br />
128
Figure 56: Blue colour band for marsh test 3<br />
The fourth sample known marsh polygon used in the test was from an area to the<br />
south of the sample image (below). As with all others it formed an irregular<br />
polygon bordered by other non-marsh areas. There were also some trees and other<br />
vegetation present in the area, which was something present in all the marsh<br />
samples. This suggests that the type of ground cover being described is often<br />
something that occurs in transitional areas, and the difficulties associated with<br />
getting clear spectral values (in terms of low levels of standard deviation) are a<br />
result of this variation. The samples, however, were consistent with the other two<br />
large marsh test polygons, and the pixel count for the red and green colour bands<br />
peaked within the expected range.<br />
129
Figure 57: Aerial view of marsh test 4<br />
The values for the red colour band for the fourth test polygon showed a gradual<br />
incline from the values for shade to the expected range for marsh:<br />
Figure 58: Red colour band for marsh test 4<br />
The green colour band values were also consistent with this trend and, once shade<br />
is removed from the analysis, these could be used to identify marsh. The values<br />
130
for the clue colour band, as with the previous three test polygons, remained<br />
consistent with an expected range for marsh (see sampling section):<br />
Figure 59: Blue colour band for marsh test 4<br />
The values returned for the test of marsh polygons were not as distinct as the<br />
previous three polygon tests (pasture, rough pasture and bog). There were sets of<br />
values (such as a consistent blue range and gradual increase in count from shade<br />
to the expected marsh range) which can be used to classify this type of area, but<br />
the classification needs to be confined to the latter (step 3) part of an automatic<br />
search algorithm to increase the probability of the spectral values matching<br />
correctly.<br />
131
4.4 Bog Test<br />
The sampling for areas of bog took larger polygons which, with forestry and water<br />
bodies already eliminated from the search algorithm (with the vector data) could<br />
automatically be assumed to have a high probability of belonging to an area of<br />
bog. The first sample was taken from an area of bog bounded by roads on all sides<br />
in the northeast of the imagery being used in this study (see the general<br />
introduction for details). This polygon was chosen for sampling as it could be<br />
considered a good example of this type of land cover:<br />
Figure 60: Vector data for bog test 1<br />
The data points which form the polylines bounding the polygon were exported in<br />
ASCII format. As with all clip files for this algorithm the points are recorded in<br />
132
anti-clockwise format with a space separating easting and northings, a newline<br />
separating coordinate pairs and the start/ end point appearing twice. The<br />
projection is, as with all the data in this study, Irish Transverse Mercator. This<br />
coordinate set then formed the boundaries for a clipped area of the raster image:<br />
Figure 61: Aerial view for bog test 1<br />
As with pasture, areas of bog produced a uniform pixel count with a low level of<br />
standard deviation (see sampling section for more detail on pixel values). One<br />
aspect of this type of ground cover, however, which sets it aside from other<br />
searches in this study is the absence of shade (due to the absence of trees and thick<br />
vegetation). The small area of shade present in the south of the above sample is<br />
not enough to distort the mean values and proved to be typical for the imagery. Ad<br />
with all the samples in this section of the study, the histogram values for the<br />
polygons have been adjusted to remove the distortion caused by blank pixels<br />
adjacent to the irregular shape created when it is exported to GeoTiff format. This<br />
was achieved by reducing the pixel count to eliminate values over 5000 (400 in<br />
the case of pasture and rough pasture, where sample areas are much smaller). The<br />
red colour band histogram (below) returned a clear spike, consistent with expected<br />
bog spectral values (see sampling section) and unique among the polygons in the<br />
imagery being studied (Note the slight peak in lower values, giving a clear<br />
indication of the presence, and proportion of, shade in the sample –something<br />
which facilitates automatic attribute detection):<br />
133
Figure 62: Red colour band for bog test 1<br />
The green colour band values for the same polygon also revealed a clear pattern to<br />
the values, peaking at the expected range for an area of bog:<br />
Figure 63: Green colour band for bog test 1<br />
The blue colour band also displayed a range of values which peaked for expected<br />
pixel values for an area of bog; the result of which is that the area can be classified<br />
as bog. This has implications for the determination of the growth/ decline of this<br />
134
type of area over time. The focus of this study is to determine the value of<br />
cropping raster imagery based on small area polygons so as to automatically<br />
detect values in the image, so the specifics of the type of bog or other factors<br />
which can be determined from further processing have not been developed, save<br />
to flag the necessary properties (i.e. The spectral values set in the sampling section<br />
and identified in this sample, and the area property of being larger than four<br />
hectares –which eliminates all other probable land cover types).<br />
Figure 64: Blue colour band for bog test 1<br />
The second sample area of bog was extracted from an area bounded by river,<br />
rough pasture, drain and road and produced an irregular polygon which was then<br />
extracted from the imagery using Mirone software and saved as a GeoTiff file.<br />
This file was examined for the spectral values across the colour bands. As is clear<br />
from the crop area from the image, the properties contain the uniform values (and<br />
associated low level of standard deviation) which are useful in for automatic<br />
analysis and classification.<br />
135
Figure 65: Vector data for bog test 2<br />
136
As with the first sample, little or no shade is present in the area being analyzed:<br />
Figure 66: Aerial view for bog test 2<br />
The red colour band produced a clear indication of an area of bog, consistent with<br />
expected values but with a small distortion due to the vegetation in the west of the<br />
image (evidenced by the very slight grade of values below the expected range):<br />
Figure 67: Red colour band for bog test 2<br />
137
The histogram for the green colour band also displayed a proportion of spectral<br />
values which was consistent with an area of bog. It should be noted that the values<br />
are very similar to the benchmark values identified in the sampling section, but are<br />
from a real world polygon –the sampling section used sections of the land type<br />
from within known areas to set the benchmark. This part of the study is looking at<br />
typical polygons extracted using vector data. It is therefore significant that the<br />
results are so similar:<br />
Figure 68: Green colour band for bog test 2<br />
As with the first sample, all three colour bands matched the expected range, with<br />
the blue colour band peaking for a range consistent with bog. This makes the<br />
proportional method (comparing the values to their proportional range to known<br />
areas such as water bodies and roads) possible, and suggests that during the<br />
second step of the algorithm the areas flagged as bog could be included into a<br />
known set of values to test areas with high standard deviation and multiple peaks<br />
of values against;<br />
138
Figure 69: Blue colour band for bog test 2<br />
The third sample for the bog land area type took a large section of bog to the north<br />
west of the imagery, which was exported from the vector software as a set of co-<br />
ordinates in ASCII format. It is worth noting that this format has been used<br />
throughout this study as it allows for easy manipulation of the files and would<br />
make them compatible with a GML routine, but the sets of co-ordinates can also<br />
be contained in a ESRI shape file (.shp) and this could then be used to cut the<br />
sections with the software used here. The downside to this, however, is that the<br />
sets cannot be (as easily) fed into reports or user created routines.<br />
139
Figure 70: Aerial view for bog test 3<br />
The results across all three colour bands were similar to the first two test samples<br />
and matched the expected values identified in the sampling section of this study,<br />
which indicates for general areas of raised bog this algorithm gives an accurate<br />
indicator of total area. This may not be the case for more remote sections of this<br />
land type, found on mountain ranges. The algorithm is dependant on closed vector<br />
polygons, which are available for almost all of the country, with the exception of<br />
high mountains. In those cases it would be necessary to re-set the image key to<br />
search for values of exposed rock and determine the proportional difference<br />
between the expected bog values and those for exposed rock in much larger<br />
mountainous polygons –this is something which the algorithm can be adapted for<br />
but is not attempted here as the focus is on matching generic land use types as<br />
much as possible (the larger mountain areas would contain a mix of marsh, bog,<br />
exposed rock and vegetation).<br />
140
Figure 71: Red colour band for bog test 3<br />
As with all the bog samples this test area returned values within the expected<br />
range for the land use type and the pixel count for the colour bands fell within<br />
expected range for red (above) and green (below):<br />
Figure 72: Green colour band for bog test 3<br />
141
This trend was continued for the blue colour band and indicates that there is a high<br />
degree of probability of bog once a low standard deviation and the value ranges<br />
are met –the graph for the blue colour band for the third test sample is below:<br />
Figure 73: Blue colour band for bog test 3<br />
The final test sample for an area of bog was taken from the west of the image and<br />
contained some vegetation and shade which may have distorted the results –as<br />
with all the samples of this type of land cover, the percentage shade is very low,<br />
allowing it to be a distinguishing feature for inclusion in an automatic search:<br />
142
Figure 74: Aerial view for bog test 4<br />
The red colour band results showed a slight grade from values of shade into the<br />
expected range of the land type, which was a result of the small area to the south<br />
and west of the polygon, but the values still fell within the range expected:<br />
Figure 75: Red colour band for bog test 4<br />
143
The green colour band pixel count also produced similar results, with the<br />
indications being that the polygon can (coupled with the relatively small level of<br />
standard deviation, large area size and low values of shade) be automatically<br />
recorded as an area of bog:<br />
Figure 76: Green colour band for bog test 4<br />
As with all the bog samples and test areas in this study the blue colour band values<br />
remained within a small range with a low level of standard deviation (see<br />
sampling section).<br />
144
Figure 77: Blue colour band for bog test 4<br />
In general terms bog is a very useful land use type for inclusion into an automatic<br />
aerial image survey such as this one, as it can be quickly identified and logged<br />
early in a looping cycle through the spectral values of polygons. As mentioned<br />
above, this is dependant on controlled polylines bounding the area, something<br />
which is available for most of the country but results may be distorted on remote<br />
high ground (identified by 1:5000 mapping and the presence of cropped rock).<br />
The purpose of this test was to determine the value of using the vector data to clip<br />
aerial imagery into a mosaic of polygons for spectral analysis. As mentioned in<br />
the sampling section of the study it is possible to include pattern analysis testing to<br />
determine the level of cutting taking place (drains are included in the vector data<br />
and could also factored into such a search).<br />
145
4.5 Conclusion<br />
Image segmentation is one of the most important parts of automatic analysis of<br />
aerial imagery (Zhou & Wang, 2007). A set of reference data is necessary to know<br />
where to divide image sections. This can be obtained from a survey input by the<br />
user or from spatial data specific to the area being studied (peat, forestry etc.).<br />
Ordnance survey vector data provides a comprehensive set of reference points and<br />
allows an aerial image to be cropped into small discrete area polygons. These<br />
polygons can also benefit from the previously captured coding which identifies<br />
many of them as a specific land type. The result of adding this data to an<br />
automatic search for specific spectral values is that the user can gain context from<br />
known neighbouring polygons and calibrate the specific search accordingly. This<br />
in turn means that the process of image analysis can be simplified by applying a<br />
generic technique for identifying polygons and refining it to search for a given<br />
value.<br />
This study looked at the value of cropping aerial imagery into a mosaic of known<br />
and unknown polygons. It attempts to automatically derive probable types for the<br />
unknown areas based on the known data and a sampled image key. The sampling<br />
and testing undertaken during the study indicated that it is possible to derive<br />
useful value from a spectral analysis based on a pixel count alone. This was<br />
because the vector data introduced into the process reduced the number of<br />
possible values that can be attributed to a pixel set –for example, as the extent of<br />
forestry is known, similar values returned from an unknown polygon must<br />
represent marsh or rough pasture while further analysis of the shape of the range<br />
can distinguish between either.<br />
The process is possible using open source software but could also be coded into a<br />
standalone application, e.g. using the GDAL library and a function to crop<br />
irregular polygons. The potential for automation will be supported by the release<br />
of the vector data in GML format, from which a large ASCII file of coordinate<br />
sets could be fed into the process. By removing the requirement for a user to<br />
control areas of the image through the use of the vector data, and by presenting the<br />
146
user with a pre calibrated image key of expected spectral values it is possible to<br />
automatically classify aerial image sections. It should be noted that this study was<br />
confined to a specific type of vector data (ordnance survey) and the landscape of<br />
small polygons with a single land use may not apply to all landscapes. However,<br />
the study proves that large scale vector data can be used to simplify aerial image<br />
processing.<br />
147
5 Literature Review<br />
The goal of this thesis is one that is in line with most work completed using<br />
remote sensing processing in that it is looking for traits in aerial imagery which<br />
can be used to derive useful information about the surface of the earth. One of the<br />
early studies of aerial image interpretation (Kittlers ‘Image processing for remote<br />
sensing’ paper) described the process as “the interpretation of image segments that<br />
exhibit similar statistical properties” (J.Kittler, 1983). One addition that could be<br />
made to that definition is that, for the majority of studies in this field the<br />
interpretation should be automatic, or as close to automatic as to make<br />
interpretation of a large volume of data viable.<br />
There is a vast body of work available which documents various methods for<br />
interpreting aerial and satellite imagery. In general terms there is always a focus<br />
for each study, e.g. identifying coffee plantations in Costa Rica (Corado-Sanches<br />
and Sader, 2005) and this influences the methodology. One result of this is that<br />
there are a large variety of methods employed. This literature review considers a<br />
representative sample of these in terms of their focus, in other words treating work<br />
that uses patterns or shapes as one category, spectral deviation for agricultural/<br />
forestry purposes as another and urban analysis as a further category. In terms of<br />
previous studies, the ones that are closest to what this thesis is attempting is the<br />
body of work that has been completed on what has been termed ISAs<br />
(impermeable surface areas, or, more usefully, hard ground). The focus of these<br />
works is to identify the percentage of hard ground within urban areas which can<br />
then be used in modeling flood events. During the early part of the study I made<br />
use use of the SWAP technique recommended by T.Knudsen in his 2005 analysis<br />
of color in aerial imagery to identify grey areas in the test imagery (Figure 5:<br />
Water Area Image Modification). In the context of the data being analyzed by this<br />
thesis (Irish peri-urban land parcels) these grey areas within the image can be<br />
made to correspond to hard ground.<br />
Before considering the body of work underlying this thesis it is probably useful to<br />
answer two questions which the reader might ask; is this not just<br />
148
photogrammetry?, and why not use an established algorithm such as the 2000<br />
vegetation-impervious surface-soil sub pixel analysis techniques published in the<br />
2000 issue of Remote Sensing (Phinn et al)?<br />
In answer to these questions this review will not be considering a history of<br />
photogrammetry other than a general outline of established (traditional)<br />
processing techniques. It will also not be describing some of the segmentation,<br />
target area identification pre-processing methods used in the various studies. This<br />
work is often a major component of this type of analysis. The answer to the first<br />
question is that in general terms this study is photogrammetry but takes as a<br />
starting point controlled photography and captured polygon data so to consider the<br />
body of work underlying theses techniques falls outside the scope of what this<br />
thesis is attempting<br />
The answer to the second question is that this study differs from previous<br />
techniques in that it pre-supposes a large amount of information form the data<br />
(features of the built environment, feature coding, water parcels, forestry parcels,<br />
roads by category, footpaths and buildings by category) so feature capture is not<br />
part of the study. A possible addition to the study would be a consideration of<br />
feature capture using pattern analysis. In particular the identification of out<br />
buildings adjoining existing dwellings would be useful. However, this is outside<br />
the scope of this study. It can be assumed from the outset most of the major<br />
physical features present in the built environment are present in the data in vector<br />
format. This narrows the application of the technique to areas that are covered by<br />
large scale mapping but results in an automatic method for adding data to this<br />
mapping. One possible application is calculating the percentage of hard ground in<br />
a region of interest.<br />
In general terms the study can be seen as specific to urban areas which have been<br />
digitally mapped at a large scale (1:2500 or 1:1000 scales). Arbitrarily segmenting<br />
an image is a technique that has been used in previous studies (Ketting &<br />
Landgrebe, 1976) but this study differs in that the segments are specific small area<br />
polygons corresponding to property divisions and physical features. Results<br />
149
shown give a more detailed picture of these areas and in this way eliminate some<br />
of the problems encountered in previous studies.<br />
There are two broad areas within the body of work on processing of remotely<br />
captured spatial data which will be considered in the remainder of this review;<br />
these can be considered spectral analysis methods, and the methods associated<br />
with identifying polygons and the deviation of data from polygon patterns.<br />
Before discussing these it might be useful to briefly run through the data capture<br />
process as it stands in a traditional mapping environment (such as in OSI, the<br />
former OSNI and OSGB). This process has been neatly explained by Bingcai<br />
Zhang and Neal Olander in their paper to the 2000 ESRI user conference. They set<br />
out the process as the captured imagery being manipulated by a user in a software<br />
package (e.g. SOCKET SET) to produce a shape file of vector data from the<br />
original hardcopy imagery. In this study the process of creating the data has<br />
already been completed (along with the prior image control as described by Zhang<br />
and Olander), so the thesis can be considered to be a method of re-visiting the<br />
imagery to add value to the captured vectors/ features. It should be noted that the<br />
process developed by this thesis is not dependant on expensive packages (such as<br />
SOCKET SET) and could be applied to lower budget environmental monitoring<br />
systems. Using low cost photogrammetric packages such as ShapeCapture there<br />
would be a trade off in terms of consistency and accuracy (Aguilar et al, 2005).<br />
Josef Kittlers 1983 Philosophical Times paper, as cited above, provides a useful<br />
introduction to the subject matter of this thesis. It was written as an attempt to<br />
summarize the various attempts that had been made towards automating the<br />
analysis of aerial imagery at that time. The text is useful not just as a background<br />
to the historical development of the field but also as an outline of the processes<br />
involved (with respect to multispectral image segmentation). The author looked at<br />
a wide number of previous studies (27 are cited) and summarized the work into a<br />
series of categories. The technology available to process imagery has evolved<br />
massively over the intervening quarter century but the basic techniques (in terms<br />
of pixel analysis) and motivations (in terms of data required) remain similar today.<br />
I have chosen this paper for the review as in some ways it sets the context for the<br />
work being undertaken, that is the “interpretation of image segments that exhibit<br />
150
similar statistical properties” (Kittler, P.323). Kittler divides image processing into<br />
six sections; (1) the sensor and (2) data collection, (4) image preprocessing, (5)<br />
segmentation and classification and (6) image interpretation. The work undertaken<br />
in this thesis relates to part five of this system, although it will differ from the<br />
work considered by Kittler’s paper in that some image interpretation will have<br />
taken palace beforehand.<br />
Kittler has identified the first part (of the segmentation and classification step<br />
described above) as analyzing remotely sensed data in order to “identify<br />
homogeneous segments in the image” (Kittler, P.324). He introduced a method for<br />
pixel-by-pixel classification to achieve this. For this method he suggests<br />
identifying and classifying all the pixels on a pixel by pixel basis and then linking<br />
identical pixels to form connected segments. In many ways this is probably the<br />
holy grail of image interpretation as if perfected a machine could automatically<br />
identify change and update maps. Kittler also proposes that segments of pixels<br />
exhibiting similar properties could be used for this purpose. This thesis does not<br />
suppose an identical method, instead it attempts to identify the proportion of<br />
pixels corresponding to hard ground in a small area polygon, subtract buildings,<br />
roads and water polygons and attach a value for impermeable surface to the area<br />
data. Kittler suggests that a Bayesian probability formula can be used to determine<br />
the class of a segment. He suggests using what he terms “ground truth data”<br />
(Kittler, P.325) for the probability function to assign pixels to a class and<br />
determine the composition of a segment. He cites the class homogeneity of land<br />
surface covers as the means to initially segment and classify the pixels. This work<br />
can be made difficult by weather conditions and instrumental scanning errors<br />
(Note: Kittler’s paper is from a time when GPS controlled measurements were not<br />
readily available and such errors are less of a factor today, though small<br />
inconsistencies can occur, particularly at high altitude).<br />
Kittler further suggests a method for partitioning the image into segments (initially<br />
into cells of 2*2 pixels as suggested by Ketting & Landgrebe, 1976). In this way<br />
he estimates that the larger size of land cover would allow an analysis to identify<br />
neighboring pixels with similar properties. He also suggests another method for<br />
identifying segments of the image which he terms “two dimensional spatial<br />
151
dependencies” (Kettling, P.330). This is the use of the four neighbours of a given<br />
pixel to identify the probability of them being part of the same group.<br />
In terms of this thesis Kettling’s paper suggests that an analysis of aerial imagery<br />
could potentially reveal large amounts of data about selected features (particularly<br />
if the segmentation has been completed already by vector mapping). Kettling’s<br />
paper does suggest that it is possible to identify homogenous areas on a pixel by<br />
pixel basis and is the focus of this thesis.<br />
5.1 Spectral and image considerations for the thesis<br />
This body of work forms the basis for what will be the main argument of this<br />
thesis, that it is possible to automatically capture spatial data relating to<br />
impervious ground in Irish towns, using controlled photography and matching<br />
vector data. This requires processing which makes use of the spectral data<br />
contained in aerial imagery of the sample data, which in turn presents a number of<br />
separate problems. One of these is bidirectional reflectance, and while this is not<br />
expected to be a major consideration while developing the algorithm for the thesis<br />
it nevertheless warrants consideration. One method of calculating for this is to<br />
adjust the imagery based on either ground sampled data or imagery from a higher<br />
(possibly satellite) vantage, and is something that was considered by Sakari<br />
Tuominen and Anssi Pekkarinen in their 2004 study of forestry in southern<br />
Finland. The authors consider a method of improving the value of data being<br />
retrieved from aerial photography (in conjunction with satellite data) by reducing<br />
the presence of bidirectional reflectance. This is a problem with the way light is<br />
hitting the surface of the earth causing the spectral values if image pixels to<br />
depend on their location in the image.<br />
One approach would be to focus any study on the centre of the image, where<br />
bidirectional reflectance would not be as big an issue. However, the study was<br />
attempting to find a more effective method of correcting this using overlaying<br />
satellite images and a correcting algorithm. The reason they chose satellite<br />
imagery was that they are less affected by bidirectional reflectance and this<br />
152
enchmark would allow the authors to conduct local adjustment for the pixel<br />
values. The study covered 4500 hectares of boreal forest located in the<br />
municipality of Kuru in the south of Finland.<br />
The core of the study is a local radiometric correction method for reducing the<br />
effect of bidirectional reflectance. This problem was not a major issue in the thesis<br />
but the methods used by the authors (finding a larger scale benchmark image to<br />
reference study areas against) was considered.At the heart of the Finnish study the<br />
problem was of similar objects possessing different spectral characteristics in<br />
different parts of the image. This was a problem which was less relevant to the<br />
focus of this study (the authors are focused on forestry data). The authors<br />
conclude, not unsurprisingly, that the value of remote sensing is dependant on<br />
what is visible and what can be registered by the airborne sensor.<br />
As mentioned in the introduction to this chapter, the body of work which<br />
examines automatic capture of hard ground within urban areas is of particular<br />
relevance to this thesis. The 2007 study by Yuyu Zhou and Yu Wang of urban<br />
examples in Rhode Island is a good illustration of the type of factors that need to<br />
be considered in this type of survey. The study, which used true-colour digital<br />
orthophotographic data with a 1m spatial resolution (forming a controlled dataset<br />
in .tiff format with red, green and blue spectral bands present) segmented the<br />
imagery according to urban districts. The authors note that “successful image<br />
segmentation is the most important prerequisite in object-oriented classification”<br />
(P.644). It is hoped that by using previously captured and verified vector data this<br />
thesis will have met this prerequisite.<br />
The algorithm which was employed for this survey was broken down into four<br />
parts; segmentation, compensation for shadow effect, analysis of variance<br />
classification and post classification of the data. There is a large body of work that<br />
has been completed using automatic interpretation of aerial imagery; the focus of<br />
this work is usually towards a specific purpose, such as the 2007 analysis of coffee<br />
crops in Costa Rica outlined by S.Cordero-Snacho and S.Adler. The study is a<br />
useful example of some of the problems that can be encountered when attempting<br />
automatic image analysis. In the study the authors consider the problem of<br />
153
identifying coffee plantations from remotely sensed data of tropical forestry. The<br />
main focus of the study was to identify a means of separating the areas of coffee<br />
from similar data (in terms of wavelength and spectral values) representing<br />
tropical forest. This problem is made even more difficult due to the fact that the<br />
coffee plantations are often set under forestry (due to the shelter the cover<br />
provides for the crop). In addition the terrain is often mountainous so the authors<br />
had the additional problems of variety in terms of elevation, associated mist/ cloud<br />
cover and shade to overcome. While these problem is not something that was a<br />
large factor in the low flown aerial photography of Irish suburban landscape used<br />
in this thesis, the methods the authors employ to deal with cloud and haze are<br />
relevant. It was advantageous to the thesis to be able to reduce the impact of any<br />
of the areas of shading that existed.<br />
The authors took rectified Landsat imagery of a large tract of land in central Costa<br />
Rica, the Central Valley surrounding San Jose. This imagery had been captured<br />
during the rainy season. They broke the study down into a series of steps, starting<br />
with classifying three different waveband combinations in the imagery. They then<br />
developed what they termed a “Coffee Environmental Stratification Model”<br />
(Coredo-Sancho & Ader, P.1581) before comparing this with supervised results<br />
(from known data). They then set out to identify which waveband combination<br />
best matched coffee crops.<br />
The identification of a control section of water which the authors used to reduce<br />
haze in their imagery helped with the development of the image key for this thesis.<br />
The method the authors used in the study was to find an area of deep water to<br />
identify a minimum reflectance value and subtract this from each of the non<br />
thermal bands. The next step the authors took was to remove clouds from the<br />
imagery, they did this by creating a binary mask using a classification of arbitrary<br />
clusters they had developed. This allowed them to recode areas which it identified<br />
as being contaminated by cloud or shadow. When this was applied the clusters<br />
were recoded to a zero value, they also digitized “isolated” (Cordero-Sancho &<br />
Ader, P.1582) clusters on a case by case basis. It is very beneficial to smooth areas<br />
of shade within a polygon. This might be done by estimating a percentage of<br />
shade that should be present in the polygon (based on time of day and vector code<br />
154
making up the boundary, e.g. fence/ forestry/ building/ water). By identifying a<br />
value that this shade should fall under it is possible to derive a corresponding<br />
relationship in the histogram and adjust the results of the study accordingly.<br />
The final result of the authors work in Costa Rica was only “moderately<br />
successful” (Cordero-Sancho & Ader, P.1589). This would seem to have been<br />
largely due to the altitude with which the imagery they used was captured, by their<br />
own admission the results would probably be better had the imagery been low<br />
flown or of better resolution. The methods the authors employed in the study are<br />
useful examples of how inconsistencies in results obtained during the process of<br />
completing this thesis might be countered –in particular potential methods for<br />
reducing the effect shade might have on altering the results could be applied.<br />
In this thesis, as with most automatic aerial imagery analysis, the classification of<br />
target areas in the photo (usually according to spectral values) is a vital part of the<br />
process. One solution is to develop a key to differentiate between features. The<br />
level of detail that can be obtained can be quite precise, but is dependant on the<br />
resolution of the imagery and the complexity in the patterns of distribution of the<br />
target. This is illustrated by Megan Lewis 1998 study of vegetation communities<br />
in an area of westerns New South Wales. In this study she attempted to develop a<br />
key to differentiate between vegetation types. The problem she was attempting to<br />
counter was that of identifying particular species. She noted that existing aerial<br />
analysis could detect the presence of vegetation and in an attempt to improve this<br />
process she divided these species into colour bands. She took sample plots of<br />
250sqm corresponding to 8*8 blocks of pixels in a relatively homogenous area<br />
and calibrated the relationship between field verified data and fifty of these blocks<br />
(using them as training areas for the study). The study used 12 colour bands which<br />
were allocated into nine classes and identified a link between these and vegetation<br />
classes. In the conclusion to the study the author noted that it was possible to<br />
portray sub-polygon variation using pixel-based imagery.<br />
In order to complete this study it was necessary to make the best use of the<br />
available imagery. This imagery can benefit from preprocessing in order to<br />
highlight the areas being captured. One method for achieving this might be to<br />
155
apply an algorithm to colour the data so as the target areas are easily captured.<br />
This is something which was identified by Thomas Knudsen in his 2005 study of<br />
pseudo natural colour aerial imagery for urban and suburban mapping (and in<br />
previous studies by the same author). In his study he suggests an algorithm for<br />
automatic urban and suburban aerial image interpretation. The paper uses test data<br />
from (pseudo) natural color images used in traditional photogrammetry (as<br />
opposed to airborne four channel imagers). His aim is to discriminate between<br />
vegetation and human made materials, which was also one of the aims of this<br />
thesis. The author cites the relative importance of separating vegetation (which he<br />
considers to be void of mapping objects) and human-made materials in respect to<br />
automated photogrammetric mapping. It is worth noting at this point that imagery<br />
captured in the near-infrared band is generally indicative of vegetation and<br />
Knudsen’s work is an attempt to identify this band using only aerial photography<br />
captured using red, green, blue three channel instruments. His work is very<br />
relevant to this thesis as over the course of his study he identifies a method of<br />
obtaining “excellent” (Knudsen, P.2691) reproduction of grey surfaces (which in<br />
an urban area correspond to paving and exposed rock).<br />
The author takes a look at three algorithms in terms of their effectiveness in<br />
discriminating between areas in scanned aerial photograph. The first is a pseudo<br />
natural color algorithm developed by the author in a previous study (Knudsen<br />
2002) where he managed to create a blue channel based on green, red and near<br />
infrared values and left the green and red values as captured. This allowed for<br />
good reproduction of red surfaces (which corresponded to roof surfaces in the<br />
Danish sample data) but suffers slightly from haze effect.<br />
The second algorithm the author considers is one which creates a blue channel<br />
between green and near infrared and a green channel form similar values and<br />
leaves the red as captured. This, similar to the first algorithm, gave good<br />
reproduction of red surfaces and of vegetation, but failed in reproducing clear grey<br />
surfaces. The third algorithm the author considers involved swapping the green<br />
data for blue, the near infrared for green and leaving the red as was. This allowed<br />
him to reproduce grey surfaces accurately (making them stand out in the<br />
156
photography) but was less useful for vegetation, leaving an “artificial looking<br />
hue” (Knudsen, P.2691).<br />
The final part of the paper sets out a method of modifying the first algorithm to<br />
improve its value in creating data for interpretation. The steps are to restore<br />
black/grey/white, vegetation-covered and red/yellow-reddish areas lost in<br />
preprocessing, to re-whiten very bright objects and to amplify the pixels (enhance<br />
the colour saturation). The author provides reference to a more detailed technical<br />
implementation (using information from previous papers he published) but these<br />
are less relevant to this thesis as the result would not be suitable for identifying<br />
hard ground.<br />
One area where there is considerable information to be gained is in the area of<br />
forestry, particularly in capturing the spread of disease or invasive species in a<br />
plantation. One such study was undertaken by M.Martin, S.Newman, J.Aber and<br />
R.Congalton in 1998. I have included it here as I think it is a good example of<br />
what appears to be a standard remotely sensed image analysis. In this study the<br />
authors set out to obtain remote data relating to tree species in an area called<br />
Prospect Hill in central Massachusetts. Their target data was species identified by<br />
11 forest cover types. To do this they used a maximum likelihood algorithm<br />
assigning all pixels in the aerial image to one of the 11 categories they were<br />
searching. The survey was validated using field data (taken from a database of<br />
species type). They note that at that time (late 19990’s) spectral data had already<br />
been used to identify categories of forest cover. These prior surveys had been<br />
successful in discriminating between coniferous and deciduous cover (the authors<br />
cite the examples of Nelson et al., 1985, Shen et al., 1985 and Landthrop et al.,<br />
1994). The primary goal was identification of species composition from the forest<br />
canopy and in this the authors had reasonable success –a random selection of<br />
pixels yielded an overall classification accuracy of 75% (Martin et al 1998). The<br />
study used photographic tiles of 10*10km with a high spectral resolution. The<br />
authors suggest further improvements in accuracy could be made by identifying<br />
the (deciduous) species with both their leaves on and off which would allow for a<br />
foliar biomass calculation to be made.<br />
157
In 2002 S. Phinn, M. Stanford, P. Scarth, A. Murray and P. Shyy attempted to<br />
apply a similar technique to that used by Megan Lewis in her 1998 study of<br />
vegetation communities, only with reference to vegetation impervious hard<br />
ground in urban areas. This study has similar goals to this thesis, but was<br />
undertaken in an Australian urban context, and does not make use of existing<br />
vector mapping and polygons. In the study by Phinn et al the authors conducted a<br />
survey of the area surrounding the city of Brisbane, in an attempt to establish an<br />
image processing method which would yield information about the urban<br />
environment. In particular they focused on vegetation –impervious surface-soil, or<br />
hard ground. They identified 60 spectral zones which they aggregated based on<br />
their location, they were able to identify distinctive zones of hard ground based on<br />
their per-pixel classification. The data used was from the Landsat5 Thematic<br />
mapper and 1:5000 scale aerial photography. Their aims were to deliminate land<br />
cover and use types, identify areas of impervious and pervious surfaces. One of<br />
the most difficult aspects they encountered was classifying non-vegetation areas<br />
into exposed soil and developed surfaces. They extracted the information along<br />
transects radiating from Brisbane’s city centre. The study noted that water bodies<br />
and vegetation were “separate in all spectral bands” (Phinn et al, 2002). The<br />
maximum separation along the cleared hard ground was along bands 3, 4 and 7.<br />
the authors concluded that the increased resolution enabled more detailed<br />
assessment of the surface.<br />
One recurring theme within these studies of spectral analysis of aerial<br />
photography (and satellite imagery) was that the success of the study is dependant<br />
on three main factors; the resolution of the photography, the correct segmentation<br />
of the target areas and an accurate knowledge of the colour bands which apply to<br />
the study. To a lesser extent it is also important to introduce methods to correct for<br />
haze, cloud cover and shade. These last three considerations are not the focus of<br />
this study; the fact that they warrant the complete focus of previously published<br />
papers indicates that it would not be possible to completely eliminate their<br />
presence. By applying some of the aspects of masking (Cordero-Sancho & Adler)<br />
and adjusting for shade (Tuominen & Pekkarinen) it might be possible to reduce<br />
the influence of these in the outcome to within an acceptable error margin for high<br />
158
flown and satellite imagery. This was not necessary in this thesis due to the clarity<br />
of the photography.<br />
5.2 Vector and polygon based studies of aerial photography<br />
I have labeled this body of knowledge of aerial (and satellite) image processing as<br />
vector and polygon based as the trend that unites the studies is the fact that the<br />
authors sought a pattern or shape based method for extracting the information. As<br />
with spectral (and hybrid spectral and spatial) methods, the underlying cause for<br />
the studies can vary, from understanding Alaskan watercourses (van der Werff &<br />
van der Meer, 2008) to examining the built environment in Moscow (Dudarev,<br />
2009). I did not make use of a particular algorithm or technique from these studies,<br />
but have considered them in this review for their potential in offering a method for<br />
sub dividing small area polygons. It should be noted that there appears to be a<br />
point where pattern analysis is less beneficial, such as the 500pixel minimum<br />
suggested by H. van der Meer and F. van der Meer.<br />
This thesis makes use of building polygons captured from the Irish peri-urban<br />
landscape. These served both as indicators as to the type of land use in the small<br />
area polygon surrounding them (and possibly a spectral control in terms of the<br />
roof tile value). In terms of pattern analysis any automated identification of<br />
modifications or new buildings should recognize the polygon outline as a building.<br />
This is a particular problem in peri-urban areas, in a rural context newly built<br />
slatted sheds etc. will conform to a standard outline but the shapes are more varied<br />
in urban areas. This is particularly so in the peri-urban Irish landscape where one<br />
off housing and a fashion for ugly looking (in terms of aerial analysis) extensions<br />
and sections of building jutting from a main structure mean that establishing a<br />
template pattern for dwellings would be difficult. Outside of considering other<br />
data (such as presence of tarred road etc.) an automated study relying on pattern<br />
identification would need a complicated signature algorithm. This was the basis<br />
for Roman Duradevs 2009 study of building polygon signature point definition. In<br />
this he was considering buildings in context of the city of Moscow, but intended<br />
the algorithm he created for use in a wider variety of data sets. The study tied in<br />
159
with the authors work for a software development company (Enterra) which is<br />
involved in developing GIS software and the algorithm was also an attempt to find<br />
a solution to identifying building signatures within the software. The paper<br />
develops a work around for identifying polygons which are consistent with<br />
buildings on a map. He describes this as an ordered point set. This could provide<br />
useful background information should spatial deviation pattern recognition ever<br />
become an option. It is not within the scope of the thesis to modify the algorithm<br />
but nevertheless Duradev’s study provides a possible alternative to the spectral<br />
deviation method of identifying additional data.<br />
The author notes that building signatures in many cases will not look “neat and<br />
beautiful” (Duradev, P.109). By this he is referring to the fact that the polygon<br />
will not conform to a standard shape which would be easily identified. It should be<br />
noted that the author acknowledges that a similar algorithm exists for the product<br />
of another software provider, ESRI (with Arc <strong>View</strong> software) and that the test of<br />
the algorithm was carried out on uncoordinated data; implying that any application<br />
of the algorithm in this thesis would involve a high level of modification for<br />
something that could be obtained from a desktop application. It is however, useful<br />
to consider the methodology for breaking down the problem (the author outlined<br />
an implementation algorithm before considering the process steps required).<br />
Duradev broke down the necessary implementation into five steps, starting with a<br />
search for a convex polygon inscribed within the shape being identified. He then<br />
suggested that if the polygon shape returned from this was bigger than the<br />
maximum (original shape) then the new polygon should be considered as the<br />
maximum, otherwise another convex polygon within the shape should be<br />
identified. This step is repeated until all the polygons have been searched at which<br />
point the centre of the polygon is searched and the result returned.<br />
Duradev then stated the mathematical steps that would be required to implement<br />
the steps he outlined; namely –take a start point (on the polygon) for the search,<br />
take the neighbor vertex in a set rotation –if this point matches the starting point,<br />
return the resulting polygon; if the point addition results in a positive vertex then it<br />
is added to the polygon, otherwise the neighbor vertex is selected again. This<br />
160
algorithm is accompanied by sample C code which can be used to test it. I believe<br />
it could be used if a method of spectral analysis of aerial photography could be<br />
developed to return sharp enough edge detail to identify the component points of<br />
these polygons. This seems unlikely in relation to the imagery (and processing<br />
techniques) that are currently available and the algorithm is probably of more use<br />
in a situation where the vector detail had already been manually captured (in<br />
which case the appropriate building code should also be present).<br />
Much of the other work involved in manipulation of polygons could be said to fall<br />
under the banner of graphic editing of GIS data (as could also be the case with<br />
Duradevs algorithm). This type of work (physically manipulating and extracting<br />
specific polygons in vector format) would be of particular significance it pattern<br />
analysis was being used in this study. If particular patterns could be identified then<br />
algorithms for clipping and determining intersections between polygons, such as<br />
the one developed by Kui Liua et al in 2007 would be a central part of the process.<br />
This would then mean the study would take polygon edges as the basis for<br />
captured data and perform calculations to construct an output polygon. As these<br />
polygons have already been manually captured, this body of work slightly less<br />
relevant. That is not to say that they would not be of central value to an automated<br />
image processing technique should it become viable.<br />
In terms of studies this concept has been the focus of a lot of effort -such as Pal &<br />
Foodys 2009 feature selection study that showed that accuracy of classification<br />
declines with additional features when using support vector machines. The fact<br />
that an underlying verifiable automatic technique for identifying change in<br />
photography and converting it to accurate vector data has not been yielded from<br />
these studies indicates that it is probably something that will always be specific to<br />
the terrain being analyzed. This work is beyond the scope of this thesis so it<br />
focused on an aspect of polygon identification that could be applied to a more<br />
general spectral analysis. One previous attempt at this is H.van der Werff and F.<br />
van der Meer’s 2008 study into shape based algorithm for identifying spectrally<br />
identical objects. In this study the authors took a look at the potential of shape<br />
signatures in aerial imagery in order to establish a means of identifying and<br />
classifying the object. They look at three broad methods for this; solely shape<br />
161
ased analysis, solely spectral based analysis and a combined “spatial-spectral<br />
classification” (H. van der Weff & F. van der Meer, P.251). These studies are<br />
slightly different from the one being undertaken in this thesis in that the shape and<br />
classification of much of the data will already be known however, the study is<br />
useful to this thesis in that it suggests the potential for a method of identifying new<br />
farm buildings based on a similar classification. The authors are seeking a method<br />
to enhance pixel-based spectral classifications (as will be used in this thesis) by<br />
adding spatial information. It is worth noting that the results of the study were not<br />
satisfactory in terms of automatically correctly identifying features.<br />
The first step the authors used to determine the shape of the areas being examined<br />
was to “seed” (H. van der Meer & F. van der Meer, P.252) the object. This<br />
involved beginning an object with a single pixel of a set value and increasing the<br />
size of the area until a spectral variance in a non-overlapping 3*3 pixel occurs.<br />
This part of the study continued until all the image pixels were segmented into<br />
objects. The authors noted that size was a factor at this point and objects of 500<br />
pixels or less were more successfully determined. The study itself was looking at<br />
parts of Alaska, and the objects being classifies were water bodies; i.e. separating<br />
streams from ox-bow lakes, thaw waters from rivers and sediment rich water. This<br />
is a difficult task due to the relative random nature of these shapes when compared<br />
to a well defined linear pattern that can be observed in the Irish landscape. The<br />
authors conclude (in the case of water bodies) “(that) an object should consist of<br />
approximately 500 pixels at minimum to be able to use the absolute value of shape<br />
measurements” (H. van der Meer & F. van der Meer, P.257).<br />
The authors created a combined analysis method by classifying shapes according<br />
to threes spectral bands from the imagery being used and comparing the results<br />
against the pixel based shape measurements. Using these results they were unable<br />
to distinguish between the water bodies being considered by the study and the<br />
authors suggest that further research is required to better combine the two (shape<br />
and spectral) classifications. In some ways this thesis is a continuation of this, in<br />
that it will be using a spectral analysis in combination with spectral signatures (in<br />
the form of previously captured and coded vector data). The aim the authors had<br />
was to established a means of measurement using an “unbiased software<br />
162
algorithm” (H. van der Meer & F. van der Meer, P.257), this would seem difficult<br />
to achieve in the case of a relatively chaotic Alaskan wilderness but might be<br />
better applied to peri-urban land parcels.<br />
Another example of a study which combines a number of different aspects of<br />
remote sensing to analyze aerial data in the 2007 random field model for urban<br />
area detection developed by Ping Zhong and Runsheng. In this study the authors<br />
presented a method for interpreting remote images of urban environments that<br />
makes use of what they call “conditional random fields” (Zhong and Wang, 2007).<br />
The study is a response to the fact that although considerable research has been<br />
completed on land cover analysis, the algorithms generally adapt for only a<br />
narrow range of image resolutions and therefore only a few types of urban areas.<br />
They see previous attempts at urban analysis as being based on either gray-level-<br />
based spectral analysis or using texture descriptors. They further note that edge<br />
strength measures can be used to extract homogenous regions. This is an<br />
interesting concept, and may have an application in the automatic capture of large<br />
utility features in rural areas, such as silage pits.<br />
The authors establish a discriminative method for identifying regions in the<br />
photography based on interactions with the neighboring regions. This allows them<br />
to utilize the conditional random fields in terms of context to identify areas. The<br />
authors broke this technique into the jobs of configuring the features, selecting<br />
classifications and classifier fusion. The proposed algorithm compares the fields<br />
against the data segments and places them in a classified segmented model; the<br />
authors compare their results against two previous algorithms, Stacked Feature<br />
Based (where a number of different feature types a re concatenated into one model)<br />
and Straight Line Statistics (where areas of high incidence are used to identify<br />
urban areas). They observed a higher output rate against the first method (based<br />
on time on a 2.4Dhz Pentium machine) and decreased accuracy in detecting<br />
smaller rural areas against the second (where straight line statistics were not<br />
effective against urban areas smaller than 400*400pixels). The method the authors<br />
use, of allowing each component part of the search to train based on “its own<br />
aspects” (Zhong & Wang, 2007) appeared to give positive results against the 60<br />
training and 91 test images used, and was able to successfully identify blocks of<br />
163
16*16 pixels as urban or nonurban. The results of the study gave 85.3% accuracy<br />
in terms of correctly identifying blocks as urban (Zhong & Wang, P.3986). The<br />
overall methodology is probably best suited to a larger study area, however it may<br />
be possible to apply the multiple conditional random fields model to a smaller<br />
scale with success.<br />
A further example of similar methodology being applied to aerial data on a large<br />
scale is the 2009 study of the Guangzhou urban area by Fenglei Fan, Yunpeng<br />
Wang, Maohui Qiu and Zhishi Wang. Although similar results to what the authors<br />
achieved in their much larger study would be an effective failure of this thesis the<br />
study indicates that it is possible to determine a lot through automatic image<br />
analysis, even with the disadvantage of poor imagery, random settlement patterns<br />
and a large test area. In the study the authors set out to examine urban growth as<br />
experienced by the people of Guangzhou (a city of 7.5 million inhabitants in the<br />
southern Chinese Guangdong province). They were limited by available imagery –<br />
their study attempted to extract urban areas from a series of images dating back to<br />
the 1970’s and some cycles were not available. They determined that fractal<br />
geometry was useful in studying the development of the city and that a “fractal<br />
dimension index is an effective index to evaluate urban form” (Fan et al, 2009).<br />
The study area covered 3178sqkm and took five separate years as sample points in<br />
time to identify a pattern in the city’s development.<br />
The data capture was completed using a maximum likelihood algorithm<br />
performed on the images. The algorithm took in seven categories to classify the<br />
imagery with; the target urban settlement, forestry, cropland, orchard, natural<br />
water, artificial water and bare land (vegetation free surface area outside the urban<br />
settlement). In order to verify the accuracy of this classification the authors took<br />
reference data captured from fieldwork and separate land use mapping and<br />
sampled the results of their study against each category in the reference. They<br />
achieved an accuracy in correct classification of over 80% using this method. The<br />
study completed segmentation of the imagery by using two transects, running<br />
from west to east, comprising nine blocks of 1306130pixels and south-west to<br />
north-east, comprising ten blocks of the same quantity of pixels.<br />
164
The Guangzhou study is useful in proving that a relatively high level of accuracy<br />
can be obtained when automatically capturing urban data over a large scale. This<br />
data was improved on by making use of a smaller area for this thesis using higher<br />
resolution imagery and additional indicators (vector and code data imported from<br />
large scale mapping).<br />
This thesis is fortunate not to have the variety in landscape patterns that previous<br />
studies have had to contend with, which meant that a high accuracy level was<br />
possible. In much of the available literature the studies are completed on a very<br />
large scale (as in the previous two papers) with very specific data in mind. They<br />
attempt to identify particular plant species or types of urban development. The<br />
methodology being used for this study may benefit from applying some of those<br />
techniques to a more stable sample. There are several advantages present in the<br />
area being targeted. The temperate nature of the Irish climate means that areas<br />
which are not developed will be covered by vegetation, so may fall into the near<br />
infrared category, while areas under development should display values consistent<br />
with earthworks or paving. At the outset of the study it was expected that most<br />
roofing would fall within a relatively small range of colors and could be used to<br />
calibrate the search. This was not the case, however, tarmac road data proved a<br />
useful replacement in terms of consistent spectral property throughout the image.<br />
The thesis looked at a very specific aspect of this body of knowledge and attempt<br />
to bridge the gap between automatic aerial data capture and traditional<br />
photogrammetric methods. It is noted that in most of the study areas the authors<br />
did not benefit from the availability of large scale coded vector data and the<br />
premise for the study was that if this is available then the accuracy of automatic<br />
capture can be increased. At the core of all of the literature mentioned in this<br />
review is the classification of imagery (with the exception of the point signature<br />
algorithm proposed by Duradev). In the course of this review I encountered one<br />
study which posed one of the same questions that are considered in this thesis; can<br />
the use of geometric information increase classification accuracies in aerial image<br />
processing? This study (Bellens et al, 2008) proposed a method of morphological<br />
profiling to improve the data capture. The authors identified “substantial<br />
improvement” (Bellens et al, P.2803). The study points out that urban areas such<br />
165
as roads and car parks are so similar spectrally that they cannot be separated by a<br />
spectral analysis alone. They further divide spectral analysis into pixel-based or<br />
object-based. The object based methods group pixels together in a meaningful<br />
way, something which the authors identify as a difficult task. It is the intention of<br />
this thesis to use the former method. The authors identify a method for<br />
automatically obtaining structuring elements to help construct the segmentation<br />
(such as solid rectangular objects, roofs etc.), allowing a shape index to help<br />
extract man-made structures from the image.<br />
One observation that can be made from the available literature on automatic aerial<br />
image processing is that even with accurately segmented imagery (such as clearly<br />
divided vegetation and urban areas) a considerable amount of work is involved in<br />
training the algorithms to classify target areas. The creation of a standard key,<br />
which can be extended by the user, became one of the main focuses of this study.<br />
This thesis enables the user to reduce the workload by presenting a method for<br />
quickly calibrating an automated search.<br />
166
6 References<br />
Geospatial Data Abstraction Library (2010) GDAL utility programs. Retrieved on<br />
18th August 2010 from: http://www.gdal.org/gdal_utilities.html<br />
Universedade do Algarve (2010) MIRONE. Retrieved on 8th July 2010 from:<br />
http://w3.ualg.pt/~jluis/mirone/<br />
PCI Geomatics (2010) Geomatica. Retrieved on 5th June 2010 from:<br />
http://www.pcigeomatics.com/index.php?option=com_content&view=article&id=<br />
5&Itemid=4<br />
<strong>Open</strong>EV (2006) Geospatial Toolkit. Retrieved on 5th June 2010 from:<br />
http://openev.sourceforge.net/<br />
Josef Kittler (1983) Image processing for remote sensing.<br />
Philosophical Times, 309, 323-335.<br />
Thomas Knudsen (2005) Pseudo natural colour aerial imagery for urban and<br />
suburban mapping.<br />
Int. Journal of Remote Sensing, Vol. 26, No.12, 2689-2698<br />
Roman Dudarev (2009) Plain Polygon Signature Point Definition Algorithm.<br />
Survey and Land Information Science 69, No2.<br />
S. Cordero-Sancho & S.A.Ader (2007) Spectral analysis and classification<br />
accuracy of coffee crops using Landsat and a topographic-environmental model<br />
International Journal of Remote Sensing Vol. 28, No. 7, 10 April 2007, 1577–1593<br />
167
H. van der Werff and F. van der Meer (2008) Shape-based classification of<br />
spectrally identical objects.<br />
ISPRS Journal of Photogrammetry & Remote Sensing 63, 251-258<br />
S. Phinn, M. Stanford, P. Scarth, A. Murray and P. Shyy (2002) Monitoring the<br />
composition of urban environments based on the vegetation–impervious surface–<br />
soil (VIS) model by sub pixel analysis techniques.<br />
International Journal of Remote Sensing, vol. 23, no. 20, 4131–4153<br />
Sakari Tuominen and Anssi Pekkarinen (2004) Local radiometric correction of<br />
digital aerial photographs for multi source forest inventory.<br />
Remote Sensing of Environment 89, 72–82<br />
Manuel A. Aguilar, Fernando J. Aguilar and Francisco Aguilera (2005) Mapping<br />
small areas using a low-cost close range Photogrammetric package with aerial<br />
photography. The Photogrammetric Record 20(112): 335–350<br />
Yuyu Zhou' and Y.Q. Wang' (2007) An Assessment of Impervious Surface Areas<br />
in<br />
Rhode Island<br />
NORTHEASTERN NATURAUST I4 {4):643-650<br />
Megan M. Lewis (1998) Numeric classification as an aid to spectral mapping of<br />
vegetation communities. Plant Ecology 136: 133–149<br />
Yong Kui Liua, Xiao Qiang Wanga, Shu Zhe Baoa, Matej Gombosib, Borut Zalik<br />
(2007) An algorithm for polygon clipping, and for determining polygon<br />
intersections and unions<br />
Computers & Geosciences 33 (2007) 589–598<br />
Rama Rao Nidamanuri Bernd Zbell (2010) A method for selecting optimal<br />
spectral resolution and comparison metric for material mapping by spectral library<br />
search Progress in Physical Geography 34(1) 47–58<br />
168
Xiaoping Liu, Xia Li, and Xiaohu Zhang (2009) Determining Class Proportions<br />
Within a Pixel Using a New Mixed-Label Analysis Method<br />
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING<br />
Mahesh Pal and Giles M. Foody (2009) Feature Selection for Classification of<br />
Hyperspectral Data by SVM<br />
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING<br />
Bingcai Zhang and Neal Olander(2000) How to get GIS Data from Imagery. ESRI<br />
user conference 2000 proceedings.<br />
Retrieved on the 2 March 2010 from:<br />
proceedings.esri.com/library/userconf/proc00/professional/papers/pap427/p427.ht<br />
m<br />
Ping Zhong and Runsheng Wang (2007) A Multiple Conditional Random Fields<br />
Ensemble Model for Urban Area Detection in Remote Sensing Optical Images.<br />
IEEE Transactions on geoscience and remote sensing, Vol. 45, No. 12<br />
Fenglei Fan, Yunpeng Wang, Maohui Qiu and Zhishi Wang. (2009) Evaluating<br />
the Temporal and Spatial Urban Expansion Patterns of Guangzhou from 1979 to<br />
2003 by Remote Sensing and GIS Methods. International Journal of Geographical<br />
Information Science, Vol. 23, No. 11, 1371–1388<br />
M. E. Martin, S. D. Newman, J. D. Aber, and R. G. Congalton (1998)<br />
Determining Forest Species Composition Using High Spectral Resolution Remote<br />
Sensing Data. Remote Sens. Environ. 65:249–254 (1998)<br />
Nelson, R. F., Latty, R. S., and Mott, G. (1985), Classifying<br />
northern forests using Thematic Mapper Simulator data.<br />
Photogramm. Eng. Remote Sens. 50:607–617.<br />
169
Shen, S. S., Badhwar, G. D., and Carnes, J. G. (1985), Separability of boreal forest<br />
species in the Lake Jennette area,<br />
Photogramm. Eng. Remote Sens. 51:1775–1783.<br />
Lathrop, R. G., Aber, J. D., Bognar, J. A., Ollinger, S. V., Casset, S., and Ellis, J.<br />
M. (1994), GIS development to support regional simulation modeling of<br />
northeastern (USA) forest Analysis (W. Michener, J. W. Brunt, and S. Stafford,<br />
Eds.), Skidmore, A. K. (1989), An expert system classifies eucalypt<br />
Taylor and Francis, London, pp. 431–451.<br />
Rik Bellens, Sidharta Gautama, Leyden Martinez-Fonte, Wilfried Philips,<br />
Jonathan Cheung-Wai Chan, and Frank Canters (2008) Improved Classification of<br />
VHR Images of Urban Areas Using Directional Morphological Profiles<br />
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 46,<br />
NO. 10.<br />
170