Towards soft classification of satellite data A case study based ... - KTH

Towards soft classification of satellite data 

A case study based upon Resurs MSU-SK satellite data and land 

cover classification within the Baltic Sea Region. 

Anna Haglund 

Master of Science Project no. In Geoinformatics 

Royal Institute of Technology 

Department of Geodesy and Photogrammetry 

Stockholm, Sweden 

March 2000

Universitetsservice US AB 

Stockholm 2000 

08-790 7400

Preface 

This master of science project has been performed at the department of geodesy and 

photogrammetry, division for geoinformatics by assignment of Satellus, during the autumn 

1999. The project comprises 20 credits and is the completion of the education for technical 

surveyors at the Surveying Program at the Royal Institute of Technology, KTH, in Stockholm. 

I would like to thank my supervisors at KTH: Sindre Langaas, dept.of Civil and 

Environmental Engineering and Maria Roslund, dept. of Geodesy and Photogrammetry. 

I would also like to thank Satellus in Solna, where I have performed the project. 

Especially I want to thank Kjell Wester, who has been my supervisor at Satellus and has 

given many technical advises and come up with ideas. 

I 

Stockholm, mars 2000 

Anna Haglund

Sammanfattning 

Vid användning av medelupplösande satellitbilder med en pixelstorlek på 150-250 m på 

marken kan det vara svårt att välja homogena träningsytor. Pixelstorleken är också större än 

många objekt på marken, varför de kanske försvinner i en ”hård klassning”, där varje pixel 

bara kan tilldelas en klass. Ibland vill man att varje pixel ska innehålla så många klasser som 

det är i verkligheten. För att klara dessa problem har s k. mjuka klassificerare utvecklats. 

Syftet med rapporten är att undersöka hur många klasser som kan fås från 

medelupplösande satellitdata när man använder högupplösande data som referens för 

generering av träningsytor och fastställa hur bra klassificeringen blir. 

Undersökningen gjordes med delar av två RESURS-bilder från två olika tidpunkter, med 

tre spektrala band, över en del av södra Sverige. Som referens användes svenska 

landtäckedata (SLD). För fyra olika samplingsmetoder jämfördes resultatet från IDRISIs 

moduler för mjuka klassificerare med varandra. Det skattade medelfelet varierade mellan 

klasserna och var minst för de små klasserna och förekomsten för varje klass jämfört med 

referensbilden skiljde sig relativt mycket. En orsak till den bristfälliga överensstämmelsen kan 

delvis bero på att referensbilden inte är helt korrekt. Referensbilden är baserad på information 

från den digitala topografiska kartan, varifrån klassgränser tagits och klassning oftast skett 

bara innanför gränsen med maximum-likelihood. Med antagandet att referensbilden är korrekt 

kommer felaktigheten i referensdata att skyllas på RESURS-klassningen och därmed försämra 

den bedömda noggrannheten. Passningen mellan RESURS-scenerna och referensbilden är 

också en felkälla liksom passningen RESURS-scenerna sinsemellan. Så det finns både 

tematiska och geometriska fel. En annan orsak är för få träningsytor tillsammans med att 

klasserna inte alltid är spektralt separerbara. 

En förbättring av mjukvaran måste också göras innan någon användning i produktion kan 

bli aktuell. 

III

Abstract 

When using medium resolution satellite images with a pixel size of 150-250 m on the ground 

it can be difficult to collect homogenous training sites. The pixel size is also larger than many 

objects on the ground, why they might disappear in a “hard classification”, where every pixel 

just can contain one class. The wish is often that every pixel should contain as many classes 

as there are in reality. To manage these problems, so called soft classifiers have been 

developed. 

The purpose of the paper is to examine how many classes that can be extracted from 

medium resolution satellite data when using higher resolution satellite data as reference and 

how well the classification performs. 

This examination was conducted using parts of two RESURS images from two different 

dates, with three spectral bands, covering a section of the southern parts of Sweden. As 

reference the national land cover classification was used. For four different samplings I 

compared the results from the IDRISI modules for soft classification with each other. The 

RMS errors varied between the classes and were smallest for the small classes and the 

occurrence in the image for each class compared to the reference image differed relatively 

much. One reason for the insufficient agreement can partly be due to that the reference image 

is not completely correct. The reference image is based on information from the digital 

topographical map, from which class boundaries has been taken and the classification mostly 

been performed within these, with maximum-likelihood. With the assumption that the 

reference image is correct, the errors in this will be blamed on the RESURS classification and 

thus worsen the assessed accuracy. The geometric correction between the RESURS images 

and the reference image, as well as the geometric correction between the RESURS images 

them selves, are also an error source. So there are both thematic and geometric errors. One 

other reason is too few training sites along with not always spectrally separable classes. 

There need to be done some improvements in the software before any use in production can 

be of interest. 

IV

Contents 

PREFACE...................................................................................................................................................................I 

SAMMANFATTNING.......................................................................................................................................... III 

ABSTRACT.............................................................................................................................................................IV 

CONTENTS ............................................................................................................................................................. V 

1. INTRODUCTION ................................................................................................................................................1 

1.1BACKGROUND...................................................................................................................................................1 

1.2 OBJECTIVES......................................................................................................................................................1 

1.3 THE BALANS PROJECT ..................................................................................................................................1 

1.4 STRUCTURE OF THE THESIS..............................................................................................................................2 

2. MEDIUM RESOLUTION SENSORS...............................................................................................................3 

2.1 AVAILABLE SENSORS .......................................................................................................................................3 

2.2 FUTURE SENSORS .............................................................................................................................................4 

3. A REVIEW OF SOFT CLASSIFIERS..............................................................................................................5 

3.1 INTRODUCTION.................................................................................................................................................5 

3.2 FUZZY SET THEORY.........................................................................................................................................6 

3.2.1 Fuzzy c-means (k-means) ........................................................................................................................7 

3.2.2 Applications with Fuzzy c-means classifier............................................................................................7 

3.3 SOFTENING MAXIMUM LIKELIHOOD CLASSIFIER (MCL).................................................................................8 

3.3.1 Applications with softening MCL ...........................................................................................................8 

3.4 LINEAR MIXTURE MODELLING.........................................................................................................................8 

3.4.1 Applications with linear mixture modelling ...........................................................................................9 

3.5 METHODS BASED ON BAYESIAN PROBABILITY THEORY .................................................................................9 

3.5.1 Applications with Bayesian probability theory ....................................................................................10 

3.6 METHODS BASED ON DEMPSTER-SHAFER THEORY ......................................................................................10 

3.6.1 Applications with Dempster-Shafer theory ..........................................................................................11 

3.7 NEURAL NETWORK........................................................................................................................................11 

3.7.1 The Multilayer Perceptron Classifier using feed-forward and back-propagation.............................11 

3.7.2 Applications with Multilayer Perceptron Classifier ............................................................................13 

3.7.2 Neural Network using Cascade-correlation.........................................................................................14 

3.7.3 Applications with Cascade-correlation................................................................................................14 

3.7.4 Fuzzy ARTMAP......................................................................................................................................14 

3.7.5 Applications with Fuzzy ARTMAP........................................................................................................16 

4. DATA AND METHODS ...................................................................................................................................17 

4.1 INTRODUCTION...............................................................................................................................................17 

4.2 DATA ..............................................................................................................................................................17 

4.2.1 Satellite data ..........................................................................................................................................17 

4.2.2 Reference data .......................................................................................................................................18 

4.3 STUDY AREA...................................................................................................................................................19 

4.3.1 General description of the Baltic Sea region landscape......................................................................19 

4.3.2 Quantitative characterisation ...............................................................................................................20 

4.4 TOOLS.............................................................................................................................................................20 

4.4.1 IDRISI ....................................................................................................................................................20 

4.4.2 ERDAS IMAGINE .................................................................................................................................21 

4.5 METHODOLOGY..............................................................................................................................................22 

4.5.1 Outline....................................................................................................................................................22 

4.5.2 Initial data preprocessing .....................................................................................................................22 

4.5.3 Choice of soft classifiers .......................................................................................................................24 

V

4.5.4 Application of soft classifiers................................................................................................................24 

4.5.4 Evaluation and comparison of soft classifiers .....................................................................................25 

5. RESULTS ............................................................................................................................................................27 

5.1 INTRODUCTION...............................................................................................................................................27 

5.2 FUZZY SIGNATURES .......................................................................................................................................27 

5.3 RMS ERRORS .................................................................................................................................................27 

5.4 OVER- AND UNDER CLASSIFIED PIXELS .........................................................................................................29 

5.5 AVERAGE VALUES..........................................................................................................................................33 

6. DISCUSSION......................................................................................................................................................40 

7. CONCLUSIONS AND RECOMMENDATIONS..........................................................................................41 

REFERENCES.........................................................................................................................................................42 

APPENDIX 1: ACTION STEPS FOR SOFT CLASSIFICATION...............................................................................44 

APPENDIX 2: MACROFILE- SCHEME ..............................................................................................................47 

VI

1. Introduction 

1.1 Background 

Satellite images have been used for land cover classification ever since the first earth resource 

satellites were launched. The most used method for classification has been the maximumlikelihood 

algorithm. Recently, medium resolution satellite data (150-250 m) has begun to be 

provided. Also then was the maximum-likelihood algorithm used in classification. There have 

been several studies done with the maximum-likelihood classifier. One made with medium 

resolution satellite data was done by Malmberg and Furberg (1997). The last five years there 

has been a progress on the method side. New classifiers have been developed – soft 

classifiers. They have been developed as a consequence that coarse- and medium resolution 

satellite data often contain mixed pixels, e.g. forests have no sharp boundaries between 

coniferous- and deciduous forest. 

It is a new and exciting field. These new soft classifiers can extract more information than 

the maximum-likelihood classifier. 

1.2 Objectives 

The main objective is to evaluate and compare the suitability of some soft classifiers to 

deduce land cover information in a patchy landscape using medium resolution satellite data, 

in particular for a study area in the Baltic Sea region for the purpose of the BALANS project 

(Landcover Information for the Baltic Sea Drainage Basin). 

Secondary objectives are to provide information about present and future satellites with 

sensors that provide medium resolution data and make a review of the available soft 

classifiers today and there requirements on input- and reference data. 

1.3 The BALANS Project 

The Remote Sensing Technology Division at Satellus, a subsidiary to the Swedish Space 

Corporation, has in partnership with six organisations in Sweden, Norway, Finland and 

Poland, started a project called BALANS. The BALANS project’s aim is to develop a 

satellite-based land cover database for the complete Baltic Sea Drainage Basin, see figure 1, 

using medium resolution satellites (Olsson 1999). Many organisations in the Baltic Sea region 

have a common need for basic geographical data sets. Up to now the main need has been from 

the environmental, hydrometeorological and marine sciences, but now there are growing 

requirements from the physical planning, political and business communities (Langaas 1996). 

Today, there are many growing pressures on the Baltic Sea. There is a critical need for 

development that is sustainable, both economically and environmentally. 

The database is envisaged to have a maximum resolution in the order of 200 meters. 

The project will last for three years, and will be finished sometime in the end of year 2001 

or the beginning of year 2002. 

1

Figure 1. The Baltic Drainage Basin and the BALANS study area 

(adapted from http://www.grida.no/baltic/htmls/maps.htm). 

1.4 Structure of the thesis 

In chapter 2, a review of present and future satellites with medium resolution satellite data is 

provided. Chapter 3 is a review of available soft classifiers and some applications made with 

those. Chapter 4 contains the data and method used in this study, chapter 5 the result and 

discussion, followed by chapter 6 with the conclusions. 

2

2. Medium Resolution Sensors 

2.1 Available sensors 

Medium resolution sensors are sensors that have a pixel size covering the ground of about 

150-250 meter. They bridge the gap in image extent and spatial resolution between 

SPOT/Landsat TM (20-30 m) and NOAA AVHRR (1.1 km) and are appropriate for mapping 

of large areas, since not so many images are needed and the images are not too coarse. 

Two satellites with medium resolution sensors covering the Nordic area are the Russian 

satellite RESURS-O1 and the Indian Remote Sensing satellite, IRS. The satellites have 

different sensors. RESURS-O1 has a MSU-SK sensor and IRS has a wide field sensor (WIFS), 

see descriptions and data in table 1. The only available MSU-SK data today is archived 

RESURS-O1 #3 data (SSC Satellitbild 1999). One orbit period takes 98 minutes and the 

repetition cycle for that satellite is 21 days, with a potential coverage of a specific area every 

third to fourth day at the equator. Every scene covers 600 x 600 km 2 . It is a wide-angle 

instrument with a conical scan where every pixel has the same size and viewing angle. The 

data from the curved scanning lines are resampled by cubic convolution from the nominal 170 

meters raw pixels to 160 meter pixels to obtain a quadratic image grid. 

The WIFS has two spectral bands, the red and the near infrared, with the applications to see 

chlorophyll absorption of plants and to survey biomass and see waterbodies, respectively 

(euromap 1997). One orbit around the earth takes 101.35 minutes to complete and it takes 341 

orbits to cover the entire earth. The repetition cycle for the IRS satellites is 24 days, but IRS 

1C/D works as a system so their combined repetition is 12 days. At the Nordic latitudes a 

specific area is covered every 1-2 days. Also the WIFS’s scanning lines are resampled by cubic 

convolution to obtain a quadratic image grid. 

Both satellite types have large swath width. The largest difference between the satellites is 

the number of wavelength bands. While RESURS-O1 MSU-SK has five bands: red, green, two 

near infrared (NIR) and one thermal infrared (TIR), IRS WIFS only has two: the red and near 

infrared. But in practice the TIR band in MSU-SK cannot be acquired in parallel with the 

others. Experiences from the data indicates that there are better radiometric dynamics in data 

from WIFS on IRS than in the data from MSU-SK on RESURS-O1 #3 (Hyyppä 1999). 

3

Table 1. Summary of satellite and sensor data. 

Satellites: 

RESURS-O1 MSU-SK IRS-1C WiFS IRS-1D WiFS 

Orbit sun-synchronous, circular sun-synchronous, circular, near polar 

Average altitude 678 km 817 km 

Inclination 98.04 deg 98.69 deg 

Orbit period 98 min 101.35 min 

Orbit repeat cycle 21 days 12 days (24 days for each) 

Launch date 4/11 1994 28/12 1995 29/12 1997 

Sensors: 

Imaging mechanism Conical scan 2048 element linear array CCD 

Viewing angle 39 deg ? 

Spectralband: Wavelength: pixelsize: Wavelength: pixelsize: 

1: Green 0.5-0.6 µm 160 m - 

2: Red 0.6-0.7 µm 160 m 0.62-0.68 µm 188 m 

3:NIR 0.7-0.8 µm 160 m 0.77-0.86 µm 188 m 

4: NIR 0.8-1.1 µm 160 m - 

5: TIR 10.4-12.6 µm 600 m - 

Swath width 600 km 810 km 728-812 km 

Potential coverage of same area 3-4 days at the equator 1-2 days at Nordic latitudes 

Overlap ? Ca. 80% 

2.2 Future sensors 

There are two new medium resolution sensors coming. These are MODIS on the EOS AM-1 

and MERIS on the ENVISAT satellite (Hyyppä 1999). 

MODIS has a cross-track scan mirror and a set of linear arrays with spectral interference 

filters located in four focal planes. It has a viewing swath width of 2330 km and will cover the 

earth every 1-2 days. It will have 36 spectral bands in the range of 0.4-14.4 µm optimized for 

measuring surface temperature, ocean colour, global vegetation, cloud characteristics, aerosol 

concentrations, temperature and moisture soundings, snow cover and ocean currents. The 

spatial resolution will be 250 m for band 1-2, 500 m for band 3-7 and 1000 m for band 8-36. 

EOS AM-1 MODIS data will be available in spring 2000. 

MERIS has radiation-sensitive arrays (CCD’s) that provides spatial sampling in the acrosstrack 

direction and the satellite’s motion provides scanning in the along- track direction. It 

will have 15 spectral bands in the range of 0.39-1.04 µm, that can be selected by ground 

command. The swath width will be 1150 km. The spatial resolution will be 300 m over 

coastal zones and land surfaces where communication capabilities exists. Otherwise, the 

resolution is reduced to 1200 m to reduce the amount of data that will be recorded onboard. 

ENVISAT MERIS data will be available in spring 2001. 

4

3. A review of soft classifiers 

3.1 Introduction 

In conventional classification in remote sensing discrete pixels are used, i.e. the result is only 

one class per pixel. Much information about the memberships of the pixel to other classes is 

lost. This additional information can increase the accuracy of the classification. 

Mixed pixels or ‘mixels’ occur because the pixel size may not be fine enough to capture 

detail on the ground necessary for specific applications or where the ground properties, such 

as vegetation and soil types, vary continuously, as almost everywhere. 

Fuzziness suggests that a given pixel, owing to its spectral reflectance properties, may be 

placed into more than one informational/spectral class. The output of a soft classification is a 

set of images (one per class) that express for each pixel the degree of membership in the class 

in question. 

Soft classifiers can be useful in delineating forest boundaries, shorelines and other 

continuous classes. They can also bring out objects that cover small areas, which with 

conventional classifiers otherwise would have disappeared. 

In training and testing in a classification, mixed pixels are usually avoided. But it may be 

difficult to acquire a training set of an appropriate size if only pure pixels are selected for 

training, since large homogenous regions of each class are needed in the image. The training 

statistics defined, may not be fully representative of the classes and so provide a poor base for 

the remainder of the analysis. 

By recognising that there are various degrees to which fuzziness may be incorporated in 

each stage, a continuum of classification fuzziness may be defined (Foody 1999). As 

fuzziness may be a characteristic feature of both the remotely sensed data and the ground 

data, the use of a fuzzy classification algorithm alone may be insufficient for the resolution of 

the mixed-pixel problem. 

The continuum ranges from fully fuzzy, where fuzziness is accommodated in all three 

stages of the classification (the training-, allocation and testing stages), to completely crisp, 

which are the conventional classification. 

A modified maximum-likelihood classifier can accommodate fuzziness in any or all three 

stages of the analysis and neural networks can provide a classification at any point along the 

continuum of classification fuzziness. 

In training, the data is hard if the pixels are pure and vary in fuzziness for mixed pixels. 

Data on the fuzzy membership properties of training samples may be used to derive refined 

class descriptors for use in the class allocation stage. By refining the training statistics where 

some or even all of the pixels were mixed, the statistics that would have been derived if the 

pixels had in fact been pure can be obtained (Foody 1999). 

In the allocation stage for a soft classifier a pixel can be allocated from a single class to all 

classes it provide membership in. 

This chapter will describe the theory and algorithms behind the existing soft classifiers. 

To make the theory more understandable this chapter will also describe some applications that 

have been made. 

5

3.2 Fuzzy Set Theory 

Fuzzy sets are sets without sharp boundaries and they are applied to handle uncertainty in the 

process of classification (Palubinskas 1994). The result is often a more detailed and precise 

classification. Fuzziness can effectively extend the usefulness of map products developed 

from remote sensing imagery. The fuzzy set theory is particularly interesting as the analyst 

controls the degree of fuzziness (Foody 1996). 

It is the shape of the membership function that defines the ‘fuzziness’ of the phenomenon 

represented by the set. There are four membership functions that are usually used: sigmoidal 

(s-shaped), J-shaped, linear and user-defined (Eastman 1997). The sigmoidal membership 

function is the most commonly used in fuzzy set theory, see figure 2. It can be produced using 

a cosine function that requires the positions (along the X-axis) of 4 points governing the shape 

of the curve. 

The J-Shaped membership function is also quite common, although in most cases the 

sigmoidal function would be better. The linear is used extensively in electronic devices 

advertising fuzzy set logic and the user-defined is used when the relationships between the 

value and fuzzy membership does not follow any of the above three functions. The userdefined 

is the most applicable (Eastman 1997). The control points used in this function can be 

as many as necessary to define the fuzzy membership curve. The fuzzy membership between 

any two control points is linearly interpolated. 

Figure 2. The Sigmoidal Membership Function (based on Eastman 1997). 

The outputs for each pixel are the different membership values for each class. A high 

membership value indicates that the relevant land cover type covers a large area of the pixel 

(Bastin 1997). When using summation of memberships for a post-classification of fuzzy 

classification results, most of the increase of average classification accuracy is achieved after 

the first iteration (Palubinskas 1994). 

6

3.2.1 Fuzzy c-means (k-means) 

Fuzzy c-means is a clustering algorithm used for either unsupervised or supervised 

classification. The algorithm subdivides a data set into c-clusters or classes (Foody 1996). It 

begins by randomly assigning pixels to classes and iteratively moves the pixels to other 

classes with the aim of minimizing the generalised least-squared error, 

J m ( U, 

v) 

= ∑∑( 

u 

n 

c 

k = 1 i= 

1 

where U is a fuzzy c-partition of the data set Y containing n pixels, c is the number of classes, 

|| ||A is an inner product norm, v is a vector of cluster centres, vi is the centre of cluster i and 

m is a weighting component that lies within the range 1≤ m ≤ ∞ which determines the degree 

of fuzziness. The inner product norm is derived from, 

2 

7 

k 

) 

m 

y 

k 

− v 

T 

y − v = ( y − v ) A( 

y − v ) 

k 

i 

A 

k 

−1 

A number of norms may be selected, for instance the Mahalanobis norm, A = C y , where 

Cy is the covariance matrix of the data set Y. 

To implement the fuzzy c-means algorithm, additional parameters are required to guide the 

partitioning process. These parameters are: selection of a distance measure and choosing a 

weighting exponent. The weighting exponent controls the ‘hardness’ or ‘fuzziness’ of the 

classification. The range of useful values for m is 1.5 < m < 3.0. 

The fuzzy c-means algorithm is particularly useful in circumstances where it is not 

reasonable to make assumptions about the statistical distributions of sample data (e.g. where 

training sets of pure pixels are small). 

For each pixel a fractional value is obtained for each class in the form of a real number 

between 0 and 1, and will generally sum up to 1.0 across all candidate classes. 

As output one also gets a residual error map showing the RMS error across the image. 

The values need to be adjusted (i.e. using the slope from linear regression) for area 

predictions to be made. 

3.2.2 Applications with Fuzzy c-means classifier 

A comparative study between traditional unsupervised classification and the fuzzy c-means 

algorithm was done by Manyara and Lein (1994). Landsat MSS subsets were conducted to 

detect environmental change. The fuzzy c-means were used since the boundary between forest 

and other categories of land cover may be difficult to delineate precisely. Four smaller study 

sites depicting different forest environments were extracted from the image. The approach of 

the fuzzy c-means proved that a pixel often belongs to more than one class. Of the total 2 500 

pixels per site, only 249 were pure in site I, 219 in site II, 29 in site III and 87 in site IV. 

Another comparison has been made by Bastin (1997) between the fuzzy c-means and the 

linear mixture model in the use for unmixing of coarse pixel signatures to identify four land 

cover classes. The Mahalanobis norm was used as a typically measure and after a little 

experimentation the weighting exponent, m, was set to 1.5. By this, dominant and minor land 

cover classes in each pixel was easily identified, without the classification being overly hard. 

The fuzzy c-means classifier performed best overall at locating and quantifying inclusions in 

mixed pixels. 

i 

k 

i 

i 

2 

A

3.3 Softening maximum likelihood classifier (MCL) 

MCL operate by using band means and standard deviations from training data to model land 

cover classes as centroids in feature space, surrounded by probability contours. The 

probability density function assumes that the sample values for each class are normally 

distributed. The unclassified pixels are plotted in the same feature space and get a posteriori 

probability. Usually the pixels are then assigned to the class for which they have highest 

membership probability. But it is possible to soften the maximum likelihood classification by 

using the a posteriori membership probability values as indices of class membership (Bastin 

1997). 

The data must still satisfy the assumptions and requirements of the classification technique 

used, which is often unlikely with the probability-based classifiers. 

3.3.1 Applications with softening MCL 

In the comparison made by Bastin (1997) described above, a shell script was used to combine 

each set of eight maximum likelihood classification output maps, and to produce a map 

showing the a posteriori probabilities for each class. The result was mostly extremely 

classified pixels, i.e. most of them were classified as pure pixels or as habitats being entirely 

absent. To avoid an incorrect evaluation of this method, the classification should be softened, 

by broadening the spread of the normal distribution functions, which represent the land cover 

classes. 

3.4 Linear mixture modelling 

For linear unmixing of the spectral data, a ‘pure’ spectral signature needs to be obtained from 

the training data, for each of the defined land cover types (Bastin 1997). The model views 

mixed pixel signatures as being made up of a simple weighted linear sum of the component 

land cover signatures in that pixel. The weights are directly determined by the relative 

proportions of ground covered by each different land cover class by, 

x = Mf + e 

where x is the set of digital values measured by the sensor in each of n measurement 

wavelengths, f is a vector of ground cover proportions, for each of c land covers. 

M is a matrix of n x c coefficients, in which the columns are the vectors µ1…µc, which 

represent the pure spectral signatures given by the c cover classes in the absence of noise, e is 

a terments noise, which is minimized in some way in order to obtain a measure of the 

proportion vector f. 

The expected signature for a mixed pixel is the sum, 

f1µ 1 + f 2µ 

2 + ... + f c µ c = Mf 

where for class c, fc is the proportion of the pixel covered, and µc is the characteristic 

signature. 

The noise caused by the sensor and by natural variability within a scene is quantified as 

the vector of errors, e, which can be quantified as the sum of squares of its elements or as the 

variance of the training data. 

The linear mixture model assumes that each photon on the ground is interacting with only 

one land cover type before being reflected back to the sensor. In a complex and non-random 

landscape the linear mixture model is a reasonable approximation. 

The model is trained using the matrix of values M generated from ground truth 

8

information on ground cover proportions within selected pixels or from selected ‘purest’ 

pixels for each component. The effectiveness of the model is dependent on the degree of 

separation of the different signatures and the level of noise present in the scene. 

Rather than incorporating the noise and variability of the sample data into statistical or 

fuzzy limits around each cluster centriod, the linear mixture model uses the centroid vectors 

to define the spectral signatures of the pure land cover classes, and separates out scene/sensor 

noise into a separate error term. 

The number of distinguishable land cover types is strictly limited by the number of spectral 

bands in the data. 

3.4.1 Applications with linear mixture modelling 

The comparison, mentioned above in the fuzzy c-means algorithm application and in the 

softening of MCL, showed that the linear mixture model assigned extreme values to most of 

the pixels (Bastin 1997). One of the reasons for that is that there were pixels outside the range 

from zero to one in the classification, so called outliers. These outliers were reclassified to 

minimum or maximum values respectively. The result from the linear mixture model was 

fairly successful at picking up the general spatial patterns and gradations of the different cover 

classes, but it had problems with specific land cover signatures, because of poor spectral 

separability (Bastin 1997). 

3.5 Methods based on Bayesian probability theory 

The prime use for methods based on Bayesian probability theory is to determine the extent to 

which mixed pixels exist in the image, and their relative proportions. The methods can only 

be used when complete information is available or assumed (Eastman 1997). They are based 

on the bayesian probability theory, which is the primary tool for the evaluation of the 

relationship between the indirect evidence and the decision set. The bayesian probability 

theory allows us to combine new evidence about a hypothesis along with prior knowledge to 

arrive at an estimate of the likelihood that the hypothesis is true. The basis for this is 

Bayes´theorem, which states that, 

p( 

h e) 

= 

where: 

p(h⏐e) = the probability of the hypothesis being true, given the evidence (posterior 

probability) 

p(e⏐h) = the probability of finding that evidence, given the hypothesis, being true 

p(h) = the probability of the hypothesis being true regardless of the evidence (prior 

probability) 

∑ 

i 

p( 

e h) 

⋅ p( 

h) 

p( 

e h ) ⋅ p( 

h ) 

The variance/covariance matrix derived from training site data is the matrix that allows one 

to assess the multivariate conditional probability p(e⏐h). This quantity is then modified by the 

prior probability of the hypothesis being true and then normalized by the sum of such 

considerations over all classes. This modification- and normalization step is important 

because it assumes that the only possible interpretation of a pixel is one of those classes for 

which training site data have been provided. Thus even weak support for a specific 

interpretation may appear to be strong if it is the strongest of the possible choices given. So 

even if the reflectance data for a certain class in a pixel is very weak it is treated as 

unequivocally belonging to that class (p=1.0) if no support exists for other classes. It therefore 

admits no ignorance and is a confident classifier. 

9 

i 

i

The posterior probability p(h⏐e) is the same quantity that the maximum-likelihood module 

evaluates to determine the most likely class. 

If a pixel has posterior probabilities of belonging to certain classes of 0.72 and 0.28 

respectively, this would be interpreted as evidence that the pixel contains 72 % of the first 

class and 28 % of the second. But this requires that the classes for which training site data 

have been provided are the only one existing and that the conditional probability distribution 

p(e⏐h) do not overlap in the case of pure pixels. These requirements may in practice be very 

difficult to meet. 

3.5.1 Applications with Bayesian probability theory 

No reports on studies using Bayesian probability theory have been found. 

3.6 Methods based on Dempster-Shafer Theory 

The methods based on Dempster-Shafer theory are the most important of the soft classifiers. 

It allows for the expression of ignorance in uncertainty management (Eastman 1997). The 

prime use of these methods is to check for the quality of one’s training site data and the 

possible presence of unknown classes. 

The basic assumptions of Dempster-Shafer theory are that ignorance exists in the body of 

knowledge, and that belief for a hypothesis is not necessarily the complement of belief of its 

negation. 

The degree to which evidence provides concrete support for a hypothesis is known as 

belief, and the degree to which the evidence does not refute that hypothesis is known as 

plausibility. The difference between these two is the belief interval, which acts as a measure 

of uncertainty about a specific hypothesis. 

If the evidence supports one class to the degree of 0.3 and all other to 0.0, the Dempster- 

Shafer theory would assign a belief of 0.3 to that class and a plausibility of 1.0, yielding a 

belief interval of 0.7. Furthermore, it would assign a belief of 0.0 to all other classes and a 

plausibility of 0.7. 

Dempster-Shafer theory defines hypotheses in a hierarchical structure and will accept all 

possible combinations of hypotheses, see figure 3. The combinations of hypotheses 

acceptance is because it often happens that the evidence we have, supports some 

combinations of hypotheses without the ability to further distinguish the subsets. The basic 

classes are called singletons and the combinations non-singletons. 

[A,B,C] 

[A,B] [A,C] [B,C] 

[A] [B] [C] 

Figure 3. The Hierarchical Structure of the Subsets in the Whole Set [A,B,C] (based on 

Eastman 1997). 

10

When evidence provides some degree of commitment to one of these non-singleton 

classes and not to any of its constituents separately, this commitment is called Basic 

Probability Assignment (BPA). Thus, belief in a non-singleton class is the sum of BPA’s for 

that class and all sub-classes. Belief represents the total commitment to all members of a set 

combined. 

These non-singletons represents mixtures and might be used for more detailed examination of 

sub-pixel classification. 

3.6.1 Applications with Dempster-Shafer theory 

In a study of fire scars in Indonesia by Fuller and Fulky (1998), Dempster-Shafer theory was 

used. That algorithm was used because each AVHRR pixel had a resolution of 1.1 km and 

could have contained several patches of fire scars. The only training sites used was for fire 

scars, all other classes were to be ‘other’ in the classification. As input some surface 

temperature layers and vegetation change detection layers were used. No rigorous ground 

validation could be done, due to lack of ground data at a scale meaningful to the AVHRR 

analysis. But compared to the other method used in the study, multiple threshold approach, 

the Dempster-Shafer method managed to extract more fire scars and came closer to the result 

done by the the Center for Remote Sensing, Imaging and Processing (CRISP) of the National 

University of Singapore. The CRISP analysis relied on visual interpretation of 766 SPOT 

quicklook images mapped to 100 m spatial resolution. 

3.7 Neural Network 

Neural networks learn by example (Kanellopoulos 1997). From the training set the network 

can learn the values of the samples internal parameters. Usually neural networks deal with 

large amounts of training data (i.e. thousands of samples) whereas statistical methods use 

much smaller training sets. The goal is the representation of complicated phenomena. The 

neural network does not make any explicit a priori assumptions about the data. 

3.7.1 The Multilayer Perceptron Classifier using feed-forward and back-propagation 

The multilayer perceptron classifier (MLP) are the most widely used neural classifier today. 

These networks are general-purpose, flexible, non-linear models consisting of a number of 

units organised into multiple layers (Kanellopoulos 1997). Varying of the number of layers 

and the number of units in each layer can change the complexity of the MLP. These networks 

are valuable tools in problems when one has little or no knowledge about the form of the 

relationship between input vectors and their corresponding outputs. The principle for the 

multi-layer perceptron neural network is that processing elements or nodes divided into layers 

perform calculations until an output value is computed at each of the output nodes (Foody 

1996). Data to the input layer are the satellite channel values used in the classification 

procedure. Then one or more hidden layers perform the calculations and send the result to the 

output layer, which has one node for each class. Every node in a layer is connected to every 

node in the layer above and below, see figure 4. The connections carry weights, which 

encapsulate the behaviour of the network and are adjusted during training. 

11

Level 

k 

j 

i 

Connection 

weights 

Output Classes 

1 2 

Input Pattern Feature Values 

Figure 4. The Architecture of Multi-Layer Perceptron (based on Kanellopoulos 1997). 

For the hidden layers the input to each node is the sum of the scalar products of the 

incoming vector components with their respective weights, 

input 

∑ 

= i 

where wji is the weight connecting node i to node j and outi is the output from node i. 

The output of a node j is, 

The function f denotes the activation function of each node. The most frequently used is the 

sigmoid activation function, 

where x =inputj. This ensures that the node acts like a thresholding device. 

The network learns using the back-propagation algorithm iteratively. This algorithm 

minimizes the mean square error between the network's output and the desired output, 

where P are all input patterns and dk is the desired output. 

j 

12 

w 

ji 

out 

out j = f ( input j ) 

1 

f ( x) 

= 

1+ 

exp( −x) 

1 

E = ∑∑( d k − out k ) 

2P 

P k 

i 

2 

Output layer 

Hidden layer 

Input layer

It compares the calculated output with the desired one, and adjust the weights attached to 

the connections, until the difference between the outputs is reduced to an acceptable level and 

the set of weights is stable. 

The weights are from the beginning small and random and then adjusted by the 

generalised delta rule, 

where wkj (t+1) and wkj (t) are the weights connecting nodes k and j at iteration (t+1) and t 

respectively, η is a learning rate parameter. 

For nodes in the output layer, 

and for nodes in the hidden layers, 

wkj ( t + 1) 

= wkj 

( t) 

+ η( 

δ koutk 

) 

δ 

k 

= ( d k − outk 

) f ′ ( inputk 

) 

To measure the generalisation ability it is common to have a set of data to train the network 

and a separate set to assess the performance. Once the neural network has been trained, the 

weights are saved to be used in the classification phase. 

The activation level of an output unit is positively related to the strength of membership to 

the class associated with the unit and lies on a 0-1 scale. These levels are significantly 

correlated with the sub-pixel land cover composition and may be mapped as fraction images. 

The classifier encounters lot of problems to separate the classes when the classes have lot 

of similarities. A-priori knowledge of the relative class distribution could be used to apply 

non-random weights to the network inputs to further distinguish similar classes. 

Another disadvantage is that the back-propagation method is extremely slow. About 

thousand iterations need to be done before the network converges to a solution. Sometimes no 

solution is found (Augusteijn and Warrender 1998). 

3.7.2 Applications with Multilayer Perceptron Classifier 

Several applications have been performed with the multilayer perceptron classifier. Howald 

classified Landsat TM data into seven land cover classes using a three layer network, with a 

slightly better overall classification accuracy than a maximum likelihood classifier (1989). 

Kanellopoulos et al. used a four-layer neural network for classifying multi-temporal SPOT 

multi-spectral imagery into twenty land cover classes (1991). Owing to the complexity of the 

classification, a relatively large neural network architecture was required and therefore the 

computation time was quite long. Gualtieri and Withers used clustering of multispectral data 

(1988). Ryan et al. (1991) reported the use of a neural network to categorise small blocks of 

pixels to extract shoreline features from images. 

Foody and Boyd (1999) used a neural network to derive fuzzy classifications of land cover 

along a transect crossing the transition from moist semi-deciduous forest to savannah in West 

Africa in February and December 1990. They used NOAA AVHRR for the study. The 

training data was acquired from core regions of each class. The AVHRR provides five 

spectral channels, so the network had five input units. The architecture also consisted of five 

hidden units in a single layer and four output units and a logistic sigmoid transfer function 

was used. The initial network weights were set randomly between the limits of ± 0.5 and the 

learning rate and momentum were set to 0.1 and 0.9 respectively. When 1000 iterations of 

13 

∑ 

δ f ′ ( 

input ) δ w 

j = k k kj

stochastic backpropagation learning algorithm had been performed, the RMS error was 0.096 

and 0.117 for the February and December data sets respectively. The method was able to 

represent both gradual and sharp changes in land cover, as those associated with the forestsavannah 

and savannah-water boundaries respectively. It also enabled important properties of 

the boundaries to be inferred, such as that the width of the forest-savannah transitional zone 

remained relatively constant, but had migrated significantly, between the two time periods. 

3.7.2 Neural Network using Cascade-correlation 

Improvements have been made on back-propagation, since it is so slow. One of these 

improved architectures is the cascade-correlation (Augusteijn and Warrender 1998). Cascadecorrelation 

is a feed-forward neural network. The network builds its internal structure 

incrementally, during training. The initial network consists of only two layers: an input layer 

and an output layer, which are completely connected. These connections are trained until no 

more significant changes occur between iterations. If the total training error still is too high, a 

hidden node will be allocated and trained to further reduce this error. This goes on until the 

termination criterion is reached or until a maximum number of hidden nodes are reached and 

the conclusion is made that the network cannot learn the problem. 

3.7.3 Applications with Cascade-correlation 

A study made of Augusteijn and Warrender (1998) was conducted to investigate the ability of 

a cascade-correlation neural network to delineate upland and forested wetland areas and to 

distinguish between different levels of wetness in a forested wetland. They used NASA’s 

Airborne Terrestrial Applications Sensor (ATLAS) multispectral data and Airborne Imaging 

Radar Synthetic Aperture Radar (AIRSAR) data. The input values used for training and 

testing of the neural network classifier were not simply digital number (DN) values indicating 

the reflectance of each pixel in a given band, but averages of DN values calculated over 3 

pixel by 3 pixel neighbourhoods inside the training and test polygons. The same sample size 

had to be used in each category for the network to learn all categories equally well. First a five 

class categorization was tried. But three of the five classes were too similar to be 

distinguishable, so they were merged together. Several networks were trained using different 

band combinations. Every network were always trained five times on the same data set, each 

time using a different selection of random initial values for the connection strengths, to reduce 

the influence of a specific set of initial values on network performance. Test sample 

agreement scored well over 80 per cent in most cases. 

3.7.4 Fuzzy ARTMAP 

ARTMAP achieves a synthesis of fuzzy logic and ART networks (Mannan et al. 1998). ART 

stands for Adaptive Resonance Theory. The architecture of fuzzy ARTMAP consists of four 

layers of neurons: the input-, category-, mapfield-, and output layer. The values of the spectral 

bands and their complement, of dimension n, are the input to the first layer, which consists of 

2n neurons. The category layer starts with one neuron, but grows in number as the learning 

proceeds. The output and mapfield layers consist of as many neurons as there are classes. 

Two vigilance parameters ρ1 and ρ2 control the operation during learning and operational 

phases of the network. 

14

The learning phase is an iteratively process where for each input a category choice is 

calculated according to, 

A ∧W 

S = 

α + W 

where A is the input feature vector and W1 is the weight vector between the input layer and a 

node in the category layer. 

For the node which has the largest value of S the match ratio at mapfield is calculated by, 

where B is the output class vector and W2 is the weight vector between a chosen node of 

category layer and the mapfield layer. 

If the Rm value is larger than ρ2 the weights are changed and a new input value is managed. 

The learning process of mapfield- and category layer weights is called resonance, and the 

weights are calculated by, 

( new) 

1 

If not, ρ1 = Rc for that input, where Rc is calculated by, 

1 

R m 

For the nodes which has Rc ≥ ρ1 it is checked if any has Rm ≥ ρ2, if so is the case the 

weights are changed and a new input is managed, otherwise a new nodes committed, the 

weights are changed and a new input managed. This goes on until all training samples are 

exhausted and the category layer nodes stop growing or the number of iterations exceeds T, a 

chosen positive constant. 

After this process a score S is calculated for each of the committed nodes. The node in the 

output layer, which corresponds to the node with the largest value of S, indicates the category 

of the input pixel. 

α can be chosen around 0.01, β can be set to 1.0 in the beginning of the learning phase and 

to a smaller value later. ρ1 and ρ2 are set to 1.0 for the most accurate results. 

There is an advantage compared to maximum-likelihood and the multilayer perceptron 

classifier (MLP) of about 5 % in the overall accuracy (Mannan et al. 1998). The method is 

stable, easy to use, with a smaller number of parameters to manage and faster than MLP. 

15 

1 

1 

B ∧W 

= 

B 

( old ) 

1 

∗ 

2 

W = β ( A ∧W 

) + ( 1− 

β ) W 

W = β ( A ∧W 

) + ( 1− 

β ) W 

( new) 

2 

2 

R c 

( old ) 

2 

A ∧W1 

= 

A 

1 

2 

( old ) 

1 

( old ) 

2

3.7.5 Applications with Fuzzy ARTMAP 

Six multispectral images acquired by Linear Imaging Self-scanning Sensor (LISS-II) camera 

of Indian Remote Sensing Satellite (IRS-1B) have been used in an experiment in using fuzzy 

ARTMAP in classification by Mannan et al. (1998). The training samples were selected by 

visual interpretation of the scenes. In order to preserve the samples relative values, the grey 

levels were normalized. 40 per cent of the samples were used in the training phase and the rest 

were applied in the operational phase to assess the accuracy. The four spectral band pixel grey 

values and their complements were input to the network. In the output binary vector, the bit, 

which belongs to the input’s class, was set to 1 and the rest to 0:s. 

All weights were initially set to 1, but decreased as the learning progressed. The result was 

about 5 per cent better in the overall accuracy compared to multilayer perceptron- and 

maximum likelihood classifier. In this experiment 7483 pixels were correctly classified and 

397 misclassified. 

16

4. Data and methods 


This chapter describes the study area and the data that has been used in the classification. 

Two different softwares have been used to manage the problem and the method to do this, is 

described. 

4.2 Data 

4.2.1 Satellite data 

For the case area, only one Indian IRS-1C image is available and that one is very cloudy. 

Since IRS-1C Wifs only have two spectral bands, at least two images are needed to get a 

reliable classification. So the choice has fallen on the Russian RESURS-O1 #3 with MSU-SK 

sensored images. Several RESURS images are available for the study area, but only three are 

more or less cloud-free. These are from July and August 1995 and September 1996. These 

RESURS images have five spectral bands, but only four can be acquired in parallell. Of these 

four spectral bands, only three are acceptably noise free to be used, these are the red band and 

the two near infrared bands, see figure 5. Only seven bands are allowed in the software, so the 

image from August 1995 is excluded. 

The study area constituted only a small part of each image and was therefore extracted to 

reduce the amount of data. The area was 22 km × 44 km. 

17 

Figure 5. RESURS multitemporal colour 

composite of the first near-infrared band, 

red band and the second near infrared 

band from July 1995 (RGB) over the study 

area.

4.2.2 Reference data 

As reference data for training and 

evaluation the existing land cover data 

from the pilot production of the Swedish 

CORINE land cover over the Gripen area 

was used, see figure 6. 

CORINE stands for ‘Co-ordinating 

Information on the Environment’ and was 

a EU programme beginning 1985. This 

land cover is mainly based on information 

from digital topographical maps 

interpretation and classification of Landsat 

TM images with 30 m resolution, 

resampled to a pixelsize of 25 m. 

In most of Europe the CORINE land 

cover contains 44 classes, but Sweden has 

added 11 classes to better fulfil the needs 

(SSC 1998). 

The main classes are produced by: Figure 6. The reference image, 

Swedish CORINE land cover. 

♦ The wetland classes are mapped using masks from the digital topographic map and by 

classifying satellite data. The map provides the spatial limits and the satellite data is used 

to update the masks for wet mires and dry mires and for division of those classes. The 

mires near a lake or sea is by a GIS-operation named inland- or coastal marshes. 

♦ The water bodies are classified by thresholding in TM band 5, where that class has low 

digital values. Then the water is classified and subtracted from the water mask in the 

topographic map. The reason for this is that floating vegetation or lakes overgrown with 

reeds are obtained, and can be candidates for the inland marshes. 

♦ Forest is classified using codes from National Forest Survey field data. For given 

coordinates, these codes say what kind of forest that grows there. For the same 

coordinates, spectral signatures are extracted. By this the signatures for different kinds of 

forest are known and the image can be classified under the forest mask. Filtered signatures 

are used. 

♦ Clear-cut forest is extracted from an image showing change, which is produced using 

histogram matching between the same channel in each scene. Filtered signatures are used 

for classification (nearest-neighbour). 

♦ Urban boundaries are interpreted on screen using the Statistics Sweden urban database 

and satellite images as background. Buildings within 200 m are included. 

♦ Agricultural areas are taken from the digital topographical map, when it exists, as in my 

study area. 

The forest mask also contains the roads. These roads are mostly classified as forest 

regeneration or clear-cut forest. Narrow and long areas of these classes which coincides with 

18

oads narrower than 7 meters are alloted to the surrounding class. Then all areas in the image 

that were smaller than 2 pixels were removed. 

The received image was again resampled, now to 12.5 m. 

4.3 Study area 

4.3.1 General description of the Baltic Sea region landscape 

The Baltic Sea region is the area (i.e. countries) surrounding the Baltic Sea. From the 

vegetation’s point of view the countries are very young, in particularly Sweden, because of 

the glaciation. The immigration of plants has occurred in several steps and from different 

directions (Bråkenhielm et al. 1998). The spreading of the plants is strongly dependent of the 

climate and their ability to compete with other species. The human activities, like agricultural 

and forestry have also affected the vegetation. Where a species chooses to grow is for 

example a matter of the bedrock, i.e. the content of nutrition in the mineral and rock and their 

ability to disintegrate and contain water. The disintegration give rise to the foundation of fine 

granular soils, which in turn helps keeping water, which in turn makes the chemical 

disintegration easier and results in more nutrition. The bedrock has developed from magmatic, 

sedimentary and metamorphic processes during billions of years. Different phases have 

contributed to bedrock of varying composition. This, along with agricultural fields with 

groups of trees and the variations in the terrain, makes the landscape patchy. How patchy a 

person or animal experience the landscape is due to the person’s references or the animals 

field of view or habitat. For example, doesn’t a hawk think the same area is patchy as do the 

mouse. 

This patchy landscape makes it difficult to generate a good and reliable hard classification, 

from medium resolution satellite images, since they consists of pixels with sizes between 150- 

250 m. It is not often such pixels contain just one class. Therefore, the main task for this 

project is to analyze if soft classifiers can better show the real situation for such satellite 

images. 

The study area is very suitable for this task, since it contains all kinds of land covers, from 

large agricultural fields with groves of trees, to large forests with patches of agricultural 

fields. In figure 7, it can be seen that there are several classes in a 150 m- pixel and therefore a 

hard classifier is not suitable. 

The study area, called Gripen, is situated between lake Hjälmaren and the southern part of 

the Baltic Sea bay – Bråviken, in the southern part of Sweden. The area contains two big 

cities – Örebro and Norrköping. Around these cities, the landscape is very flat with large areas 

of arable land. The forested area between these cities is very patchy with many smaller lakes 

and hills. The forest contains both deciduous and coniferous forest and a mixture of both with 

no sharp boundaries. 

There are also different kinds of wetlands, which are of special interest in the BALANS 

project, since they act as filters of nutrients before the water runs into the Baltic Sea. 

19

Figure 7. A part of the study area in Gripen showing the reference image with 12.5 m pixel 

size and a 150 m grid to elicit the fuzziness in the medium resolution satellite image pixels 

(adapted from http://www.grida.no/baltic/htmls/maps.htm). 

4.3.2 Quantitative characterisation 

The area distribution of each class in the reference image are: water 5.19%, urban 1.82%, 

agricultural areas 20.08%, marsh 0.71%, deciduous forest 11.79%, coniferous forest 36.80%, 

mire 2.42%, clear-cut forest 7.35%, rock 0.82%, barren land 0.63%, peat 0.44% and grass 

3.51%. This result in 13% unknown classes when using eight classes and 9% unknown 

classes when using twelve classes. 

4.4 Tools 

4.4.1 IDRISI 

IDRISI is a geographic information and image processing software system developed by the 

Graduate School of Geography at Clark University, Worcester, U.S.A. It covers the full 

spectrum of GIS and remote sensing needs. The software has over 150 modules that provide 

facilities for the input, display and analysis of geographic data. For this project the fuzzy 

signature development and soft classifier modules has been especially important. But at least 

this version, version 2.0 for Windows, has its limitations, which force the users to find other 

20

solutions to their problems, and sometimes import data from other systems, for example 

already radiometric- and geometric corrected images. 

The module based on the logic of Fuzzy Sets in IDRISI is FUZCLASS. In that module fuzzy 

set membership is determined from the distance of pixels from signature means. To use this 

module two parameters has to be set. The first is the z-score distance where fuzzy 

membership becomes zero. If a pixel has the same location as the class mean the membership 

grade is 1.0, and away from this point the grade decreases until it reaches zero. The second 

parameter is whether or not the membership values should be normalized. Normalization 

makes the assumption that the classes are the only one existing, and thus the membership 

values for all classes for a single pixel must sum to 1.0. 

The module using the Dempster-Shafer theory in IDRISI is BELCLASS. In normal use, 

Dempster-Shafer theory requires the classes under consideration to be mutually exclusive and 

exhaustive. But in BELCLASS the pixel may belong to some unknown class, for which a 

training site has not been provided. An additional category is added to every analysis called 

[other]. The result is consistent with Dempster-Shafer theory, but recognizes the possibility 

that there may be classes present about which we have no knowledge. The uncertainty is 

almost identical to that of Dempster-Shafer ignorance. 

In BELCLASS you have two choices of output: beliefs or plausibilities. In either case, one 

image of belief or plausibility is produced for each class. A classification uncertainty image is 

also produced. 

4.4.2 ERDAS IMAGINE 

ERDAS Imagine is a complete GIS and image processing software produced by the 

Engineering department of ERDAS, Inc., Atlanta, U.S.A. It is easy to use and has modules for 

most problems. The limitation in this case is that it only has one fuzzy module. For this 

project it has been used for preparation of the images and extraction of pixel information in 

the resulting images. 

21

4.5 Methodology 

The outline of the used method is shown in figure 8. A scheme showing the main action 

steps is included as appendix 1. The produced macro files are listed by name in a scheme in 

appendix 2. 

4.5.1 Outline 

Figure 8. The used methodology. 

4.5.2 Initial data preprocessing 

4.5.2.1 Reference data 

Preprocessing 

reference image 

Binarisation, 

Thematic aggregation 

Spatial aggregation 

Sampling of training 

sites 

Preparation of training statistics 

Running the modules 

Evaluation and 

comparison of soft 

classifiers 

4.5.2.1.1 Binarisation 

The preprocessing of the reference image began with extraction of clouds that were in the 

RESURS images. This was done by using the mask made from the satellite images. To m 

make one image layer for each class a reclassification was made. The result was a binary 

image for each class. 

22 

Preprocessing 

satellite images 

Radiometric correction 

Geometric correction

4.5.2.1.2 Thematic aggregation 

The reference image was reclassified to contain eight or twelve classes. These classes were 

water, urban, agricultural field (agri), marshes (marsh), deciduous forest (decid), coniferous 

forest (conif), wet/other mires (mire) and clear-cut forest (clear). The urban class consisted 

of the classes coarse- and dense city structures and smaller communities from the Swedish 

CORINE land cover. The coniferous and deciduous class consisted of conifers/decids with 

different age and height. When 12 classes were used, rock, barren land (barr), peat and 

grass were added. 

The choices were made based on the more detailed classes that the BALANS project 

wants to extract. 

The BALANS project has envisaged at least five broad classes. 

These are, 

• Artificial surfaces 

• Agricultural areas 

• Forests and semi-natural areas 

• Wetlands 

• Water bodies 

More detailed classes envisaged if possible is urban fabric, arable land, inland wetlands, 

coniferous- and deciduous forest. Therefore, a classification of these more detailed classes 

was tried out to. 

4.5.2.1.3 Spatial aggregation 

The reference image contained pixels with a size of 12.5 meter. To be able to compare the 

reference image with the RESURS images, which have a pixel size of 150 meter, an 

aggregation was made in IDRISI. In the process an average of twelve 12.5 m pixels in 

both the X- and Y-direction was calculated and was output as a 150 m pixel. That made 

the border pixels of each class patch fuzzy, i.e. they didn’t contain 100% of that class any 

more. One such layer were made for each class and these were seen as the ‘true’ fuzzy 

classes, since now the percent of fuzziness in each pixel was known. 

4.5.2.2 Satellite data 

4.5.2.2.1 Radiometric correction 

Since RESURS images from two different dates were used they had different illumination. 

Therefore the image which visually seemed to be the best was set as reference in this case. 

Each band in the other image was histogram-matched against respectively band in the 

’reference’ image. 

4.5.2.2.2 Geometric correction 

The RESURS images didn’t have any known projection and didn’t lay above each other. In 

one image there was one band that didn’t lay above the other bands either. So the images 

were geometrically corrected image to image by taking ground control points (GCPs). The 

RMS error were about 25 m as maximum, that is about one sixth of a pixel. The images 

were then resampled with cubic convolution and then corrected against the reference image. 

The reference image was projected with Transverse Mercator and had datum and spheroid 

WGS 84, so after the correction against the reference image, the RESURS images received 

the same projection. 

A cloud mask was created by taken training sites in the clouds and other ‘close’ classes. 

A supervised classification with maximum-likelihood was performed and the output classes 

were tested to see which were actually clouds. These clouds were marked in an ‘area-ofinterest’-layer 

and cut out in all images. 

23

4.5.3 Choice of soft classifiers 

The software available for soft classification was IDRISI. This software included soft 

classifiers using Bayesian probability theory and Dempster-Shafer theory reviewed earlier. It 

also included one method where the distance from a class mean were set where the fuzzy 

membership of that class should be zero. 

4.5.4 Application of soft classifiers 

4.5.4.1 Sampling of ’training data’ from reference data 

Different training data samplings were examined. 

♦ The first sampling was to take random sample points in the aggregated masks, as training 

sites. In this sampling as well as in sampling two and three, different points were sampled 

for each class. 

♦ The second sampling was to make majority masks from the aggregated ones, were only 

the pixels with class values of 75-100 % (90-100% for water) were included, to receive 

purer sample points to see if there were any improvement. 

♦ The third sampling was to, first (in ERDAS Imagine) manually take training sites within 

the wetland mask to see where the wetland had a separable signature from the other 

classes. Then a mask from these wetland areas were made under the whole wetland layer 

and used as input to the aggregation in IDRISI. The sample points were taken in the 

majority masks. 

♦ The fourth sampling was to take random sample points in the whole image and divide 

them into eight equally quantified classes. In this case the points for a specific class could 

be sampled where that class didn’t have any pixel values. Most of the time the training 

samples for a specific class did not contain a majority of that class. All training samples 

were quite equal. 

4.5.4.2 Preparation of training statistics 

Each class’ training sites were seen as one training site group. After the sampling step, every 

training site group was compared to the aggregated masks to see how much of each class 

there was in every training site group. An average value was calculated for each class for each 

training site group, e.g. the output is an average value of content of water in the urban training 

site group and the content of water in the agricutural training site group etc. These average 

values were imported to the fuzzy partition matrix and linked to the image with the training 

site groups. The fuzzy partition matrix indicates the membership grades of each training site 

group in each class (Eastman 1997). After the linking to the training site groups image, one 

image for each class is created showing the membership grade in each training site group. The 

fuzzy signatures are created by giving each training site group a weight proportional to its 

membership grade when determining the mean, variance and covariance of each satellite 

spectral band for each class. 

4.5.4.3 Running the modules 

After the creation of fuzzy signatures the different modules were tested. When the proportion 

of each class in the reference image is known, a prior probability can be input as a weight and 

improve the result. 

24

4.5.4 Evaluation and comparison of soft classifiers 

4.5.4.1 Introduction to thematic accuracy assessment of soft classifiers 

The ground data can rarely be assumed to be error free. So testing a classification is an 

evaluation of the level of agreement or correspondence between two sets of class allocations, 

both of which have their own error characteristics, rather than quantifying the accuracy. 

In hard classification, accuracy assessment is generally made with the use of an error 

matrix. Error matrices compare, on a class-by-class basis, the relationship between known 

reference data and the result of a classification. Such matrices are square, with as many rows 

and columns as there are classes to be assessed. Several characteristics are expressed by the 

error matrix, for example the error of omission (exclusion) and error of commission 

(inclusion). Then the producer’s accuracy, user’s accuracy and overall accuracy can be 

calculated. The producer’s- and user’s accuracy tells how much of, for example the 

coniferous class, that is correctly classified and how much of the classified category 

coniferous that is truly that in reality, respectively. 

Another measure to extract from the error matrix is the kappa statistic, which is a measure 

of the difference between the observed accuracy and a chance agreement between the 

reference data and a random classifier. 

The problem is that the error matrix is only appropriate for use with hard classification 

(Foody 1996). In soft classification you receive one matrix for each class, i.e. each pixel 

would be compared. But each pixel is just allowed to consist of one class, consequently to be 

correct or incorrect. 

A number of approaches have been suggested to assess the accuracy of soft classification. 

For example, entropy can be used (Zhang and Foody 1998). Entropy describes the variations 

in class membership probabilities associated with each pixel. 

The entropy is calculated by, 

∑ 

H ( p) 

= − p( 

x) 

log 2 p( 

x) 

where p(x) is the class membership probabilities. 

But entropy is only suitable when ground data are hard. When both classified data and 

ground data are fuzzy, cross-entropy are more appropriate (Zhang and Foody 1998). Other 

possible indices of classification accuracy may be based on correlation analysis and distance 

measures between the representations of the land cover in the image classification and ground 

data. 

One other alternative is to degrade the data to just contain one or two classes in each pixel 

and use conventional methods, but the result is not an evaluation of the fuzzy classification. 

There are a number of points to note. Firstly, the predictive accuracy of partial values tends 

to vary between the land cover classes being considered (Bastin 1997). Secondly, the 

estimates of pixel composition give no reliable estimate of where in the pixel objects or 

features are likely to be. 

To deal with the second point, there are methods for sharpening the fuzzy classification 

output (Foody 1998). This is made by fusion between a fine spatial resolution image and the 

fuzzy classification derived at a coarser resolution. 

There is a difficulty in evaluating soft classifications, and there need to be done more 

research for better methods. There is no general measure that says when the classification is 

good or bad. 

25

4.5.4.2 Criteria and methods available 

One method that can be used along the whole continuum of classification fuzziness, from the 

case where every stage is hard to the case where every stage is fuzzy, is the RMS error (rootmean-square 

error) between the estimated and actual class composition of the pixels (Foody 

1999), 

where xi is a measurement, t is the true value and n is the number of data to be assessed. 

4.5.4.3 A suggested new approach 

One new measure can be to calculate how many pixels that are under- or overclassified a 

certain percent, in 10%- intervals, for each module. 

4.5.4.4 Implementation 

RMS = 

4.5.4.4.1 Validation data from reference data 

In the evaluation RMS errors were calculated for all samplings and modules to see 

how well the classification performed in the classes. The training site pixels and 

the ‘unknown’ areas in the reference image, i.e. classes that had not 

been used, were excluded and then the whole images were used in the validation. To be 

able to compare the modules and samplings, a weighted total average RMS error was 

calculated. Each class’ RMS error was weighted with the reference area of the same class, 

Average_RMS_error =Σ RMS_error⋅ percentuel area in the reference image 

Also the average value for each class, in percent was calculated. This was been done in 

two ways, one compared to the reference image and one compared to the classified pixels. 

This to be able to compare BELclass with BAYclass and the reference image. BELclass 

underclassifies and may just classify 10% of the image. 

An analysis were made to see if the classification improved when more training pixels 

were used, in the sampling where the pixels were randomly taken, using BELclass, and 

how many classes that could be extracted in the same sampling. 

4.5.4.4.2 Statistical analysis 

I used Microsoft Excel for the calculations. 

( x t) 

∑ i − 

i 

n 

26 

2

5. Results 


This chapter contains the results in terms of RMS errors, over- and underclassified pixels and 

average values. 

5.2 Fuzzy signatures 

Some of the eight chosen classes extracted have very similar spectral signatures, when using 

two near-infrared bands and one red band. That is why the soft classifiers classify so many 

pixels, with different percentage of probability, to wrong classes. The mean values for each 

spectral band for each class, for the sampling in majority masks, is shown in figure 9. There it 

can be seen that coniferous, clear-cut forest, deciduous forest, mire and urban have mean 

values close to each other and therefore the training pixels can be mixed. 

Mean value 

120 

100 

80 

60 

40 

20 

0 

-20 

Fuzzy signatures for the six spectral bands in the majority sampling 

1 2 3 4 5 6 

Spectral bands (NIR,NIR,NIR,NIR,red,red) 

Figure 9. The mean values for the fuzzy signatures in the six spectral bands used ( 1-2 near 

infrared from September 1996, 3-4 near infrared from July 1995, 5 red from July 1995 and 6 

red from September 1996). 

5.3 RMS errors 

To be able to compare the different modules and samplings, each class’ RMS error were 

weighted with the percentage area of the reference image for respective class, to retrieve an 

overall RMS error, see table 2 below. This table shows that BAYclass performs best 

classification result before FUZclass. BELclass is not that far behind. They differ just a little 

between the modules and samplings but the result improves when the training pixels get purer 

27 

agri 

clear 

conif 

decid 

marsh 

mire 

urban 

water

and separable signatures are used. The modules classified differently well for the different 

classes, as seen in the more detailed tables, 3-5, showing the RMS errors for each class, 

sampling and module. In table 3 and 4, it can be seen that the RMS errors are reduced for the 

wetland classes when using a mask with areas that manually had been tried out to be 

separable. A thematic class can not often be assembled in one spectral class. This is why 

clustering or supervised classification need to be done to achieve separable classes within the 

masks. 

The number of classes you try to extract in BELclass makes no difference in the RMS 

errors. This is because BELclass uses unknown classes. It is only the spectral signature that 

counts. In that way one class is independent to another class, see table 4. More classes in 

BAYclass on the other hand improves the RMS errors for most classes. 

Table 2. Overall RMS errors for eight classes, weighted with the area of each class in the 

reference image, for the different modules and methods (×100 for percent). 

Overall RMS for the different samplings, named 1-4: 

1. training sites within 2. within majority 3. in majority mask and with 4. randomly taken 

module aggregated mask masks wetland signatures trainingsites 

BAYclass 0.299 0.259 0.244 0.362 

BELclass 0.404 0.400 0.398 0.436 

FUZclass 0.304 0.283 0.258 0.373 

Table 3. RMS errors for BAYclass. 

RMS for 8 classes classified with BAYclass RMS for 12 

classes trainingsites 

within 

aggregated 

mask 

within majority 

masks 

28 

in majority mask randomly taken 

classes, 

BAYclass 

randomly 

and with 

training sites taken training 

wetland signatures sites 

water 0.13 0.11 0.11 0.23 0.18 

urban 0.16 0.15 0.15 0.19 0.13 

agri 0.23 0.23 0.22 0.35 0.4 

marsh 0.15 0.16 0.07 0.13 0.11 

decid 0.21 0.21 0.22 0.24 0.24 

conif 0.42 0.33 0.3 0.48 0.5 

mire 0.21 0.2 0.12 0.2 0.19 

clear 0.22 0.22 0.24 0.21 0.2 

rock 0.16 

barr 0.13 

peat 0.07 

grass 0.08

Table 4. RMS errors for BELclass. 

RMS for 8 classes classified with BELclass RMS for 12 

29 

classes 

classes training sites within majority in majority mask randomly randomly 

within aggregatedmasks and with wetlandtaken taken 

masks signatures training sites training sites 

water 0.19 0.20 0.20 0.23 0.22 

urban 0.11 0.11 0.11 0.13 0.13 

agri 0.41 0.41 0.41 0.46 0.46 

marsh 0.08 0.08 0.03 0.08 0.08 

decid 0.24 0.24 0.24 0.27 0.27 

conif 0.56 0.55 0.55 0.59 0.59 

mire 0.12 0.12 0.06 0.13 0.13 

clear 0.22 0.22 0.22 0.22 0.22 

rock 0.03 

barr 0.06 

peat 0.04 

grass 0.06 

Table 5. RMS errors for FUZclass. 

RMS for 8 classes classified with FUZclass 

classes training sites within within majority in majority mask and with randomly taken 

aggregated masks masks wetland signatures training sites 

water 0.13 0.11 0.11 0.22 

urban 0.13 0.13 0.15 0.18 

agri 0.22 0.21 0.18 0.37 

marsh 0.16 0.15 0.04 0.16 

decid 0.22 0.21 0.21 0.24 

conif 0.44 0.40 0.36 0.50 

mire 0.16 0.18 0.09 0.15 

clear 0.21 0.22 0.24 0.21 

5.4 Over- and under classified pixels 

When looking at the number of correct classified pixels (± 5% ) in the summaries of table 6 

and 7, it can be seen that BELclass is superior with an average of 80%, compared to 

BAYclass’ 63%. But in these tables, the values are not weighted with the area, and the small 

classes that don’t contain so many pixels therefore brings up the average value, since they are 

so well classified with BELclass. The under- and overestimated values for each class in the 

sum-up tables 6 and 7, is the percent in pixels, not the total RMS error. The detailed tables for 

each 10% interval are shown as tables 8 and 9.

Table 6. Percent pixels over- and underestimated, for BELclass (×100 for percent). 

Over-, correct- or underestimated for BELclass, method 2 

water urban agri marsh decid conif mire clear 

underestimated: 0.082 0.027 0.285 0.011 0.255 0.585 0.044 0.161 

correct +-5%: 0.918 0.968 0.708 0.970 0.702 0.409 0.929 0.808 

overestimated: 0.001 0.005 0.006 0.019 0.042 0.007 0.027 0.030 

average: 

underest. 0.181 

correct 0.802 

overest. 0.017 

Table 7. Percent pixels over- and underestimated pixels for BAYclass (×100 for percent). 

Over-, correct- or underestimated for BAYclass, method 2 


underestimated: 0.026 0.014 0.140 0.009 0.187 0.486 0.029 0.109 

correct +- 5%: 0.907 0.848 0.820 0.823 0.522 0.354 0.423 0.341 

overestimated: 0.067 0.139 0.040 0.169 0.291 0.160 0.548 0.550 

average: 

underest. 0.125 

correct 0.630 

overest. 0.245 

BELclass systematically underclassifies, as seen in table 8 or sum-up table 6. Figure 10 and 

11 visualizes the clear-cut forest class from table 8 and 9 to easier see how many pixels that 

are over- and underclassifed for BAYclass and BELclass and how much. 

30

Table 8. Percent pixels over- and underclassified for BELclass. For example 5.1% of 

agricultural areas are underestimated between 84-95%. 

RMS in 10 % intervals for BELclass, method 2 

water urban agri marsh decid Conif mire clear 

100-94% underestimated 0.012 0.004 0.070 0.002 0.004 0.099 0.004 0.010 

95-84% 0.0130.0030.051 0.001 0.008 0.096 0.003 0.009 

85-74% 0.009 0.002 0.036 0.001 0.013 0.077 0.002 0.011 

75-64% 0.008 0.0030.030 0.001 0.020 0.067 0.004 0.013 

65-54% 0.009 0.0030.025 0.001 0.027 0.064 0.004 0.017 

55-44% 0.007 0.002 0.020 0.001 0.027 0.047 0.004 0.014 

45-34% 0.006 0.002 0.013 0.001 0.024 0.030 0.003 0.012 

35-24% 0.004 0.002 0.012 0.001 0.031 0.032 0.004 0.017 

25-14% 0.005 0.002 0.012 0.001 0.040 0.031 0.006 0.022 

15-4% 0.009 0.002 0.016 0.002 0.060 0.0430.010 0.037 

5% underest. - 5% overest. 0.918 0.968 0.708 0.970 0.702 0.409 0.929 0.808 

6-15% overestimated 0 0.0030.004 0.009 0.025 0.004 0.021 0.016 

16-25% 0 0.001 0.001 0.004 0.010 0.002 0.005 0.007 

26-35% 0 0 0 0.003 0.004 0 0.001 0.003 

36-45% 0 0 0 0.001 0.002 0 0 0.001 

46-55% 0 0 0 0.001 0.001 0 0 0.001 

56-65% 0 0 0 0 0 0 0 0.001 

66-75% 0 0 0 0 0 0 0 0 

76-85% 0 0 0 0 0 0 0 0 

86-95% 0 0 0 0 0 0 0 0 

96-100% 0 0 0 0 0 0 0 0 

Procent pixlar 

Procent över- eller underklassade pixlar i hyggesklassen, klassad med BELclass 

0,900 

0,800 

0,700 

0,600 

0,500 

0,400 

0,300 

0,200 

0,100 

0,000 

100-94% underestimated 

95-84% 

85-74% 

75-64% 

65-54% 

55-44% 

45-34% 

35-24% 

25-14% 

15-4% 

5% underest. - 5% overest. 

6-15% overestimated 

16-25% 

26-35% 

36-45% 

46-55% 

56-65% 

66-75% 

76-85% 

86-95% 

96-100% 

10 % - intervall för över- och underklassat 

Figure 10. Percent under- and overclassified pixels for the clear-cut forest class with 

BELclass. 

31 

hyggen

Percent pixels 

Table 9. Percent pixels over- and underclassified for BAYclass. For example 1.4 % of 

agricultural areas are underestimated between 84-95%. 

Over- or underclassified, in 10 % intervals for BAYclass, sampling 2 


100-94% underestimated 0.001 0 0.012 0 0 0.001 0 0 

95-84% 0.001 0.001 0.014 0 0.002 0.005 0.001 0.002 

85-74% 0.001 0.001 0.012 0 0.004 0.014 0.001 0.005 

75-64% 0.002 0.001 0.0130 0.009 0.035 0.0030.010 

65-54% 0.0030.002 0.014 0.001 0.015 0.060 0.003 0.012 

55-44% 0.0030.001 0.013 0.001 0.022 0.081 0.004 0.015 

45-34% 0.002 0.001 0.011 0.001 0.025 0.083 0.004 0.015 

35-24% 0.003 0.002 0.013 0.001 0.031 0.080 0.004 0.015 

25-14% 0.004 0.002 0.015 0.001 0.036 0.069 0.004 0.016 

15-4% 0.007 0.0030.023 0.0030.044 0.058 0.005 0.019 

5% underest. - 5% overest. 0.907 0.848 0.820 0.823 0.522 0.354 0.423 0.341 

6-15% overestimated 0.036 0.081 0.014 0.087 0.152 0.074 0.217 0.172 

16-25% 0.011 0.018 0.009 0.025 0.070 0.0430.1730.226 

26-35% 0.008 0.009 0.006 0.013 0.038 0.022 0.094 0.121 

36-45% 0.005 0.006 0.005 0.009 0.018 0.011 0.036 0.026 

46-55% 0.002 0.005 0.002 0.008 0.009 0.005 0.014 0.004 

56-65% 0.002 0.004 0.002 0.005 0.004 0.0030.006 0 

66-75% 0.001 0.0030.001 0.005 0.001 0 0.003 0 

76-85% 0.001 0.0030.001 0.006 0 0 0 0 

86-95% 0 0.005 0.001 0.007 0 0 0 0 

96-100% 0 0.005 0 0.004 0 0 0 0 

1.000 

0.900 

0.800 

0.700 

0.600 

0.500 

0.400 

0.300 

0.200 

0.100 

0.000 

Percent pixels under- or overclassified in the 'clear' class with BAYclass 

100-94% underestimated 

95-84% 

85-74% 

75-64% 

65-54% 

55-44% 

45-34% 

35-24% 

25-14% 

15-4% 

5% underest. - 5% overest. 

6-15% overestimated 

16-25% 

26-35% 

36-45% 

46-55% 

56-65% 

66-75% 

76-85% 

86-95% 

96-100% 

Percent under- or overclassified intervals 

Figure 11. Percent under- and overclassified pixels for the clear-cut forest class with 

BAYclass. 

32 

clear

5.5 Average values 

If you look at the average value, in percent of the reference image, for each class classified 

with BELclass, in table 10, you see that no sampling performs very well. This can be due to a 

low amount of training sites. It is a limitation in the software that only allow 10 000 random 

points. The small classes need of up to 40 000 points to get about 100 training pixels for each 

class in the aggregated mask. So probably the result improves if the training sites are collected 

manually and separable signatures are extracted. BAYclass have some higher occurence and 

comes therefore closer to the true average value for some classes, but overclassifies for many 

classes as well. It is not so strange that BAYclass have higher occurence, since this module 

think that the trained classes are the only one existing. If there are other classes in the image, 

it puts them to the most look-alike class. BELclass on the other hand treats the pixel as 

belonging to an unknown class if the pixel doesn’t support any of the given classes. 

As seen in average table 11, BELclass classifies more of the image when more classes are 

added, 13% compared to 0.90% for eight classes. But it is in wrong classes the increase is 

done. 

33

Table 10. Average values for 8 classes in percent of reference image and percent of classified 

image. 

Average values for 8 classes (in %) Average values for 

classified with BELclass for the different samplings BAYclass and FUZclass 

for one sampling 

class reference training sites within in majority mask randomly BAYclass FUZclass 

image within % of majority % of and with wetland % of taken % of in majority in majority 

aggregated classified mask, classified signatures, classified training classified mask, mask, 

mask, area % of ref. area % of ref. area sites area % of ref. % of ref. 

% of ref. 

% of ref. 

water 6.03 0.98 7.88 0.78 6.90 0.75 6.44 0.03 2.8 5.81 6.16 

urban 2.11 0.37 2.95 0.35 3.10 0.35 3.03 0.38 42.6 5.71 9.23 

agri 23.3 4.04 32.57 2.98 26.46 3.15 27.21 0.00 0.4 15.47 17.03 

marsh 0.82 0.82 6.65 0.46 4.12 0.01 0.08 0.36 39.9 6.35 7.40 

decid 13.68 2.37 19.11 2.23 19.77 2.47 21.33 0.00 0.3 12.74 12.35 

conif 42.71 2.84 22.93 3.07 27.23 3.18 27.47 0.00 0.0 25.18 18.30 

mire 2.81 0.46 3.75 0.46 4.13 0.06 0.51 0.06 6.9 13.54 14.25 

clear 8.53 0.52 4.16 0.93 8.29 1.61 13.94 0.06 7.1 15.20 15.28 

Sum %: 100.00 12.40 100.00 11.26 100.00 11.58 100.00 0.90 100.0 100.00 100.00 

35

Table 11. Average values for 12 classes in percent of reference image and percent of 

classified image. 

Average values (in %) for 12 classes 

classified with BELclass and BAYclass for sampling 4 

class reference BELclass, 

BAYclass, 

image random training sites 

random training sites 

% of ref. % of classified % of classified area 

area 

water 5.67 0.16 1.16 12.07 

urban 1.99 0.01 0.04 5.98 

agri 21.93 0.00 0.00 5.68 

marsh 0.77 7.81 56.84 5.23 

decid 12.87 0.00 0.00 7.09 

conif 40.19 0.00 0.01 10.39 

mire 2.65 0.00 0.02 13.86 

clear 8.03 0.01 0.08 8.72 

rock 0.89 0.02 0.12 13.61 

barr 0.69 3.47 25.22 6.62 

peat 0.48 1.43 10.42 4.97 

grass 3.83 0.84 6.08 5.80 

Sum %: 100.00 13.74 100.00 100.00 

As can be seen in table 12, the number of training pixels does not matter when they are 

taken randomly. That is because the fuzzy partitioning matrix in the software doesn’t allow 

a not-square matrix and therefore an average value is calculated for each class. 

Table 12. Comparison of RMS errors with 3000 

training sample points and 10 000. 

RMS for 8 classes classified with BELclass 

classes randomly taken randomly taken 

trainingsites trainingsites 

RESURS 3000 RESURS 10000 

water 0.23 0.23 

urban 0.13 0.13 

agri 0.46 0.46 

marsh 0.08 0.08 

decid 0.27 0.27 

conif 0.59 0.59 

mire 0.13 0.13 

clear 0.22 0.22 

In figure 12-37 the reference images for all classes along with the result for each class for 

BAYclass and BELclass are shown for visual comparison. The agricultural class, the 

deciduous-, coniferous and clear-cut forest classes are labelled with shortenings in the figure 

texts. 

36

Figure 12. The agri.class 

from the reference image. 

Figure 15. The water class 


Figure 18. The urban class 



classified with BAYclass. 





37 


classified with BELclass. 




classified with BELclass.

Figure 21. The mire class 


Figure 24. The decid class 


Figure 27. The clear class 








38 






classified with BELclass.

Figure 30. The conif class 

form the reference image. 

Figure 33. The marsh class 






Figure 36. Uncertainty 

image for BAYclass. 

39 





Figure 37. Uncertainty 

image for BELclass.

6. Discussion 

The reason why the water is surprisingly bad classified in all tryouts, can be due to error in 

the reference image. One lake in the reference image is practically covered with reed in the 

RESURS images. Training pixels in that lake can reduce the training site average for the 

class, and make at least BELclass to not classify water pixels if to far away from the mean 

value. BAYclass handles the problem a bit better since it only chooses among the classes 

given. Coniferous forest and agricultural areas have the highest RMS errors. This can be due 

to the fact that these classes are not spectrally homogenous. Coniferous pixels can also have 

been classified as water, since water and coniferous are the darkest pixels in the image. One 

class that is surprisingly well classified is the urban class. This can be due to that time series 

are used. 

There are many reasons why the classifiers perform unsatisfactory. The major reason is 

the input data. The RESURS images only had three, or actually only two spectral bands with 

good enough radiometric quality. These were the two near infrared and the red bands. 

Therefore, two images from different times were used, to at least have six bands. But in one 

of these images the spectral bands did not fit the other so a geometric correction had to be 

done. All images were then geometrically corrected to each other and to the reference image. 

There can still be small errors in the geometry, which affects the classification and the 

evaluation. 

The reference data is not really true either. It has been developed using topographic maps to 

get the boundaries of the classes and then for some classes just classified within the mask. 

Some other classes have been extracted using GIS-operations. 

The reference image is based on interpretation, GIS-operations and maximum-likelihood 

classification, why the true percentage distribution is not really known. What is really 

between 50-100% is set to be 100%. So the assumptions about what really is 100% may not 

be true. The classes, which are mixed in the reference image, as for example mixed forest, are 

not considered in the validation. The content of each forest kind in a pixel is unknown. But 

they have not participated in the training phase either. 

40

7. Conclusions and recommendations 

The advantages of soft classifiers are that small classes will not vanish as with maximumlikelihood 

classification and they gives a measure, not in whole pixels, of the occurrence of 

the classes. 

To receive an acceptable classification result, the training areas need to be spectrally 

separable. This can be done with clustering or expert knowledge. There is also needed to be 

collected enough number of training areas to get the spectral variation of each class. 

If only red and near infrared bands are available, a time series is required to get enough 

information. The images can be chosen close in time to get rid of clouds and more important 

from different vegetation periods for a better classification. The satellite images also needs to 

have good geometric and radiometric quality. 

Forest and agricultural areas are hardest to classify because of the heterogeneous spectral 

signatures. Urban on the other hand is surprisingly well classified. This can be due to the time 

series. BAYclass is the soft classifier that gives the best result. 

There need to be done some improvements in the software to be able to use any of these 

soft classifiers in production. The main drawback is that the fuzzy partition matrix has to be 

square. So even if you take training sites with separable signatures manually, you have to 

calculate an average value for all your training sites for a certain class. This reduces the value 

of many training sites. 

The second drawback, if still wanting the computer to take training sites, is that not enough 

training sites can be taken for small classes. Not enough randomly spread out pixels will hit 

the areas where a member of a small class is situated. 

The third drawback if wanting to use images from different dates, is that the maximum 

number of spectral bands allowed is seven when using IDRISI. 

When these improvements are done, these soft classifiers can be a valuable tool when using 

medium- or coarse resolution satellite data to extract more information from the images. 

41

References 

Arora, M.K. and Foody, G.M. 1997, Log-linear modelling for the evaluation of the variables 

affecting the accuracy of probabilistic, fuzzy and neural network classification. 

International Journal of Remote Sensing 4:785-798 

Augusteijn, M.F. and Warrender, C.E. 1998, Wetland classification using optical and radar 

data and neural network classification. International Journal of Remote Sensing 8:1545- 

1560 

Bastin, L. 1997, Comparison of fuzzy c-means classification, linear mixture modelling and 

MLC probabilities as tools for unmixing coarse pixels. International Journal of Remote 

Sensing 17:3629-3648 

Bråkenhielm, S., Drakenberg, B., Hellgren, M., Holmgren, P., Karltun, E., Melkerud, P-A. 

and Olsson, M. 1998, Markinfo. Institutionen för skoglig marklära, SLU http://wwwmarkinfo.slu.se 

(visited October the 15 th 1999) 

Eastman, J.R. 1997, IDRISI for Windows Version 2.0: User´s Guide. Clark Labs, Clark 

University, Worcester, Main, USA 

euromap Satellitendaten- Vertriebsgesellschaft mbH, 1997, IRS-1C Handbook. 

http://www.euromap.de/ (visited October the 10 th 1999) 

Foody, G.M. 1996, Approaches for the production and evaluation of fuzzy land cover 

classification from remotely-sensed data. International Journal of Remote Sensing 7:1317- 

1340 

Foody, G.M. 1998, Sharpening fuzzy classification output to refine the representation of 

sub-pixel land cover distribution. International Journal of Remote Sensing 13:2593-2599 

Foody, G.M. 1999, The continuum of Classification Fuzziness in Thematic Mapping. 

Photogrammetric Engineering & Remote Sensing 4:443-451 

Foody, G.M. and Boyd, D.S. 1999 Fuzzy mapping of tropical land cover along an 

environmental gradient from remotely sensed data with an artificial neural network. 

Journal of Geographical Systems 1:23-35 

Fuller, D. and Fulk, M. 1998, An Assessment of Fire Distribution and Impacts during 1997 

in Kalimantan, Indonesia Using Satellite Remote Sensing and Geographic Information 

Systems. The George Washington University and Clark University 

Gualtieri, J.A. and Withers, J. 1988, Goddard researchers simulate neural networks using 

parallel processing. NASA Information Systems Newsletter 15:12-15 

Howald, K.J. 1989, Neural network image classification. Proceedings ASPRS-ACSM Fall 

Convention from Compass to Computer, Cleveland 207-215 

Hyyppä, J. 1999, WP 3200 State of the art in EO mapping, monitoring and data accessibility: 

Report on Earth Observation methods and data accessibility of mediumresolution 

satellite data. BALANS – Planning and management in the Baltic Sea Region 

with land information from EO, Environment and Climate Programme, Area 3.3: Centre for 

Earth Observation. 

Kanellopoulos, I., Varfis, A., Wilkinson, G.G. and Mégier, J. 1991, Classification of 

Remotely-Sensed Satellite Images Using Multi-Layer Perceptron Networks in Proceedings 

of the 1991 International Conference on Artificial Neural Networks (ICANN-91), Espoo, 

Finland (Kohonen, T., Mäkisara, K., Simula, O. and Kangas, J.,eds.) Elsevier Science 

Publications (North-Holland) 2:1067-1070 

42

Kanellopoulos, I. 1997, Use of Neural Networks for Improving Satellite Image Processing 

Techniques for Land Cover/Land Use Classification, SUPCOM'95 LOT 16. Provision of 

Statistical Services – Support to the Commission EUROSTAT Remote Sensing and 

Statistics Programme. Institute for Systems, Informatics and Safety (Neural Networks 

Laboratory) & Space Applications Institute (EMAP Unit - Advanced Methods Sector), 

European Commission, Joint Research Centre I-21020 Ispra (VA), Italy 

http://ams.egeo.sai.jrc.it/eurostat/Lot16-SUPCOM95/ (visited September the 15 th 1999) 

Langaas, S. 1996, Transboundary European GIS databases: A review of the Baltic Sea 

region experiences, GISDATA Specialist Meeting on Geographic Information: the 

European Dimension. BALANS – Planning and management in the Baltic Sea Region with 

land information from EO, Environment and Climate Programme, Area 3.3: Centre for 

Earth Observation. 

Lillesand, T.M. and Kiefer, R.W. 1994, Remote Sensing and Image Interpretation. John 

Wiley & Sons, Inc. 587-618 

Malmberg, U. and Furberg, O. 1997 Klassnings- och kompositmetoder för medelupplösande 

satellitdata, Pilotstudie med RESURS-data över centrala Östersjöområdet. Swedish Space 

Corporation, Rymdbolaget X-PUBL-11 

Mannan, B. and Roy, J. and Ray, A.K. 1998, Fuzzy ARTMAP supervised classification of 

multi-spectral remotely-sensed images. International Journal of Remote Sensing 4:767-774 

Manyara, C.G. and Lein, J.K. 1994, Exploring the Suitability of Fuzzy Set Theory in Image 

Classification: A Comparative Study Applied to the Mau Forest Area Kenya. 

ASPRS/ACSM, http://www.odyssey.maine.edu/gisweb/spatdb/acsm/ac94044.htm 

(visited September the 16 th 1999) 

Maselli, F. And Rodolfi, A. and Conese, C. 1996, Fuzzy classification of spatially degraded 

Thematic Mapper data for the estimation of sub-pixel components. International Journal 

of Remote Sensing 3:537-551 

Olsson, B. 1999, BALANS Land Cover Information for the Baltic Sea Drainage Basin. 

BALANS – Planning and management in the Baltic Sea Region with land information from 

EO, http://www.ssc.se/rst/rss/BALANS/ (visited October th 15 th 1999) 

Palubinskas,G. 1994, Post-processing of fuzzy classification of forests for Landsat TM 

imagery. EGIS Foundation 

Pouncey, R. and Flynn, T. and Curry, K. 1997, ERDAS IMAGINE Ver. 8.3.1 On-Line 

Manuals. ERDAS, Inc. 

Ryan, T.W., Sementelli, P.J., Yuen, P. and Hunt, B.R. 1991, Extraction of shoreline features 

by neural nets and image processing. Photogrammetric Engineering and Remote Sensing 

7:947-955 

SSC Rymdbolaget, 1998, Produktspecifikation av landtäckedata version 1.0. SSC 

SSC Satellitbild, 1999, RESURS-O1 Technical Specifications., 

http://www.ssc.se/sb/RESURS/technic.html (visited October the 11 th 1999) 

Zhang, J. and Foody, G.M. 1998, A fuzzy classification of sub-urban land cover from 

remotely sensed imagery. International Journal of Remote Sensing 14:2721-2738 

43

Appendix 1: Action steps for soft classification 

This is an example for water. 

Actions Input Module Output 

Make a water mask reference_gripen RECLASS rcwater 

in 12.5 m pixelsize 

Aggregate to a pixel rcwater CONTRACT aggwater 

size of 150 m 

Scale the pixel values aggwater SCALAR scwater 

to be in range 1-100, 

but still real values 

Convert the pixel values scwater CONVERT cvwater 

to byte binary 

For method 2 and 3: 

Mask out 75-100 % of cvwater RECLASS majwater 

a class, 90-100% for cvdecid… majdecid… 

water 

Convert the majority majwater CONVERT cv2water 

masks to integer binary majdecid… cv2decid… 

Take out sample points majwater SAMPLE vecwater 

randomly to be majdecid… vecdecid… 

training sites from the 

mask. 

Initialize images with aggwater INITIAL trswater 

the same parameters as trsdecid… 

an aggregated image 

The initialized image vecwater POINTRAS trswater 

updated with rasterized vecdecid… trsdecid… 

sample points 

Reclass the sample trswater RECLASS rc2water 

points to 1 and 0 trsdecid… rc2decid… 

Overlay to extract just majwater, rc2water OVERLAY ovwater 

the sample points which majdecid, rc2decid… ovdecid… 

are within the mask 

44

Numbering the sample ovwater GROUP grwater 

points ovdecid… grdecid… 

Reclass the sample grwater RECLASS grwater1 

points for one class to grdecid… grdecid1… 

only one classnumber 

Make an 8-layer image, grwater1 OVERLAY gr1_8 

(if you have 8 classes), grdecid1… 

with the classes (sample 

point groups). 

Extract an average value grwater1, aggwater, EXTRACT valwater 

for each class in each grdecid1, aggdecid… valurban… 

class mask. 

Create a fuzzy partitioning matrix. In Database Workshop 

Number 8 rows for the use of 8 classes. -Number the rows. One more 

Name 8 columns with class names, row comes up with the down 

e.g water, decidious. arrow. To create one more 

column: 

-Modify 

-Add field 

Real (4 byte) and class name 

Put in the values in the valwater In Database Workshop 

matrix valurban… -File 

-import/export Idridi values file 

free format ASCII (VAL file) 

valwater… and corresponding 

class 

Link the matrix to the gr1_8, water In Database Workshop fzwater 

training sites (sample gr1_8, decid… -Assign field values to fzdecid…. 

points groups) image gr1_8 fzwater.. 

and corresponding class 

Create the fuzzy gr1_8, sig.names, FUZSIG water.sig 

signatures satellitebands decid.sig…. 

Classify with different signature names MAXLIKE mlclass 

hard/soft classifiers BAYCLASS baywater… 

BELCLASS belwater… 

FUZCLASS fuzwater… 

45

Calculate the over- baywater, aggwater OVERLAY errwater 

and underclassified baydecid, aggdecid… errdecid… 

pixel values 

Add 1 to the extracted errwater SCALAR sc2water 

error-values and errdecid… sc2decid… 

multiply with 100 to 

have positive values 

between 1 and 200 

Convert the pixel values sc2water CONVERT cv3water 

to byte binary sc2decid… cv3decid… 

Export the error images and classification images to Imagine to receive pixel information: 

Export the images to cv3water, baywater -File expwater.img 

ERDAS Imagine cv3decid… - Export ex2water.img 

-Export 

-Software-Specific Formats 

-ERDIDRIS 

-IDRISI to ERDAS 7.4 

name of file to export and name in 

Imagine with extension .img 

In Imagine: 

-Open the image 

-info i 

-compute statistics 

In the viewer: 

-Raster 

-Attributes 

-Edit 

Mark the column 

-Copy 

Copy the pixel information into excel to calculate RMS and average values for each class 

and each module. 

In Excel: 

-Paste the number of pixels in the excel-file “mall_RMS.xls” and “mall_medeltal.xls” 

46

Appendix 2: Macrofile- scheme 

8 classes, aggregated mask 

NEWMASK 

8 classes, majority mask 

MACMAJ_3 

MACMAJ_5 

MACR_8_2 

TRSITES3 

MACR8_4 

MCAGG8_5 

8 classes, miresig, aggregated mask 

NEWMASK2 

MACR8_6 , create matrix 

FUZSIG (FUZSIG4 if Landsat TM) 

MAC8_BAY MAC8_BEL MAC8_FUZ MAC8_ML 

Export to Imagine and Excel. eval.. for RMS for random 

cv10.. for average for random 

cv4.. for RMS other methods 

cv3.. for average other methods 

47 

8 classes, random 

MAC_RD_2 , create matrix 

FUZSIG2 (FUZSIG3 IF Landsat TM)

TRITA-GEOFOTO serien utges av: 

Kungl Tekniska Högskolan 

Institutionen för Geodesi och Fotogrammetri 

100 44 STOCKHOLM 

Drottning Kristinas väg 30 

Telefon: 08-7907342 

Telefax: 08-7907343 

URL: http://www.geomatics.kth.se/ 

TRITA-GEOFOTO-serien 1999 - The TRITA-GEOFOTO series, 1999 

1999:1 Friborg, Klas. Differential GPS in A Tested Mobil Environment – An Accuracy Study. 

March 1999. (Examensarbete i geodesi nr 3061. Handledare: Horemuz) 

1999:2 Torlegård, Kennert. Analytisk fotogrammetri och dess felteori. (Kompendium) 

1999:3 Horemuz, Milan and Lars E Sjöberg. Quick GPS Ambiguity Resolution For Short 

And Long Baselines. March 1999. 

1999:4 Anna-Maria Dorfh och Maria Nilsson. Digitalt ortofoto för ajourhållning av 

baskarta. April 1999. (Examensarbete i fotogrammetri nr 112. Handledare: Anders 

Boberg) 

1999:5 Rose-Marie Kvål och Annica Larsson. Algal Bloom Monitoring and Detection in the 

Baltic Sea, the Usefulness of Marine Sensors Compared to a Meteorological Sensor. April 

1999. (Examensarbete i fotogrammetri. Handledare: Boberg, Lundén och Rud) 

1999:6 Krzysztof Gajdamowicz. Automated Processing of Georeferenced Colour Stereo Images 

for Road Inventory. April 1999. (Doctoral dissertation) 

1999:7 Daniel Öhman. Automatic Text Localisation on Traffic Signs from Colour Stereoscopic 

Image Sequences. April 1999. (Examensarbete i fotogrammetri nr 113. Handledare: Boberg, 

Gajdamowicz) 

1999:8 Stefan Wulcan. Samlingskarta - En förstudie avseende samordning av ledningar inom 

Solna. May 1999. (Examensarbete. Handledare: Hauska) 

1999:9 Magnus Johansson och Christer Bylund. Landhöjning i södra Sverige 

beräknad från tredje precisionsavvägningen och RH70 höjder. Juni 1999. (Examensarbete i 

geodesi nr 3062. Handledare: Fan) 

1999:10 Jan Danielsen. Determination of Land Uplift in areas not covered by repeated levellings. 

With application to South-Norway. Juni 1999. (Licentiate thesis) 

1999:11 Mattias Pettersson. Utvärdering av vektoriseringsprogram. Juni 1999. 

(Examensarbete i geoinformatik. Handledare: Hauska) 

1999:12 Anders Vidén. Semiautomatiserad fotogrammetrisk mätning av signalerade punkter, linjeoch 

ellipsobjekt i MIDOC, Multi Image Documentation.". Juni 1999. (Examensarbete i 

fotogrammetri nr 114. Handledare: Axelsson) 

1999:13 Albert Morales Pérez and Pere Carmona Ortega. Movements monitoring of a high 

building. Juni 1999. (Examensarbete i geodesi nr 3063. Handledare: Horemuz)

1999:14 Lina Nyberg och Andreas Prevodnik. A Pilot Study of Soil Erosion in the Close 

Environment of Idrija, Western Slovenia. An Application of Geographical Information 

Systems. December 1999. (Examensarbete i geoinformatik. Handledare: Henkel) 

1999:15 Sanna Sparr Olivier. Évaluation de la possibilité d'utiliser des images 

satellites à haute résolution et le GPS différentiel pour la cartographie en 

haute montagne - Avec réalisation d'un prototype. Augusti 1999. 

(Examensarbete i fotogrammetri nr 115. Handledare: Boberg) 

1999:16 Rannala, Marek. Surveying of road geometry with Real Time Kinematics GPS. Oktober 

1999. (Examensarbete i geodesi nr 3064. Handledare: Horemuz) 

1999:17 Klang, Dan. Reconstruction of Geometric Road Data Using Remotely Sensed Imagery. 

Oktober 1999. Photogrammetric Reports No 67. (Doctoral Dissertation) 

TRITA-GEOFOTO-serien 2000 - The TRITA-GEOFOTO series, 2000 

2000:1 Carballo, Gabriel. Statistically-based multiresolution network flow phase unwrapping in 

SAR interferometry. January 2000. Photogrammetric Reports No 68. (Doctoral Dissertation) 

2000:2 Jonas Andersson och Jens Hedlund. Geografiska analyser över Internet. Januari 2000. 

(Examensarbete i geoinformatik. Handledare: Roslund) 

2000:3 Thomas Rehders. Noggrannhetsstudier vid RTK-mätning och kvalitetsundersökning av GPSmottagare. 

Februari 2000. (Examensarbete i geodesi nr 3065. Handledare: Horemuz) 

2000:4 Åsa Hesson. Publicering av historiska kartor på Internet. Februari 2000. (Examensarbete i 

geoinformatik. Handledare: Hauska) 

2000:5 Malin Skagerström och Karin Wiklander. Extension of a control network in dar es salaam 

using GPS-technology. Februari 2000. (Examensarbete i geodesi nr 3066. Handledare: 

Stoimenov) 

2000:6 Maria Sjödin och Anna Strid. Automatisk generalisering med objektorienterad kartdatabas. 

Februari 2000. (Examensarbete i geoinformatik. Handledare: Hauska) 

2000:7 Per Koldestam. IT-projektet Ortnamnslinjen. Mars 2000. (Examensarbete i geoinformatik. 

Handledare: Hauska). 

2000:8 Anna Källander och Elin Söderström. Deformationsanalys i horisontal- och vertikalled på 

Slussen-konstruktionen. Mars 2000. (Examensarbete i geodesi nr 3067. Handledare: Horemuz 

och Lewen) 

2000:9 Anna Haglund. Towards soft classification of satellite data - A case study based upon Resurs 

MSU-SK satellite data and land cover classification within the Baltic Sea Region. Mars 2000. 

(Examensarbete i geoinformatik. Handledare: Roslund och Langaas).

Towards soft classification of satellite data A case study based ... - KTH

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?