On the Automatic Analysis of Stellar Spectra - Armagh Observatory

On the Automatic Analysis of Stellar 

Spectra 

A thesis submitted for the degree of 

Doctor of Philosophy 

by 

Christopher Winter, B.Eng. 

Armagh Observatory 

Armagh, Northern Ireland 

& 

Faculty of Science and Agriculture 

Department of Pure and Applied Physics 

The Queen’s University of Belfast 

Belfast, Northern Ireland 

March 2006

“Quia non erit impossibile apud Deum omne verbum”

To Stacey 

“Qui invenit mulierem invenit bonum 

et hauriet iucunditatem a Domino”

Acknowledgements 

I would like to acknowledge and thank my supervisor, C.S. Jeffery, for his sound advice 

and direction over the course of this project, and the staff and students of the Armagh 

Observatory for their helpful support and assistance. 

I am very grateful to J.S. Drilling, E.M. Green, and A. Ahmad, all of whom supplied 

spectroscopic data that was used in this project. In addition, my thanks go to C.A.L 

Bailer-Jones for the use of his neural network code, STATNET. 

This work was carried out as part of the CosmoGrid project, funded under the 

Programme for Research in Third Level Institutions (PRTLI) administered by the Irish 

Higher Education Authority under the National Development Plan and with partial 

support from the European Regional Development Fund. 

This work also uses data from the Sloan Digital Sky Survey (SDSS) data archive. 

Funding for the creation and distribution of the SDSS Archive has been provided by the 

Alfred P. Sloan Foundation, the Participating Institutions, the National Aeronautics 

and Space Administration, the National Science Foundation, the U.S. Department of 

Energy, the Japanese Monbukagakusho, and the Max Planck Society. The SDSS Web 

site is http://www.sdss.org/. 

The SDSS is managed by the Astrophysical Research Consortium (ARC) for the Participating 

Institutions. The Participating Institutions are The University of Chicago, 

Fermilab, the Institute for Advanced Study, the Japan Participation Group, The Johns 

Hopkins University, the Korean Scientist Group, Los Alamos National Laboratory, 

the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics 

(MPA), New Mexico State University, University of Pittsburgh, University of 

Portsmouth, Princeton University, the United States Naval Observatory, and the University 

of Washington. 

Chris Winter 

March, 2006 

iii

Abstract 

This project investigates the problem of automatically searching for and analysing 

astronomical spectra from large data sets. The three core problems of (1) spectral classification, 

(2) physical parameterisation, and (3) searching are examined, and a generalisable 

set of tools is established based on the techniques of artificial neural networks 

(ANNs), χ 2 minimisation, and principal components analysis (PCA). These tools are 

then applied to the archives of the Sloan Digital Sky Survey (SDSS) to automatically 

search for and analyse the spectra of hot subdwarf stars. 

Spectral classification is tackled by the versatile statistical machine learning method 

of ANNs. An ANN is trained to classify hot subdwarf spectra onto the classification 

system defined by Drilling et al. (2006), obtaining global errors (σ rms ) of ∼ 2 subtypes 

for spectral type, ∼ 1 subclass for luminosity class, and ∼ 4 subclasses for the helium 

class. These errors are in line with accuracies achieved by human classifiers. 

Physical parameters are obtained by fitting observations to grids of theoretical models 

using a χ 2 minimisation procedure. A new methodology has been developed for 

managing and indexing large grids of theoretical models in the χ 2 minimisation code, 

SFIT. Concepts from the field of computational geometry are used to remove several 

limitations from this code, and pave the way for its use in a distributed parallel 

computing environment. 

Searching for the spectra of a particular type of object in large, unknown data sets 

is accomplished using the multivariate statistical technique, PCA. The mechanics of 

this tool are outlined, and its use demonstrated by searching for hot subdwarf spectra 

in the SDSS. This solution provides a means to reduce unknown data sets to quantities 

suitable for visual inspection. 

282 spectra of hot subdwarf candidates are obtained from the SDSS and analysed. 

The results evidence several unexplained phenomena of extended horizontal branch 

stars, namely: 1) the existence of the second horizontal branch gap of Newell (1973); 

2) two sdB n He –T eff sequences; and 3) a clustering of hot, helium rich stars at T eff ≈ 

44,000K, log g = 5.7. These findings pose important questions for stellar evolution 

theory in the realms of the extended horizontal branch. 

v

Contents 

Acknowledgements 

iii 

Abstract 

v 

List of Tables 

xii 

List of Figures 

xvi 

1 Introduction 1 

1.1 Astronomical Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 3 

1.2 Large Data Sets And Their Sources . . . . . . . . . . . . . . . . . . . . . 6 

1.3 Astronomical Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

1.3.1 Types Of Objects And Their Spectra . . . . . . . . . . . . . . . . 13 

1.3.2 Automatic Methods of Analysis . . . . . . . . . . . . . . . . . . . 17 

1.4 Hot Subdwarf Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 

1.4.1 Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 

1.4.2 Stellar Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

1.4.3 Why Study Them? . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

1.4.4 Why Search For Them In The SDSS? . . . . . . . . . . . . . . . 26 

1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

2 Classification - Artificial Neural Networks 29 

2.1 Classifying Hot Subdwarfs . . . . . . . . . . . . . . . . . . . . . . . . . . 32 

vii

viii 

CONTENTS 

2.1.1 The Training Sample . . . . . . . . . . . . . . . . . . . . . . . . . 33 

2.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 

2.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 

2.2 Physical Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

2.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 

2.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

3 Parameterisation - χ 2 Fitting 51 

3.1 Analysing Stellar Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . 51 

3.2 SFIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 

3.2.1 Limitations of SFIT . . . . . . . . . . . . . . . . . . . . . . . . . 57 

3.2.2 Proposal to Remove SFIT’s Limitatons . . . . . . . . . . . . . . 58 

3.3 Tetrahedralisation: Interpolation and Indexing . . . . . . . . . . . . . . 62 

3.3.1 Simplex Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 62 

3.3.2 Grid Index - Delaunay Triangulation . . . . . . . . . . . . . . . . 64 

3.3.3 Navigating the Index - Point Location . . . . . . . . . . . . . . . 67 

3.4 Testing the Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . 72 

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

4 Filtering - Principal Components Analysis 81 

4.1 Constructing A PCA-Based Filter . . . . . . . . . . . . . . . . . . . . . 83 

4.1.1 Mathematics of PCA . . . . . . . . . . . . . . . . . . . . . . . . . 84 

4.1.2 Building A Hot Subdwarf Filter . . . . . . . . . . . . . . . . . . 86 

4.2 Searching the SDSS for Hot Subdwarfs . . . . . . . . . . . . . . . . . . . 95 

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 

5 Application I - SDSS Hot Subdwarfs 107 

5.1 Search Criteria And Data Sets . . . . . . . . . . . . . . . . . . . . . . . 107

CONTENTS 

ix 

5.2 PCA Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 

5.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

5.4.1 Parameterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

5.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 

5.4.3 Radial Velocities . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 

5.5 Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 

5.6 Analysis of PCA Filter Efficiency . . . . . . . . . . . . . . . . . . . . . . 123 

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 

6 Application II - Other Data Sets 131 

6.1 2MASS-Selected Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 

6.2 SDSS sdB-He Stars of Harris et al. (2003) . . . . . . . . . . . . . . . . . 137 

6.3 Ahmad & Jeffery (2003) He-sdBs . . . . . . . . . . . . . . . . . . . . . . 138 

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 

7 Conclusions And Future Work 141 

Bibliography 152 

Appendices 161 

A Results for 192 Drilling et al. (2006) Hot Subdwarfs 163 

B Results for 282 SDSS DR3 Hot Subdwarf Candidates 175 

C Results for 83 2MASS-Selected Hot Subdwarf Candidates 189 

D The Armagh Observatory Cluster 193 

D.1 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 

D.2 Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 

On the Automatic Analysis of Stellar Spectra

x 

CONTENTS 

D.3 MPICH 1.2.4 RPM Spec File . . . . . . . . . . . . . . . . . . . . . . . . 202 

E LTE-CODES 207 

E.1 Directory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 

E.2 Build System Organisation . . . . . . . . . . . . . . . . . . . . . . . . . 209 

E.3 Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

List of Tables 

2.1 Results of the leave-one-out procedure as applied to a committee of five 

901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations. . . 38 

2.2 As Table 2.1, but for the committee of five 901:5:5:3 ANNs. . . . . . . 39 

2.3 Results of parameterising the 60 calibration stars. . . . . . . . . . . . . 45 

2.4 A comparison between ANNs and χ 2 minimisation for parameterising 

the 133 unparameterised stars. . . . . . . . . . . . . . . . . . . . . . . . 49 

3.1 Details of the model grid used in the comparison . . . . . . . . . . . . . 72 

3.2 Initial parameters used for the Amoeba and Levenberg-Marquardt optimisation 

routines. The step sizes used for Amoeba are also given . . . . 73 

3.3 Results of BD+10 2179 analysis with the unmodified version of SFIT . 73 

3.4 Results of BD+10 2179 analysis with the modified version of SFIT . . . 74 

3.5 The model grid used to obtain physical parameters of the set of test 

models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 

3.6 RMS comparison of parameterisation results from each interpolation 

method with the original parameters of each model. Also given is the 

RMS difference between the methods, and a comparison between the 

results in the region of parameter space for which both schemes seem to 

give their best results (see Figures 3.6 and 3.7). . . . . . . . . . . . . . 79 

5.1 Summary of data quantities obtained from the SDSS DR3. . . . . . . . 108 

5.2 The model grid used to obtain physical parameters from the SDSS hot 

subdwarf candidates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

xi

xii 

LIST OF TABLES 

6.1 Parameters of the two calibration stars as obtained by χ 2 -fitting to NLTE 

(Green et al., 2006) and LTE (Armagh) model atmospheres. Formal 

errors are given in parentheses. . . . . . . . . . . . . . . . . . . . . . . . 133 

6.2 Classification results for the sdB-He stars of Harris et al. (2003). . . . . 137 

6.3 Classification results for the Ahmad & Jeffery (2003) He-sdBs. . . . . . 140 

A.1 Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs . 164 

B.1 Results for 282 SDSS Hot Subdwarf Candidates . . . . . . . . . . . . . . 176 

C.1 Results for 83 2MASS-Selected Hot Subdwarf Candidates . . . . . . . . 189

List of Figures 

1.1 A stellar spectrum (top), and a galaxy spectrum (bottom). (Taken from 

the SDSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

1.2 Example of a quasar (top) and carbon star (bottom) spectrum. (Taken 

from the SDSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

1.3 The emission spectrum of the Orion nebula (M42). . . . . . . . . . . . 16 

1.4 Examples from each hot subdwarf spectrographic subgroup. Classifications 

listed are those from Drilling et al. (2006). . . . . . . . . . . . . . 20 

1.5 Schematic temperature-luminosity diagrams showing: a) the positions 

of stars belonging to the main stellar groups; b) the normal sequence of 

stellar evolution experienced by a star of a few solar masses; c) possible 

evolution of an sdB star in a binary system. (Diagram courtesy of C.S. 

Jeffery). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

2.1 The training sample shows clustering in certain regions of the classification 

space. For clarity, points have been offset by small random shifts in 

both coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

2.2 Results of the leave-one-out procedure for both ANN architectures at the 

near-optimal training time of 300 iterations for the 901:10:3 architecture 

(left column), and 500 iterations for the 901:5:5:3 architecture (right 

column). Also plotted is the best-fit linear least squares line. . . . . . . 41 

2.3 Parameterisations of the 60 calibration stars. Results from each method 

have been combined onto each plot. ANN results are indicated by blue 

crosses, and χ 2 minimiser results by red pluses. . . . . . . . . . . . . . 46 

2.4 Parameterisations of the 133 unparameterised stars using the ANNs and 

χ 2 minimiser. Also shown is the best-fit linear least squares line. . . . . 48 

xiii

xiv 

LIST OF FIGURES 

3.1 Example of a k-D tree in two dimensions. On the left is the representation 

of how the k-D tree on the right splits up the x,y plane. (Adapted 

from Moore 1991.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

3.2 A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is 

a tetrahedron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 

3.3 In two dimensions, the Delaunay triangulation guarantees that no other 

points lie in the circumcircle of any simplex. . . . . . . . . . . . . . . . 65 

3.4 The line segment, L, is constructed using the centroid of the starting 

tetrahedron, T, and the interpolation point, p. The tetrahedra visited 

on the walk-through are coloured grey. . . . . . . . . . . . . . . . . . . 68 

3.5 Parameterisation results from the linear interpolation in tables method. 

Clearly visible are anomalous results arising from a suspected defect in 

the method’s implementation. . . . . . . . . . . . . . . . . . . . . . . . 76 

3.6 Parameterisation results from the linear interpolation in tables method. 

Axes have been restricted to give a view of the grid boundaries described 

in Table 3.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 

3.7 Parameterisation results from the simplex-based interpolation scheme. 

In contrast with Figures 3.5 and 3.6, the simplex-based scheme clearly 

restricts the optimisers to the grid boundaries. . . . . . . . . . . . . . . 78 

4.1 Principal component analysis. u 1 is the first principal component and 

the axis onto which the projected positions of the data have their maximum 

sum. u 2 is the second principal component, and u 1 · u 2 = 0. . . . 83 

4.2 Mean spectrum of the Drilling et al. (2006) sample. . . . . . . . . . . . . 87 

4.3 First five PCs of the Drilling et al. (2006) sample. . . . . . . . . . . . . 89 

4.4 Second five PCs of the Drilling et al. (2006) sample. . . . . . . . . . . . 90 

4.5 Cumulative variance of the first ten PCs of the Drilling et al. (2006) 

sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 

4.6 Illustration of projecting hot subdwarf spectra onto the first four PCs of 

the Drilling et al. (2006) standards. . . . . . . . . . . . . . . . . . . . . . 93 

4.7 Histogram of reconstructions errors from the SDSS data sample. . . . . 96 

4.8 Spectra in first three reconstruction error histogram bins (R ≤ ∼ 3.0). . 97


xv 

4.9 Spectra in first three reconstruction error histogram bins (R ≤ ∼ 3.0). . 98 

4.10 Sample of spectra from the eighth error bin (R ∼ 3.0). . . . . . . . . . . 100 

4.11 Sample of spectra from the fourteenth error bin (R ∼ 4.5). . . . . . . . 101 

4.12 Sample of high S/N DA white dwarfs from the 22 nd − 24 th error bins 

(R ∼ 6.4 − 7.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 

4.13 Sample of spectra from the fifty-third error bin (R > 15.0). . . . . . . . 103 

5.1 Histogram of reconstruction errors for the colour-colour selected SDSS 

sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 

5.2 Parameterisation results of the 282 SDSS hot subdwarf candidates. The 

helium main sequence of Paczyński (1971), and post-EHB evolutionary 

tracks of Dorman et al. (1993) are also plotted. . . . . . . . . . . . . . 112 

5.3 Four example fits from the 282 SDSS hot subdwarfs. The classification 

and physical parameters (T eff (K), log g, log(n He /n H )) obtained for each 

star are printed in the lower corners of each plot. . . . . . . . . . . . . 113 

5.4 The results of applying a kernel density estimate analysis to the data 

from Figure 5.2. The low-density at T eff ≈ 22,500K is prominent, along 

with another possible low-density region at T eff ≈ 41,000K. . . . . . . . 114 

5.5 Classification results of the 282 SDSS hot subdwarf candidates. Points 

have been given small random offsets in each axis for clarity. . . . . . . 117 

5.6 A comparison of the ANN classifications of the 282 SDSS hot subdwarf 

candidates (left-most plots) with all the stars classified by Drilling et al. 

(2006) (right-most plots). Points have been given small random offsets 

in each axis for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 

5.7 A calibration of the ANN classifications onto the Drilling et al. (2006) 

system using the 282 SDSS hot subdwarf candidates. . . . . . . . . . . 119 

5.8 The distribution of SDSS-derived redshifts of the 282 hot subdwarf candidates. 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 

5.9 Examples of white dwarf and BHB contaminants. A - BHB star with 

deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines 

due to high surface gravity. C - DB white dwarf. D - Uncertain (some 

evidence of weak carbon absorption, so possibly a DQ white dwarf). . . 125 


xvi 


5.10 This gray-shaded region of the log g–T eff plane represents an area of good 

probability that the stars within it are subdwarfs. . . . . . . . . . . . . 126 

5.11 TP rates (red) and FP rates (blue) of the PCA filter as a function of 

the reconstruction error threshold, R. The green curve is the difference 

between the TP and FP rates. . . . . . . . . . . . . . . . . . . . . . . . 127 

5.12 A closer examination of the TP and FP rates. The peak in the green 

TP-FP curve occurs at R ∼ 7.0 and signifies the optimum value for R 

in the SDSS sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 

6.1 SFIT physical parameters for 2MASS-selected sample. The helium main 

sequence of Paczyński (1971), and post-EHB evolutionary tracks of Dorman 

et al. (1993) are also plotted. . . . . . . . . . . . . . . . . . . . . . 134 

6.2 ANN classification for 2MASS-selected sample. Points have been given 

small random offsets in each axis for clarity. . . . . . . . . . . . . . . . 135 

6.3 The stars assigned late-A and early-F spectral types by the neural network. 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 

6.4 Comparison of ANN classifications with those of Drilling et al. (2006) 

for the 17 He-sdBs of Ahmad & Jeffery (2003). Points have been given 

small random offsets in each axis for clarity. Also plotted is the best 

fit least squares regression line with error bars showing the RMS of the 

residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 

7.1 Schematic diagram showing how the work of this thesis fits in with the 

wider system envisaged by Jeffery (2003). . . . . . . . . . . . . . . . . . 149

Chapter 1 

Introduction 

The spectroscopy of light from astronomical objects is one of the most important 

methods for understanding the physics at work in the universe. Many fundamental 

parameters of those objects can be determined by analysing their spectrum, including 

temperature, chemical composition, motion, and other clues about their origin and 

evolution. 

Advances in information technology over the past 35 years, and their subsequent influence 

on observational methods, have allowed spectroscopic studies of unprecedented 

numbers of objects to be carried out over a short period of time. Modern astronomy 

is now about dealing with very large quantities of data, and the problems associated 

with its management and analysis. 

This project develops a collection of tools to assist astronomers in data mining large 

sets of astronomical spectra. The tools are general in nature, and can be used to search 

for and automatically study the spectra of potentially any type of astronomical object. 

Together, the tools form a semi-automatic pipeline allowing a fast progression from 

large quantities of unknown spectra to useful scientific results. 

In the past, studies of automatic methods of spectral analysis have mainly centred 

around the problem of object classification. This makes sense from the point of view 

1

2 Chapter 1 - On the Automatic Analysis of Stellar Spectra 

of a survey mission because it is desirable to know what types of objects have been 

observed, with particular interest being paid to those objects not falling into any known 

category. 

However, the individual astronomer, studying a particular type of object, is not 

always interested in large-scale classification. He needs a way to search exclusively 

for samples in a data set which are most like his object of interest. Once located, 

those samples are likely to exist in large enough numbers to require further automatic 

assistance in their analysis. 

The techniques needed to help solve this problem already exist in the field, but they 

have not yet been brought together and adapted to form any sort of useful, coherent 

system. As such, scientific insights contained in large data sets remain mostly untapped. 

The work in this project represents what seems to be the first attempt at rectifying 

this issue. Three major algorithms are employed to construct a general data mining 

tool set. 

1. Principal Components Analysis is applied in a supervised classification role to 

create a filter that can help search for a specific type of object in an unknown 

data set. 

2. Artificial Neural Networks have been shown to be a robust and versatile tool for 

many tasks in astronomy. They are used here to provide spectral classifications. 

3. χ 2 minimisation is used to derive physical parameters for spectra by fitting them 

to grids of theoretical models. 

Additional minor tools to facilitate data processing, management, and visualisation 

are also prototyped. 

Furthermore, a new and original methodology has been developed to extend the 

functionality of the χ 2 minimisation code, SFIT, used at the Armagh Observatory.

1.1 Astronomical Data Mining 3 

The code is modified using concepts from the field of computational geometry to allow 

the use of arbitrarily large, three-dimensional grids of theoretical models. This removes 

several severe limitations from the program, and prepares it for further modification to 

permit its use in a distributed computational environment. 

The specific outcome of this project is a set of general tools which can be used 

to study the spectra of any astronomical object, and a “real-world” demonstration 

of these tools through their application to search for and analyse the spectra of hot 

subdwarf stars from the archives of the Sloan Digital Sky Survey. The results evidence 

several unexplained phenomena of extended horizontal branch stars that pose important 

questions for the theory of stellar evolution. 

The work undertaken in this project is a step towards the larger computational 

framework of Jeffery (2003) which outlines a wider system incorporating the management 

of atomic data, dynamic generation and storage of grids of theoretical models, 

parameter space visualisation, and automated analysis. The use of distributed computational 

resources, such as the Grid, is also envisaged. 

1.1 Astronomical Data Mining 

The term “data mining” refers to the use of a broad set of techniques and algorithms for 

extracting useful patterns and models from very large data sets. Typically, the goal is 

to discover either something hitherto unknown about a phenomenon that only becomes 

apparent when it is studied en masse, or else a new phenomenon that only becomes 

apparent when observations are gathered in large enough quantities over a sufficiently 

wide range. 

Traditionally, in astronomy, much effort was invested in gathering observations of 

one particular object, such as a star, in an attempt to understand that object in detail. 

Given the universality of physics, the insights gained are usually applicable to other 

objects of the same type, allowing a wider understanding to be achieved. 



However, advances in technology, such as large-area mosaic CCDs and multi-object 

fibre-fed spectrographs, mean that modern telescopes can be made to gather observations 

of thousands of objects in a single night. This opens up the possibility of 

discovering new facts about particular objects by studying their properties in large 

numbers, and also the possibility of discovering completely new objects. 

Unfortunately, this abundance of data brings with it a set of new problems. Managing 

all of the information requires knowledge of data formats, storage mechanisms, and 

techniques for indexing, searching, and analysing it all. Indeed, modern astronomy is 

fast becoming a cross-disciplinary endeavour, providing a rich area for exploring many 

aspects of computer science and statistics in the context of real-world applications. 

Data Types 

The nature of astronomical data means that it is inherently heterogeneous in both 

format and content, with observations now being gathered over all regions of the electromagnetic 

spectrum. Broadly speaking, astronomical data can be classified into five 

domains. 

• Imaging data are the fundamental component of astronomical observations, capturing 

a two-dimensional picture of the universe within a narrow wavelength 

region at a particular point in time. 

• Catalogues of objects are constructed by analysing imaging data, and recording 

many different parameters about each object such as brightness and colour, 

morphological information, and coordinates. 

• Spectroscopy provides detailed physical quantification of objects including temperature, 

chemical composition, and kinematical information. 

• Studies of objects in the time-domain provide valuable insight into the nature 

of the universe by identifying moving objects, variable sources (e.g., pulsating

1.1 Astronomical Data Mining 5 

stars), or transient objects such as supernovae and gamma-ray bursts. 

• Finally, theoretical simulations of astronomical objects are an important source 

of data. Comparing theoretical models with observational data is the central 

mechanism in understanding how these objects formed and have evolved. 

Each of these data domains carries its own particular problems to be solved in a 

data management and mining context. Imaging data and catalogue construction require 

robust, automatic techniques to identify sources distinct from background-level noise, 

then differentiate between different types of objects (e.g., stars, galaxies, and comets), 

and finally the indexing of these data to allow fast searching based on spatial criteria. 

Spectroscopy and time-domain data require more involved algorithms for the automated 

reduction and calibration of observations – algorithms which often have to 

be tailored for a specific instrument and telescope setup. The automatic analysis of 

spectroscopic data typically seeks to classify an object onto a predefined categorical 

system by somehow comparing the object with the set of standards which define the 

system. The physics of an object which are manifest in its spectrum are determined 

by computing accurate theoretical models and comparing them with the observations. 

Any results then need to be stored and indexed with the observations in a manner that 

allows for further re-analysis as more improved observations and theoretical models 

become available. 

Numerical simulations to generate theoretical models are always in need of powerful 

and plentiful computational resources to allow more detail and precision to be attained. 

As models will always have a shorter shelf-life than observations, appropriate meta-data 

needs to be recorded and stored with the models so a historical record can be kept as 

the underlying physics improves. This meta-data is also needed to help automate 

the parameterisation of observations by providing a means to explore grids of models, 

and ascertain when new models need to be generated to cover a required part of the 

parameter space. 



1.2 Large Data Sets And Their Sources 

Three main sources contribute to large observational data sets in astronomy, namely, 

those generated by specific surveys, general-purpose observatories, and space missions. 

In recent years, Virtual Observatory projects are investigating ways to combine the 

various databases generated by these sources, mapping out the computational infrastructures 

and tools needed to explore large data volumes. 

Specific Surveys 

Digital sky surveys generate very large quantities of homogeneous data over multiple 

wavelengths. As such, they are the main drivers behind the study of data mining 

methods in astronomy. 

The Digitized Palomar Observatory Sky Survey 1 (DPOSS; Djorgovski et al., 

1998) is a digital survey of the entire Northern sky in three visible-light bands, based 

on the photographic sky atlas, POSS-II, the second Palomar Observatory Sky Survey 

(Reid et al., 1991). A set of three photographic plates (one in each filter), each covering 

36 square degrees, were taken at each of 894 pointings spaced by 5 degrees, covering the 

Northern sky. The plates were then digitised at the Space Telescope Science Institute 

(STScI), producing about 1 gigabyte per plate, and about 3 terabytes of data in total. 

Specially developed data mining software called SKICAT (Weir et al., 1995) was 

used to perform object classification and measure around 40 parameters for each object, 

storing this information in a database which will eventually be released to the 

community as the Palomar-Norris Sky Catalog. 

The Two Micron All-Sky Survey 2 (2MASS; Skrutskie et al., 2006) is a nearinfrared 

(J, H, and K S ) all-sky survey. The project is a collaboration between the 

1 http://dposs.caltech.edu/ 

2 http://www.ipac.caltech.edu/2mass/

1.2 Large Data Sets And Their Sources 7 

University of Massachusetts which constructed the observatory facilities and operated 

the survey, and the Infrared Processing and Analysis Center at Caltech which is responsible 

for all data processing and archive issues. The survey began in the spring 

of 1997, completing survey-quality operations in 2000, with the final catalogue being 

released in March, 2003. 

The survey includes over 12 terabytes of imaging data, with the final catalogue 

containing over one million resolved galaxies, and more than three hundred million 

stars and other unresolved sources to a limiting magnitude of K S < 14.3. 2MASS is 

currently producing the following data products for the entire sky: 

• A digital atlas of the sky comprising approximately 4 million 8´×16´ images, 

having about 4´´ spatial resolution in each of the three wavelength bands, 

• A point source catalog containing accurate positions and fluxes for ∼ 300 million 

stars and other unresolved objects, 

• An extended source catalog containing positions and total magnitudes for more 

than one million galaxies and other nebulae. 

The 2dF Galaxy Redshift Survey 3 (2dFGRS; Colless et al., 2001) is a major 

spectroscopic survey taking full advantage of the unique capabilities of the 2dF facility 

built by the Anglo-Australian Observatory 4 . The 2dFGRS obtained spectra for 245,591 

objects, mainly galaxies, brighter than a nominal extinction-corrected magnitude limit 

of b J = 19.45. Reliable redshifts were obtained for 221,414 galaxies. The galaxies cover 

an area of approximately 1,500 square degrees selected from the extended APM Galaxy 

Survey of the South Galactic cap. 

The final release dataset comprises the following elements: 

• source catalogues for the full survey, containing data for 382,323 objects, together 

3 http://www.mso.anu.edu.au/2dFGRS/ 

4 http://www.aao.gov.au/2df/ 



with related material, 

• spectroscopic catalogues for 245,591 objects, containing the spectroscopic parameters 

such as redshifts and spectral types. 

The Sloan Digital Sky Survey 5 (SDSS; York et al., 2000) is a project to survey 

a 10,000 square degree area (1/4 of the entire sky) of the North Galactic hemisphere 

over a 5 year period. The estimated 100 million catalogued sources from this survey 

will then be used as the foundation for the largest ever spectroscopic survey of galaxies, 

quasars and stars. 

A dedicated 2.5m telescope is specially designed to take wide field (3x3 degree) 

images using a 5×6 mosaic of 2048×2048 CCD’s, in five wavelength bands, operating 

in scanning mode. Spectroscopic targets are then observed using two spectrographs 

each with 320 fibres feeding in light from the focal plane. A total of four 2048×2048 

CCDs (one for each channel of each spectrograph) collect the spectra. 

The total raw data will exceed 40 terabytes, and a processed subset of about 1 

terabyte in size will consist of 1 million spectra, positions, and image parameters for 

over 100 million objects, plus a mini-image centered on each object in every colour. 

The data will be made available to the public at specific milestone releases, and upon 

completion of the survey. 

General-Purpose Observatories 

Traditional ground-based observatories have been saving data, primarily as backups 

for the users, for a significant time, accumulating large quantities of valuable, but 

heterogeneous, data. Unfortunately, lack of funding, and this inherent heterogeneity, 

makes it difficult to archive the data in such a way as to make it available and easy 

to access for the wider astronomical community. However, some notable exceptions do 

5 http://www.sdss.org/


exist. 

The National Optical Astronomy Observatory 6 (NOAO) is a US organisation 

that manages ground-based national astronomical observatories including the Kitt Peak 

National Observatory, Cerro Tololo Inter-American Observatory, and the National Solar 

Observatory. 

The NOAO has been archiving all data from their telescopes in a program called 

“Save-the-Bits” which, prior to the introduction of survey-grade instrumentation, generated 

around half a terabyte and over 250,000 images a year. With the introduction of 

survey instruments and related programs, the rate of data accumulation has increased, 

and NOAO now manages over 10 terabytes of data. 

The European Southern Observatory 7 (ESO) operates a number of telescopes 

(including the four 8m class VLT) telescopes at two observatories in the southern 

hemisphere: the La Silla Observatory, and the Paranal observatory. As with many 

other ground-based observatories, ESO has been archiving data for some time, with 

storage rates approaching a steady rate of approximately 20 terabytes of data per year 

from all of their telescopes. This number will eventually increase to several hundred 

terabytes with the completion of the rest of the planned facilities, including the VST, a 

dedicated survey telescope similar in nature to the telescope built for the SDSS project. 

Space Missions 

Although ground-based observatories are aided by the advancement of technology and 

continue to make important discoveries, they will always be encumbered by the restrictions 

imposed by the Earth’s atmosphere. Thus, space missions, although extremely 

expensive, are critical components in the study of the universe, and all of the data they 

produce are very valuable and therefore archived. 

6 http://www.noao.edu/ 

7 http://www.eso.org/ 



The Multimission Archive at the Space Telescope Science Institue 8 (MAST) 

archives a variety of astronomical data gathered from space missions, with the primary 

emphasis on the optical, ultraviolet, and near-infrared parts of the spectrum. MAST 

provides a cross correlation tool allowing users to search all archived data for all observations 

which contain sources from either archived or user-supplied catalogue data. In 

addition, MAST provides individual mission query capabilities. 

The dominant holding for MAST is the data archive from the Hubble Space Telescope, 

but with total holdings currently exceeding ten terabytes, including (or providing 

links to) archival data for the following missions or projects: Hubble Data Archive, 

Galaxy Explorer, Far Ultraviolet Explorer, International Ultraviolet Explorer Final 

Archive, Extreme Ultraviolet Explorer, Hopkins Ultraviolet Telescope Archive, Ultraviolet 

Imaging Telescope Archive, Wisconsin Ultraviolet Photopolarimeter Experiment 

Archive, Copernicus UV Satellite Archive, Berkeley Extreme and Far-UV Spectrometer, 

The Interstellar Medium Absorption Profile Spectrograph, Digitized Sky Survey, 

The Röntgen SATellite Archive. 

Virtual Observatories 

The Virtual Observatory (VO) concept represents a scientific and technological framework 

aimed at trying to manage the ongoing exponential growth in the volume, quality, 

and complexity of astronomical data gathered by all of the sources discussed previously. 

Two main challenges are faced: 

1. The effective inter-linking of large, geographically distributed data sets and digital 

sky archives in a homogeneous manner thereby allowing the optimal use of data 

mining algorithms to extract new science. 

2. The research and development of data mining and “knowledge discovery in 

databases” (KDD) algorithms and techniques for the exploration and scientific 

8 http://archive.stsci.edu/mast.html


investigation of large digital sky surveys, including combined multi-wavelength 

data sets. 

These problems have significant relevance beyond the field of astronomy as many 

aspects of society are struggling with information overload. 

The National Virtual Observatory 9 (NVO) is a project funded by the US National 

Science Foundation to research and explore the technologies necessary to create 

a VO. The central themes of this research are the formation and adoption of standards 

to make the sharing of astronomical data easier. An NVO standard that has been 

adopted worldwide in this regard is “VOTable”, a way to represent a table of data in 

XML with good meta-data about the semantic meaning of the data. Grid computing 

is seen as an important resource for the large-scale analysis of astronomical data. The 

NVO have also produced research prototypes demonstrating that interesting and efficient 

research can be done by building upon on just a few new protocols and standards 

for data exchange and access. 

The AstroGrid 10 project is a UK government funded, open source project designed 

to create a working VO for UK and international astronomers. The goals of the Astrogrid 

project are: 

• A working datagrid for key UK databases 

• High throughput data mining facilities for interrogating those databases 

• A uniform archive query and data-mining software interface 

• The ability to browse simultaneously multiple datasets 

• A set of tools for integrated on-line analysis of extracted data 

• A set of tools for on-line database analysis and exploration 

9 http://www.us-vo.org/ 

10 http://www.astrogrid.org/ 



• A facility for users to upload code to run their own algorithms on the data mining 

machines, 

• An exploration of techniques for open-ended resource discovery 

Many of these goals are common to other nations and other disciplines, and the 

AstroGrid project is working closely with other VO projects worldwide through the 

International Virtual Observatory Alliance (IVOA) – jointly formed with the NVO, 

and other world-wide VO efforts – to deliver these goals. 

1.3 Astronomical Spectra 

It is clear that much work lies ahead if astronomers are to keep up with the ever 

increasing amounts of data their telescopes are able to gather. As such, the project 

presented in this thesis focusses on one particular aspect of the data mining problem: 

methods to analyse digitised astronomical spectra in an automated fashion. 

The central idea of data mining is to be able to turn large quantities of unknown 

information into meaningful interpretations, and this is very much a non-trivial task in 

the context of astronomical spectra. Before large-scale statistics can be done to search 

for patterns, the spectra of an interesting type of object need to be selected from a 

set of unknown data. Then, the major analytical tasks are usually the classification 

and physical parameterisation of the spectra, after which pattern searching can be 

performed. 

The problems of searching, classification, and physical parameterisation all involve 

some kind of pattern matching in and of themselves. Searching, which is basically a very 

coarse initial classification, matches unknown spectra to a set of known examples of a 

search target, retaining only those spectra which are within some acceptable distance 

from the set of examples. Classification assigns a fine-grained category to an object 

based on how well it matches the spectral standards of the classification system used.

1.3 Astronomical Spectra 13 

Physical parameterisation matches observations to grids of theoretical models in an 

attempt to find the best fit and, consequently, estimates for the main physical quantities 

of interest 

1.3.1 Types Of Objects And Their Spectra 

All objects in the night sky can be studied by spectroscopic analysis. Each object has 

a set of distinct features which can be found in its spectrum, reflecting the specific 

physical processes at work in or around the object. This section gives some examples 

of these objects and the spectra they produce. 

In Figure 1.1, the top plot shows the spectrum of a hot star. The overall shape 

of a stellar spectrum approximates the curve of a black body at the same effective 

temperature. This temperature can be estimated from the peak wavelength (Wien’s 

displacement law) or from the area under the spectrum (using the Stefan-Boltzmann 

law). The absorption lines in the spectrum reflect the various chemicals present in the 

star’s atmosphere, and tell of the specific physical conditions in that region of the star. 

The bottom plot in Figure 1.1 is that of a galaxy spectrum. The overall spectrum 

of a galaxy is simply the combined spectrum of all the stars and other radiating matter 

in the galaxy. As galaxies differ in structure and relative composition of stellar type 

and gas, their spectra will also differ. 

Unlike stars, galaxies are not point sources, so their spectra must be obtained differently. 

As a galaxy can often be resolved as an extended object, it is possible to take a 

spectrum of different parts of the galaxy, providing information about its composition, 

the stellar birth rates, and rotational velocity for that particular region. 

Quasars exhibit very bright emission features relative to a low intensity continuum 

in their spectra, as can be seen in the top plot of Figure 1.2. In fact, it was only through 

careful analysis of the spectra of quasars that astronomers realised they were not just 



Figure 1.1: A stellar spectrum (top), and a galaxy spectrum (bottom). (Taken from 

the SDSS)


Figure 1.2: Example of a quasar (top) and carbon star (bottom) spectrum. (Taken 

from the SDSS) 



Figure 1.3: The emission spectrum of the Orion nebula (M42). 

faint stars. The emission lines in quasar spectra are not where they are expected to be 

seen if the object was a nearby star. The standard explanation is that the quasar is 

at a vast distance and so appears to be receding from us due to the expansion of the 

Universe. This high recession velocity relative to the Earth causes the spectral lines to 

be redshifted to longer wavelengths. 

Exotic stars, such as Wolf-Rayet stars or the carbon star in the bottom plot of 

Figure 1.2, are identified by the features present in their spectra. Carbon stars can 

have similar temperatures to G, K, and M-class stars (4,600 - 3,100 K) but have a 

much higher abundance of carbon than normal stars which appears in the spectrum 

as very strong molecular bands (C 2 ). As these stars have such low temperatures, they 

appear red in colour, but the carbon molecules absorb light at blue wavelengths which 

makes the star appear even redder. Carbon stars are assigned a type C spectral class. 

Emission nebulae are clouds of high temperature gas. The atoms in the cloud are 

ionised by ultraviolet light from a nearby star and emit radiation as the electrons fall


back into atomic orbitals, so their spectra show strong emission lines, as can be seen 

in Figure 1.3. 

These nebulae usually appear to be red because the predominant emission line of 

hydrogen in the optical (Hα) happens to be red. Although other colours are produced 

by other atoms, hydrogen is by far the most abundant. Emission nebulae are usually 

the sites of recent and ongoing star formation. 

1.3.2 Automatic Methods of Analysis 

Despite the diversity in features present in the spectra of astronomical objects, their 

general character always remains the same, namely, flux intensities measured across 

some wavelength range. This permits an automated method of analysis developed for 

one type of object to be applied, in principle, to the spectra of another. 

Over the years, a small number of automatic pattern matching techniques have found 

wide-spread use in the field. One of the first, and simplest, is the cross-correlation function. 

This is a signal processing technique wherein two signals are convolved according 

to the integral 

c(z) = 

∫ ∞ 

−∞ 

T(x)G(z − x)dx. (1.1) 

which convolves two functions, T(x) and G(x), over an infinite range, z = [−∞, ∞], 

yielding the resulting cross-correlation function, c(z). 

Simkin (1974) demonstrated the use of the cross-correlation function for measuring 

the radial velocities of stars and galaxies. Tonry & Davis (1979) then applied the technique 

in a survey to measure galaxy redshifts. Kurtz (1982) used cross-correlation to 

classify low resolution (14 Å) stellar spectra onto the MK classification system (Morgan 

et al., 1978). Cross-correlation remains an important, basic tool that is widely used, 



mainly as a method for calculating radial velocities. 

Related to the cross-correlation function are minimum distance methods (MDM). 

Here, an observation is compared with a set of templates with the intention of finding a 

match which minimises some distance metric. Kurtz (1982), Lasala (1994), and Gulati 

et al. (1994a) used this technique to classify stellar spectra with very positive results. 

The application of minimum distance methods to the parameterisation of stellar spectra 

by fitting observations to grids of theoretical models is discussed in Chapter 3. 

Aritifical neural networks (ANNs) are a statistical pattern matching algorithm which 

have found wide application due to their powerful ability to “learn” highly non-linear 

function mappings by studying examples of such mappings. von Hippel et al. (1994) 

outline the use of ANNs for the classification of stellar spectra. Folkes et al. (1996) 

use ANNs to provide automatic classifications of low S/N galaxy spectra. Gulati et al. 

(1997a) show the use of ANNs in determining reddening estimates from low-dispersion 

ultraviolet spectra of O and B stars. Weaver (2000a) demonstrates an ANN-based 

technique for performing two-dimensional classification of the components of binary 

stars. Qin et al. (2003) use a form of ANN to perform automatic star-galaxy separation 

by spectra with a high success rate. The use of ANNs to provide classifications and 

physical parameterisations of stellar spectra is studied in Chapter 2. 

Principal Components Analysis (PCA) is a multivariate statistical technique which 

facilitiates the discovery of linear correlations between observed variables. Early work 

by Deeming (1964), Kurtz (1982), and Whitney (1983) examines the application of 

PCA to the unsupervised classification of stellar spectra. Since then, PCA has found a 

wide application in spectral analysis such as creating classification systems for galaxy 

spectra (Sodre et al., 1998; Galaz & de Lapparent, 1998; Connolly & Szalay, 1999), 

determination of galactic redshifts (Glazebrook et al., 1998), and investigating the 

polarisation properties of broad absorption line quasars (Lamy & Hutsemékers, 2004). 

The application of PCA to stellar spectra is examined in more detail in Chapter 4.

1.4 Hot Subdwarf Stars 19 

1.4 Hot Subdwarf Stars 

The automatic analysis tool set established in this thesis, although general in nature, 

has been applied to the analysis of a specific type of astronomical object in order to 

demonstrate the effectiveness of the tools, and how they might be used in a real-world 

scenario. 

The early type subluminous dwarfs (Greenstein & Sargent, 1974) are defined as stars 

which populate a region located below the upper main sequence on the Hertzsprung- 

Russell diagram, extending the horizontal branch to higher effective temperatures, they 

are mostly considered to be low-mass (M core ≈ 0.50 − 0.55M ⊙ ), core helium burning 

objects surrounded by a thin envelope of hydrogen. Visibly, they are quite blue objects, 

(B − V ) ≈ −0.3, (U − B) ≈ −1.0, and have been shown to dominate the population of 

faint blue stars in the galaxy (m B ≤ 16) (Green et al., 1986). Regardless of their prior 

evolution, hot subdwarfs are thought to be direct progenitors of white dwarfs, although 

only a small fraction (< 2%) of white dwarfs are formed through this route. 

1.4.1 Spectroscopy 

The hot subdwarfs fall into three broad subgroups based on spectroscopic criteria. 

sdB Strong Stark-broadened hydrogen lines, with weak He I and no Mg II absorption 

lines. 

sdOB/He-sdB Strong HeI absorption with weak or absent hydrogen Balmer lines, 

and HeII. Carbon lines of varying strength. 

sdO Strong He II and weak He I lines, with broad and shallow hydrogen Balmer lines 

superimposed with He II lines. 

Examples from each of these subgroups can be seen in Figure 1.4. 



3 

HeII 

2.5 

PG1220-056 

sdO3VII:He40 

Flux (continuum = 1) + const. 

2 

1.5 

1 

FEIGE 110 

sdO8VII:He6 

PG1532+523 

sdB1VII:He4 

PG1544+488 

sdBC1VII:He39 

0.5 

HeI 

H CII H CII CIII 

H 

0 

4000 4200 4400 4600 4800 5000 5200 

Wavelength (Angstroms) 

Figure 1.4: Examples from each hot subdwarf spectrographic subgroup. Classifications 

listed are those from Drilling et al. (2006). 

Analyses of sdB spectra (e.g., Edelmann et al., 2003) show them to have effective 

temperatures in the range 20,000 ≤ T eff /K ≤ 40,000, surface gravities in the range 

5.0 ≤ log g(cgs) ≤ 6.0, and extremely helium-deficient atmospheres n He /n H 0.01. 

sdB stars are thought to be low-mass (M core ≈ 0.50 −0.55M ⊙ , Caloi 1976), core helium 

burning objects, with a very thin hydrogen envelope (M env 0.02M ⊙ , Heber 1986). 

The helium deficiency of sdB stars is believed to be caused by gravitational settling, 

i.e., the settling of heavier elements sure to gravity (Wesemael et al., 1982). However, 

Heber (1991) found that some sdB stars show metals like carbon and silicon to be 

over-abundant in their atmospheres, believed to be due to radiative levitation being 

large for those elements. 

Analyses of sdO spectra performed by Dreizler et al. (1990) and Thejll et al. (1994) 

find that they have effective temperatures in the range 40,000 ≤ T eff /K ≤ 80,000, with 

the majority lying between 40,000 − 50,000K. Surface gravities lie in the range 4.0 ≤


Bright 

a) 

L 

The Hertzsprung Russell Diagram 

High mass 

Main Sequence 

Horizontal 

Branch 

Large 

Red Giants 

b) c) 

Asymptotic Giant Branch 

Helium Burning 

L 

Giant Branch 

L 

expansion slowed, envelope 

removed by companion 

subdwarf B stars 

White Dwarfs 

Sun 

Hydrogen Burning 

binary star 

Faint 

Small 

Blue/Hot 

T 

Low mass 

Red/Cool 

Normal 

Stellar Evolution 

T 

T 

Figure 1.5: Schematic temperature-luminosity diagrams showing: a) the positions of 

stars belonging to the main stellar groups; b) the normal sequence of stellar evolution 

experienced by a star of a few solar masses; c) possible evolution of an sdB star in a 

binary system. (Diagram courtesy of C.S. Jeffery). 

log g(cgs) ≤ 6.5, and the atmospheres of most sdO stars are helium-rich, n He 0.50, 

with additional enrichment of carbon and nitrogen. 

Drilling (1996) and Jeffery et al. (1997) represent the first attempts to introduce a 

homogeneous classification system for hot subdwarfs. This past work has been extended 

and further refined by Drilling et al. (2006) to produce a three-dimensional classification 

system based on a spectral type, luminosity class, and a helium class. The standard 

stars of this system are used in Chapter 2 as the basis for training an artificial neural 

network to automatically classify hot subdwarf spectra. 

1.4.2 Stellar Evolution 

One of the most useful tools in stellar astronomy is the Hertzsprung-Russell (HR) diagram 

which plots absolute magnitude against spectral type. The relationship between 

these two parameters shows several important patterns, with the most significant being 

that the majority of stars lie within a band stretching from the region of bright, hot 

stars to the region of dim, cool stars. This band is called the main sequence of the HR 

diagram. The giant stars are seen as a large cluster occuring above the cooler end of 

the main sequence, and the white dwarfs populate a sequence of dim, hot stars running 

almost parallel to the main sequence. Evidently, the HR diagram serves as a kind of 



atlas for the different types of stars, and stellar evolution is usually described in terms 

of how the underlying physics changes a star’s position on the HR diagram over time. 

The HR diagram can also be plotted as the relationship between colour and absolute 

magnitude, the version frequently used by observers. Theorists prefer to plot luminosity 

(or surface gravity) against effective temperature, as shown in the schematic diagram 

of Figure 1.5a. The log g-T eff version of the HR diagram will be used later in this thesis 

(Chapters 5 and 6). 

Canonical stellar evolution theory (see Figure 1.5b) predicts that a low-mass, core 

hydrogen burning main sequence star will eventually exhaust all the hydrogen in its 

core, converting it, through nuclear fusion, into helium. 

At the point when core hydrogen fusion ceases, the core is not hot enough to begin 

helium fusion and starts to collapse because no energy is being generated to counteract 

the effect of gravity. The collapsing core heats up, with some of this heat being transferred 

into the hydrogen envelope surrounding the core. Eventually, this envelope can 

become hot enough to fuse hydrogen in a thin shell at the core boundary. 

The continued core collapse and hydrogen shell burning causes temperature and 

pressure in the shell to increase. The increasing shell temperature supplies sufficient 

pressure to the outer layers of the star, causing them to expand and cool. The star leaves 

the main sequence, and evolves to lower temperatures at nearly constant luminosity, 

eventually reaching the red giant branch. Mass can be lost from the outer layers due 

to stellar winds. 

The core collapse continues until the helium ceases to behave like an ideal gas, and 

becomes electron degenerate. Essentially, this means that the gas doesn’t expand very 

much as its temperature increases. The hydrogen burning shell adds helium to the 

core which continues to increase in temperature. The core finally becomes hot enough 

to fuse helium and commences this reaction in an explosive manner called the helium 

flash.


The degeneracy of the core is removed, and it expands and cools as helium burning 

continues. The temperature in the hydrogen envelope also cools. The star contracts 

again as a new state of equilibrium is reached, and settles on the horizontal branch. 

A star on the horizontal branch has two energy sources: a helium burning core, and 

a hydrogen burning shell. The star evolves at nearly constant luminosity, with the core 

converting helium into mostly carbon and oxygen. When the helium is exhausted, the 

core again begins to contract under gravity. Now, there is a hydrogen burning shell 

and a helium burning shell which cause the star to expand, evolving with increasing 

luminosity to the asymptotic giant branch. 

This stage of the star’s life is characterised by high mass loss due to stellar winds. 

The process of helium fusion is very sensitive to temperature, so the helium burning 

shell goes through a series of thermal pulses alternating with periods of quiescence. 

This is thought to enhance the efficiency of the stellar winds until the entire outer 

envelope of the star is lost. When the mass of the envelope is almost entirely depleted, 

the star begins to evolve across the HR diagram at constant luminosity. 

A significant fraction of material has been ejected from the outer regions of the star, 

and the expelled gas is ionised by the star (temperatures of such stars often exceed 

50,000K). The planetary nebula disperses into interstellar space. 

The hydrogen and helium burning layers eventually extinguish, and the star becomes 

a white dwarf with a degenerate carbon–oxygen core. The core cools quickly and 

luminosity decreases, but it takes a long time for the thermal energy in the core to be 

radiated away completely. 

sdB Evolution 

Extended horizontal branch stars tend to differ from true horizontal branch stars in 

terms of the luminosity of the hydrogen burning shell. As noted above, the mass of 

this envelope is very small (M env 0.02M ⊙ ) for a subdwarf B star, meaning that its 



luminosity is negligible. For a normal horizontal branch star, the luminosity of its 

hydrogen envelope equals or even exceeds that of the helium core. 

How the hot subdwarfs come to arrive on the extended horizontal branch is still 

under debate. A number of scenarios have been proposed to explain the evolution of 

sdB stars. 

In the single star scenario, enhanced mass loss on the red giant branch due to stellar 

winds may remove all of the hydrogen-rich envelope before core helium burning begins 

(D’Cruz et al., 1996). 

In the binary scenario (see Figure 1.5c), Mengel et al. (1976) suggest sdB’s could 

be formed from relatively wide binaries. Mass transfer through stable Roche Lobe 

overflow results in a depletion of the hydrogen-rich envelope prior to the helium core 

flash. If the sdB progenitor and its compact companion are in a close binary system, a 

common-envelope phase can result in the creation of a helium star. More recent work 

(Maxted et al., 2001) suggests ∼2/3 of sdBs are in close binary systems. 

sdO Evolution 

The atmospheric parameters of sdO stars show them to be less homogenous than the 

sdBs. Generally, they appear to fall into two subgroups on the log g–T eff plane. One 

group (“compact” sdOs) lies close to the theoretical post-extended horizontal branch 

evolutionary tracks, and therefore might have evolved from sdB stars. The other group 

have lower surface gravities (“luminous” sdOs), lying closer to the post-asymptotic 

giant branch tracks. These stars are found in the same region on the log g–T eff plane 

as the central stars of planetary nebulae. 

Various evolutionary scenarios have been proposed to explain the origin of sdO stars, 

and it is unlikely that a single scenario can come to explain both subgroups. 

Several theories exist for “compact” sdOs. The Post EHB scenario attempts to


explain the large number of sdOs found at the extreme end of the horizontal branch, 

along the helium burning main sequence, which suggests a close connection to sdB stars 

(Caloi, 1989; Dorman et al., 1993). But how does an sdB star become an sdO? It has 

been suggested that the hydrogen-rich envelope of an sdB can re-ignite during the postextended 

horizontal branch phase, causing the star to evolve towards the asymptotic 

giant branch. However, the luminosity of the star is not sufficient to let it ascend the 

asymptotic giant branch, so the star returns to the sdO region. Dreizler et al. (1990) 

propose an alternate theory wherein deep mixing of the star’s atmosphere by helium 

shell flashes could explain the helium enrichment seen in sdO stars. 

Other explanations for compact sdOs include the delayed helium flash scenario proposed 

by Sweigart (1997) which suggests that if mass loss during the red giant branch 

is too high, then the helium core never reaches ignition mass, and the star ends up as a 

helium white dwarf without going through a horizontal branch phase. Alternatively, if 

the ignition of helium is delayed but can still occur on the white dwarf cooling sequence, 

it will take the star into the region of the sdO stars. 

A third evolutionary scenario comes from binary white dwarf mergers as studied by 

Iben (1990). It was found that the evolution of close binary systems, leading to the 

merger of He+He and CO+He white dwarfs, could produce low-mass helium burning 

stars similar to sdOs. Strong support for this scenario comes from Napiwotzki et al. 

(2004) who found that almost all of the sdO stars in their sample were apparently 

single. 

For the “luminous” sdO stars, Heber & Hunger (1987) suggest that they are “born 

again post-asymptotic giant branch” stars. In this scenario (Iben et al., 1983), a postasymptotic 

giant branch star undergoes a late helium shell flash, sending it to the 

asymptotic giant branch for a second time. During this phase, the outer hydrogen 

envelope can be completely removed by stellar winds, leaving the star with the appearance 

of a luminous sdO star. Husfeld et al. (1989) suggest that a small number of sdO 

stars are also formed from normal post-asymptotic giant branch evolution. 



1.4.3 Why Study Them? 

The study of hot subdwarfs is important in several respects. As they exist in large 

numbers, and have been shown to be highly evolved stars, they are useful indicators for 

studying the structure and evolution of the galaxy. Brown et al. (1997) suggest that 

these stars are the main cause of the ultraviolet upturn phenomenon (UV excess) seen 

in elliptical galaxies and the bulges of other spiral galaxies because they spend a long 

time (10 8 yrs) on the extended horizontal branch at high temperatures. They are also 

considered to be useful age indicators for elliptical galaxies (Brown et al., 2000). 

As described previously, the hot subdwarfs are interesting in their own right because 

their evolution cannot seem to be explained by canonical stellar evolution theories. This 

makes them important objects from an astrophysical point of view. 

1.4.4 Why Search For Them In The SDSS? 

The Sloan Digital Sky Survey project and the data it produces is a prime example of 

where the future of astronomy is heading. 

The main observational goal of the survey is to collect photometric and spectroscopic 

data on galaxies and quasars. However, many quasars appear as very blue objects, so 

the SDSS will observe spectra for a lot of blue stars, such as white dwarfs and hot 

subdwarfs, because these objects cannot be differentiated at the photometric level. 

This makes the SDSS an unbiased, magnitude-limited survey containing potentially 

hundreds of moderate resolution (∼ 3.0Å), fully reduced hot subdwarf spectra which 

can be used to statistically identify new subgroups within an extracted sample. The 

large, homogeneous, publicly accessible data archives are therefore an excellent test site 

for the tool set developed in this thesis.

1.5 Summary 27 

1.5 Summary 

The continual advancement of observational and information technology is driving astronomy 

forward as a data-rich discipline. A clear need has been identified for robust 

automatic methods to help analyse large databases of astronomical data, and extract 

from them useful science. 

This thesis focuses on automatic tools to search for and analyse astronomical spectra 

in large databases. Artificial neural networks (Chapter 2), χ 2 minimisation (Chapter 

3), and principal components analysis (Chapter 4) are the methods used to construct 

a generalisable tool kit for performing this task. 

The tools will be demonstrated by applying them to the problem of searching for 

and analysing the spectra of hot subdwarf stars from the archives of the Sloan Digital 

Sky Survey (Chapter 5). They will also be used to analyse other smaller data sets 

(Chapter 6). 

As the amount of data gathered by astronomers increases, much work is needed to 

improve the ways in which it can be analysed, and solve the problems that lie ahead. 

Some of the issues encountered during this project are discussed, finally, in Chapter 7. 


Chapter 2 

Classification - Artificial Neural 

Networks 

Artificial neural networks (ANNs) are a statistical machine learning algorithm best 

thought of as arbitrary function estimators. They are able to provide a non-linear 

parameterised mapping between some input vector, x, and an output vector, y. 

For example, in the case of stellar spectral classification, x is the feature vector 

containing the flux values of a spectrum over some wavelength range, and y is the classification 

assigned to x according to some classification standard. The mapping performed 

by the ANN is analogous to the process which leads an expert human classifier 

to assign classification y to spectrum x. This ability to replicate non-linear functions 

makes ANNs a powerful tool in astronomical data mining. 

In the context of machine learning, ANNs are part of a wider class of methods 

to approximate non-linear functions. Some of the mystery that commonly surrounds 

their use can be dispelled by relating several important issues to the simpler process of 

polynomial curve fitting. Here, the problem is to fit a polynomial to a set of M points 

by minimising some error function. The n th -order polynomial is given by 

29


y(x) = w 0 + w 1 x + · · · + w n x n = 

n∑ 

w i x i . (2.1) 

i=0 

If this is considered as a non-linear mapping which takes x as input and produces y 

as output, then the exact form of the function y(x) is determined by the values of the 

parameters w 0 ,... w n , which are analogous to the weights in a neural network. 

The weights can be determined by minimising an error function which compares 

the desired output from the polynomial, d(x k ), for each input value, x k , and the polynomial’s 

actual output, y(x k ), for instance, the commonly used sum-of-squares error 

function, 

E = 1 ∑ 

(y(x k ) − d(x k )) 2 . (2.2) 

2 

k 

The minimisation of an error function such as Equation 2.2, which involves target 

values for the polynomial outputs, is called supervised learning since for each input value 

the desired output is specified. This is also a common way to determine the weights of 

a neural network for a particular application (the back-propagation algorithm adjusts 

the weights by calculating the derivatives of the error function with respect to the 

weights). A second form of learning, called unsupervised learning, does not involve the 

use of target data. In the context of neural networks, this form of learning can be used 

to discover clusters or other patterns in a data set. 

If the polynomial of Equation 2.1 is being trained to model a particular inputoutput 

mapping via supervised training, then the goal is to have a model which gives 

good predictions for new data, in other words one which exhibits good generalisation 

properties. One of the factors which influences a model’s ability to generalise is the 

number of free parameters it has (i.e., the number of degrees of freedom). If a firstorder 

polynomial is chosen to model a non-linear mapping, then it will generalise poorly 

because a linear function is not flexibile enough to match the underlying mapping

31 

function very well. In other words, the model has a high bias, meaning that the 

complexity of the polynomial is not sufficient to model the actual mapping function. 

The bias can be reduced by increasing the number of degrees of freedom, i.e., increasing 

the order of the polynomial. This gives it greater flexibility to model the non-linear 

mapping. However, if the order is increased too much, the polynomial’s approximation 

to the underlying function will actually get worse - the mapping may give an exact 

fit to the training data, but its ability to generalise is hampered by highly oscillatory 

behaviour between training points. Such a model is said to over-fit the training data, 

and has a high variance meaning that the model is sensitive to the training data (i.e., 

quantity, noise, distribution, etc.). 

The point of best generalisation is determined by a trade-off between the model’s 

bias and variance, and occurs when the number of degrees of freedom in the model is 

relatively small compared to the size of the training data set. The quantity of training 

data is a significant factor in achieving good generalisation. As the quantity of training 

data increases, the model’s complexity can be increased, thereby reducing bias, while 

ensuring that the model is more heavily constrained, thereby also reducing variance. 

In the context of neural networks, the complexity of the model is determined by the 

number and structure of the internal weights. The weights are arranged in a network of 

layers, with more layers allowing the ANN to model essentially any non-linear function. 

However, as illustrated in the discussion of polynomial fitting, a neural network with 

too much complexity may succeed in “memorising” the training data by over fitting 

them and therefore yielding poor generalisation properties. A number of techniques 

exist to combat over-fitting and regularise, or smooth, the mapping produced by neural 

networks, such as weight decay which adds a penalty term to the error function that 

weights against large values for the network’s internal weights, early stopping of the 

training process also prevents the network weights from becoming too large, and adding 

noise to the training data set makes it more difficult for the neural network to over-fit. 

A more detailed review of basic neural network theory can be found in Bishop (1995). 



Previous work by others in the field of automatic stellar spectral analysis demonstrates 

that ANNs are well-suited to fast classification and parameterisation of large 

quantities of spectra from across the main sequence. See, for example, Gulati et al. 

(1994b), von Hippel et al. (1994), Storrie-Lombardi et al. (1994), Weaver & Torres- 

Dodgen (1995), Bailer-Jones (1996), Gulati et al. (1996), Bailer-Jones (1997), Gulati 

et al. (1997b), Weaver & Torres-Dodgen (1997), Bailer-Jones et al. (1997), Bailer-Jones 

et al. (1998), Singh et al. (1998), Rhee et al. (1999), Allende Prieto et al. (2000), Weaver 

(2000b), and Snider et al. (2001). 

Here, the feedforward multilayer back-propagation ANN code STATNET of Bailer- 

Jones (1996) is used to obtain classifications and acquire astrophysical parameters from 

a sample of hot subdwarf spectra. The values for the astrophysical parameters are 

compared with those obtained from a different computerised technique, that of χ 2 

minimisation as implemented in the code SFIT (see Chapter 3). 

2.1 Classifying Hot Subdwarfs 

The hot subdwarfs do not fall within the scope of the standard MK system (Morgan 

et al., 1978), therefore Drilling et al. (2006) have extended and refined the earlier work 

of Drilling (1996) and Jeffery et al. (1997) to construct a three-dimensional MK-like 

classification scale for hot subdwarfs. This scale is based upon a sample of spectra 

from a number of sources, covering the wavelength region 4050–4900Å 

at a resolution 

of 2.5Å , and consists of a ‘spectral’ class, ‘luminosity’ class, and a ‘helium’ class. 

The classification scale uses a spectral type running from sdO1 to sdA (1 – 20), 

analogous to MK spectral classes. It introduces a helium class (0 – 40) based on H, HeI 

and HeII line strengths, and uses luminosity classes IV – VIII, where most subdwarfs 

have luminosity class ∼VII. The mapping between the Drilling et al. (2006) classes and 

those used elsewhere, e.g. the PG survey (Green et al., 1986), is illustrated in figure 16 

of Drilling et al. (2006).

2.1 Classifying Hot Subdwarfs 33 

2.1.1 The Training Sample 

A set of subdwarf spectra was taken from a collection compiled by Drilling et al. (2006) 

from data provided by Moehler et al. (1990a,b), Dreizler et al. (1990), and Theissen 

et al. (1993). It comprises a representative sample of 174 PG subdwarfs and blue 

horizontal branch stars, plus a few other stars not included in the PG catalog. Several 

observations have been supplied for many of the targets with the sample containing 

471 spectra in total at an approximate resolution of 2.5 Å. 

The spectra are not homogeneous. Due to the data being gathered by different 

observers using different equipment at different locations, etc., a number of issues affect 

the sample including: calibration anomalies, velocity shifting, different windows of 

wavelength coverage, inconsistent S/Ns and dispersion intervals, and so on. 

A pre-processing step was needed to correct these problems and establish a more 

homogenous sample. The spectra were visually inspected to select the best samples for 

each star. The resulting 359 spectra were corrected for large cosmic spikes and instrumental 

end-effects. A velocity shift correction was applied by cross-correlating each 

spectrum with a grid of theoretical spectra chosen to coarsely cover the approximate 

T eff , log g, log(n He /n H ) range of the Drilling et al. (2006) classification scale. Finally, 

the spectra were rebinned onto a common wavelength grid of 4050 – 4950 Å with a 

dispersion of 1 Å pixel−1 . 

It should be noted that the radial velocity correction described above already partly 

solves the parameterisation problem by choosing the best fitting model from the grid. 

As such, training the neural network to solve for this parameter simultaneously alongside 

the other astrophysical parameters may be a more convenient approach. However, 

this was not attempted here. 



0 

I 

II 

Luminosity Class 

III 

IV 

V 

VI 

VII 

VIII 

IX 

O O5 B B5 A 

Spectral Type 

40 

30 

Helium Class 

20 

10 

0 

O O5 B B5 A 


40 

30 

Helium Class 

20 

10 

0 

IX 

VIII 

VII 

VI 

V 

IV 

III 

II 

I 

0 


Figure 2.1: The training sample shows clustering in certain regions of the classification 

space. For clarity, points have been offset by small random shifts in both coordinates.


2.1.2 Methodology 

As described at the beginning of the chapter, training an ANN to learn the Drilling 

et al. (2006) classification system involves iterating over the training set and minimising 

the sum-of-squares error function between the desired output and the network’s 

actual output (see Equation 2.2) with respect to the ANN’s internal parameters. The 

minimisation process continues until some criterion of convergence has been reached 

(e.g., when the weight updates have become very small). 

A typical strategy to assess network performance after training is to apply the network 

to an application set for which the “true” classifications are known. Unfortunately, 

no other suitable set of spectra previously classified onto the Drilling et al. (2006) scale 

were available for the study presented here. 

An alternative is to split the Drilling sample into two similarly sized sets, with one 

used for training, and the other to quantify performance. However, as the Drilling 

sample is small, and its distribution across the parameter space is limited (see Figure 

2.1), a concern is that there may not be enough data to constrain the model if the 

sample is split into two smaller subsets. 

On the other hand, if the two subset approach is changed slightly, there is a way to 

determine how well a given ANN model performs using only the data in the Drilling 

sample. A technique called N-fold cross-validation, or the leave-one-out method, permits 

the greatest number of samples to be used in training while still giving an idea of 

ANN performance over the whole sample set. 

The method proceeds by assuming a data set of size N. Each datum is left out in 

turn and the ANN is trained on the remaining N −1 samples. The ANN’s performance 

is then assessed by classifying the omitted datum. No random sampling is involved 

with this method, so repeating the procedure for a particular ANN model always gives 

the same result. 



The leave-one-out method carries with it a large computational cost as each ANN 

model must be trained N times (in this case, N = 359). As several models are to 

be tested, the computational burden was alleviated by the construction of a small distributed 

cluster of 15 ordinary desktop workstations at Armagh Observatory using the 

Condor batch system (e.g., Livny & Raman, 1998). The cluster reduced the computation 

time for the leave-one-out procedure by a factor of ∼10 compared to using only a 

single workstation. 

To determine the optimal complexity of the ANN model, two different ANN architectures 

were studied, one with a single hidden layer of 10 nodes, and one with two 

hidden layers of 5 nodes each. The notation used to refer to these architectures are 

901:10:3,and 901:5:5:3, respectively. 

This notation explains the structure of the neural network in terms of layers of 

processing nodes, and the number of nodes in each layer. For each network being 

tested, an input layer of 901 nodes corresponds to the 901 flux points in the preprocessed 

observational sample, and an output layer of three nodes corresponds to 

each parameter in the classification scale: spectral type, luminosity class, and helium 

class. 

For each model architecture, a committee of five ANNs was formed. The committee 

approach (see Bishop, 1995, sect. 9.6) trains a number of ANNs on the same data, and 

applies them in unison on a new datum. The results from each of the ANNs are then 

averaged together to provide a combined result. In STATNET, each network in the 

committee is initialised with different random values for the weights, so the committee 

approach seeks to achieve more robust results by averaging out ‘convergence noise’ due 

to the variance of the model causing the minimisation process to get caught in local 

minima, with the final set of weights therefore being different for each committee ANN. 

The leave-one-out method was carried out for five different training epochs for each 

architecture: 150, 300, 500, 700, and 1000 iterations of the optimisation procedure. 

This required about four days of continuous computation on the Condor cluster. The


approach of stopping the training procedure early is a method of regularising the ANN 

models. 

STATNET also implements a weight decay factor in the neural network’s sum-ofsquares 

error function, but this feature was not used here (or in the parameterisation 

network described in the next section). Weight decay attempts to prevent the ANN 

model from over-fitting the training data by discriminating against network weights 

that become too large during training. Large network weights (which can occur if the 

network is trained for too long) increase the complexity of the mapping because they 

produce regions of high curvature in the input-output parameter space. 

As the classification training set was small, it was felt that weight decay should 

not be used here in order to preserve the structure and curvature of the input-output 

mapping. An alternative is early stopping which regularises the network by limiting 

the effective number of degrees of freedom. This number is supposed to start out small 

and then grow during the minimisation of the sum-of-squares error function, which 

corresponds to a steady increase in the complexity of the model. 

If the network error is measured against a validation sample, as is done here via the 

leave-one-out method, it is typically observed that this error often shows a decrease at 

first, followed by an increase as the network starts to over-fit. The network’s training 

procedure can be terminated close to the point of smallest error since this gives a 

network which is expected to have the best generalisation performance. 

STATNET posesses the capability to add weighting factors to each of the network’s 

outputs so that certain outputs contribute more to the sum-of-squares error minimisation 

than others. These weighting factors (called ‘β’ parameters in STATNET) allow 

the user to control the level of modelling precision for each output variable. If this is 

limited by the noise in the data, 1/ √ β should be approximately equal to the standard 

deviation of the noise in the output variable. 

STATNET includes a data scaling option which separately scales each input and 



output variable to have zero mean and unit standard devition. With respect to the β 

parameters, the variance scaling casts each β in terms of the scaled variables. Therefore, 

1/ √ β roughly interprets as the fractional uncertainty in a particular output variable. 

As an example, the default β value of 6.0 corresponds to a standard deviation of 0.4. 

Thus, if the data are variance scaled and roughly normally distributed, 95% of the data 

will lie in the range −2 to +2, so this standard deviation corresponds to approximately 

a 10% uncertainty. 

In terms of the Drilling et al. (2006) classification parameters, the expected accuracy 

in each parameter for a human classifier is ±2 spectral types, ±1 luminosity class, 

and ±2 helium classes. These correspond to uncertainties of 10%, 12.5%, and 5% 

respectively. Therefore, the STATNET β parameters were set to 6.0 for the spectral 

type output, 4.0 for luminosity class, and 25 for helium class. 

2.1.3 Results 

Tables 2.1 and 2.2 give the σ rms and correlation coefficient values, r, comparing each 

ANN architecture’s results with the classifications assigned by Drilling et al. (2006) as 

determined by the leave-one-out method. 

901:10:3 

150 300 500 700 1000 

SpT 2.1041 2.1967 2.2338 2.2947 2.3434 

σ rms LC 1.1771 1.1835 1.2199 1.2435 1.2627 

HeC 5.5604 4.5434 4.3255 4.3540 4.5109 

SpT 0.8710 0.8621 0.8586 0.8523 0.8473 

r LC 0.8209 0.8201 0.8123 0.8061 0.8012 

HeC 0.9216 0.9483 0.9533 0.9527 0.9491 

Table 2.1: Results of the leave-one-out procedure as applied to a committee of five 

901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations. 

The large σ rms values for helium scale classifications, apparent in both tables, suggest 

the ANNs are having difficulty generalising for this parameter. However, the high


901:5:5:3 

150 300 500 700 1000 

SpT 1.7446 1.8296 1.9593 2.0626 2.2202 

σ rms LC 1.0574 1.0766 1.1078 1.1536 1.2156 

HeC 6.2962 5.2257 4.3019 4.1405 4.2633 

SpT 0.9065 0.8983 0.8858 0.8759 0.8599 

r LC 0.8507 0.8621 0.8389 0.8272 0.8116 

HeC 0.9007 0.9316 0.9528 0.9573 0.9547 

Table 2.2: As Table 2.1, but for the committee of five 901:5:5:3 ANNs. 

correlation coefficients suggest a good learning response. There are several possible 

reasons for this. Firstly, it could be due to a problem with the neural network model 

itself, either a regularisation issue (e.g., not using weight decay), or sub-optimal settings 

of the β parameters. Secondly, it is possible that the neural networks simply cannot do 

any better for this parameter, in which case the attention turns to the Drilling et al. 

(2006) classification scale itself and the observational sample on which this study is 

based. 

If the S/N of the observational sample is not sufficiently high enough for the ANNs 

to generalise well for the helium scale, this could be affecting the bias and variance of 

the models, making it difficult to ascertain the underlying mapping function. It is also 

possible that the helium scale itself is too fine-grained. If the helium scale was scaled 

down by a factor of ∼4, a corresponding four-fold reduction in the σ rms errors would 

be observed (the corresponding correlation coefficients would remain unchanged as this 

statistic is not affected by scaling effects). This would bring them in line with those of 

spectral type and luminosity class, i.e., σ rms ∼ ±1 helium class. 

Further investigation of this issue is required. 

It can be seen in Tables 2.1 and 2.2 that both architectures are able to learn the 

appropriate spectral features associated with spectral type and luminosity class within 

the first 250-300 epochs of the training procecdure. After this point, further training 

only serves to degrade performance with respect to these parameters which indicates 

that the models are starting to over-fit the training data. 



For the helium scale, both architectures yield optimal classifications after a few 

hundred more training epochs. The 901:10:3 architecture achieved best performance 

at around 500 iterations, and the 901:5:5:3 architecture reached its optimum at around 

700 iterations. A similar phenomenon was reported by Snider et al. (2001, sect. 5.1), 

although Willemsen et al. (2005) did not observe the same effect. 

The optimal trade-off in accuracy between the classification parameters occurs at 

around 300 training epochs for the 901:10:3 architecture, and 500 epochs for the 

901:5:5:3 architecture. The results of these two ANNs are compared with the actual 

Drilling et al. (2006) classifications in Figure 2.2. 

2.2 Physical Parameters 

The ability of neural network models to obtain astrophysical parameters of hot subdwarf 

spectra was tested by generating a grid of synthetic spectra to be used as a training 

set, and extracting two application sets from the Drilling et al. (2006) sample. 

The first application set contains 60 stars which were used by Drilling et al. to 

calibrate their classification system against the physical parameters of T eff , log g, and 

log(n He /n H ). These 60 stars have been previously analysed by their original observers, 

with astrophyscial parameters being derived mostly by the method of fine analysis. 

The second application set contains 133 stars from the Drilling et al. sample for 

which no astrophysical parameters have been listed in Drilling et al. (2006). 

Using the first application set, the neural network results for those stars can be 

compared against the results of the fine analyses performed by the original observers. 

However, the second application set has no measure of comparison. For that, the χ 2 

fitting code used at Armagh Observatory, SFIT, is used to derive a set of astrophysical 

parameters based on a grid of synthetic spectra. SFIT is also applied to the first 

application set to serve as second comparison for the neural network results.

2.2 Physical Parameters 41 

Architecture 901:10:3 

Architecture 901:5:5:3 

A 

A 

ANN Spectral Type 

B5 

B 

O5 


B5 

B 

O5 

O 

O 

O O5 B B5 A 

Drilling Spectral Type 

O O5 B B5 A 


0 

0 

I 

I 

ANN Luminosity Class 

II 

III 

IV 

V 

VI 

VII 


II 

III 

IV 

V 

VI 

VII 

VIII 

VIII 

IX 

IX 

IX 

VIII 

VII VI V IV III 

Drilling Luminosity Class 

II 

I 

0 

IX 

VIII 

VII VI V IV III 


II 

I 

0 

40 

40 

ANN Helium Class 

30 

20 

10 


30 

20 

10 

0 

0 

0 10 20 30 40 

Drilling Helium Class 

0 10 20 30 40 


Figure 2.2: Results of the leave-one-out procedure for both ANN architectures at the 

near-optimal training time of 300 iterations for the 901:10:3 architecture (left column), 

and 500 iterations for the 901:5:5:3 architecture (right column). Also plotted is the 

best-fit linear least squares line. 



The neural network training grid contains 2009 synthetic spectra generated using 

the line-blanketed LTE spectral synthesis code SPECTRUM (Jeffery et al., 2001). 

The grid covered the parameter space in T eff : 12000 - 50000K, ∆T eff ∼5000K; log g: 

3.5 - 6.0 dex, ∆log g = 0.5 dex; and log(n He /n H ): -3 - 3 dex, in 10 non-uniformly spaced 

intervals. 

In order to match this training grid to the Drilling et al. (2006) observations, each 

synthetic spectrum was first convolved with a Gaussian to lower its resolution to that 

of the observations (∼ 2.5 Å), and then re-binned onto the same wavelength grid as the 

observations (4050–4950 Å at a 1.0 Å dispersion). 

Design limitations in the χ 2 minimisation code used at Armagh Observatory, SFIT 

(which are dealt with in Chapter 3), required a smaller grid of synthetic spectra to 

be used. The grid covered the parameter space: T eff = {15,20,25,30,35, 40, 50}kK, 

log g = {3,4,5,6}, and log(n He /n H ) = {−3, −1,0,+1,+3}. This grid is commensurate 

with the dispersion and S/N present in the Drilling et al. (2006) sample. 

A default instrumental profile of 1Å 

(FWHM) was assumed during the fitting for 

each application set, and all data points more than 5% above continuum were rejected. 

All three intrinsic parameters, T eff , log g, and log(n He /n H ), were free to vary in the χ 2 

optimisation. 

Solutions for v rad and v sin i were also obtained. The correction for v rad used during 

the pre-processing stage (Section 2.1.1) appeared to have left residual shifts of a few 

km s −1 , and, in one case, (possibly where Balmer lines were confused with Heii lines) 

of a couple of Ångströms. Overall, < v rad >= −1.9 ± 22.3kms −1 , the mean being 

satisfactorily close to the expectation value (0 km s −1 ). The solution for v sini allowed 

SFIT to be tolerant of both the varying instrumental resolution present in the data, and 

any rotational broadening present in the source. Formally, < v sin i >= 59 ± 39kms −1 . 

A single normalisation procedure was applied to remove small trends in the background 

continuum. Nine “continuum” regions free of hydrogen and helium lines were


defined. After an initial optimisation step, the spectrum was divided by the initial fit. 

A second-order polynomial was fitted to this ratio using only the data in the continuum 

regions. An estimate of the true sample S/N was obtained from the RMS of the ratio 

around the polynomial fit in these same regions. The sample was then multiplied by 

the polynomial fit before a second optimisation step was applied. 

2.2.1 Methodology 

A control experiment was carried out to determine if neural network models trained 

on a set of synthetic spectra at infinte S/N are able to accurately parameterise other 

synthetic spectra over a range of S/Ns. 

The training grid of synthetic spectra was randomly divided into two evenly sized 

training and application subsets. Several committees of different ANN architectures 

were trained on the training subset for range of training epochs. The intention here 

was to establish optimal model complexity for the task without using weight decay. 

The STATNET β parameters for each of the network output variables (T eff , log g, 

log(n He /n H )) were set to 6.0, estimating a 10% error in each parameter. This is commensurate 

with the spacing of the grid points over the parameter space, and assumes, 

conservatively, that the neural network model will do at least as well as nearest neighbour 

matching to the synthetic spectra in the grid. Again, the Condor cluster allowed 

the different experiments to be carried out in parallel. 

The application subset was duplicated eight times. Each set was degraded to one of 

the following S/Ns by the addition of Gaussian noise: {∞, 1000, 500, 100, 50, 20, 10, 

5}. Each trained ANN committee was applied in turn to the noised application sets. 

The experiments suggested that the optimal network architecture was a 901:10:10:3 

configuration, trained for 500 epochs for T eff and log g parameterisations, and 1350 

epochs for log(n He /n H ) parameterisations. 

The results showed positive correlations between the actual parameters and the 



ANN’s results. However, the accuracy of the ANNs declined quickly as the S/N of 

the application set fell below 100. This observation is important because the spectra 

in the Drilling et al. (2006) sample are not of a consistent S/N. The majority of the 

sample has an S/N somewhere in the 50 – 100 range, so the neural network model 

should account for this. The results imply that an ANN trained on synthetic spectra 

of infinite S/N will not give the most accurate parameterisations of the observational 

sample. 

This result was also reported by Snider et al. (2001) and Willemsen et al. (2005). 

In the latter, Willemsen et al. reported on their attempts to improve the generalisation 

abilities of their neural network models by increasing the amount of weight decay 

taking place. They found that performance improved only when the weight decay 

term was chosen to be rather large, indicating that the problem lies in regularising the 

model, i.e., a neural network trained on high S/N spectra will over-fit the data unless 

“restrained”. 

The solution chosen in the study presented here was to make two copies of the 

entire grid of 2009 theoretical models. One copy being degraded to a S/N of 100, the 

other to 50. The final training set for the optimal network architecture was then a 

combination of all three grids, totalling 6027 synthetic spectra. This addition of noise 

to the training grid serves as another mechanism of regularisation. Willemsen et al. 

(2005) employed a similar solution. The noise serves to ‘smear out’ each training point, 

making it difficult for the network to fit individual points precisely, and hence reducing 

over-fitting. 

Despite increasing the size of the training set, there is no reason to believe that the 

optimal ANN configuration would be consequently changed. The fundamental structure 

and physical parameters of the noised spectra are no different than the unnoised 

spectra.


2.2.2 Results 

Application Set 1: 60 Calibration Stars 

The results of applying the two ANN models to the 60 calibration stars are given in the 

first column of Table 2.3, and the actual parameters obtained are listed in Appendix 

A. 

The correlation coefficients show a reasonable agreement between the ANN’s predicted 

T eff parameterisations and those of Drilling et al. (2006). However, the log(n He /n H ) 

and log g correlation coefficients are not quite as positive. Looking at the middle and 

last plots in Figure 2.3, it can be seen that the ANN’s results in these parameters 

(indicated by the blue crosses) are visibly more scattered than the T eff results given in 

the first plot. 

The typical errors quoted in the original fine analyses of these stars are σ Teff 

= 

±2500K and σ log g = ±0.2dex. The results in the first column of Table 2.3 are still 

within ∼ 2σ of the fine analysis errors, which is significant (assuming, of course, that 

the method of fine analysis is more accurate than either of the methods used here). 

ANN/Drilling χ 2 /Drilling ANN/χ 2 

T eff 4389.79 4338.85 3740.99 

σ rms log g 0.4577 0.3754 0.4908 

log(n He /n H ) 0.9796 0.4769 0.8382 

T eff 0.9207 0.9447 0.9131 

r log g 0.7844 0.8173 0.7525 

log(n He /n H ) 0.8705 0.9649 0.8816 

Table 2.3: Results of parameterising the 60 calibration stars. 

SFIT was applied to the 60 calibration stars and the results are listed in the second 

column of Table 2.3. The actual parameters obtained are listed in Appendix A. 

SFIT compares well with the neural network in T eff and log g, but gives slightly better 

performance in log(n He /n H ). 

A direct comparison between the neural network and SFIT’s results is given in the 



80 

ANN/χ 2 T eff Parameterisations (kK) 

60 

40 

20 

0 

0 20 40 60 80 

Drilling T eff Calibrations (kK) 

8 

ANN/χ 2 log g Parameterisations 

7 

6 

5 

4 

3 

2 

2 

3 

4 

5 

6 

7 

8 

Drilling log g Calibrations 

ANN/χ 2 log( nHe / nH ) Parameterisations 

4 

2 

0 

-2 

-4 

-6 

-6 

-4 

-2 

0 

2 

4 

Drilling log( nHe / nH ) Calibrations 

Figure 2.3: Parameterisations of the 60 calibration stars. Results from each method 

have been combined onto each plot. ANN results are indicated by blue crosses, and χ 2 

minimiser results by red pluses.


third column of Table 2.3. The disagreement between the neural network models and 

SFIT is of similar degree as the disagreement of each method with the Drilling et al. 

parameters. The σ rms values in column three of the table are still within twice the 

quoted errors for the fine analyses of the 60 calibration stars, which is a significant 

result (again, assuming that fine analysis is the more accurate method) and confirms 

that ANNs have the potential of being able to parameterise hot subdwarf spectra to a 

similar degree of accuracy as the more traditional method of χ 2 minimisation. 

The poor generalisation of the neural network in the log(n He /n H ) parameter is a 

significant issue, and requires further investigation. 

Application Set 2: 133 Unparameterised Stars 

The two ANN committees were applied to the remaining 133 unparameterised stars in 

the sample. These stars were also parameterised using SFIT. The parameters obtained 

from both methods are listed in Appendix A. 

A direct comparison between the two methods was made. The results are presented 

in Table 2.4, and Figure 2.4. For approximately twice as many stars, the σ rms values 

are only slightly worse than the values in the last column of Table 2.3. Tentatively 

speaking, the results could still be considered to support the view that ANNs have 

the potential of being able to parameterise hot subdwarf spectra to a similar degree of 

accuracy as χ 2 minimisers. 

As has been pointed out previously, the neural network models seem to be suffering 

from regularisation issues when training on synthetic spectra. With further investigation 

on this matter, a significant imrpovement in the neural network’s generalisation 

performance could be obtained. 



χ 2 T eff Parameterisations (kK) 

60 

40 

20 

0 

0 20 40 60 

ANN T eff Parameterisations (kK) 

7 

χ 2 log g Parameterisations 

6 

5 

4 

3 

2 

2 

3 4 5 6 

ANN log g Parameterisations 

7 

χ 2 log( nHe / nH ) Parameterisations 

4 

2 

0 

-2 

-4 

-6 

-6 

-4 

-2 

0 

2 

4 

ANN log( nHe / nH ) Parameterisations 

Figure 2.4: Parameterisations of the 133 unparameterised stars using the ANNs and 

χ 2 minimiser. Also shown is the best-fit linear least squares line.


ANN/χ 2 

T eff 5768.74 

σ rms log g 0.6853 

log(n He /n H ) 0.9926 

T eff 0.8850 

r log g 0.8003 

log(n He /n H ) 0.8875 

Table 2.4: A comparison between ANNs and χ 2 minimisation for parameterising the 

133 unparameterised stars. 

2.3 Summary 

Artificial neural networks are a fast, and powerful method for automatically classifying 

astronomical spectra. A feed-forward neural network configured in a 901:5:5:3 architecture, 

and trained for 500 epochs, was able to classify hot subdwarf spectra onto the 

Drilling et al. (2006) scale with global errors (σ rms ) of ∼ 2 sub-types for spectral type, 

∼ 1 sub-class for luminosity class, and ∼ 4 sub-classes for the helium class. This was 

the most accurate ANN discovered for the task. 

The use of ANNs for obtaining physical parameters from stellar spectra offers the 

possibility of having a fast method for deriving initial parameter estimates. However, 

establishing the optimal network architecture to accurately model the flux-space to 

physical parameter-space mapping function was found to be cumbersome with much 

experimentation required. It was also discovered that attempting to train the neural 

network model on infinite S/N synthetic spectra led to over-fitting due to insufficient 

regularisation. A solution was attempted by the addition of noise to the training set, 

but further investigation here is needed. 

χ 2 methods are therefore more desirable for parameterising astronomical spectra in 

a general data mining tool kit as they offer more flexibility and greater ease of use than 

ANNs. Of course, these qualities come with the price of slower speed, with χ 2 methods 

unable to compete with ANNs in this regard. This issue is discussed further in the next 

Chapter. However, if the regularisation issues with parameterising ANNs can be solved, 



their extremely fast application speed would instantly make them the preferred tool.

Chapter 3 

Parameterisation - χ 2 Fitting 

3.1 Analysing Stellar Spectra 

Deriving physical parameters (i.e., T eff , log g, abundances) for a star is done by a fine 

analysis of its spectrum. The traditional method of spectroscopic fine analysis is a long, 

iterative process requiring several months to complete. 

The method is based on measuring equivalent widths of spectroscopic lines. The 

astronomer must go through a spectrum and manually identify as many spectral lines 

as possible, and the ions to which they belong. Microturbulent and rotational velocities 

are first determined. Then, an initial grid of model atmospheres is calculated to cover 

the approximate T eff , log g, and composition of the star. 

Using these models, the theoretical equivalent widths are calculated for each of the 

identified ion lines in the star over the range of elemental abundances in the grid. These 

equivalent widths are combined to form curves of growth which can then be used to 

read off derived abundances for each of the measured ion line equivalent widths in the 

stellar spectrum. 

Temperature and surface gravity are determined by using the derived abundances 

of lines known to be sensitive to temperature (e.g., Fe) and gravity (e.g., H or He), and 

51


performing a process of comparison and line fitting with each of the models in the grid. 

The derived values of T eff and log g are then used, along with the measured equivalent 

widths, to calculate new abundances. A new grid of model atmospheres is computed 

with these parameters. The entire analysis process of determining curves of growth, deriving 

values of T eff , log g, and abundances, and recomputing the model grid is repeated 

until the derived parameters agree with those used in the models (i.e., convergence is 

achieved). 

An excellent description of this process, and demonstration of its application, can 

be found in Dudley (1992). 

Progress Towards Automation 

Given the iterative nature of the method of fine analysis, and the time required to 

conduct an analysis for a single star, attempts have been made to find automated 

procedures for accomplishing the same goal much more quickly. 

Hutchison (1971) presents an automatic procedure for detecting spectral features 

and determining accurate line frequencies, line depths, and equivalent widths for highresolution 

infrared spectra. 

Morossi & Crivellari (1980) describe a method to obtain T eff and log g by comparing 

observations to a grid of models. Their method is based on a least-squares minimisation 

procedure which determines values for the parameters which optimise the fit between 

the theoretical models and observational data. 

Katz et al. (1998) use a χ 2 minimisation procedure to obtain values of T eff , log g, 

and [Fe/H] from ELODIE spectra by fitting observations to a library of 211 reference 

stars observed with the same instrument for which the atmospheric parameters are 

well-known. 

The method of χ 2 fitting has grown to be very much the de facto procedure of

3.1 Analysing Stellar Spectra 53 

automating the parameterisation of astronomical spectra. It is a specific case of the 

more general class of fitting procedures known as metric distance minimisation (or 

minimum distance methods), where, as the name suggests, results are determined by 

minimising some distance metric between the object under analysis and each member 

of a set of templates. The object is assigned the parameters of the template which gives 

the smallest distance. 

Let x = (x 1 ,x 2 ,... ,x N ) be the spectrum to parameterise, and s = (s 1 ,s 2 ,... ,s N ) 

be a template spectrum with known physical parameters. The distance metric to be 

minimised is of the form 

D = 1 N 

[ i=N 

] 1/p 

∑ 

w i |x i − s i | p , (3.1) 

i=1 

where w i is a weight assigned to flux element s i of the template spectrum. Typically, 

s is only one template in a set of templates, S = {s 1 ,s 2 ,... ,s M }, and equation 3.1 is 

computed for all templates s j . Equation 3.1 becomes χ 2 fitting when p = 2 and 

w i = σ −2 

i 

, where σ i is the error in x i . 

For a straightforward nearest neighbour minimisation of the χ 2 metric over a grid of 

templates, an accurate result requires the grid to be finely spaced in each parameter of 

interest so that the effects of that parameter on the flux vector can be ascertained. This 

can create a large data requirement, and the computation time required to parameterise 

one spectrum can increase prohibitively as equation 3.1 must be evaluated for all the 

templates. 

One solution to this problem is to use some method of interpolation to “fill in the 

gaps” between templates in a discrete grid. As interpolation creates the illusion of 

continuity in the grid, it also opens the possibility of using search-based optimisation 

methods to locate the minimum of D in an efficient manner. 

Unfortunately, with χ 2 fitting, there is no escaping the so-called curse of dimen- 



sionality. As the number of parameters to be determined increases, the number of 

templates in the grid also increases exponentially. 

χ 2 Fitting for Astronomical Data Mining 

The main disadvantage to using χ 2 fitting in the context of a data mining application 

is slowness. In contrast to artificial neural networks, no training procedure exists to 

extract information from the template grid, and the grid is required to be present and 

searched to minimise D for every new spectrum to be parameterised. 

One solution to the speed issue is to distribute the grid search over many computers 

in a parallel cluster. The grid of templates could be broken up into N sections, where 

N is the number of processing nodes in the cluster. Each node then receives its section 

of the grid, finds the local minimum of D for an observed spectrum, and reports this 

value back to a master processing node which then selects the global minimum from 

all the node results. 

In a data mining context, it is likely that the template grid will cover a large region 

of the parameter space of interest in reasonable detail so as to account for the diversity 

of objects that will be encountered. Large template grids pose storage and accessing 

problems from within the χ 2 minimisation program because the main memory of the 

computer may not be capacious enough to store all the templates at once. 

The work of this chapter is concerned with taking a pre-existing χ 2 minimisation 

program used at the Armagh Observatory, and beginning the modifications necessary 

in order to use the program more efficiently in a data mining context. Parallelising the 

program is a relatively straightforward task, however the problem of managing large 

template grids is much more involved and needs to be tackled first.

3.2 SFIT 55 

3.2 SFIT 

SFIT (Jeffery et al., 2001) is a Fortran 90 implementation of the χ 2 minimisation 

method outlined in the previous section. Given a grid of theoretical model spectra, 

and an observed spectrum, SFIT finds the combination of physical parameters of the 

model which most closely matches the observed spectrum by minimising the χ 2 distance 

metric. 

The program considers several broadening processes which must be applied to the 

theoretical spectra before comparison with an observed spectrum. These include instrumental 

broadening I(∆λ), rotational broadening V(v sin i,β), acceleration broadening 

A(v), and projection broadening P(v − ¯v). 

Model grids are discrete in three-dimensions: T eff , log g, and n atm , the fractional 

atmospheric abundance of an element. A linear interpolation in tables method is used 

to estimate the model space between grid points. Fitting solutions can be obtained 

in several parameters (T eff , log g, n atm , v sini, and v rad ) for both single and composite 

spectra. The χ 2 minimisation can be carried out using either the Nelder-Mead 

downhill simplex optimisation procedure, implemented as a variant of the AMOEBA 

algorithm of Press et al. (1986), or the Levenberg-Marquardt algorithm (Levenberg, 

1944; Marquardt, 1963). Nearest neighbour χ 2 fitting is also possible. 

The Amoeba algorithm minimises a function (in this case, the χ 2 difference between 

the observed spectrum and the models in the grid) by defining an initial simplex with 

N +1 vertices, where N is the number of dimensions in the function’s parameter space. 

The method then takes a series of steps, most of which just move the point of the 

simplex where the function to be minimised is largest through the opposite face of 

the simplex to a lower point. The simplex “moves” through the parameter space by 

contracting and expanding until the distance “moved” is smaller than some tolerance 

threshold, at which point the method is determined to have converged on a solution. 

The Levenberg-Marquardt method is an interative method specifically catering for 



the minimisation of sum-of-squares error functions (i.e., the χ 2 function used in SFIT). 

The algorithm expands the error function around a point and examines the derivatives 

to search for a minimum by dynamically setting the step size according to the direction 

of the gradient. As the solution approaches the minimum, the step size decreases, and 

the algorithm usually converges quickly. The use of this method in SFIT requires the 

initial guess for the parameters to be reasonably close to the solution as Levenberg- 

Marquardt can get trapped in local minima. Although slower, Amoeba is more robust 

against this possibility. 

Both methods assume that the error function is either continuous, or can be evaluated 

for any point within the boundaries of the parameter space. Evaluation of the χ 2 

error function in SFIT depends upon a grid of models which is discrete. As mentioned 

in the previous section, an interpolation method is used to “fill in the gaps” of the 

grid, thereby creating the illusion of a continuous parameter space. As the Amoeba or 

Levenberg-Marquardt optimisers examine the properties of the error function throughout 

the parameter space, they are more often than not examining an interpolation of 

the model spectra. 

Once T eff , log g, and v sin i have been determined, SFIT can estimate the composition 

of a star by adjusting the abundances of the different atomic species which 

contribute to the absorption spectrum until the theoretical spectrum matches the observed 

spectrum. As the number of free parameters in such an analysis is so large (i.e., 

the abundances of H, C, N, O, Al, and so on, along with the microturbulent velocity, 

v t ), pre-computing multidimensional grids of theoretical spectra is infeasible. SFIT 

solves this problem by computing synthetic spectra as demanded by the χ 2 minimisation 

algorithm. 

SFIT is currently distributed with STERNE and SPECTRUM, the model atmosphere 

and spectral synthesis codes used at the Armagh Observatory. As part of 

this thesis, the source codes for all three programs, and their associated libraries, were 

ported from a simple build system based on GNU make to a more flexible build system

3.2 SFIT 57 

based on the GNU autotools (see Appendix E). 

3.2.1 Limitations of SFIT 

Analyses performed with SFIT are hindered by the program’s restrictions on the size 

of the model grid. Grids are limited to three dimensions (T eff , log g, and n atm ), and, at 

maximum, nine points in T eff , five in log g, and five in n atm . Models are permitted to 

have no more than five thousand wavelength points. 

These limits are due to design decisions made during SFIT’s inital construction 

which choose to store the model grid entirely in the computer’s main memory. The 

restrictions on dimensionality and number of grid points are merely hard-coded numbers 

within the program, and therefore cannot be changed without recompiling the source 

codes. 

Storing the model grid in main memory, whilst providing fast access to the models, 

also presents a problem in that computer memory is finite - orders of magnitude more 

finite than the space available on secondary storage devices, such as hard disks. Despite 

ever increasing main memory capacities, the implied upper limit on the number of 

models and their detail will always be much smaller than if secondary storage was 

used. 

Another restriction SFIT places on the model grid is that it must be rectangular 

and complete, with no missing grid points. This is problematic because it may be 

difficult or impossible for model atmosphere simulations to converge for a given set of 

physical parameters. In such an instance, a make-shift solution is employed wherein a 

converged model close to the desired physical parameters is used to “plug the gap”. 

The rectangularity and completeness requirements are a result of SFIT’s interpolation 

scheme which generates approximations of models in the parameter space between 

discrete grid points by linear interpolation in tables. An irregular grid, or a missing 



grid point, prevents the interpolation scheme from operating correctly. 

3.2.2 Proposal to Remove SFIT’s Limitatons 

In summary, the limitations of SFIT’s treatment of model grids are 

1. Size limitations due to initial program design decisions and storage of grids in 

main memory. 

2. Interpolation scheme cannot handle irregular or incomplete grids. 

Modifying SFIT to be more useful in a data mining context requires removing these 

two limitations. 

The solution to the first limitation is obvious: correct the limiting initial design 

decisions, and store the model grids on secondary storage, i.e. hard disk, reading them 

into main memory on an individual basis only when needed. An indexing scheme is 

then required, one that can be held in main memory in place of the models, and quickly 

searched to determine which models are to be read in and their location on disk. 

The nature of this index is dependent on the interpolation scheme chosen to correct 

the second SFIT limitation. Interpolation allows a complicated function to be 

approximated at an unknown point by using known surrounding points to construct a 

simpler, estimating function. Different interpolation schemes use the known surrounding 

points in different ways, so the design and function of the proposed model grid 

indexing scheme must be tailored accordingly. 

Many interpolation schemes exist in the literature. The ideal scheme for this application 

should be multidimensional (although this can be relaxed due to the curse of 

dimensionality), have low computation cost, and be able to operate over potentially 

randomly sampled functions. The interpolating function must also be continuous, and 

be based on known data points local to the interpolation point (as opposed to global

3.2 SFIT 59 

methods, in which the interpolated value is influenced by all of the available data). 

Two interpolation functions which stand out in terms of their simplicity, multidimensionality, 

and ability to handle incomplete grids are weighted average interpolation 

and simplex interpolation. 

Weighted Average Interpolation 

The most common weighted average method referred to in the literature is that of 

Shepard (1968) and its modifications, such as Renka (1988). 

Given an underlying function, f, with values f i at nodes (x i ,y i ) for i = 1,... ,N, 

the interpolating formula is of the form, 

F(x,y) = 

∑ N 

k=1 W k(x,y)f k (x,y) 

∑ N 

i=1 W . (3.2) 

i(x,y) 

The weighting function, W k , is defined by some inverse distance function, 

W k (x,y) = 1 d 2 , (3.3) 

k 

where d k (x,y) denotes the Euclidean distance between (x,y) and (x k ,y k ). 

A suitable indexing scheme for weighted average interpolation should allow fast 

searching of the node points to determine which are within a specified radius of the 

interpolation point (i.e., nearest neighbour searching). 

The field of computational geometry contains many algorithms and data structures 

for indexing and searching a set of N-dimensional points in a computationally efficient 

manner. One data structure that is very applicable to nearest neighbour searching 

problems is the k-D tree (Moore, 1991). Figure 3.1 demonstrates the k-D tree in two 

dimensions. 



[4,9] 

[4,9] 

[2,5] 

[8,7] 

[2,5] [8,7] 

[3,2] 

[3,2] 

Figure 3.1: Example of a k-D tree in two dimensions. On the left is the representation 

of how the k-D tree on the right splits up the x,y plane. (Adapted from Moore 1991.) 

This data structure is a binary tree which represents a series of partitions in k- 

dimensional space, organising a set of points into a collection of hyper-rectangular 

regions. Nearest neighbour searching can be carried out in O(log 2 N) time on average, 

where N is the number of nodes in the tree. 

All that remains is to determine how many nearest neighbours are needed, and the 

weighted average interpolation can be performed immediately. 

Simplex Interpolation 

A simplex, or N-simplex, is the N-D analogue of a triangle in 2-D and a tetrahedron 

in 3-D, as demonstrated in Figure 3.2. 

Simplex-based interpolation uses a weighted linear combination of the simplex vertices 

to approximate a function at a point located on or within the simplex boundary. 

These weights are computed as the barycentric coordinates of the interpolation point 

within the simplex. 

Given a collection of N-dimensional points, such as a grid of model spectra, a suit-

3.2 SFIT 61 

(a) 1−simplex (b) 2−simplex (c) 3−simplex 

Figure 3.2: A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is a 

tetrahedron. 

able indexing scheme must allow the vertices of the enclosing N-simplex to be located 

quickly. 

As this is, again, another nearest neighbour problem, the method of k-D trees could 

be a viable solution. However, if the dimensionality of the grid is kept to three dimensions 

or less, the field of computational geometry offers another approach. 

Several algorithms exist which can take a cloud of two or three-dimensional points 

and generate a triangular or tetrahedral mesh. All that is then needed is a method to 

search the mesh for the triangle or tetrahedron that contains the interpolation point. 

Choosing the Solution 

Preliminary testing of both interpolation and indexing schemes was carried out to help 

determine which solution would be more viable. 

Constructing a suitable prototype of the weighted average/k-D tree solution was hindered 

by Fortran 90’s insufficient flexibility to support the implementation of advanced 

data structures. No suitable third-party libraries were available to speed development, 

and, as a result of time constraints, the pursuit of this solution had to be abandoned. 

On the other hand, if it is assumed that SFIT model grids are limited to three 

dimensions, then several freely available third-party libraries exist which can generate 

tetrahedral meshes from a cloud of random points. From a purely pragmatic stand- 



point, this makes the simplex interpolation scheme very attractive. After the mesh has 

been generated, the methods then required to search for the tetrahedron enclosing an 

interpolation point are simple geometric operations. 

Thus, the simplex interpolation method was chosen to solve SFIT’s grid management 

problems. The weighted-average/k-D tree solution is an interesting idea (which, 

unlike the simplex scheme, is not limited to three dimensions), and should be pursued 

in future work. 

3.3 Tetrahedralisation: Interpolation and Indexing 

In developing the simplex interpolation and corresponding grid indexing scheme, it 

was assumed that SFIT grids will always be three dimensional due to the curse of 

dimensionality. 

From this assumption, the tetrahedral mesh indexing scheme, described previously, 

can be constructed using third-party libraries. This affords a very pragmatic solution 

to the problem. 

3.3.1 Simplex Interpolation 

Barycentric coordinates express the location of any point within an N-simplex in terms 

of a set of homogenous coordinates that form a linear combination of the simplex 

vertices. Given a tetrahedron defined by three arbitrary vertices, v 1 , v 2 , v 3 , and 

v 4 , and some point p within this tetrahedron, p can be expressed as the weighted 

combination of the four vertices 

p = λ 1 v 1 + λ 2 v 2 + λ 3 v 3 + λ 4 v 4 , (3.4) 

where λ 1 , λ 2 , λ 3 , and λ 4 are the barycentric coordinates. These are subject to the

3.3 Tetrahedralisation: Interpolation and Indexing 63 

constraints that 

0 ≤ λ 1 ,λ 2 ,λ 3 ,λ 4 ≤ 1, (3.5) 

and, 

λ 1 + λ 2 + λ 3 + λ 4 = 1. (3.6) 

Calculating the barycentric coordinates of a point inside a given tetrahedron is 

accomplished by reformulating equation 3.4 as follows, 

⎡ 

⎢ 

⎣ 

p x 

p y 

p z 

1 

⎤ ⎡ 

= 

⎥ ⎢ 

⎦ ⎣ 

v 1x v 2x v 3x v 4x 

v 1y v 2y v 3y v 4y 

v 1z v 2z v 3z v 4z 

1 1 1 1 

⎤ ⎡ 

· 

⎥ ⎢ 

⎦ ⎣ 

⎤ 

λ 1 

λ 2 

, (3.7) 

λ 3 ⎥ 

⎦ 

λ 4 

or, rewriting in matrix notation, 

b = A · x, (3.8) 

where b = [ p x p y p z 1 ] T , x = 

[ 

λ 1 λ 2 λ 3 λ 4 

] T, and, 

⎡ 

A = 

⎢ 

⎣ 

v 1x v 2x v 3x v 4x 

v 1y v 2y v 3y v 4y 

v 1z v 2z v 3z v 4z 

1 1 1 1 

⎤ 

. 

⎥ 

⎦ 

Therefore, x can be found through the standard methods of solving equation 3.8. 

As will be useful later on, if the computed barycentric coordinates do not conform 



to the constraints discussed earlier, then the point of interest can be determined to lie 

outside the given tetrahedron. 

3.3.2 Grid Index - Delaunay Triangulation 

The Delaunay triangulation (O’Rourke, 1998) is frequently used to generate meshes 

of N-simplices from a set of N-dimensional points because it has certain desirable 

properties, the most important of which is the following: inside the circum-hypersphere 

of any simplex, there are no other points of the set (see Figure 3.3). 

This property yields a resulting triangulation which is “natural” and provably optimal 

in many respects. It is known that the Delaunay triangulation exists and is unique 

for a set of points in general position, that is, no N + 1 points are on the same hyperplane 

and no N + 2 points are on the same hypersphere, for an N-dimensional set of 

points. 

In the context of SFIT, the Delaunay tetrahedralisation of a model grid is generated 

by the third-party library TetGen 1 . 

TetGen is a portable C++ program implementing the Delaunay triangulation algorithm 

of Edelsbrunner & Shah (1992). This algorithm is simple, fast, and TetGen’s 

implementation is numerically robust due to the use of adaptive exact arithmetic code 

(Shewchuk, 1996). TetGen can be compiled as a set of library functions which can then 

be integrated into other applications, in this case, SFIT. 

A technical difficulty arises in that SFIT is a Fortran 90 program, but TetGen is 

written in C++. Unfortunately, the Fortran 90 standard does not provide for calling 

functions written in other programming languages, and it has been left up to the 

individual compiler implementors to include a solution. 

SFIT is currently based around the Intel Fortran compiler for Linux 2 , and it is rel- 

1 http://tetgen.berlios.de 

2 http://www.intel.com/cd/software/products/asmo-na/eng/compilers/flin/


Figure 3.3: In two dimensions, the Delaunay triangulation guarantees that no other 

points lie in the circumcircle of any simplex. 

atively straightforward to call out to C/C++ functions using the mechanisms provided 

by this compiler. 

To simplify the process of calling the TetGen library from Fortran, a small “glue” 

function was written in C. This function accepts a flattened array of three-dimensional 

model grid points, copies the data into the data structure used by TetGen, calls TetGen 

to perform the tetrahedralisation, then returns a flattened array of vertices for the 

generated tetrahedra, and a flattened array denoting the neighbouring tetrahedra for 

each generated tetrahedron. 

This process of calling TetGen to construct the new model grid indexing scheme fits 

in with SFIT’s normal grid generation procedure, as outlined in algorithm 1. 

As noted in the pseudo-code, the parameters of the models are rescaled before being 



Algorithm 1 Generating a Tetrahedralisation of a Model Grid 

for all models in the grid file list do 

read model parameters from header 

record each parameter in corresponding grid axis array 

write model fluxes to direct access grid file 

append model parameters and corresponding direct access file record numbers to 

a linked list 

end for 

rescale parameters in linked list to yield more optimal tetrahedra 

flatten the parameters in linked list to an array of 3D points 

pass array of points into TetGen {TetGen returns two arrays: a list of tetrahedra 

vertices, and a list of tetrahedra neighbours} 

write grid axis arrays to beginning of index file 

write linked list of model data to index file 

write list of tetrahedra vertices to index file 

write list of tetrahedra neighbours to index file 

passed to TetGen. This is to allow the generation of a mesh which is composed, more 

optimally, of “fat” tetrahedra, avoiding degenerate tetrahedra or “slivers” which would 

cause numerical problems for the simplex interpolation scheme and the point location 

algorithm outlined in the next section. 

Such degenerate tetrahedra would arise because of a scale disparity between the 

model grid axes. For instance, the T eff axis contains effective temperatues measured in 

Kelvin and rescaled in magnitude by a division by 100. 

On the other hand, the n atm axis typically contains fractional values 0 ≤ n atm ≤ 1. 

This disparity means that model grids are very compact in the n atm dimension, and 

comparatively widely spaced in the T eff dimension. 

Given the model grid axis arrays accumulated during the model grid creation process 

(which typically correspond to the dimensions of T eff , log g, and n atm ), each axis is 

rescaled in the following manner.


Let A i be the i th model grid axis comprising the list of m monotonically increasing 

points {a i1 ,a i2 ,...,a im }. A i is rescaled according to the mapping function f : A i ↦→ R i 

such that f(a) for every a ∈ A i is defined as 

f(a) = a − a i1 

a i2 − a i1 

∗ 100. (3.9) 

This simple function translates A i to the origin, and rescales the points onto a more 

widely spaced grid. Assuming a constant distance between all points a ii , this mapping 

yields a list of m monotonically increasing points R i , {0,100,200, · · · ,(m − 1) ∗ 100}. 

3.3.3 Navigating the Index - Point Location 

The algorithm for locating the tetrahedron which encloses any given interpolation point 

is based on a randomised jump-and-walk methodology, inspired by the work of Mücke 

et al. (1996). 

The basic idea is simple. A “good starting point” is established by randomly sampling 

the set of tetrahedra. The distances between each tetrahedron’s centroid and the 

given interpolation point are calculated, and the tetrahedron closest to the interpolation 

point is selected. 

A line segment is then constructed using the chosen tetrahedron’s centroid and the 

interpolation point. The tetrahedron containing the interpolation point is located by 

“walking through” the tetrahedra which intersect this line. Figure 3.4 illustrates the 

concept in two dimensions 

More formally, given the tetrahedralisation D of a model grid containing n tetrahedra, 

and an interpolation point p (rescaled using Equation 3.9), the following procedure 

locates the tetrahedron of D, if any, which contains p: 

1. Select m tetrahedra T 1 , · · · ,T m at random from D, where m = ⌈2n 1 3 ⌉ 



p 

L 

T 

Figure 3.4: The line segment, L, is constructed using the centroid of the starting 

tetrahedron, T, and the interpolation point, p. The tetrahedra visited on the walkthrough 

are coloured grey. 

2. Determine the index j ∈ {1, · · · ,m} of the tetrahedron minimising the Euclidian 

distance d(centroid(T j ),p). Set T = T j 

3. Locate the tetrahedron containing p (if it exists) by traversing all tetrahedra 

intersected by the line segment L = (centroid(T),p). 

Step 3 is implemented in constant time per tetrahedron visited once the initial 

tetrahedron, intersected by L and incident on starting point T, is determined. This is 

due to the fact that TetGen conveniently returns an array which describes, for every 

tetrahedron in the mesh, which tetrahedra are its neighbours. 

The implementation of the walk-though mechanism is based on the fast ray-triangle 

intersection algorithm of Möller & Trumbore (1997). This algorithm is very straight-


forward. 

A ray R(t) with origin O and normalised direction D is defined as 

R(t) = O + tD, (3.10) 

and a triangle is defined by three vertices V 0 , V 1 , and V 2 . A point, T(u,v), on a 

triangle is given by 

T(u,v) = (1 − u − v)V 0 + uV 1 + vV 2 , (3.11) 

where (u,v) are the barycentric coordinates which must fulfill u,v ≥ 0, and u + v ≤ 

1. Computing the intersection between the ray, R(t), and the triangle, T(u,v), is 

equivalent to R(t) = T(u,v), which yields 

O + tD = (1 − u − v)V 0 + uV 1 + vV 2 . (3.12) 

Rearranging the terms gives 

[ 

⎡ 

] 

−D, V 1 − V 0 , V 2 − V 0 

· 

⎢ 

⎣ 

t 

u 

v 

⎤ 

⎥ 

⎦ = O − V 0. (3.13) 

The barycentric coordinates (u,v) and the distance, t, from the ray origin to the 

intersection point can be found by solving the linear system of equations above. If the 

barycentric coordinates meet the requirements stipulated earlier, then the ray intersects 

the triangle. 

From the starting point of the walk-through method, each triangular face of the 

tetrahedron is tested using this algorithm to determine if it is intersected by the line 



segment L. If an intersecting face is found, the walk-through moves to the tetrahedron 

opposite that face (in constant time). 

This new tetrahedron is first tested to see if it contains point p by way of the simplex 

interpolation method discussed in section 3.3.1. If the tetrahedron does not contain 

p, the ray-triangle intersection test is performed, and the walk-through moves to the 

neighbouring tetrahedron on the other side of the face intersected by the ray. If the 

tetrahedron does contain p, then the walk-through procedure can terminate successfully 

by returning the interpolation weights (i.e., the barycentric coordinates) obtained from 

the point-in-simplex test. 

It is possible that point p could lie outside the convex hull of the tetrahedralisation. 

The walk-through algorithm recognises this eventuality when the line segment L intersects 

the face of a tetrahedron which is a member of the convex hull and therefore has 

no neighbour listed in the array returned by TetGen. 

Rather than allowing the walk-through algorithm to spend time traversing the tetrahedralisation 

in order to discover that point p lies outside the convex hull, it is possible 

to test for this case immediately after forming the line segment L. 

In addition to generating the Delaunay tetrahedralisation of a model grid, TetGen is 

also able to return a list of those tetrahedron faces which comprise the convex hull. After 

forming the line segment L, each of these faces could then be tested for intersection. 

However, it doesn’t really matter which method is used because, if point p lies 

outside the convex hull, the simplex interpolation method dictates that SFIT can no 

longer proceed with a fitting run and must stop. 

In summary, pseudo-code for the algorithms outlined in this section are given in 

algorithms 2, 3, and 4.


Algorithm 2 Locating a Point in a Tetrahedralisation 

rescale point, p, onto axes of rescaled model grid 

if no starting tetrahedron exists then 

find close starting tetrahedron by random selection 

end if 

walk through tetrahedralisation 

if enclosing tetrahedron found then 

return barycentric coordinates of point p within the tetrahedron 

else 

point lies outside the convex hull of the tetrahedralisation 

exit SFIT 

end if 

Algorithm 3 Finding Walk-Through Starting Point 

select at random m = ⌈2n 1 3 ⌉ tetrahedra from the tetrahedralisation, where n is the 

total number of tetrahedra in the tetrahedralisation 

compute the Euclidean distance from each selected tetrahedron’s centroid to the 

interpolation point 

return the index of the closest tetrahedron 

Algorithm 4 Walk-Through of Tetrahedralisation 

construct the line segment, L, from given starting tetrahedron’s centroid to the interpolation 

point, p 

current tetrahedron = given starting tetrahedron 

loop 

if current tetrahedron contains the interpolation point then 

return the barycentric coordinates of its location 

else 

test each triangular face of the starting tetrahedron for intersection with L 

current tetrahedron = neighbouring tetrahedron on other side of intersected face 

if current tetrahedron is null then 

interpolation point lies outside convex hull 

exit SFIT 

end if 

end if 

end loop 



3.4 Testing the Modifications 

The new simplex interpolation and indexing scheme was tested against the previous 

SFIT grid storage and interpolation in tables method using two case studies. The 

first conducts an analysis of a spectrum from the extreme helium star BD+10 2179 

(Klemola, 1961) to allow a comparison of each of the different optimisation routines 

offered by SFIT over the two interpolation schemes. The second uses a coarse grid of 

theoretical models to parameterise a large number of other models to give an indication 

of the accuracy of the two interpolation schemes whilst keeping the optimisation method 

constant. 

Case Study 1: BD+10 2179 

The observed high-resolution echelle spectrum of BD+10 2179 used in this study covers 

the wavelength range 3760–5230 Å, at a dispersion of 0.1Å pixel−1 . The spectrum 

has already been wavelength calibrated and normalised. Both versions of SFIT fit a 

window, 4054–4545 Å, of this spectrum to a grid of 48 theoretical models covering the 

parameter space as described in Table 3.1. 

Parameter Values 

T eff (K) 14,000, 16,000, 18,000, 20,000 

log g 2.00, 2.50, 3.00, 3.50 

n He 0.9960, 0.9890, 0.9690 

Table 3.1: Details of the model grid used in the comparison 

. 

The grid also has a latent fourth parameter in carbon abundance. 

For each analysis, the same initial guesses for each parameter were used for the 

Amoeba and Levenberg-Marquardt optimisation methods. These have been chosen to 

be close to expected values of the final parameter. They, and the step sizes given to 

the Amoeba routine, are listed in Table 3.2.

3.4 Testing the Modifications 73 

Parameter Initial Value Amoeba Step Size 

T eff (kK) 17.0 2.0 

log g (dex) 2.5 0.5 

n He 0.989 0.01 

v sin i 27.5 10.0 

v rad 137.4 10.0 

Table 3.2: Initial parameters used for the Amoeba and Levenberg-Marquardt optimisation 

routines. The step sizes used for Amoeba are also given 

. 

An analysis begins by fixing the helium abundance, and solving for T eff , log g, v sin i, 

and v rad . Then, the values for these parameters are fixed, and a solution is found for 

n He which, with the latent n C parameter, is effectively a first approximation for n C . 

Finally, the value of n He is fixed again, and the solutions for T eff and log g are checked. 

The results obtained by each optimisation method available in SFIT (Nelder-Mead 

simplex (Amoeba), Levenberg-Marquardt (LM), and nearest neighbour (NN) fitting) 

are presented in Tables 3.3 and 3.4 for both the original SFIT and the modified SFIT. 

Listed in parentheses for each parameter are the standard errors generated by SFIT. 

Unmodified SFIT 

Amoeba LM NN 

T eff (kK) 18.000 (±0.014) 18.087 (±0.016) 18.00 (±0.011) 

log g 2.743 (±0.004) 2.747 (±0.004) 2.50 (±0.003) 

n He 0.997 (±0.001) 0.994 (±0.001) 0.996 (±0.001) 

v sin i 33.44 (±0.13) 36.9 (±0.15) 27.50 (±0.11) 

v rad 136.23 (±0.0) 137.4 (±0.0) 137.4 (±0.0) 

χ 2 Fit 9.00 9.90 9.96 

Time (secs) 54.3 11.42 27.99 

Table 3.3: Results of BD+10 2179 analysis with the unmodified version of SFIT 

This is a satisfactory result which shows that the simplex interpolation and grid 

indexing scheme performs slightly better (in terms of the final χ 2 value) than the 

original linear interpolation in tables method. There is also a small gain in terms of 

execution speed of the Amoeba method with the new simplex-based scheme. 

It should be noted that the 6-fold increase in speed for nearest neighbour searching 

reported in Table 3.4 is due to a re-write of some SFIT internals to take advantage of 



Modified SFIT 

Amoeba LM NN 

T eff (kK) 18.150 (±0.005) 17.870 (±0.015) 18.00 (±0.012) 

log g 2.836 (±0.004) 2.687 (±0.005) 2.50 (±0.003) 

n He 0.993 (±0.001) 0.992 (±0.001) 0.996 (±0.001) 

v sini 33.46 (±0.13) 36.90 (±0.14) 27.50 (±0.12) 

v rad 136.22 (±0.0) 137.4 (±0.0) 137.4 (±0.0) 

χ 2 Fit 8.88 9.770 10.20 

Time (secs) 20.649 14.01 4.71 

Table 3.4: Results of BD+10 2179 analysis with the modified version of SFIT 

the data structures used in the simplex interpolation scheme. The data structures allow 

a fast iteration over all the models in a grid, reading each in from disk as needed. This 

means that the χ 2 computation is being performed directly with the model itself, in 

contrast with the linear interpolation in tables scheme which actually tries to interpolate 

to the grid point instead of accessing the model directly. This is another design flaw 

in SFIT that the simplex-based scheme corrects. The difference in methodology also 

accounts for the slightly different χ 2 values for nearest neighbour fitting listed in Tables 

3.3 and 3.4. 

Case Study 2: Model-Based Analysis 

The grid of theoretical models used in this case study is given in Table 3.5. It coarsely 

covers almost the entire parameter space of models available in the Armagh Observatory 

archives. 


T eff (kK) 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 50.0 

log g 3.00, 4.00, 5.00, 6.00 

n He 0.001, 0.1, 0.5, 0.9, 0.999 

Table 3.5: The model grid used to obtain physical parameters of the set of test models. 

The rationale of the experiment is to use this grid to parameterise a large set of 

models which fall within its boundaries, but aren’t actually used in the grid. Keeping 

the optimisation method constant, the results of the parameterisations will give an


indication of the relative accuracy of the two interpolation schemes. 

1238 models were selected to be parameterised by each version of SFIT. Each model 

was convolved with a Gaussian of 1Å 

FWHM to degrade its resolution slightly, and 

then resampled onto a wavelength grid of 4050–4950 Å. 

The optimisation method used was Nelder-Mead simplex, with initial parameters 

and step sizes as follows: T eff = 30kK, δT eff = 5.0kK; log g = 4.5, δ log g = 1.0; n He = 

0.5, δn He = 0.1. Results are presented in Figures 3.5 to 3.7, and in Table 3.6, 

Before discussing the results, the presence of some anomalies in the linear interpolation 

in tables parameterisations must be noted and dealt with. Figure 3.5 plots the 

parameterisation results for all of the 1238 models. At T eff ∼ 50,000K, the optimiser 

returns unbelievable values of log g for some models. Something also seems to be going 

wrong with the T eff parameterisations at the 50,000K grid boundary as some models 

with log g ∼ 3.5 are assigned temperatures much larger than 50,000K. 

The implementation of the linear interpolation in tables method used in SFIT does 

not take any steps to limit the optimisation routines to the boundaries of the grid, 

and actually allows some extrapolation to occur at the edges of the grid. However, 

it is unclear whether the anomalous T eff and log g values are due to the optimisation 

routine (in this case, Amoeba) extrapolating too far outside the grid space (i.e., there 

is a problem with the implementation of the interpolation routine), or if there is a 

problem with the models. 

If Figure 3.5 is replotted with axes closer to the grid boundaries, as in Figure 3.6, the 

best performance of the interpolation method appears to occur below T eff = 40,000K. 

Between 40,000K and 50,000K, the parameterisations are more randomly distributed 

indicating a greater level of “confusion” from the interpolation routine. A cursory 

inspection of the models reveals no significant problems, so it could be hypothesised 

that there is definitely an issue with the implementation. However, an inspection of 

Figure 3.7 shows a similar “confusion” from the simplex-based method. 



-10 

0 

10 

20 

log g 

30 

40 

50 

60 

70 

80 

80000 

70000 

60000 

50000 40000 

T eff (K) 

30000 

20000 

10000 

3 

2 

1 

log( n He / n H ) 

0 

-1 

-2 

-3 

-4 

-5 

80000 

70000 

60000 

50000 40000 

T eff (K) 

30000 

20000 

10000 

Figure 3.5: Parameterisation results from the linear interpolation in tables method. 

Clearly visible are anomalous results arising from a suspected defect in the method’s 

implementation.


2 

3 

4 

log g 

5 

6 

7 

50000 

40000 

30000 

T eff (K) 

20000 

10000 

2 

1 


0 

-1 

-2 

-3 

-4 

50000 

40000 

30000 

T eff (K) 

20000 

10000 

Figure 3.6: Parameterisation results from the linear interpolation in tables method. 

Axes have been restricted to give a view of the grid boundaries described in Table 3.5. 



2 

3 

4 

log g 

5 

6 

7 

50000 

40000 

30000 

T eff (K) 

20000 

10000 

2 

1 


0 

-1 

-2 

-3 

-4 

50000 

40000 

30000 

T eff (K) 

20000 

10000 

Figure 3.7: Parameterisation results from the simplex-based interpolation scheme. 

In contrast with Figures 3.5 and 3.6, the simplex-based scheme clearly restricts the 

optimisers to the grid boundaries.


At T eff ≥ 40,000K, the helium-rich models are most likely confusing the optimiser 

because the HeII ion lines manifest at wavelengths close to those of the neutral hydrogen 

lines. This problem requires further investigation, but, to work around the issue, a 

comparison of parameterisation results for those models with T eff ≤ 40,000K, and 

log g ≤ 6.0 is also given in Table 3.6. These RMS metrics give a better indication of 

the relative performance of the two methods. 

σ rms 

σ rms 

All Models T eff ≤ 40kK, log g ≤ 6.0 

T eff (K) log g n He T eff (K) log g n He 

Simplex/Models 3592.74 0.329 0.102 2666.79 0.355 0.068 

Linear/Models 4695.11 3.362 0.149 1905.47 0.349 0.056 

Linear/Simplex 3455.88 3.376 0.150 1928.02 0.306 0.065 

Table 3.6: RMS comparison of parameterisation results from each interpolation 

method with the original parameters of each model. Also given is the RMS difference 

between the methods, and a comparison between the results in the region of parameter 

space for which both schemes seem to give their best results (see Figures 3.6 and 3.7). 

The linear interpolation in tables scheme yields slightly more accurate results than 

the simplex-based method for all three parameters. This is most likely due to the coarse 

grid spacing used in the experiment, with a finer-grained grid allowing the simplex 

interpolation method to achieve more accuracy. Using a finer-grained grid with SFIT 

is now possible because the simplex-based gird management scheme removes all the 

grid size, shape, and completeness restrictions imposed by the old linear interpolation 

in tables method. 

The speed difference between the two methods should also be emphasised. To parameterise 

all the models, SFIT took approximately 10 minutes with the simplex-based 

scheme, but over 90 minutes with the old methodology. This significant gain in speed, 

along with the other advantages offered by the new simplex-based scheme, outweigh 

the possible slight loss of accuracy indicated in Table 3.6. 



3.5 Summary 

The χ 2 fitting code, SFIT, has been modified and extended to handle arbitrarily large 

grids of theoretical model spectra. This paves the way to making SFIT more amenable 

to parameterising very large quantities of stellar spectra in an astronomical data mining 

application. 

Two major problems were identified with the way SFIT manages grids of models. 

Grids were restricted in size due to hard-coded limits written into the program, and 

the interpolation scheme used to approximate the space between grid points could not 

handle irregular or incomplete grids. 

These problems were solved by developing a new grid management and interpolation 

scheme based on simplex interpolation and Delaunay triangulation. 

This new scheme was tested against the old version of SFIT by parameterising a wellstudied 

spectrum, and a large quantity of theoretical models. The new version of SFIT 

was found to perform much faster than the old version, with a more accurate fit being 

reported for the individual spectrum, and slightly (but not significantly) worse results 

in the parameterisation of the models. This slight loss of accuracy is outweighed by the 

increase in overall speed, and the removal of several severely constricting restrictions 

on the size, shape, and completeness of SFIT model grids.

Chapter 4 

Filtering - Principal Components 

Analysis 

Modern astronomical data sets often contain observations of many different types of 

objects, and are rarely typologically homogeneous (Chapter 1). Searching for particular 

types of objects in such large databases requires computer assistance. Query parameters 

can be used to narrow down the data set to objects of a particular colour range, 

redshift, morphology, or some other parameter combination of significance. However, 

this reduced data set will invariably still contain objects that the astronomer would 

like to discard. Manual inspection of the data is not time-efficient unless quantities are 

small. It is more expedient to have an automated, or semi-automated, tool that can be 

used to assist in filtering through the data. 

Filtering is essentially a coarse-grained classification problem. An unknown spectrum 

is compared with a collection of known, or template, spectra to determine if it 

belongs to that particular class of object. The well-known techniques of cross correlation 

(Tonry & Davis, 1979) and χ 2 minimisation (Chapter 3) are immediately applicable. 

However, in a data mining context, speed is of importance, and these techniques are 

slow. 

81


One way to construct a fast filtering method is to extract the defining features from 

a set of known spectra, and use them to summarise and represent that set. Instead of 

comparing an unknown spectrum with each template spectrum, it can then be weighed 

against the summarised form in a more computationally expedient manner. 

Principal Components Analysis (PCA; Murtagh & Heck, 1987) can be used to 

construct such a summary. It is a multivariate statistical technique which seeks to 

summarise the variance of an N-dimensional data set in a handful of independent 

parameters. These parameters capture the main sources of linear variation in the data 

set, and can be used to construct a fast test to determine if an unknown spectrum is 

similar to a collection of known spectra. 

Another advantage to using PCA as a filter is that the independent parameters produced 

are unique to each data set. This means that a PCA-based filter is generalisable, 

and can be used to construct a filter for any type of astronomical object. 

As a testament to its versatility, PCA has been applied on several occasions to the 

classification of astronomical spectra. Deeming (1964) applied it to the classification of 

G and K-type giants. Connolly et al. (1995) used PCA to classify galaxy spectra, and 

Francis et al. (1992) applied it to the classification of quasar spectra. Whereas these 

studies used PCA in an unsupervised manner, it is used here in a supervised fashion. 

A Filter for Hot Subdwarfs 

Chapter 1 outlines a general data mining toolkit for astronomical spectra, with a specific 

application to hot subdwarfs. As such, the apparatus of the PCA-based filter outlined 

here will be applied to the data set obtained from Drilling et al. (2006) to construct a 

filter for hot subdwarfs. 

The operation of this filter will then be applied to a set of real-world low-dispersion 

spectra obtained from the Sloan Digital Sky Survey in an attempt to data mine a

4.1 Constructing A PCA-Based Filter 83 

Y 

u 2 

u 

1 

Figure 4.1: Principal component analysis. u 1 is the first principal component and the 

axis onto which the projected positions of the data have their maximum sum. u 2 is the 

second principal component, and u 1 · u 2 = 0. 

X 

collection of hot subdwarf candidates for further study. 

4.1 Constructing A PCA-Based Filter 

Principal components analysis transforms an N-dimensional data set onto a new set 

of optimally defined axes. These axes represent the directions of maximum variance 

between variables in the data set, and are called the Principal Components (PCs). The 

technique basically amounts to a rotation from the original axes to the new ones, and 

is therefore a linear transformation of the data. 

Figure 4.1 illustrates the concept with a two dimensional data set. The direction 

of maximum variance in the data is represented by u 1 . This new axis (the first PC of 

the data set) better describes the data than either x 1 or x 2 . The remaining variance 

in the data, once they have been projected onto the first PC, is described by u 2 , the 

second PC. Thus, u 1 and u 2 are a more optimally aligned directional basis set for this 

particular data set. 



The PCs are derived in decreasing order of importance, with the first PC describing 

most of the variance in the data, and subsequent PCs representing less and less 

information about the variance. In the case of a large N-dimensional data set, a successful 

derivation of the principal components means that the first few components can 

be used to give a compressed representation of the data without a significant loss of 

information. 

Lesser principal components will typically contain information on features in the 

data which are not very well correlated, such as noise or anomalies. By discarding 

these components, a compressed representation will preferentially remove undesired 

features, and features which do not vary over a sufficient fraction of the data set. 

4.1.1 Mathematics of PCA 

This presentation of PCA theory follows that of Bailer-Jones (1996) and Murtagh & 

Heck (1987). 

Let the vector x = (x 1 ,x 2 ,x 3 ,... ,x N ) be a stellar spectrum with N flux bins. 

A spectrum can then be considered a point in N-dimensional space, with each axis 

representing each flux bin. M such spectra can be described as the (M × N) matrix 

X T = (x 1 ,x 2 ,... ,x M ). 

The first principal component is the normalised vector, u, which best fits the points 

in X T . The criterion of goodness of fit of this axis to the point set is defined as the 

squared deviation of the points from the axis. Minimising the sum of distances between 

the points and axis is equivalent to maximising the sum of squared projections onto 

the axis, i.e., maximising the variance of the points when projected onto this axis. 

The sum of squared projections of the points in X T onto the new axis, u, is 

(Xu) T (Xu). (4.1)


In maximising this quadratic form, the constraint must be made that u T u = 1 otherwise 

the projection can be maximised arbitrarily. Setting S = X T X, and introducing 

the Lagrange multiplier, λ, the maximum is obtained by differentiating 

u T Su − λ(u T u − 1), (4.2) 

which gives, 

2Su − 2λu. (4.3) 

Setting this equal to zero, the optimal value of u is the solution of 

Su = λu. (4.4) 

This is a standard eigenvector problem. The eigenvector of S, u, is the line of best 

fit, and the corresponding eigenvalue, λ, indicates the amount of variance described by 

this line. 

Calculating the remaining axes proceeds in a similar manner. The second axis is 

found by again maximising u T Su, but with the added constraint that the second axis 

be orthogonal to the first, i.e., u T 2 u 1 = 0. Introducing the Lagrange multipliers, λ 2 and 

µ, the maximum is obtained by differentiating 

u T 2 Su 2 − λ 2 (u T 2 u 2 − 1) − µ(u T 2 u 1 ), (4.5) 

giving, 

2Su 2 − 2λ 2 u 2 − µu 1 . (4.6) 



Setting this equal to zero, and multiplying through by u T 1 yields 

µu T 1 u 1 = 0, (4.7) 

which implies that µ = 0. Therefore, equation 4.6 is of the same form as equation 

4.4, meaning λ 2 and u 2 are the second largest eigenvalue and eigenvector of S. 

Thus, the principal components of a set of N-dimensional points, X, are the eigenvectors 

of the matrix of sums of squares and cross products, S = X T X. There are N 

eigenvectors for an N-dimensional problem. 

The principal components form a directional basis set, meaning that PCA is best 

applied to data that are centred. Geometrically speaking, centring is equivalent to 

a shift in the origin of the co-ordinate system, and is performed by calculating and 

subtracting the mean from the row vectors of X. 

Let x i be the average of element x i over all M data points. Therefore, the i th element 

of the p th point is given by 

∆x i,p = x i,p − x i . (4.8) 

S now becomes the covariance matrix of the data points. The result of equation 4.4 

remains unchanged. Subtracting the mean also has the advantage that the dynamic 

range of S is reduced, increasing the numerical stability of the solution to the eigenvector 

problem. 

4.1.2 Building A Hot Subdwarf Filter 

By retaining only the most significant principal components of an N-dimensional data 

set, a quick test can determine if a new data point is in a similar region of N-dimensional


1.0 

Normalised Flux 

0.8 

0.6 

4100 

4500 


4900 

Figure 4.2: Mean spectrum of the Drilling et al. (2006) sample. 

space as the original data set. This is the principle upon which a filter can be built 

to help search for astronomical objects of a particular type from a large collection of 

unknown spectra. 

As described at the start of the chapter, such a filter will now be developed using 

the collection of 177 standard hot subdwarf spectra obtained from Drilling et al. (2006) 

(see also Chapter 2). 

The first step is to construct the mean spectrum, subtract it from each spectrum in 

the set, thereby forming the matrix of difference spectra using equation 4.8. The mean 

spectrum is plotted in Figure 4.2. 

The elements of the covariance matrix, S, are then calculated from 

s i,j = ∆x i,p ∆x j,p . (4.9) 



The use of the covariance matrix in the formulation of PCA assumes that the data 

do not need to be standardised, i.e., that all the variables are on the same scale. This 

assumption is valid here because the Drilling et al. (2006) spectra have all been continuum 

normalised, and the application of the filter will be to normalised spectra. 

If the variables were on different scales, e.g., if the Drilling et al. (2006) set of spectra 

were unnormalised and half had flux scales several orders of magnitude greater than 

the other, then the large differences between the variances of the variables would cause 

weaker variables to be ignored. Likewise, PCA can be sensitive to outliers in the data 

set which can greatly contribute to the variance. 

Scale dependences must be removed if PCA is to generate useful components. Common 

approaches to normalisation include standardising the variables to have unit variance, 

compressing them onto the scale 0-1, or taking logarithms. The results of the 

PCA will depend on the normalisation method used. 

In this application of PCA to stellar spectra, the covariance matrix, S, will always 

be real and symmetric. As such, equation 4.4 does not need to solved as is because any 

real matrix is diagonalised by the matrix of its eigenvectors (see Golub & Van Loan 

1989). 

Any real and symmetric matrix can be reliably diagonalised using a technique such 

as Jacobi’s method. Here, a QR-based singular value decomposition (see Press et al. 

1986) routine has been used to calculate the eigenvectors. The results of the PCA 

analysis are presented in Figures 4.3 and 4.4 wherein the first ten principal components 

of the Drilling et al. (2006) spectra have been plotted. 

The PCs are rotations in the data space of the original axes, therefore they resemble 

spectra, and have the same number of elements as the original spectra. It can be clearly 

seen that the first PC differentiates between hydrogen and helium lines. This reification 

makes sense as it is these features which vary most across the Drilling et al. (2006) data 

set. The second PC also clearly differentiates between HeI and HeII line series. For the


0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

4100 4500 4900 

4100 4500 4900 

4100 4500 4900 

4100 4500 4900 

4100 4500 4900 

PC 4 

PC 3 

PC 2 

PC 1 

PC 0 

Figure 4.3: First five PCs of the Drilling et al. (2006) sample. 



0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

0.15 

0.0 

-0.15 

4100 4500 4900 

4100 4500 4900 

4100 4500 4900 

4100 4500 4900 

4100 4500 4900 

PC 9 

PC 8 

PC 7 

PC 6 

PC 5 

Figure 4.4: Second five PCs of the Drilling et al. (2006) sample.


100 

99 

Cumulative Percentage of Total Variance 

98 

97 

96 

95 

94 

0 1 2 3 4 5 6 7 8 9 

Principal Component 

Figure 4.5: Cumulative variance of the first ten PCs of the Drilling et al. (2006) 

sample. 

remaining PCs, it becomes harder to attach any meaningful interpretation. 

The question remains as to how many principal components should be retained in 

order to form an adequate representation of the Drilling et al. (2006) standard stars. 

Figure 4.5 shows the cumulative percentage variance accounted for by the first ten 

principal components. 

The first principal component itself accounts for 94.66% of the total variance, which 

is not surprising given the reification outlined previously. All ten PCs account for 

99.83% of the variance, however 99.30% is described by the first four PCs, making 

them sufficiently adequate to give a compressed representation of the Drilling et al. 

(2006) hot standards. 

It should be noted that this selection criterion of maximal variance may unwisely 

discard the less significant PCs. Lahav et al. (1996) point out that, in the role of 



classification of galaxy spectra, the fractional variance on its own was not sufficient to 

determine how many PCs were needed for classification. The reason for this may be 

due to non-linearity in the data (a spectrum is not a linear combination of line features, 

and the lines do not separate into different principal components), the effect of noise 

on the deduction of the PCs, or the fact that classification requires more information 

than that given simply by the maximal variance. 

In the application of PCA here to the filtering of stellar spectra, only an adequate 

representation of a data set is sought through PCA, and not an adequate discrimination 

between classes within a data set. As such, the criterion of maximal variance remains 

valid. 

Now, let the matrix E T = (u 1 ,u 2 ,u 3 ,u 4 ) contain the first four principal components 

of the Drilling et al. (2006) hot standards. To determine the similarity of some unknown 

spectrum y = (y 1 ,y 2 ,y 3 ,...,y N ) to the Drilling et al. (2006) standards, first, the vector, 

p, is constructed which is the magnitudes of the projection of y onto each of the four 

principal components in E, 

p = ∆y · E, (4.10) 

where ∆y is the mean subtracted difference spectrum of y (i.e., ∆y = y −x, where 

x is the mean spectrum of the Drilling et al. (2006)). 

The reduced reconstruction of y, y r , is then given by 

y r = x + p · E T . (4.11) 

Figure 4.6 shows the results of projecting two hot subdwarf spectra onto the first 

four principal components of the Drilling et al. (2006) hot standards. 

At the top, spectrum A is a relatively good S/N observation of a cooler subdwarf.


1.5 

1.0 

A 1.89970 

Original Spectrum 

Reduced Reconstruction 

0.5 

0.0 

4100 4500 4900 

1.5 

1.0 

B 6.22063 

Original Spectrum 

Reduced Reconstruction 

0.5 

0.0 

4100 4500 4900 

Figure 4.6: Illustration of projecting hot subdwarf spectra onto the first four PCs of 

the Drilling et al. (2006) standards. 

The original spectrum is plotted in red, and its reduced reconstruction in blue. 

Spectrum B shows a hotter subdwarf with a lower S/N observation. Again, the 

original spectrum is plotted in red, with the reduced reconstruction in blue. 

Spectrum A compares well with its reduced reconstruction, the latter showing very 

little difference to the original. However, spectrum B is noiser, and its reduced reconstruction 

matches the spectrum well but for the noise (here, the noise-filtering capabilites 

of PCA can be observed). 

Certainly, spectrum A, if encountered in a large set of unknown spectra, would be 

desirable to the astronomer, whereas spectrum B could be considered too noisy for any 

further analysis. Thus, when filtering through a large set of unknown spectra, those 

spectra which compare well with their reduced reconstructions will be of most interest 

to the hot subdwarf astronomer. 



A suitable quantitative measure for this comparison is the reconstruction error 

R = 100 × √ 1 N 

i=N 

∑ 

(y i − y r,i ) 2 , (4.12) 

i=1 

where y i is the i th flux bin of the original spectrum, y, and y r,i is the i th flux bin 

of the reduced reconstruction of y, y r . This error metric gives the RMS difference 

in each flux bin between the original spectrum and its reconstruction. The factor of 

100 is simply a scaling factor to make the final error values easier to work with (it is 

anticipated that the majority of values for R will lie in the range 0 ≤ R ≤ 1). 

The reconstruction errors for each spectrum in Figure 4.6 are shown in the top left 

region of each plot. 

How “well” a spectrum should compare in this manner with its reduced reconstruction 

is a subjective measure dependent on the type of object an astronomer is filtering 

for, and what further analysis he has in mind. In the hot subdwarf case, for classification 

purposes, a spectrum such as B in Figure 4.6 may mark the lower threshold of the 

reconstruction errors that are to be accepted. However, if the derivation of physical 

parameters is the goal, then reconstruction errors close to that of spectrum A, but not 

as low as that of B, may be desired. 

As mentioned in the introduction to this chapter, PCA is a data-driven tool, with 

the principal components derived for one data set being unique to those data. As such, 

if, say, a galaxy spectrum is reconstructed using the PCs of the Drilling et al. (2006) 

standards, its reconstruction error will be very high as it won’t have many (if any) 

features in common with hot subdwarfs. The same is true for noisy, or incomplete 

spectra, making them easy to filter out.

4.2 Searching the SDSS for Hot Subdwarfs 95 

4.2 Searching the SDSS for Hot Subdwarfs 

The PCA hot subdwarf filter was applied to a sample of 4610 spectra obtained from 

the Sloan Digital Sky Survey, Data Release 3 database. The selection criteria used to 

obtain the sample from the SDSS are outlined in the following SQL query, 

SELECT s.plate, s.mjd,s.fiberid 

FROM BESTDR3..SpecPhotoAll as s 

WHERE s.specClass = dbo.fSpecClass(’STAR’) 

AND (s.primTarget & (dbo.fPrimTarget(’TARGET_STAR_BHB’) 

+ dbo.fPrimTarget(’TARGET_STAR_SUB_DWARF’)) > 0) 

AND (s.objType = 2) 

The criteria naively rely upon the classifications automatically assigned by the SDSS 

spectrophotometic pipeline. 

The SDSS supplies spectra in FITS format with each FITS file including a calibrated 

spectrum, a normalised spectrum, and all measured parameters (redshift, line fits, line 

indices, per-pixel resolution, etc.) stored in the FITS header. 

For convenience, the normalised spectra were extracted from the FITS files, and 

subsequently velocity corrected using the redshift stored in each FITS header. The 

spectra were then rebinned onto the common wavelength grid of 4050–4950Å at a 

dispersion of 1Å pixel−1 to match the Drilling et al. (2006) spectra. 

The PCA filter was applied using equations 4.10 and 4.11, outlined in the previous 

section, to construct the set of reduced reconstructions. The reconstruction errors were 

then calculated as per equation 4.12. 

The distribution of the reconstruction errors is displayed in Figure 4.7. 

The histogram shows that most of the spectra in the SDSS sample are concentrated 

in the region R ≤∼ 4.0. The contents of the first three error bins (R ≤∼ 1.8) are 

shown in Figures 4.8 and 4.9. Clearly, these eight spectra are of a good S/N, strong 



300 

250 

Number of Spectra 

200 

150 

100 

50 

0 

15.00 

Reconstruction Error - R 

Figure 4.7: Histogram of reconstructions errors from the SDSS data sample. 

subdwarf candidates, and well-suited to further analysis. 

As the reconstruction error increases, the S/N of the spectra starts to decrease. 

Figure 4.10 shows four spectra sampled from the maximal error bin, R ∼ 3.0. 

They are slightly noiser spectra than those in Figures 4.8 and 4.9, but yet the 

reconstructions are still a close match, meaning they could still be suitable for further 

analysis. 

By around R ≈ 4.5, the reconstruction quality is becoming noticably poorer, as 

demonstrated in Figure 4.11. Here, the S/N is becoming progressively lower, and 

objects with spectra quite unlike those of subdwarfs, such as white dwarfs, begin to 

make an appearance in the succeding error bins. 

One interesting feature of note is the final error bin which contains all the SDSS 

spectra with reconstruction errors R > 15.0. It contains a large number of spectra in


1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

A 1.52992 

J234137.25+000123.2 

4100 4500 4900 

B 1.66001 

J171531.67+271545.5 

4100 4500 4900 

C 1.62708 

J155612.59+022152.9 

4100 4500 4900 

D 1.74365 

J153701.88-011307.9 

4100 4500 4900 

Figure 4.8: Spectra in first three reconstruction error histogram bins (R ≤ ∼ 3.0). 



1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

A 1.73210 

J152357.12+354009.4 

4100 4500 4900 

B 1.79120 

J151722.09+603546.3 

4100 4500 4900 

C 1.71810 

J125244.60-002512.9 

4100 4500 4900 

D 1.76950 

J112015.43+650003.2 

4100 4500 4900 

Figure 4.9: Spectra in first three reconstruction error histogram bins (R ≤ ∼ 3.0).


comparision to the preceding error bins. Such high reconstruction errors are indicative 

of spectra with features poorly matched to typical subdwarf spectra. Figure 4.13 shows 

a sample of four spectra from this error bin. 

The first three spectra are dominated by noise, with spectrum B exhibiting an 

anomalous gap in the data at around 

4110Å. Spectrum D is incomplete, hence the 

large reconstruction error. 

The PCA filter is effective at separating out the very low S/N exemplars, and incomplete 

spectra as shown in Figure 4.13. However, it does not magically separate out 

subdwarf candidates. Invariably, they will be mixed in with stars that are very much 

spectroscopically similar to subdwarfs. An example of high S/N spectra that aren’t 

subdwarfs, and which are filtered out, is shown in Figure 4.12. 

The SDSS sample used here was predominantly composed of cooler BHB and main 

sequence stars, with some white dwarfs. Thus, any subdwarf candidates were difficult 

for the filter to extract from amidst the spectroscopically similar cooler stars. This 

problem was due to the search criteria used in the initial SQL query, but it can be 

rectified by altering the database query to select by photometric colour which would 

exclude most of the cooler stars and subdwarf-main sequence binaries. 

The reconstruction error calculation described in equation 4.12 provides a description 

of the mean difference between an original spectrum and its reconstruction. As 

such, it served to rank the SDSS spectra mostly according to noise content. This 

meant that objects such as white dwarfs started to be found ranked alongside lower 

S/N subdwarf candidates/BHB stars with reconstruction errors of around R ≈ 7.0. 

This is not necessarily a problem per se because, by about R ≈ 5.0, any subdwarfs 

to be found are going to be dominated by noise levels that may not be conducive to 

useful further analysis. 

Practically speaking, the PCA filter allows a value of R to be established beyond 

which any spectra can be safely discarded on the grounds that they are not of sufficient 


Figure 4.10: Sample of spectra from the eighth error bin (R ∼ 3.0). 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

A 2.97285 

J023624.84-072238.1 

4100 4500 4900 

B 3.00098 

J001832.61+155540.1 

4100 4500 4900 

C 2.89146 

J224640.34-090631.8 

4100 4500 4900 

D 2.88735 

J145418.66-022346.1 

4100 4500 4900 

100 Chapter 4 - On the Automatic Analysis of Stellar Spectra


1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

A 4.66907 

J001146.72+152147.5 

4100 4500 4900 

B 4.50669 

J165401.98+294801.7 

4100 4500 4900 

C 4.54534 

J074623.09+205546.7 

4100 4500 4900 

D 4.50072 

J113044.42+612111.7 

4100 4500 4900 

Figure 4.11: Sample of spectra from the fourteenth error bin (R ∼ 4.5). 



1.5 

1.0 

0.5 

4900 

4100 

4500 

4500 

4500 

4500 

J085128.17+060551.2 

1.5 

1.0 

J110651.79+625024.0 

0.5 

A 6.48477 

4900 

4100 

J092252.13+524446.4 

1.5 

1.0 

0.5 

4900 

B 6.94160 

J080051.56+223558.5 

4100 

4900 

1.5 

1.0 

0.5 

C 6.99275 

D 7.03080 

4100 

Figure 4.12: Sample of high S/N DA white dwarfs from the 22 nd − 24 th error bins 

(R ∼ 6.4 − 7.1)


1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

1.5 

1.0 

0.5 

0.0 

A 15.15518 

J075647.73+232913.6 

4100 4500 4900 

B 20.11230 

J141509.80-021147.2 

4100 4500 4900 

C 38.54495 

J140804.49+011320.1 

4100 4500 4900 

D 66.91276 

J145616.92+024549.6 

4100 4500 4900 

Figure 4.13: Sample of spectra from the fifty-third error bin (R > 15.0). 



S/N for whatever further analysis the astronomer has in mind. This will also safely 

discard objects whose spectra are not sufficiently similar to the objects of interest. 

For the spectra that remain, a visual inspection is still necessary to separate out 

candidates of interest from objects for which the reconstruction error calculation is 

not sensitive enough to mark for removal. In the obtained SDSS sample, spectra 

with a reconstruction error of R < 5.0 were generally suitable for classification or 

parameterisation, however, as mentioned previously, any real subdwarf candidates in 

that sub-sample were mixed in with cooler BHB and main-sequence stars. 

4.3 Summary 

The concept of the PCA-based filtering tool presented here is certainly sound from the 

point of necessity. In the construction of a filter for hot subdwarfs, and its application 

to search for such stars in the SDSS, it was discovered that the SDSS-assigned spectral 

classifications are not a useful criterion to include in an initial search. 

The data set obtained was composed of a large quantity of blue horizontal branch 

stars. As they are spectroscopically very similar to hot subdwarfs, this made it difficult 

for the filter to provide a robust discrimination between the two object types. 

This point highlights the need to use appropriate and specific search criteria when 

extracting data from a very large survey database. In the case of hot subdwarfs and the 

SDSS, a photometric colour-based search would allow cooler BHB stars to be avoided. 

Still, the PCA filter is not completely automated, and cannot be treated as a black 

box. A user must be aware of the correct manner of operation: 

1. The set of training data from which a filter is to be constructed must be preprocessed 

into an homogeneous form. 

2. Application data must be pre-processed to have the same properties as the train-


ing set (i.e., wavelength range, dispersion, etc.). 

3. An acceptable reconstruction error threshold is a subjective decision that the 

user must make. It can only be determined through examination of the filtering 

results, and prior experience. 

4. A visual inspection of data below the acceptable error threshold is still required 

to ensure the correct extraction of candidate objects from undesired but spectroscopically 

similar objects. 

The diversity of real-world data makes decisive filtering a very hard problem, but 

the PCA filter presented here is able to reduce the search space by at least an order of 

magnitude, making the job of visual inspection a lot more tractable. 


Chapter 5 

Application I - SDSS Hot 

Subdwarfs 

Having established a set of tools in Chapters 2 to 4 for data mining large sets of astronomical 

spectra, they are now applied in unison to extract and analyse hot subdwarf 

candidates from the Sloan Digital Sky Survey. 

Firstly, a set of search criteria based on SDSS photometric colours is devised to 

obtain a data set which excludes most of the horizontal branch stars encountered in 

the previous chapter. This data set is then filtered with the aid of the PCA filter, and 

pre-processed before being fed into the analysis pipeline for classification and parameterisation. 

5.1 Search Criteria And Data Sets 

After the work of Harris et al. (2003) and Kleinman et al. (2004) (based on the photometric 

simulations of Fan 1999), a search was made of the SDSS Data Release 3 

database using the following selection criteria of SDSS ugriz point spread function 

colour magnitudes, 

107



FROM BESTDR3..SpecPhotoAll as s 

WHERE s.psfMag_u < 21 

AND (s.psfMag_u - s.psfMag_g) < 0.7 

AND (s.psfMag_g - s.psfMag_r) < -0.1 

AND s.specClass dbo.fSpecClass(’QSO’) 

For completeness, the spectra chosen by the SDSS as their hot standards were also 

retrieved using a separate query, 

5.1. 


FROM BESTDR3..SpecObj as s 

WHERE s.objType = dbo.fObjType(’HOT_STD’) 

The total data quantities retrieved by these two queries are summarised in Table 

Data Set Spectra 

Retrieved 

Colour-Colour 6539 

Hot Standards 1411 

Total 7950 

(6764 Unique) 

Table 5.1: Summary of data quantities obtained from the SDSS DR3. 

5.2 PCA Filtering 

The PCA filter from Chapter 4 was applied to the 6764 unique spectra obtained from 

the SDSS. The SDSS-normalised spectrum was extracted from the each of the downloaded 

FITS files, and velocity corrected using the SDSS-derived redshift stored in 

each file’s FITS header. The histogram of reconstruction errors is plotted in Figure 

5.1. 

The large quantity of spectra located at the error bin R ≈ 2.46 are blank – the 

normalised flux level is constant at 1.0 for all wavelengths. This is due to the rebinning

5.2 PCA Filtering 109 

500 

450 

400 

350 

Number of Spectra 

300 

250 

200 

150 

100 

50 

0 

35.00 


Figure 5.1: Histogram of reconstruction errors for the colour-colour selected SDSS 

sample. 

routine’s default behaviour of assigning a flux value of 1.0 to those wavelengths where 

no flux information is available for interpolation. In this case, the spectra in question 

seem to originally cover a lower wavelength range than the chosen 4050–4950 Å 

range. 

Otherwise, visual examination of the error bins reveals that all of the hot subdwarf 

candidates of reasonable S/N are located below a reconstruction error level of R ≤ 6.4, 

and are mixed in with many white dwarf and blue horizontal branch spectra which are 

hard to separate out because they often show almost no spectral features which allow 

the PCA filter to clearly distinguish them from hot subdwarf candidates. At R > 6.4, 

the error bins are almost entirely comprised of various types of white dwarfs, with only 

a few very low S/N hot subdwarf candidates. 

Selecting all those spectra with reconstruction errors R ≤ 6.4 yields 817 samples, 

approximately 400 of which are the “blank” spectra discussed previously. Removing 



them left a final set of 400 spectra which were manually processed to select the hot 

subdwarf candidates from amidst the white dwarfs. This proceeded quickly as white 

dwarf spectra are quite distinct. 

A final data set of 282 hot subdwarf candidates was obtained. 

5.3 Analysis 

The SDSS-normalised spectra are created by fitting a pseudo-continuum using a median/mean 

filter. A sliding window is created of length 300 pixels for stars, and a 

set of reference lines are used to mask out major absorption features by excluding 

pixels closer than 8 pixels to any reference line. The remaining pixels are ordered, 

and the values between to 40th and 60th percentile are averaged to give the pseudocontinuum. 

However, this pseudo-continuum tends to underfit the real continuum for the higherorder 

Balmer lines, with blending between the broad wings pulling the pseudo-continuum 

down. Although the SDSS-normalised spectra are sufficient for the coarse filtering performed 

by the PCA filter, the underfitting associated with the pseudo-continuum makes 

them unsuitable for use in classification or parameterisation. 

Instead, the SDSS-calibrated spectra of the 282 hot subdwarf candidates were renormalised 

using an automated method based on cubic spline fitting, after having been 

velocity corrected, again, using the SDSS redshifts. Each spectrum was then resampled 

onto the common wavelength grid of 4050–4950 Å at a sampling of 1Å pixel−1 , ready 

for analysis by the classification neural network and SFIT. 

Physical parameters in T eff , log g, and log(n He /n H ) were derived by fitting each 

spectrum to a large grid of 2426 LTE model spectra generated using STERNE and 

SPECTRUM. Details of the grid are summarised in Table 5.2.

5.4 Results 111 


T eff (kK) 8.0, 9.0, 10.0, 12.0, 14.0, 15.0, 16.0, 18.0, 20.0, 22.0, 

24.0 25.0, 26.0, 28.0, 30.0, 32.0, 34.0, 35.0, 36.0, 38.0, 

40.0, 45.0, 50.0 

log g 2.50, 3.00, 3.50, 4.00, 4.50, 5.00, 5.50, 6.00 

n He 0.001, 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 0.99, 0.999 

Table 5.2: The model grid used to obtain physical parameters from the SDSS hot 

subdwarf candidates. 

. 

5.4 Results 

The results of both classification and parameterisation are presented in Figures 5.2-5.8, 

and tabulated in Appendix B. 

5.4.1 Parameterisation 

A number of interesting features are present in the diagrams of Figure 5.2. Most 

prominent in the log g–T eff plot is the low density region centred at T eff ≈ 22,500K. 

Figure 5.4 overlays Figure 5.2 with density estimate contours which better illustrate 

the presence of the gap. 

This low density region appears to separate the blue horizontal branch stars from 

the extended horizontal branch. However, it occurs at the same position as the zero-age 

main sequence, so could it be the result of selection effects? The answer is probably no 

because an early B-type main sequence star with an apparent magnitude of m v = 15, 

similar to the stars in the hot subdwarf sample, and an absolute magnitude of M V = 

−2.4, would be located ∼ 30kpc away out of the plane of the galaxy. The existence of 

such a star at this position is unlikely. 

The same low density region was also observed by Green et al. (2006) and Saffer 

et al. (1994), and corresponds with the second gap indentified in observations of blue 

halo stars by Newell (1973). 



2 

3 

4 

log g 

ZAMS 

5 

ZAHB 

6 

He-MS 

7 

50000 

40000 

30000 

Effective Temperature (K) 

20000 

10000 

3 

2 

1 

log( nHe / nH ) 

0 

-1 

-2 

-3 

-4 

50000 

40000 

30000 


20000 

10000 

Figure 5.2: Parameterisation results of the 282 SDSS hot subdwarf candidates. The 

helium main sequence of Paczyński (1971), and post-EHB evolutionary tracks of Dorman 

et al. (1993) are also plotted.


1.0 

0.8 

0.6 

sdO4VII:He26 50654, 6.001, -0.913 

1.0 

0.8 

0.6 

sdB1VI:He29 34502, 5.581, -0.568 

1.0 

0.5 

0.0 

sdB3VI:He2 25219, 5.303, -2.769 

1.0 

0.5 

sdB7III:He2 12653, 3.342, -3.004 

0.0 

4000 4200 4400 4600 4800 5000 


Figure 5.3: Four example fits from the 282 SDSS hot subdwarfs. The classification 

and physical parameters (T eff (K), log g, log(n He /n H )) obtained for each star are printed 

in the lower corners of each plot. 

Heber et al. (1984) and Newell (1973) propose evolutionary explanations for this 

gap based on variations in hydrogen envelope mass along the horizontal branch, but 

this was before the discovery that possibly 2/3 of the sdB stars blueward of the gap 

are short-period binaries (Maxted et al., 2001) (and therefore products of the common 

envelope binary evolutionary channel). 

Monte Carlo simulations of single star evolution on the extended horizontal branch, 

carried out at St. Andrews (Jeffery & Jardine 1984, unpublished), did not reveal 

the existence of such a gap. It is therefore our hypothesis that the second gap of 



2 

3 

4 

log g 

ZAMS 

5 

ZAHB 

6 

He-MS 

7 

50000 

40000 

30000 


20000 

10000 

3 

2 

1 


0 

-1 

-2 

-3 

-4 

50000 

40000 

30000 


20000 

10000 

Figure 5.4: The results of applying a kernel density estimate analysis to the data 

from Figure 5.2. The low-density at T eff ≈ 22,500K is prominent, along with another 

possible low-density region at T eff ≈ 41,000K.


Newell (1973) reflects differing evolutionary scenarios for blue horizontal branch stars 

and extended horizontal branch stars, primarily that subdwarf B stars result from 

common-envelope binary evolution. 

In the single star evolution hypothesis, a strong stellar wind on the RGB is believed 

to occur, but which fails to remove the entire outer hydrogen envelope before the helium 

core flash takes place. After the helium flash, a star evolves to the horizontal branch. 

The distribution of stellar masses along the horizontal branch must be continuous because 

evolutionary models do not predict gaps if the factors affecting mass loss in single 

stars (e.g., metallicity, rotation rate, magnetic field strength, etc.) are not discrete. 

In the binary star evolution scenario, most of the hydrogen-rich envelope is removed 

(either by Roche Lobe overflow, or by a common envelope phase) at the tip of the RGB, 

meaning that evolution proceeds to the blue end of the horizontal branch. The distribution 

of post-common envelope binaries is not continuous because a partial removal 

of the hydrogen envelope does not occur. 

The second feature of interest in Figure 5.2 is the cluster of stars at T eff ≈ 44,000K, 

log g = 5.7. The clump is also noticable in the log(n He /n H )–T eff plot in Figure 5.2 as 

the group of extremely helium rich stars at log(n He /n H ) ≈ 1.2. Heber et al. (2006), in 

a spectral analysis of sdO stars selected from the Supernova Ia Progenitor Survey, the 

Hamburg Quasar Survey, and the SDSS, show a similar clustering at the same location 

on their log g–T eff diagram. 

The log(n He /n H )–T eff diagram in Figure 5.2 shows that the majority of the stars in 

the sample have helium deficient atmospheres (less than 0.5 times the solar abundance). 

This has been attributed to diffusion and gravitational settling processes at work in 

the extended horizontal branch stars (Wesemael et al., 1982). 

For 28,000K ≤ T eff ≤ 40,000K, a correlation between helium abundance and T eff 

can be seen, with the helium abundance increasing with temperature. The same phenomenon 

was reported by Edelmann et al. (2003) in their analysis of sdBs from the 



Hamburg Quasar Survey, and Saffer et al. (1994) in a study of 92 field sdBs drawn 

largely from the PG catalogue. Both studies also report the existence of two sequences 

in the correlation, with a smaller fraction of stars having lower helium abundances at 

the same temperatures than the bulk of the sdBs. There is evidence to suggest the 

existence of these two sequences in Figure 5.2. Heber et al. (2006) also expand on this 

phenomenon by showing that the “cooler” sdO stars in their sample adhere to two 

distinct sequences, and extend the trend to higher T eff . 

The band of stars evident at log(n He /n H ) = −3 corresponds to the boundary of the 

model grid used in the analysis. 

5.4.2 Classification 

The neural network classification results of the 282 hot subdwarf candidates are shown 

in Figure 5.5. Although the neural network gives real-value outputs for each classification 

parameter, these have been rounded to their closest value on the discrete Drilling 

et al. (2006) system to reflect how a human classifier would use the system. 

A correlation can be seen between luminosity class and spectral type, with luminosity 

decreasing as spectral type progresses from O to A. As the physical analogues 

to luminosity and spectral type are log g and T eff respectively, this trend mirrors that 

found in the log g–T eff plot of Figure 5.2. 

From the plot of helium class against spectral type, it can be seen that the stars 

in the sample are either helium poor or helium rich. There is a group of early-type 

sdBs showing a higher helium class than the bulk of such stars at the same spectral 

type. These are most likely the interesting subset hot subdwarf stars known as He-sdBs 

(Jeffery et al., 1996; Ahmad, 2004). 

Figure 5.6 gives a comparison of the neural network classification results with the 

distribution of stars originally classified by Drilling et al. (2006) in their paper. The


0 

I 

II 


III 

IV 

V 

VI 

VII 

VIII 

IX 

O 

O5 

B 

B5 

A 


40 

30 

Helium Class 

20 

10 

0 

O 

O5 

B 

B5 

A 


Figure 5.5: Classification results of the 282 SDSS hot subdwarf candidates. Points 

have been given small random offsets in each axis for clarity. 



40 

A 

30 

B5 

20 

10 

0 

A 

O 

O5 B B5 


A 

40 

30 

20 

10 

0 

B5 

B 


O 

O5 

Helium Class 

Helium Class 

0 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

IX 

O 

O5 B B5 


A 

0 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

IX 

B 

O 

O5 




Figure 5.6: A comparison of the ANN classifications of the 282 SDSS hot subdwarf 

candidates (left-most plots) with all the stars classified by Drilling et al. (2006) (rightmost 

plots). Points have been given small random offsets in each axis for clarity.


50000 


40000 

30000 

20000 

10000 

O 

O5 

B 


B5 

A 

7 

6 

5 

log g 

4 

3 

2 

0 

I 

II 

III 

IV V VI 


VII 

VIII 

IX 

3 

2 


1 

0 

-1 

-2 

-3 

0 

10 

20 

Helium Class 

30 

40 

Figure 5.7: A calibration of the ANN classifications onto the Drilling et al. (2006) 

system using the 282 SDSS hot subdwarf candidates. 



trends in the two distributions are similar if one takes into account the differing sample 

sizes. 

One feature of interest in the luminosity class–spectral type plot of the Drilling 

et al. (2006) data is the group of high-luminosity B-type giant stars. These correspond 

with a group of MK stars used by Drilling et al. (2006) to interface their hot subdwarf 

classification system with the MK system. In the corresponding plot for the 282 SDSS 

hot subdwarfs studied here, no such low luminosity class B-type stars are contained in 

the sample. 

A third-order calibration of the Drilling et al. (2006) classification system is shown 

in Figure 5.7 (i.e., the Drilling et al. (2006) parameters are being correlated to their 

corresponding physical parameters using a sample of spectra that is not comprised of 

the original standard stars, and has not been classified by Drilling et al. or any other 

human trained to use the Drilling et al. (2006) scale). 

Although a linear correlation can be discerned between T eff vs. spectral type, and 

log(n He /n H ) vs. helium class, the correlations are quite poor. This could be due to 

systematic noise introduced during the renormalisation of the SDSS data, and may 

also signify that the neural network is having difficulty interpolating in regions not 

well represented by the original Drilling et al. (2006) training data (Figure 2.1 shows 

two low-density regions around spectral types O5 and B5, which is where the most 

“confusion” is seen in the correlation of Figure 5.7). 

Despite the noise, the log(n He /n H ) vs. helium class plot still follows the trend of 

Figure 14 of Drilling et al. (2006). 

Between log g and luminosity class, no significant correlation can be seen. This is 

due to the majority of subdwarfs residing in the luminosity classes VI and VII, and 

between log g values of 5.0 and 6.0. The seemingly bi-modal distribution of this plot 

corresponds to the separation between the lower-T eff , lower-log g BHB stars in the SDSS 

sample, and the higher-T eff , higher-log g subdwarfs. It is impossible to constrain any


25 

20 

Stars Per Bin 

15 

10 

5 

0 

-600 -400 -200 0 200 400 600 

Redshift (Km s -1 ) 

Figure 5.8: The distribution of SDSS-derived redshifts of the 282 hot subdwarf candidates. 

linear fit to the distribution due to the under-representation of the lower-log g, higher 

luminosity class region. The concentration of points in luminosity classes VI and VII 

reflect a similar pattern observed in Figure 15 of Drilling et al. (2006). 

5.4.3 Radial Velocities 

As an interesing aside, the radial velocities of the 282 hot subdwarf candidates, as 

measured by the SDSS, are plotted in Figure 5.8. The errors in the radial velocities are 

of the order of 30kms −1 . Several studies of the kinematical behaviour of hot subdwarfs 

have been conducted in the past, e.g., Altmann et al. (2004), Maxted et al. (2001), de 

Boer et al. (1997), Colin et al. (1994). 

Altmann et al. (2004) point out that short-period sdB binaries could exhibit orbital 

velocities in excess of 200kms −1 , but with most being of the order of 50kms −1 or less. 



Based on the parameterisation and classification results of the hot subdwarf sample 

studied here, it is clear that the majority of the sample are sdBs, and, consequently, 

possibly short-period binaries (see also Maxted et al. 2001). 

As the SDSS observes out of the galactic plane, most of the hot subdwarf candidates 

will be either thick disk, or halo objects with greater radial velocities due to their orbits 

not conforming with the local standard of rest (see Altmann et al. 2004). There are 

a few objects in the hot subdwarf sample with velocities cz > ±400kms −1 . Although 

these velocities are unverified and could be anomalous, they are greater than what can 

be accounted for by the previously outlined mechanisms. As such, they are of interest 

for further study (e.g., Hirsch et al. 2005). 

5.5 Sources of Error 

The results of this chapter are affected by a number of error sources. The issues of 

primary concern are systematic errors arising from the internal accuracy of the tools 

themselves, whether the training data for the tools are representative of the application 

domain, the assumptions used in generating the model spectra, and random errors in 

the application spectra along with systematic errors introduced during the observation 

and reduction stage. 

In terms of the physical parameters derived using SFIT, SFIT produces standard 

errors for each parameter it fits based on the curvature of the χ 2 function in the region 

of parameter space about the located minimum. These errors give an indication of 

the internal accuracy of the fittin method, with the χ 2 function giving an indication 

of the goodness-of-fit. At the boundaries of the grid, where the curvature is difficult 

to estimate, or in regions of low curvature, the standard errors may not be as useful a 

measure of SFIT’s internal uncertainty. 

A major error source is the grid of theoretical models to which observations are fit. 

Here, models have been used which assume a stellar atmosphere that is plane-parallel,

5.6 Analysis of PCA Filter Efficiency 123 

and in local thermal, radiative and hydrostatic equilibrium. Opacities are modelled 

using opacity distribution functions, which differs fundamentally from the methods 

used in stellar atmospheres that do not make the LTE assumption. It is known that 

the LTE approximation is good up to 40,000K, after which NLTE effects become more 

significant. There is also the question of whether or not the inclusion of physical effects, 

such as magnetic fields, is an important issue. 

Within SFIT itself, the assumption is made that changes in the physical parameters 

of a model have a corresponding linear effect on the flux distribution. It is known from 

theory that changes in the physical parameters have a nonlinear effect on the flux 

distribution, but a trade-off must be made between accuracy and efficiency, expecially 

in a data mining context. 

Other sources of error, such as from the SDSS observation and reduction pipeline 

or the hot subdwarf classification standards obtained from Drilling et al. (2006), are 

difficult to quantify. For the same reason, discussion of errors arising from models is 

a complicated topic and beyond the scope of this thesis. However, see, for example, 

Behara & Jeffery (2006) for an investigation of the influence of improving the opacities 

used in the models. 

Nevertheless, the issue of the robustness of the results presented in this chapter 

(and also the conclusions which are drawn from the results) is very important, but 

quantifying the influence of all the possible error sources requires further investigation. 

5.6 Analysis of PCA Filter Efficiency 

Figure 5.9 gives some examples of the BHB and white dwarf contaminants mentioned 

earlier in the chapter. In cases B, C, and D, the differences between the original 

spectrum and its reconstruction are not sufficient to produce a reconstruction error 

greater than the chosen threshold of 6.4. In case A, the BHB star, the reconstruction 

matches the original spectrum very closely, except for a slight difference in Hδ. Physical 



parameters obtained for this star using SFIT show that it is too cool to be a subdwarf 

(T eff = 12,000K,log g = 3.42,n He = 0.004). 

The simple RMS error calculation of Equation 4.12 yields the scaled RMS difference 

between each flux point of the original spectrum and its PCA reconstruction. 

Clearly, then, for such small differences this error metric is not sensitive enough to 

filter out the BHB and white dwarf contaminants. This limitation of the PCA filter 

could be dimished by further developing the reconstruction error calculation to include 

a weighting scheme that gives more significance to the spectral lines and features commonly 

found in the objects under investigation. A disadvantage to this approach is that 

a the weighting scheme must be crafted and optimised manually to suit the quirks of 

the PCA filter and spectral features of the target objects. A more robust error metric 

that does not require user input is a topic for future work. 

Quantitative Estimation of Filter Efficiency 

To give an estimate of the success (and failure) of the PCA filter as deployed in this 

chapter, the word “success” needs to be more clearly defined. 

Based on the results plotted in Figure 5.2, the assumption can be made that 

most subdwarfs in the SDSS sample lie, with good probability, in a region T eff ≥ 

23,000K,log g ≥ 4.7, as demonstrated in Figure 5.10. 

For any chosen value of R for the reconstruction error threshold, stars with a reconstruction 

error and parameters inside this region will be assumed to be true positives, 

i.e., actual subdwarfs that the filter has successfully separated out. False positives are 

those stars which are within the value of R but lie outside this region, i.e., stars which 

the filter should have excluded but didn’t. True negatives lie both outside the shaded 

region and beyond the threshold of R. And, finally, false negatives lie within the shaded 

region but are outside of the filter’s error threshold.


1.5 

1.0 

0.5 

4900 

4100 

4500 

4500 

4500 

4500 

J220403.45+122507.3 

1.5 

1.0 

0.5 

1.5 

A 2.30372 

4900 

4100 

J213301.41+122831.1 

1.0 

0.5 

1.0 

0.9 

J135532.42+001124.0 

B 2.78036 

4900 

4100 

J101805.04+011123.5 

C 3.34386 

D 3.90771 

4100 

4900 

Figure 5.9: Examples of white dwarf and BHB contaminants. A - BHB star with 

deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines due to high 

surface gravity. C - DB white dwarf. D - Uncertain (some evidence of weak carbon 

absorption, so possibly a DQ white dwarf). 



2 

3 

4 

log g 

ZAMS 

5 

ZAHB 

He-MS 

6 

7 

50000 

40000 

30000 


20000 

10000 

Figure 5.10: This gray-shaded region of the log g–T eff plane represents an area of good 

probability that the stars within it are subdwarfs. 

Using these definitions, the PCA filter’s efficiency can be quantitatively stated for 

any value of R. Of course, the assumption is that every star passing through the filter 

has values for T eff and log g. Estimates of these parameters for the SDSS sample were 

obtained by applying SFIT to the whole data set. 

The quantitative measures used are the percentage rate of true positives (which 

measures how successful the PCA filter is, according to the aforementioned definition 

of “success”), 

TPRate = 

TP 

× 100% (5.1) 

TP + FN 

where TP is the number of true positives and FN the number of false negatives, and 

also the rate of false positives (which measures how often the filter fails),


100 

TP Rate 

FP Rate 

TP - FP 

80 

Percentage - % 

60 

40 

20 

0 

0 10 20 30 40 50 


Figure 5.11: TP rates (red) and FP rates (blue) of the PCA filter as a function of the 

reconstruction error threshold, R. The green curve is the difference between the TP 

and FP rates. 

FPRate = 

FP 

× 100% (5.2) 

FP + TN 

where FP is the number of false positives, and TN is the number of true negatives. 

Figure 5.11 shows how the TP and FP rates vary as a function of R in the application 

to the SDSS data set. The rate of true positives increases rapidly until R ∼ 10 after 

which it begins to level off. The percentage of false positives increases slowly until 

R ∼ 5.5. From this point until R ∼ 13 the filter begins to produce false positives at 

the maximum rate before starting to level off. At R ∼ 28, the rate of false positives 

surpasses that of true positives meaning that the filter now fails more than it succeeds. 

An idea of the optimum value for R can be determined by plotting the difference 

between the rates of true positives and false positives for each R. This is the green curve 

in Figure 5.11. There is a noticeable and very definite peak. Figure 5.12 shows a close 



100 

TP Rate 

FP Rate 

TP - FP 

80 

Percentage - % 

60 

40 

20 

0 

0 

1 

2 

3 

4 5 6 


7 

8 

9 

10 

Figure 5.12: A closer examination of the TP and FP rates. The peak in the green 

TP-FP curve occurs at R ∼ 7.0 and signifies the optimum value for R in the SDSS 

sample. 

up view of the region of this peak, which occurs at R ∼ 7.0. At this error threshold, 

the PCA filter is producing the maximum number of true positives compared to false 

positives. In other words, this is the optimum value of R for this particular application. 

This compares favourably with the chosen reconstruction error threshold of R ≤ 6.4 

reported in section 5.2. 

It should be pointed out that there does not seem to be a reliable method for 

determining the optimal threshold value of R for a filter and data set, a priori, without 

first establishing at least a rough estimate of physical or classification parameters. If 

the PCA filter (which is fast in its operation) was paired with a parameterising neural 

network or a fast nearest neighbour χ 2 fitting, then an estimate of the optimal PCA 

error threshold could be obtained using the same method as above.


5.7 Summary 

The tools developed in Chapters 2 to 4 have been deployed on a real-world data set 

with some interesting outcomes. The hot subdwarf candidates extracted from the 

SDSS represent a completely homogeneous set, and their analysis evidences several 

unexplained phenomena: 

1. Existence of the second horizontal branch gap of Newell (1973) at T eff ≈ 22,500K. 

2. Two sdB n He –T eff sequences, also observed by Edelmann et al. (2003). 

3. A clustering of hot, helium rich sdO stars at T eff ≈ 44,000K, log g = 5.7, also 

observed by Heber et al. (2006). 

These results reiterate the challenge to provide evolutionary explanations for the variety 

of stars present on the extended horizontal branch, and the subsequent importance 

of continuing research into hot subdwarfs. 


Chapter 6 

Application II - Other Data Sets 

The work presented in this chapter details the application of the analysis pipeline to 

three smaller data sets obtained in collaboration with others in the field. This reflects 

the situation described in Chapter 1 regarding the heterogenous data sets amassed 

by various ground-based observatories. When data from these observatories are made 

available, robust tools will be needed to process them into a homogeneous form, and 

provide fast analyses. 

6.1 2MASS-Selected Sample 

A preliminary analysis of the 282 SDSS hot subdwarf candidates in the previous chapter 

was presented at the Second Meeting on Hot Subdwarfs and Related Objects in La 

Palma, June 2005. As a result of this conference, E. M. Green provided the author 

with a sample of high S/N, low-resolution spectra selected from 2MASS 1 photometry 

(see Green et al. 2006) to be classified and parameterised with the tools developed in 

this thesis. 

83 2MASS-selected spectra were made available with an average S/N of about 133, 

1 http://www.ipac.caltech.edu/2mass 

131


but varying as high as 273 and as low as 70. The wavelength range covered is 3615– 

6900 Å at a resolution of R ≈ 922. 

Spectra for two known stars, Balloon 090900004 and BD+48 2721, were also supplied 

along with physcial parameters (T eff , log g, log(n He /n H )) obtained using NLTE 

model atmospheres (H+He, zero metal). The purpose of these stars is to provide a 

temperature calibration for the hot and cool ends of the sdOB sequence, so that the 

parameterisation results obtained with SFIT (and LTE model atmospheres) can be 

compared with those derived from other model atmospheres. 

All of the spectra were previously flux and wavelength calibrated. Normalisation 

was carried out using a cubic spline fitting routine, and the spectra were then resampled 

onto a common wavelength grid of 4050–4950 Å at a sampling of 1 Å pixel−1 . Radial 

velocities were corrected for by cross correlating each spectrum with a grid of 101 

theoretical models coarsely varying over T eff , log g, and log(n He /n H ). 

During this pre-processing stage, it was discovered that two of the stars in the sample 

were white dwarfs, so they were excluded from any further analysis. Application of the 

PCA filter of Chapter 4 was deemed unnecessary given the small sample size. 

Analysis And Results 

Classification and parameterisation on the final 83 stars was carried out using the 

classification neural network of Chapter 2, and SFIT using the same grid of models as 

in Chapter 5 (Table 5.2). Results are plotted in Figures 6.1 and 6.2, and tabulated in 

Appendix C. 

The parameterisation results of the two calibration stars, Balloon 090900004 and 

BD+48 2721, are given in Table 6.1. Small differences exist between the parameters for 

both stars, with the hotter star, Balloon 090900004, showing a temperature difference 

of ∼ 9700K. This is not unexpected considering the inherent differences between the

6.1 2MASS-Selected Sample 133 

LTE and NLTE approaches. 

Identifier NLTE LTE 

T eff (K) 23017 (248) 22979 (240) 

BD+48 2721 log g 5.035 (0.028) 5.267 (0.032) 

log(n He /n H ) -2.135 (0.022) -1.629 (0.018) 

T eff (K) 40897 (248) 31147 (278) 

Balloon 090900004 log g 5.369 (0.022) 4.757 (0.054) 

log(n He /n H ) -2.842 (0.046) -1.811 (0.056) 

Table 6.1: Parameters of the two calibration stars as obtained by χ 2 -fitting to NLTE 

(Green et al., 2006) and LTE (Armagh) model atmospheres. Formal errors are given 

in parentheses. 

The parameterisation results of Figure 6.1 show distributions with some similarity 

to those of the SDSS hot subdwarf candidates in Figure 5.2. The second gap of Newell 

(1973) seems to be present at T eff ≈ 23,000K (however, it is unsure if Green’s sample 

suffers from any selection effects). Some main sequence late-type B and A stars appear 

to be present in the sample. 

The log(n He /n H )–T eff results in Figure 6.1 show the atmospheric helium deficiency 

of the sdB stars, and the cluster of blue horizontal branch stars with normal helium 

abundances. The main sequence stars present in the sample can be seen again as the 

low temperature, hydrogen-rich data points. Not enough sdB stars are present in the 

sample to confirm any correlation between helium abundance and T eff , although such 

a correlation appears to be suggested by the results. 

The distribution of classifications in Figure 6.2 again shows some similarity to that 

of the SDSS hot subdwarf candidates in Figure 5.5. Not plotted in Figure 6.2 are the 

late-A and early-F spectral classifications assigned to some stars by the neural network. 

The parameterisation results suggest the existence of such stars in the sample, but it 

is of interest that the neural network would distinguish and assign them (unreliable) 

classes for which no samples were present in the training data. Figure 6.3 plots these 

stars. The deep and broad hydrogen Balmer lines correspond with the late-A and 

early-F spectral types. This would seem to demonstrate that the neural network has 

very good generalisation properties. 



2 

3 

log g 

4 

ZAMS 

5 

ZAHB 

He-MS 

6 

7 

50000 

40000 

30000 

20000 

10000 


2 

1 


0 

-1 

-2 

-3 

-4 

50000 

40000 

30000 

20000 

10000 


Figure 6.1: SFIT physical parameters for 2MASS-selected sample. The helium main 

sequence of Paczyński (1971), and post-EHB evolutionary tracks of Dorman et al. 

(1993) are also plotted.

6.1 2MASS-Selected Sample 135 

0 

I 

II 


III 

IV 

V 

VI 

VII 

VIII 

IX 

O O5 B B5 A 


40 

30 

Helium Class 

20 

10 

0 

O O5 B B5 A 


Figure 6.2: ANN classification for 2MASS-selected sample. Points have been given 

small random offsets in each axis for clarity. 



6 

J143155.30+172404.9 

sdA7V:He3 

5.5 

5 

J095854.23+360314.3 

sdF8VI:He2 

4.5 

Flux (continuum = 1) + const. 

4 

3.5 

3 

2.5 

J114454.50+031550.2 

J112832.64+603859.3 

sdA7V:He4 

sdF5V:He3 

2 

J111819.13+093144.4 

sdA2V:He5 

1.5 

1 

J083127.37+422201.7 

sdA5VI:He2 

0.5 

0 

4100 

4500 

4900 


Figure 6.3: The stars assigned late-A and early-F spectral types by the neural network.

6.2 SDSS sdB-He Stars of Harris et al. (2003) 137 

6.2 SDSS sdB-He Stars of Harris et al. (2003) 

In collaboration with Ahmad (Ahmad et al., 2006) the classification neural network 

was used to classify a small set of “helium-rich” sdB-He stars obtained from the SDSS 

by Harris et al. (2003). Results of this analysis, along with helium abundances derived 

by Ahmad using SFIT and a grid of LTE model atmospheres, are presented in Table 

6.2. 

SDSS Identifier n He ANN Class 

J094044.08+004759 0.16 sdB0VIII:He23 

J113840.69-003531 0.01 sdB3V:He1 

J124346.38+002534 0.05 sdB1V:He23 

J125410.86-010408 0.01 sdB3III:He5 

J131745.80+010450 0.01 sdB0VI:He3 

J134545.24-000641 0.15 sdO9VII:He21 

J134635.68-001804 0.09 sdA2IV:He0 

J135707.35+010454 0.36 sdO6VII:He30 

J141556.68-005814 0.21 sdB8VI:He14 

J143917.64+010251 0.01 sdB6V:He3 

J144514.93+000249 0.02 sdB1VII:He11 

J152708.31+003308 0.45 sdO9VIII:He35 

J152905.62+002137 0.06 sdO9VII:He10 

J154238.43-003758 0.07 sdA2III:He2 

Table 6.2: Classification results for the sdB-He stars of Harris et al. (2003). 

The aim of this work was to determine if the sdB-He stars of Harris et al. (2003) are 

similar to He-sdB stars (see Ahmad 2004) as this would increase the number of known 

helium-rich subdwarfs for further study. 

However, it is clear from the classification and parameterisation results obtained 

that most of the sdB-He stars show very little helium enrichment, with half of the 

stars in the sample having surface gravities too low to be subdwarfs (Ahmad, private 

communication). Out of the remaining subdwarfs, only a handful are helium rich (i.e. 

having n He ≥ 0.10, or He class > 20). 



6.3 Ahmad & Jeffery (2003) He-sdBs 

Ahmad & Jeffery (2003) undertook the first systematic study of a set of helium-rich 

subdwarf B stars, obtaining observations and physical parameters for 17 targets. 

These stars have been previously classified by Drilling et al. (2006) using observations 

from different sources. As such, the re-classification of these stars by the neural network 

in Chapter 2, using the new observations of Ahmad & Jeffery (2003), presents an 

opportunity to verify the neural network’s performance. 

Ahmad & Jeffery (2003) observed the targets over a variety of wavelength ranges 

between 3900 and 5000 Å, with the spectra being bias corrected, flat-fielded, sky subtracted, 

and wavelength calibrated using standard procedures. All spectra were normalised 

by defining a smooth polynomial continuum from sections of local continuum, 

with care being taken to avoid the wings of broad absorption lines. 

Before passing the spectra to the neural network, they were rebinned onto the common 

wavelength grid of 4050–4950 Å at a sampling of 1 Å pixel−1 . Any wavelength bins 

in this grid for which no flux data were available in the original observations (i.e., in 

the case of a short spectrum) were automatically assigned a flux value of 1.0. 

The results are presented in Table 6.3, with a graphical comparison between the 

neural network classifications and those of Drilling et al. (2006) plotted in Figure 6.4. 

Although the sample is limited in distribution in the classification parameter space, 

a good agreement can be seen between the neural network and Drilling et al. (2006), 

providing confirmation of the work presented in Chapter 2. 

6.4 Summary 

The application of the analytical tools developed in previous chapters to a collection 

of small data sets from different sources highlights their versatility and usefulness.


40 


30 

20 

10 

10 20 30 40 


IV 

V 


VI 

VII 

VIII 

IX 

IX 

VIII 

VII 

VI 


V 

IV 

B5 


B 

O5 

O5 B B5 


Figure 6.4: Comparison of ANN classifications with those of Drilling et al. (2006) 

for the 17 He-sdBs of Ahmad & Jeffery (2003). Points have been given small random 

offsets in each axis for clarity. Also plotted is the best fit least squares regression line 

with error bars showing the RMS of the residuals. 



Identifier Drilling Class ANN Class 

HS1000+471 sdBC0.2VII:He28 sdB0VII:He29 

HS1844+637 sdB1VII:He39 sdB2VII:He37 

LSIV-14 116 sdB0.2VII:He17 sdB0VIII:He20 

PG0229+064 sdB3V:He13 sdB4V:He18 

PG0240+046 sdBC0.2VII:He24 sdB2VII:He28 

PG0902+057 sdB0VII:He38 sdO9VII:He35 

PG1127+019 sdOC9VII:He40 sdO8VI:He41 

PG1415+492 sdBC1VI:He39 sdB0VI:He38 

PG1544+488 sdBC1VIII:He39 sdB0VII:He37 

PG1554+408 sdB0.2VII:He39 sdB0VII:He36 

PG1600+171 sdOC8.5VII:He39 sdO8VI:He37 

PG1615+413 sdB1VII:He37 sdB2VII:He34 

PG1658+273 sdOC9.5VII:He39 sdO8VII:He40 

PG1715+273 sdB1VII:He37 sdO5VIII:He36 

PG2258+155 sdB0.2VII:He39 sdB1VII:He35 

PG2321+214 sdB0VII:He37 sdB2VII:He37 

TON107 sdBC0.5VII:He28 sdB1VII:He27 

Table 6.3: Classification results for the Ahmad & Jeffery (2003) He-sdBs. 

The results of the 2MASS-selected sample appear to confirm the findings of Green 

et al. (2006), and lend support to the results described in the previous chapter. Before 

the evolutionary details causing the observed distributions can be understood, 

additional data, e.g., stellar masses, needs to be gathered. 

The application of the classification neural network to the helium-rich subdwarf B 

stars of Harris et al. (2003) highlights the need for a homogeneous classification scheme 

for hot subdwarfs.

Chapter 7 

Conclusions And Future Work 

This project set out to examine the problem of analysing large sets of astronomical 

spectra. Specifically, the intention was to establish a set of tools that can automatically 

extract and analyse the spectra of any type of object from a large database of unknown 

observations, and then apply these tools to a real survey database. 

Analysing large sets of astronomical spectra consists of three core problems: classification, 

physical parameterisation, and the extraction of particular types of objects 

from an unknown data set. 

In this project, classification was tackled by the highly versatile statistical machine 

learning method of artificial neural networks, which has seen widespread use in astronomy. 

Chapter 2 studied the use of ANNs to classify hot subdwarf spectra onto the 

system defined by Drilling et al. (2006). Global errors (σ rms ) on the classifications of 

∼ 2 subtypes for spectral type, ∼ 1 subclass for luminosity class, and ∼ 4 subclasses for 

the helium class were achieved. These errors are in line with the accuracies achieved 

by human classifiers. 

Physical parameters were obtained by fitting observations to grids of theoretical 

models using a χ 2 minimisation procedure. SFIT, the χ 2 minimisation code used at the 

Armagh Observatory, has been improved in Chapter 3 using concepts from the domain 

141


of computational geometry to provide a new methodology for storing and accessing 

arbitrarily large, three-dimensional grids of models, paving the way to extending the 

code to operate in distributed parallel computing environments. 

Locating the spectra of a particular type of object in a large set of unknown observations 

was accomplished using the multivariate statistical technique, Principal Components 

Analysis. Chapter 4 outlined the mechanics of the filter, and demonstrated how 

it was used to extract hot subdwarf spectra from a data set obtained from the SDSS. 

This solution provides a means to reduce unknown data sets to quantities suitable for 

closer visual inspection. 

Collectively, these tools were applied to the archives of the SDSS to extract and 

analyse the spectra of hot subdwarf stars. The PCA filter was able to reduce a set 

of almost 7000 unknown spectra to a collection of approximately 400 samples from 

which 282 hot subdwarf candidates were quickly extracted by visual inspection. The 

classification ANN successfully assigned classes to these stars based on the Drilling et al. 

(2006) system, and physical parameters were derived using SFIT and a grid of LTE 

model atmospheres. The results revealed several unexplained phenomena of extended 

horizontal branch stars, namely, 

1. Existence of the second horizontal branch gap of Newell (1973) at T eff ≈ 22,500K. 

2. Two sdB n He –T eff sequences, also observed by Edelmann et al. (2003). 

3. A clustering of hot, helium rich sdO stars at T eff ≈ 44,000K, log g = 5.7, also 

observed by Heber et al. (2006). 

These findings pose important questions for stellar evolution theory, and represent 

a successful demonstration of what this project set out to achieve.

143 

Future Directions 

Working with the data from the SDSS highlighted a number of improvements that could 

be made to the tools themselves, but several important problems concerning spectral 

analysis and its large-scale application were also made apparent. 

Continuum Normalisation 

One of the most troubling was the normalisation of stellar continua. As noted in Chapter 

5, the SDSS uses a method based on median/mean filtering which tends to underfit 

the continuum in regions where the blending of lines becomes very strong. 

An automatic renormalisation method based on cubic spline fitting was employed 

in Chapter 5 in an attempt to gain a more precise fit to the continuum. This method 

used several sets of pre-programmed wavelength locations as control points for the cubic 

spline fit. The control points in each set were chosen manually by iterative refinement, 

and the different sets essentially conformed to a coarse temperature–abundance classification 

system because different control points were needed for hot, helium-rich stars 

and cooler, helium-poor stars. 

Once the sets of control points were established, the method gave good results 

for the final set of hot subdwarf candidates. Obviously, this particular methodology 

is poorly catered for a general data mining application because it is tied to one 

particular type of object. 

A more robust and general automatic algorithm is required. 

However, this is an extremely difficult problem because such an algorithm must take 

into account many factors: noise, regions where the spectral flux changes rapidly, cosmic 

spikes and other anomalies, and troublesome regions like that of the higher-order 

Balmer lines where the actual continuum runs above the flux information present. An 

acceptable solution will be very hard to come by. 



Data Management 

Another major problem encountered was the management of large data sets. The two 

main issues are storing sets of spectra in a meaningful and easily accessible manner, 

and keeping track of the changes to each spectrum over the course of time. 

Almost 7000 unique spectra were extracted from the SDSS in Chapter 5. Over the 

course of the analysis, the spectra were converted from FITS files to ASCII format, 

filtered, renormalised, velocity corrected, resampled, and collected together into the 

specific formats required by the classification and parameterisation codes. Eventually, 

this trail of data became cumbersome to manage and keep track of as it was replicated 

into different folders and different files across the computer’s file system. There was 

also an unfortunate incident where a badly typed command accidently deleted several 

very important folders of data. 

When the analysis of the 282 hot subdwarfs was complete, the results were stored 

in several ASCII-format files which had to be processed manually in order to correlate 

the classifications of the ANN with the parameters found by SFIT. This led to several 

such files in different folders with no attached information to say when the results were 

obtained, from what data set, using which models, and which ANN. 

Both of these issues highlight the need for a centralised database which can keep 

track of the changes made to the data as an analysis proceeds. Such an idea is already 

widely used in tools to help manage computer software projects (e.g., CVS 1 ). These 

tools record all the changes made to each individual source file, allowing the changes to 

be rolled back to any previous version should something go awry. Auditing analyses of 

astronomical spectra in this manner would bring with it not only data integrity, but a 

trail of operations conducted on the data which could be analysed in detail later should 

an erroneous methodology need verified. 

A centralised database would also allow structured metadata to be recorded con- 

1 http://www.nongnu.org/cvs/

145 

cerning the dates and times of analyses, the tools used and their version numbers, 

the theoretical models used, the date they were generated, and the codes and atomic 

data used to generate them, and so on. Such metadata would prove invaluable if, for 

example, an analysis is revised at a later date. 

Finally, storing results alongside the data in a homogeneous database would greatly 

simplify tasks such as producing plots for publication, applying clustering algorithms 

to automatically look for patterns in the results, and cross-correlating the database 

with other databases accessible over the internet. 

Data Visualisation 

When dealing with large quantities of data, one extremely useful tool is interactive 

visualisation. Being able to graphically represent data in useful ways, and manipulate 

them by way of visualisation, facilitates the process of discovery and understanding. 

When analysing the SDSS data in Chapter 5, the final hot subdwarf sample was manually 

selected from the PCA filtering results. This stage would have proceeded much 

more quickly if a good visualisation tool had been in place. 

In this project, extensive use was made of Gnuplot 2 to visualise spectra. Although 

Gnuplot is an excellent plotting tool, it is not designed for interactive investigation of 

the data being plotted. As such, to visualise the SDSS data, Gnuplot was invoked from 

a script to produce thousands of plots that were subsequently displayed in a series of 

static web pages. Clearly, this is awkward, adding another layer of data management 

to complicate the problems mentioned previously. A better solution is desperately 

needed. 

2 http://www.gnuplot.info/ 



Algorithm Development 

Working with the main analytical tools used in this project showed that they could 

be improved in several ways. The errors obtained for the classifications produced 

by the neural network in Chapter 2 are global estimates based on the leave-one-out 

cross-validation that was carried out. It would be far more useful if proper confidence 

intervals were available for each individual result produced by the ANN. Such confidence 

intervals can be obtained through the bootstrap statistical technique (e.g., Willemsen 

et al., 2005), or Bayesian methods (see Bishop, 1995, sect. 10.2). 

The SFIT model grid indexing and searching methodology in Chapter 3 works well 

for two and three-dimensional grids. Although it was stipulated in that chapter that 

higher dimensional grids are not likely to be used due to the curse of dimensionality, the 

use of four or possibly five-dimensional grids may not be out of the question as computer 

technology continues to improve. In theory, the Delaunay triangulation methodology in 

Chapter 3 could be extended to higher dimensional geometries, but a different approach 

(perhaps the k-D tree-based algorithm discussed in the chapter) may be more flexible 

and less complicated. 

As it stands, SFIT, with the modifications of Chapter 3, is a flexible and robust 

tool for spectral parameterisation. The next step forward is to introduce parallel programming 

constructs to allow its use in a distributed computing environment, such 

as the computing cluster at the Armagh Observatory (see Appendix D), or the Grid. 

Programatically speaking, this is not a very difficult task, but it does require some 

planning. 

The Principal Components Analysis filtering tool of Chapter 4 worked well for the 

application to hot subdwarf spectra. A visual selection process is still required on 

the final filtered data set because precise filtering is a hard problem. Nevertheless, 

future work could help improve the efficiency of the PCA filter perhaps by devising 

a new reconstruction error calculation that is more sensitive to the finer details of

147 

astronomical spectra. 

In the application of the hot subdwarf filter to the data sets obtained from the SDSS 

in Chapters 4 and 5, the filter could have worked better if more weighting was given to 

differences in the cores and wings of spectral lines. This would burden the user with 

supplying some sort of line list giving the wavelengths and perhaps equivalent widths 

of spectral lines to which the error calculation should pay attention, but a little effort 

spent in preparation could save a lot of time when it comes to the visual inspection 

stage. 

The tools used in this project were chosen based on their previous successful applications 

to analysing astronomical spectra, but many other machine learning techniques 

have the potential to be employed (see Russell & Norvig, 2003). Algorithms such as 

the Kohonen self-organising map (Kohonen, 1990; Kohonen et al., 1996), and Bayesian 

probabilistic methods like those embodied in the AutoClass program 3 , can take an unknown 

dataset and automatically derive classes for that set based on the information 

present in the data. This makes them of particular interest for filtering and classification 

problems, and it would be a worthwhile project to investigate their ability in this 

regard. 

Afterword 

As noted in Chapter 1, improvements in observational and information technology 

mean that the amount of data being gathered in astronomy is always increasing. The 

specific result of this thesis is a set of tools which can be used to analyse the very large 

databases that will be generated by new survey projects such as SDSS-II and the GAIA 

space mission. 

The ultimate future goal of the work presented in this thesis is, however, to continue 

the development of the computational framework of Jeffery (2003). This framework 

3 http://ic.arc.nasa.gov/ic/projects/bayes-group/autoclass/ 



incorporates the tool set developed here into a much wider system to analyse and 

manage astronomical data, making use of distributed computing initiatives such as the 

Grid (see Figure 7.1). This system will help us set sail on the seas of astronomical data, 

charting our way into the unknown mysteries of the universe.

On the Automatic Analysis of Stellar Spectra 

Figure 7.1: Schematic diagram showing how the work of this thesis fits in with the 

wider system envisaged by Jeffery (2003). 

Training 

Data 

Unknown 

Data Set 

(eg SDSS) 

ANN 

Classification 

Remote 

Astronomical 

Databases 

(eg Simbad) 

Pre−Processing 

PCA 

Filtering 

Manual 

Selection 

Results 

Database 

Results 

Exploration 

& 

Visualisation 

χ 2 Model 

Fitting 

Distributed 

Computing 

Resources 

Theoretical 

Models 

Database 

Parameter 

Space 

Exploration 

Model 

Generation 

Remote 

Atomic 

Database 

Third−Party 

Codes 

Request 

New 

Data 

R−Matrix II 

Calculation 

149

151

Bibliography 

Ahmad, A. 2004, PhD thesis, The Queen’s University of Belfast 

Ahmad, A. & Jeffery, C. S. 2003, A&A, 402, 335 

Ahmad, A., Winter, C., & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, 

The Proceedings of the 2nd Meeting on Hot Subdwarfs and Related Objects, ed. 

R. H. Østensen, 159–162 

Allende Prieto, C., Rebolo, R., López, R. J. G., Serra-Ricart, M., Beers, T. C., Rossi, 

S., Bonifacio, P., & Molaro, P. 2000, AJ, 120, 1516 

Altmann, M., Edelmann, H., & de Boer, K. S. 2004, A&A, 414, 181 

Bailer-Jones, C. A. L. 1996, PhD thesis, University of Cambridge 

—. 1997, PASP, 109, 932 

Bailer-Jones, C. A. L., Irwin, M., Gilmore, G., & von Hippel, T. 1997, MNRAS, 292, 

157 

Bailer-Jones, C. A. L., Irwin, M., & von Hippel, T. 1998, MNRAS, 298, 361 

Behara, N. T. & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, The 

Proceedings of the 2nd Meeting on Hot Subdwarfs and Related Objects, ed. R. H. 

Østensen, 115–122 

Bishop, C. M. 1995, Neural Networks for Pattern Recognition (Oxford: Oxford University 

Press) 

Brown, T. M., Bowers, C. W., Kimble, R. A., & Ferguson, H. C. 2000, ApJ, 529, L89 

Brown, T. M., Ferguson, H. C., Davidsen, A. F., & Dorman, B. 1997, ApJ, 482, 685 

Caloi, V. 1976, A&A, 50, 471 

—. 1989, A&A, 221, 27 

Colin, J., de Boer, K. S., Dauphole, B., Ducourant, C., Dulou, M. R., Geffert, M., Le 

Campion, J.-F., Moehler, S., Odenkirchen, M., Schmidt, J. H. K., & Theissen, A. 

1994, A&A, 287, 38 

153

154 BIBLIOGRAPHY 

Colless, M., Dalton, G., Maddox, S., Sutherland, W., Norberg, P., Cole, S., Bland- 

Hawthorn, J., Bridges, T., Cannon, R., Collins, C., Couch, W., Cross, N., Deeley, 

K., De Propris, R., Driver, S. P., Efstathiou, G., Ellis, R. S., Frenk, C. S., Glazebrook, 

K., Jackson, C., Lahav, O., Lewis, I., Lumsden, S., Madgwick, D., Peacock, J. A., 

Peterson, B. A., Price, I., Seaborne, M., & Taylor, K. 2001, MNRAS, 328, 1039 

Connolly, A. J. & Szalay, A. S. 1999, AJ, 117, 2052 

Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L., & Calzetti, D. 1995, 

AJ, 110, 1071 

D’Cruz, N. L., Dorman, B., Rood, R. T., & O’Connell, R. W. 1996, ApJ, 466, 359 

de Boer, K. S., Aguilar Sanchez, Y., Altmann, M., Geffert, M., Odenkirchen, M., 

Schmidt, J. H. K., & Colin, J. 1997, A&A, 327, 577 

Deeming, T. J. 1964, MNRAS, 127, 493 

Djorgovski, S. G., Gal, R. R., Odewahn, S. C., de Carvalho, R. R., Brunner, R., Longo, 

G., & Scaramella, R. 1998, in Wide Field Surveys in Cosmology, 14th IAP meeting 

held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN: 2-8 6332-241-9, 

p. 89., ed. S. Colombi, Y. Mellier, & B. Raban, 89–+ 

Dorman, B., Rood, R. T., & O’Connell, R. W. 1993, ApJ, 419, 596 

Dreizler, S., Heber, U., Werner, K., Moehler, S., & de Boer, K. S. 1990, A&A, 235, 234 

Drilling, J. S. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient Stars, 461 

Drilling, J. S., Jeffery, C. S., Moehler, S., Heber, U., & Napiwotzki, R. 2006, in preparation 

Dudley, R., E. 1992, PhD thesis, The University of St. Andrews 

Edelmann, H., Heber, U., Hagen, H.-J., Lemke, M., Dreizler, S., Napiwotzki, R., & 

Engels, D. 2003, A&A, 400, 939 

Edelsbrunner, H. & Shah, N. R. 1992, in SCG ’92: Proceedings of the eighth annual 

symposium on Computational geometry (New York, NY, USA: ACM Press), 43–52 

Fan, X. 1999, AJ, 117, 2528 

Folkes, S. R., Lahav, O., & Maddox, S. J. 1996, MNRAS, 283, 651 

Francis, P. J., Hewett, P. C., Foltz, C. B., & Chaffee, F. H. 1992, ApJ, 398, 476 

Galaz, G. & de Lapparent, V. 1998, A&A, 332, 459 

Glazebrook, K., Offer, A. R., & Deeley, K. 1998, ApJ, 492, 98 

Golub, G. H. & Van Loan, C. F. 1989, Matrix Computations, 2nd edn. (Baltimore, 

Maryland 21218: The Johns Hopkins University Press)

BIBLIOGRAPHY 155 

Green, E. M., Fontaine, G., Hyde, E. A., Charpinet, S., & Chayer, P. 2006, in Baltic 

Astronomy, Vol. 15, Nos. 1-2, The Proceedings of the 2nd Meeting on Hot Subdwarfs 

and Related Objects, ed. R. H. Østensen, 167–174 

Green, R. F., Schmidt, M., & Liebert, J. 1986, ApJS, 61, 305 

Greenstein, J. L. & Sargent, A. I. 1974, ApJS, 28, 157 

Gulati, R., Gupta, R., & Singh, H. 1997a, PASP, 109, 843 

Gulati, R. K., Gupta, R., Gothoskar, P., & Khobragade, S. 1994a, ApJ, 426, 340 

—. 1994b, Vistas in Astronomy, 38, 293 

—. 1996, Bulletin of the Astronomical Society of India, 24, 21 

Gulati, R. K., Gupta, R., & Rao, N. K. 1997b, A&A, 322, 933 

Harris, H. C., Liebert, J., Kleinman, S. J., Nitta, A., Anderson, S. F., Knapp, G. R., 

Krzesiński, J., Schmidt, G., Strauss, M. A., Vanden Berk, D., Eisenstein, D., Hawley, 

S., Margon, B., Munn, J. A., Silvestri, N. M., Smith, J. A., Szkody, P., Collinge, 

M. J., Dahn, C. C., Fan, X., Hall, P. B., Schneider, D. P., Brinkmann, J., Burles, 

S., Gunn, J. E., Hennessy, G. S., Hindsley, R., Ivezić, Z., Kent, S., Lamb, D. Q., 

Lupton, R. H., Nichol, R. C., Pier, J. R., Schlegel, D. J., SubbaRao, M., Uomoto, 

A., Yanny, B., & York, D. G. 2003, AJ, 126, 1023 

Heber, U. 1986, A&A, 155, 33 

Heber, U. 1991, in IAU Symp. 145: Evolution of Stars: the Photospheric Abundance 

Connection, ed. G. Michaud & A. V. Tutukov, 363–+ 

Heber, U., Hirsch, H., Ströer, A., O’Toole, S., Haas, S., & Dreizler, S. 2006, in Baltic 

Astronomy, Vol. 15, Nos. 1-2, The Proceedings of the 2nd Meeting on Hot Subdwarfs 

and Related Objects, ed. R. H. Østensen, 104–111 

Heber, U. & Hunger, K. 1987, in IAU Colloq. 95: Second Conference on Faint Blue 

Stars, ed. A. G. D. Philip, D. S. Hayes, & J. W. Liebert, 599–602 

Heber, U., Hunger, K., Jonas, G., & Kudritzki, R. P. 1984, A&A, 130, 119 

Hirsch, H. A., Heber, U., O’Toole, S. J., & Bresolin, F. 2005, A&A, 444, L61 

Husfeld, D., Butler, K., Heber, U., & Drilling, J. S. 1989, A&A, 222, 150 

Hutchison, R. B. 1971, AJ, 76, 711 

Iben, I., Kaler, J. B., Truran, J. W., & Renzini, A. 1983, ApJ, 264, 605 

Iben, I. J. 1990, ApJ, 353, 215 

Jeffery, C. S. 2003, in ASP Conf. Ser. 288: Stellar Atmosphere Modeling, ed. I. Hubeny, 

D. Mihalas, & K. Werner, 141–+ 



Jeffery, C. S., Drilling, J. S., Harrison, P. M., Heber, U., & Moehler, S. 1997, A&AS, 

125, 501 

Jeffery, C. S., Heber, U., Hill, P. W., Dreizler, S., Drilling, J. S., Lawson, W. A., 

Leuenhagen, U., & Werner, K. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient 

Stars, ed. C. S. Jeffery & U. Heber, 471–+ 

Jeffery, C. S., Woolf, V. M., & Pollacco, D. L. 2001, A&A, 376, 497 

Katz, D., Soubiran, C., Cayrel, R., Adda, M., & Cautain, R. 1998, A&A, 338, 151 

Kleinman, S. J., Harris, H. C., Eisenstein, D. J., Liebert, J., Nitta, A., Krzesiński, J., 

Munn, J. A., Dahn, C. C., Hawley, S. L., Pier, J. R., Schmidt, G., Silvestri, N. M., 

Smith, J. A., Szkody, P., Strauss, M. A., Knapp, G. R., Collinge, M. J., Mukadam, 

A. S., Koester, D., Uomoto, A., Schlegel, D. J., Anderson, S. F., Brinkmann, J., 

Lamb, D. Q., Schneider, D. P., & York, D. G. 2004, ApJ, 607, 426 

Klemola, A. R. 1961, ApJ, 134, 130 

Kohonen, T. 1990, in New Concepts in Computer Science: Proc. Symp. in Honour of 

Jean-Claude Simon (Paris, France: AFCET), 181–190 

Kohonen, T., Hynninen, J., Kangas, J., & Laaksonen, J. 1996, SOM PAK: The Self- 

Organizing Map program package, Tech. Rep. A31, Laboratory of Computer and 

Information Science, Helsinki University of Technology 

Kurtz, M. J. 1982, Ph.D. Thesis 

Lahav, O., Naim, A., Sodré, L., & Storrie-Lombardi, M. C. 1996, MNRAS, 283, 207 

Lamy, H. & Hutsemékers, D. 2004, A&A, 427, 107 

Lasala, J. 1994, in ASP Conf. Ser. 60: The MK Process at 50 Years: A Powerful Tool 

for Astrophysical Insight, ed. C. J. Corbally, R. O. Gray, & R. F. Garrison, 312–+ 

Levenberg, K. 1944, Questions of Applied Mathematics, 2, 164 

Livny, M. & Raman, R. 1998, in The Grid: Blueprint for a New Computing Infrastructure, 

ed. I. Foster & C. Kesselman (Morgan Kaufmann) 

Marquardt, D. W. 1963, Journal of the Society for Industrial and Applied Mathematics, 

11, 431 

Maxted, P. f. L., Heber, U., Marsh, T. R., & North, R. C. 2001, MNRAS, 326, 1391 

Mengel, J. G., Norris, J., & Gross, P. G. 1976, ApJ, 204, 488 

Moehler, S., de Boer, K. S., & Heber, U. 1990a, A&A, 239, 265 

Moehler, S., Richtler, T., de Boer, K. S., Dettmar, R. J., & Heber, U. 1990b, A&AS, 

86, 53 

Möller, T. & Trumbore, B. 1997, Journal of Graphics Tools, 2, 21, see: 

http://www.acm.org/jgt/papers/MollerTrumbore97/


Moore, A. 1991, A tutorial on kd-trees, Extract from PhD Thesis, available from 

http://www.cs.cmu.edu/∼awm/papers.html 

Morgan, W. W., Abt, H. A., & Tapscott, J. W. 1978, Revised MK Spectral Atlas for 

stars earlier than the sun (Williams Bay: Yerkes Observatory, and Tucson: Kitt Peak 

National Observatory, 1978) 

Morossi, C. & Crivellari, L. 1980, A&AS, 41, 299 

Mücke, E. P., Saias, I., & Zhu, B. 1996, in SCG ’96: Proceedings of the twelfth annual 

symposium on Computational geometry (New York, NY, USA: ACM Press), 274–283 

Murtagh, F. & Heck, A. 1987, Multivariate Data Analysis (Dordrecht, Holland: D. 

Reidel Publishing Co.) 

Napiwotzki, R., Karl, C. A., Lisker, T., Heber, U., Christlieb, N., Reimers, D., Nelemans, 

G., & Homeier, D. 2004, Ap&SS, 291, 321 

Newell, E. B. 1973, ApJS, 26, 37 

O’Rourke, J. 1998, Computational Geometry in C, 2nd edn. (Cambridge (UK) and 

New York: Cambridge University Press) 

Paczyński, B. 1971, Acta Astronomica, 21, 1 

Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. 1986, Numerical 

Recipes: The Art of Scientific Computing, 1st edn. (Cambridge (UK) and New York: 

Cambridge University Press) 

Qin, D.-M., Guo, P., Hu, Z.-Y., & Zhao, Y.-H. 2003, Chinese Journal of Astronony and 

Astrophysics, 3, 277 

Reid, I. N., Brewer, C., Brucato, R. J., McKinley, W. R., Maury, A., Mendenhall, 

D., Mould, J. R., Mueller, J., Neugebauer, G., Phinney, J., Sargent, W. L. W., 

Schombert, J., & Thicksten, R. 1991, PASP, 103, 661 

Renka, R. J. 1988, ACM Trans. Math. Softw., 14, 139 

Rhee, J., Beers, T. C., & Irwin, M. J. 1999, Bulletin of the American Astronomical 

Society, 31, 971 

Russell, S. & Norvig, P. 2003, Artificial Intelligence A Modern Approach, 2nd edn. 

(Upper Saddle River, New Jersey 07458: Pearson Education Inc.) 

Saffer, R. A., Bergeron, P., Koester, D., & Liebert, J. 1994, ApJ, 432, 351 

Shepard, D. 1968, in Proceedings of the 1968 23rd ACM national conference (New 

York, NY, USA: ACM Press), 517–524 

Shewchuk, J. R. 1996, in SCG ’96: Proceedings of the twelfth annual symposium on 

Computational geometry (New York, NY, USA: ACM Press), 141–150 

Simkin, S. M. 1974, A&A, 31, 129 



Singh, H. P., Gulati, R. K., & Gupta, R. 1998, MNRAS, 295, 312 

Skrutskie, M. F., Cutri, R. M., Stiening, R., Weinberg, M. D., Schneider, S., Carpenter, 

J. M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Liebert, J., 

Lonsdale, C., Monet, D. G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J. D., 

Gizis, J. E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R., 

Kopan, E. L., Marsh, K. A., McCallon, H. L., Tam, R., Van Dyk, S., & Wheelock, 

S. 2006, AJ, 131, 1163 

Snider, S., Allende Prieto, C., von Hippel, T., Beers, T. C., Sneden, C., Qu, Y., & 

Rossi, S. 2001, ApJ, 562, 528 

Sodre, L. J., Cuevas, H., & Capelato, H. V. 1998, in Wide Field Surveys in Cosmology, 

14th IAP meeting held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN: 

2-8 6332-241-9, p. 424., ed. S. Colombi, Y. Mellier, & B. Raban, 424–+ 

Storrie-Lombardi, M. C., Irwin, M. J., von Hippel, T., & Storrie-Lombardi, L. J. 1994, 

Vistas in Astronomy, 38, 331 

Sweigart, A. V. 1997, ApJ, 474, L23+ 

Theissen, A., Moehler, S., Heber, U., & de Boer, K. S. 1993, A&A, 273, 524 

Thejll, P., Bauer, F., Saffer, R., Liebert, J., Kunze, D., & Shipman, H. L. 1994, ApJ, 

433, 819 

Tonry, J. & Davis, M. 1979, AJ, 84, 1511 

von Hippel, T., Storrie-Lombardi, L. J., Storrie-Lombardi, M. C., & Irwin, M. J. 1994, 

MNRAS, 269, 97 

Weaver, W. B. 2000a, Bulletin of the American Astronomical Society, 32, 1430 

—. 2000b, ApJ, 541, 298 

Weaver, W. B. & Torres-Dodgen, A. V. 1995, ApJ, 446, 300 

—. 1997, ApJ, 487, 847 

Weir, N., Fayyad, U. M., Djorgovski, S. G., & Roden, J. 1995, PASP, 107, 1243 

Wesemael, F., Winget, D. E., Cabot, W., van Horn, H. M., & Fontaine, G. 1982, ApJ, 

254, 221 

Whitney, C. A. 1983, A&AS, 51, 443 

Willemsen, P. G., Hilker, M., Kayser, A., & Bailer-Jones, C. A. L. 2005, A&A, 436, 

379 

York, D. G., Adelman, J., Anderson, J. E., Anderson, S. F., Annis, J., Bahcall, N. A., 

Bakken, J. A., Barkhouser, R., Bastian, S., Berman, E., Boroski, W. N., Bracker, S., 

Briegel, C., Briggs, J. W., Brinkmann, J., Brunner, R., Burles, S., Carey, L., Carr, 

M. A., Castander, F. J., Chen, B., Colestock, P. L., Connolly, A. J., Crocker, J. H.,


Csabai, I., Czarapata, P. C., Davis, J. E., Doi, M., Dombeck, T., Eisenstein, D., 

Ellman, N., Elms, B. R., Evans, M. L., Fan, X., Federwitz, G. R., Fiscelli, L., Friedman, 

S., Frieman, J. A., Fukugita, M., Gillespie, B., Gunn, J. E., Gurbani, V. K., 

de Haas, E., Haldeman, M., Harris, F. H., Hayes, J., Heckman, T. M., Hennessy, 

G. S., Hindsley, R. B., Holm, S., Holmgren, D. J., Huang, C.-h., Hull, C., Husby, D., 

Ichikawa, S.-I., Ichikawa, T., Ivezić, Ž., Kent, S., Kim, R. S. J., Kinney, E., Klaene, 

M., Kleinman, A. N., Kleinman, S., Knapp, G. R., Korienek, J., Kron, R. G., Kunszt, 

P. Z., Lamb, D. Q., Lee, B., Leger, R. F., Limmongkol, S., Lindenmeyer, C., 

Long, D. C., Loomis, C., Loveday, J., Lucinio, R., Lupton, R. H., MacKinnon, B., 

Mannery, E. J., Mantsch, P. M., Margon, B., McGehee, P., McKay, T. A., Meiksin, 

A., Merelli, A., Monet, D. G., Munn, J. A., Narayanan, V. K., Nash, T., Neilsen, 

E., Neswold, R., Newberg, H. J., Nichol, R. C., Nicinski, T., Nonino, M., Okada, N., 

Okamura, S., Ostriker, J. P., Owen, R., Pauls, A. G., Peoples, J., Peterson, R. L., 

Petravick, D., Pier, J. R., Pope, A., Pordes, R., Prosapio, A., Rechenmacher, R., 

Quinn, T. R., Richards, G. T., Richmond, M. W., Rivetta, C. H., Rockosi, C. M., 

Ruthmansdorfer, K., Sandford, D., Schlegel, D. J., Schneider, D. P., Sekiguchi, M., 

Sergey, G., Shimasaku, K., Siegmund, W. A., Smee, S., Smith, J. A., Snedden, S., 

Stone, R., Stoughton, C., Strauss, M. A., Stubbs, C., SubbaRao, M., Szalay, A. S., 

Szapudi, I., Szokoly, G. P., Thakar, A. R., Tremonti, C., Tucker, D. L., Uomoto, A., 

Vanden Berk, D., Vogeley, M. S., Waddell, P., Wang, S.-i., Watanabe, M., Weinberg, 

D. H., Yanny, B., & Yasuda, N. 2000, AJ, 120, 1579 


Appendices 

161

Appendix A 

Results for 192 Drilling et al. 

(2006) Hot Subdwarfs 

This table lists the parameterisation results for both the calibrated and uncalibrated 

stars obtained from Drilling et al. (2006). Results obtained from the parameterisation 

neural network and SFIT are given, with the internal errors of SFIT also listed. 

163

Table A.1: Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs 

SFIT Results 

ANN Results 

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H ) 

(K) (cgs) (K) (cgs) 

BD-07 3477 27748 1362 5.420 0.120 -2.673 0.204 1.90E-01 26360.4620 5.4370 -2.4641 

BD+25 3941 28478 447 4.645 0.058 -1.422 0.034 1.89E+00 30794.9018 4.7807 -2.2476 

BD+28 4211 48135 1120 5.773 0.072 -1.121 0.040 3.72E-01 59518.9743 6.6219 -3.3647 

BD+40 4032 27895 713 4.083 0.069 -1.079 0.036 1.02E+00 27197.0607 3.7571 -0.9165 

Feige 110 40000 196 5.776 0.042 -2.020 0.136 8.78E-01 45638.2315 5.9658 -3.3709 

Feige 15 12000 162 4.500 0.053 -2.201 0.276 2.48E+00 13763.3319 3.9102 -1.4225 

Feige 38 29629 504 5.546 0.055 -2.483 0.132 2.71E-01 30014.7462 5.5768 -2.5487 

Feige 56 15571 279 3.608 0.048 -1.765 0.101 2.80E-01 17780.0476 3.7579 -2.3262 

Feige 98 11590 196 3.793 0.064 -2.541 0.302 7.42E-01 13083.8300 3.8179 -3.0230 

FHB 18 10819 133 4.179 0.037 -1.602 0.052 5.28E-01 12901.1081 4.2528 -2.7523 

FHB 23 11646 189 4.394 0.053 -2.586 0.334 4.27E-01 14271.8636 4.4317 -3.3414 

HD 144941 22000 348 3.835 0.055 0.963 0.008 1.31E+00 21681.7225 3.6821 1.6263 

HD 160641 30614 207 3.153 0.025 2.237 0.225 5.06E+00 31303.9914 2.8086 1.8212 

HD 17520 34793 723 3.804 0.067 -0.661 0.030 1.46E+00 35330.3819 4.0278 -0.9810 

HD 184279 25927 292 3.917 0.034 -0.400 0.016 2.06E+00 28409.0447 3.9086 -0.0132 

HD 192281 11337 194 4.012 0.044 -1.856 0.405 6.63E-01 11143.6105 3.6825 -2.2308 

HD 217086 37427 439 4.691 0.065 -1.237 0.045 8.02E-01 45007.2314 5.1943 -1.5241 

Hiltner 600 29739 564 4.717 0.063 -1.112 0.034 6.06E-01 27869.8814 4.7985 -0.9717 

HR 6092 22693 545 4.026 0.070 -1.477 0.039 4.36E+00 14259.7656 2.7584 -0.0691 

HR 6588 23305 466 3.682 0.063 -0.938 0.033 3.60E+00 16634.1650 2.5556 -0.1572 

continued on next page 

164 Chapter A - On the Automatic Analysis of Stellar Spectra

Table A.1: continued 


SFIT Results 

ANN Results 



HR 6719 25813 814 3.500 0.056 -0.597 0.026 2.68E+00 19240.4897 2.3722 -0.0260 

HR 7287 19088 391 3.598 0.058 -1.929 0.074 4.50E+00 12007.6482 2.1732 0.1142 

HR 8622 28951 1186 2.919 0.063 0.107 0.006 3.06E+00 29216.7530 2.8812 0.2961 

HS 0016+0044 29725 540 5.523 0.075 -2.939 0.377 6.74E-01 30591.2646 5.7893 -3.2113 

HS 1000+471 40766 137 5.659 0.036 0.521 0.013 6.30E+00 35501.0360 4.3467 -0.6256 

HS 1844+637 29045 177 3.633 0.026 0.976 0.008 8.46E+00 39133.7152 4.5950 3.4902 

HS 2253+0900 13534 281 3.863 0.060 -1.156 0.062 5.37E-01 13688.8067 3.9192 -1.8474 

HS 2301+0728 17658 528 4.378 0.065 -2.771 0.257 2.29E+00 10788.7216 2.7773 -3.8880 

HZ 15 20434 584 3.000 0.065 -0.630 0.025 2.37E+00 24751.3086 3.0214 -1.0013 

HZ 44 38507 224 5.381 0.040 0.088 0.003 1.58E+00 37595.6085 4.8985 -0.1747 

LSIV-14 37999 240 5.648 0.045 -0.573 0.017 3.63E+00 37185.7816 5.3742 -0.8631 

LSIV-6 28974 257 3.747 0.046 2.522 0.144 2.73E+00 26745.5011 2.7521 3.2416 

LSS 5121 30511 245 3.216 0.037 2.273 0.163 3.95E+00 32165.1540 3.2400 1.0310 

PG0001+275 35180 300 5.406 0.053 -3.000 0.434 4.31E+00 34205.9951 5.0684 -4.2459 

PG0004+133 26205 1118 4.828 0.110 -2.037 0.094 3.67E+00 32994.6787 5.2114 -1.8659 

PG0009+036 20214 629 4.496 0.084 -2.488 0.134 2.29E+00 23334.3874 4.8444 -2.5336 

PG0039+049 28606 391 4.668 0.068 -2.995 0.430 2.26E+00 20927.6745 3.7393 -3.3758 

PG0039+135 45029 212 5.408 0.081 0.514 0.030 3.51E+00 42789.5834 5.3039 -0.3035 

PG0057+155 32203 375 5.500 0.063 -1.785 0.053 1.02E+00 33110.1795 5.5769 -2.0610 

PG0101+039 27565 1344 5.357 0.114 -3.000 0.434 1.66E+00 27132.9594 5.1346 -2.8721 

PG0133+114 35999 61 6.000 0.034 -2.995 0.430 1.54E+00 30996.2889 5.0018 -3.0707 


165


SFIT Results 

ANN Results 



PG0135+242 25308 323 3.375 0.057 2.452 0.123 3.30E+00 22720.6398 2.5661 2.3728 

PG0142+148 28738 565 5.022 0.080 -2.966 0.402 3.72E+00 33990.4595 5.0653 -3.3974 

PG0208+016 44506 160 5.926 0.041 1.218 0.043 4.17E+00 42334.8969 5.7844 0.7455 

PG0229+064 18305 192 3.991 0.056 -0.798 0.035 9.77E-01 23366.1468 4.5001 -1.1572 

PG0232+095 35000 435 4.861 0.030 -1.392 0.053 5.88E+00 26042.8910 3.8106 -2.5768 

PG0304+183 29953 710 5.182 0.077 -2.913 0.356 7.50E+00 25550.8542 4.7584 -3.9436 

PG0314+146 8541 37 1.235 0.028 -2.004 0.088 7.27E+00 4603.7402 1.2050 -3.8688 

PG0342+026 21878 819 4.731 0.073 -3.000 0.434 2.06E+00 30227.8532 5.3591 -3.3190 

PG0838+133 40055 139 4.500 0.031 0.990 0.013 3.52E+00 46872.5237 4.9328 4.0786 

PG0856+121 26869 620 5.600 0.067 -3.000 0.434 2.70E+00 27571.6464 5.7342 -3.3096 

PG0902+058 42352 111 6.000 0.041 1.912 0.106 3.85E+00 42180.4296 5.7766 1.9591 

PG0907+123 26482 641 5.075 0.071 -3.000 0.434 4.06E+00 24278.8101 4.9274 -3.3162 

PG0909+164 31880 562 4.847 0.086 -3.000 0.434 1.57E+00 35575.3699 4.6247 -3.7231 

PG0909+275 34009 427 4.795 0.057 -0.685 0.022 1.93E+00 36507.4485 5.1157 -1.2373 

PG0918+029 31029 310 5.500 0.066 -2.649 0.193 2.41E+00 24129.8781 4.4794 -2.2902 

PG0920+029 25992 1329 4.781 0.111 -3.000 0.434 1.48E+00 27712.3799 4.8930 -3.3238 

PG0921+161 32868 292 5.329 0.060 -1.640 0.057 2.59E+00 35284.4531 5.1844 -3.0325 

PG0921+311 42320 137 5.873 0.047 1.068 0.015 2.72E+00 41033.2928 5.5140 0.5536 

PG0934+145 16681 212 4.031 0.037 -0.899 0.034 2.22E+00 13314.2430 3.7870 -0.7299 

PG0954+049 13384 239 3.399 0.058 -1.526 0.087 1.31E+00 12465.6053 3.0812 -2.6841 

PG1000+375 32896 237 5.814 0.053 -1.761 0.050 1.31E+00 20945.0047 4.8115 -1.6003 





SFIT Results 

ANN Results 



PG1017+431 32192 454 4.816 0.074 -2.991 0.425 5.45E-01 44773.4090 5.8294 -3.6347 

PG1018-047 30361 252 5.365 0.057 -3.000 0.434 2.23E+00 30145.1825 5.3885 -5.4213 

PG1047+003 32977 318 5.459 0.066 -2.130 0.117 4.07E-01 34846.5851 5.5663 -2.5655 

PG1049+013 32754 427 4.725 0.069 -2.610 0.177 8.38E-01 44195.8859 5.5828 -2.9385 

PG1050-065 34509 236 5.591 0.047 -1.212 0.035 2.91E+00 36241.7231 5.3524 -1.2791 

PG1118+061 28321 386 5.224 0.065 -3.000 0.434 3.84E-01 27695.5481 5.1031 -2.9965 

PG1127+019 40812 131 4.965 0.070 1.940 0.265 1.81E+00 39675.7493 4.6297 2.1534 

PG1136-003 30576 260 5.250 0.056 -3.000 0.434 4.32E-01 28147.6630 5.0249 -3.0350 

PG1154-070 28000 963 5.430 0.092 -2.200 0.069 3.92E-01 26478.4624 5.2617 -2.1939 

PG1220-056 49308 883 5.460 0.107 0.309 0.013 8.09E-01 52185.9827 6.0800 -1.0282 

PG1230+067 38843 191 4.926 0.056 1.013 0.013 2.36E+00 40314.1570 5.0314 2.8373 

PG1245-042 15232 266 3.803 0.056 -1.820 0.144 5.97E-01 15548.7313 3.8416 -1.4344 

PG1246-122 32573 361 4.001 0.044 -1.012 0.035 3.11E+00 33475.4066 3.7321 -2.2059 

PG1249+762 50000 276 5.623 0.087 0.368 0.028 1.06E+00 65439.9717 6.7602 0.0994 

PG1255+547 32774 330 5.500 0.059 -1.547 0.046 1.20E+00 30935.1727 5.6242 -2.1342 

PG1258-030 13075 278 3.656 0.048 -2.286 0.168 1.03E+00 12502.8404 3.3526 -2.0501 

PG1300+279 48677 632 5.955 0.053 -0.041 0.002 1.56E+00 46441.3479 6.5777 -0.9764 

PG1303-114 31245 354 5.502 0.068 -3.000 0.434 1.82E+00 30588.0996 5.4478 -5.0836 

PG1325+054 44232 208 5.915 0.033 0.326 0.011 2.18E+00 45019.5994 6.0397 1.9269 

PG1336-018 31271 245 5.567 0.049 -2.519 0.143 6.11E-01 37061.5891 5.9196 -3.4263 

PG1343-102 29958 707 5.424 0.087 -2.910 0.353 6.71E-01 31263.5069 5.3800 -3.4850 


167


SFIT Results 

ANN Results 



PG1343+578 15335 301 3.396 0.062 -1.835 0.089 7.67E-01 14789.0812 3.2773 -2.8106 

PG1348+607 45000 223 5.386 0.076 0.000 0.000 2.17E+00 56671.0493 6.6681 2.0785 

PG1352-023 47661 1197 6.000 0.091 -1.723 0.092 2.46E+00 54114.0712 6.2103 -3.7261 

PG1355-064 49999 733 5.299 0.115 0.364 0.029 9.95E-01 55817.7736 5.5139 -1.3330 

PG1401+289 47629 247 5.753 0.087 0.184 0.014 9.22E-01 43645.2377 5.6074 0.1812 

PG1409-103 39399 768 5.008 0.089 -2.166 0.127 1.45E+00 50621.5726 5.7496 -4.4764 

PG1413+114 43416 117 6.000 0.039 1.261 0.063 3.54E+00 44247.9760 5.7971 3.0220 

PG1415+492 31467 283 4.135 0.047 2.512 0.141 2.26E+00 30378.3273 3.7367 2.7468 

PG1426-067 34262 380 5.368 0.066 -3.000 0.434 1.17E+00 34904.3068 5.1176 -2.8789 

PG1432+004 24561 1051 4.987 0.114 -2.308 0.088 9.80E-01 24852.4845 5.0079 -2.5775 

PG1433+239 35306 378 5.345 0.061 -2.970 0.405 3.89E+00 39665.7300 5.3344 -3.5664 

PG1441+407 49802 277 6.000 0.089 0.464 0.038 1.61E+00 46391.6786 5.5498 0.7923 

PG1448-052 32489 304 5.189 0.058 -3.000 0.434 7.68E-01 50271.3965 6.1452 -3.8949 

PG1449+652 30456 269 4.598 0.057 -3.000 0.434 1.23E+00 22392.8613 3.6074 -4.2284 

PG1451+492 17996 482 3.999 0.067 -1.919 0.108 8.85E-01 21898.4813 4.1696 -2.0822 

PG1453-081 16393 281 3.977 0.068 -1.196 0.095 1.02E+00 20030.3017 3.9852 -1.7400 

PG1453-085 12264 169 3.175 0.044 -2.777 0.260 6.93E-01 14759.0551 3.3091 -3.1086 

PG1458+423 29151 811 5.000 0.104 -2.995 0.430 6.94E-01 29104.1751 4.8454 -3.9124 

PG1506-052 38002 479 5.227 0.071 -2.226 0.073 2.32E+00 57963.7346 6.0973 -6.7294 

PG1510+635 12489 184 3.256 0.048 -2.906 0.350 2.53E+00 14137.8470 3.2197 -3.4824 

PG1518-098 47134 1376 5.000 0.098 0.276 0.029 1.04E+00 63938.6641 5.7489 2.8699 





SFIT Results 

ANN Results 



PG1519+640 28801 669 5.000 0.099 -2.779 0.261 8.55E-01 29147.1760 4.9799 -3.1609 

PG1526+440 41199 139 5.826 0.037 0.738 0.023 3.90E+00 37868.3371 4.9471 -0.1320 

PG1532+523 30618 263 5.257 0.056 -2.935 0.374 1.03E+00 25389.6717 4.5495 -3.1041 

PG1534-018 45010 166 5.656 0.043 1.224 0.087 8.31E-01 43114.3697 5.3541 0.8758 

PG1536+690 50000 277 5.647 0.087 0.364 0.028 9.38E-01 55913.0919 6.1882 -0.0670 

PG1537-046 47494 676 5.249 0.108 0.174 0.008 1.17E+00 58302.0139 6.0350 2.8671 

PG1538+401 32260 320 5.446 0.064 -3.000 0.434 8.57E-01 34944.9166 5.5433 -2.9674 

PG1538+611 30528 307 5.416 0.063 -3.000 0.434 1.32E+00 29648.9555 5.0240 -4.0399 

PG1543+629 40002 185 5.536 0.043 -2.137 0.119 1.07E+00 51462.0631 5.9720 -3.9767 

PG1544+488 30992 273 4.202 0.046 2.522 0.144 1.50E+00 33677.6132 4.4517 3.5425 

PG1544+601 30000 518 5.543 0.054 -2.579 0.165 4.27E-01 28454.0634 5.5260 -2.6698 

PG1545+035 38820 419 5.000 0.083 -1.081 0.036 1.43E+00 47852.7894 5.5488 -4.0558 

PG1549+006 31939 559 5.288 0.070 -2.074 0.103 1.08E+00 29109.5245 5.0910 -1.8754 

PG1553-077 44904 210 5.707 0.045 0.256 0.014 1.56E+00 42387.5197 5.3536 1.0981 

PG1554+408 34356 255 4.320 0.056 2.522 0.289 3.41E+00 38902.7912 4.9593 2.1864 

PG1558-007 21419 753 4.922 0.081 -2.709 0.222 4.26E-01 26843.3693 5.3108 -2.7541 

PG1559+048 36325 228 5.600 0.049 -0.938 0.033 1.05E+00 36003.1956 5.3491 -0.9926 

PG1559+222 42434 138 6.000 0.047 2.129 0.117 2.79E+00 41856.4568 5.5616 1.0422 

PG1559+533 29420 633 5.500 0.090 -2.817 0.285 1.52E+00 21995.1194 5.0622 -1.9979 

PG1600+171 45883 363 5.999 0.092 0.943 0.026 3.97E+00 42817.5697 5.4414 1.6103 

PG1602+013 39961 202 5.592 0.042 -1.995 0.129 1.49E+00 53840.6098 6.1985 -3.8367 


169


SFIT Results 

ANN Results 



PG1605+072 30000 824 4.779 0.088 -2.790 0.268 2.97E-01 31978.9966 4.9655 -2.6540 

PG1607+174 32181 335 4.674 0.046 -0.268 0.009 5.28E+00 31842.6743 4.4473 -0.4410 

PG1610+519 31595 380 4.537 0.070 -3.000 0.434 5.74E-01 38165.4762 4.8614 -4.8178 

PG1613+467 23590 1075 4.612 0.083 -2.541 0.151 9.68E-01 29674.0969 5.1593 -3.3097 

PG1615+413 28553 189 3.759 0.018 0.990 0.008 3.70E+00 36907.9108 4.9056 2.1208 

PG1618+563 33309 341 5.481 0.069 -1.513 0.042 2.17E+00 36466.0770 5.8061 -2.4338 

PG1619+525 31178 318 5.500 0.067 -2.371 0.102 4.17E-01 30888.0423 5.4437 -2.4573 

PG1624+085 43897 205 5.752 0.058 0.623 0.045 1.94E+00 41755.1142 5.5719 0.1620 

PG1627+006 23222 593 5.193 0.068 -2.899 0.344 1.01E+00 20611.6950 4.8277 -2.7701 

PG1627+017 22959 522 5.206 0.065 -2.700 0.218 3.61E-01 22424.9747 5.2100 -2.8087 

PG1629+466 35779 273 5.000 0.035 0.767 0.030 1.21E+00 37328.4040 5.1599 0.6611 

PG1640+645 34458 262 5.591 0.048 -1.590 0.051 4.84E-01 35698.5308 5.9373 -2.2544 

PG1644+404 28221 457 5.060 0.062 -3.000 0.434 7.48E-01 32326.3238 5.4986 -3.2872 

PG1645+610 28377 522 5.411 0.080 -2.391 0.107 8.12E-01 17587.7044 4.7845 -2.2044 

PG1646+607 47595 668 6.000 0.057 -0.032 0.001 2.31E+00 46365.3539 6.0318 0.4509 

PG1648+315 41835 130 6.000 0.046 1.040 0.014 4.08E+00 36749.1785 4.9233 0.0589 

PG1648+536 30145 630 5.001 0.093 -3.000 0.434 7.59E-01 31819.6176 5.0395 -4.0618 

PG1653+633 34667 268 5.790 0.055 -1.693 0.043 1.01E+00 38094.6200 6.2811 -2.4102 

PG1656+600 30481 252 6.000 0.056 -2.628 0.184 1.51E+00 33912.5833 6.0668 -3.3172 

PG1658+273 43059 113 6.000 0.038 1.261 0.063 3.25E+00 40736.4516 4.8451 1.2766 

PG1701+359 32615 88 5.918 0.021 -2.604 0.175 5.87E-01 32564.5119 5.5194 -3.9213 





SFIT Results 

ANN Results 



PG1704+222 15926 334 2.561 0.053 -1.131 0.053 5.49E-01 19339.7888 2.9039 -1.5097 

PG1705+537 15025 341 3.437 0.062 -1.356 0.049 1.23E+00 19121.6772 3.7978 -1.7138 

PG1707+657 36119 114 5.993 0.033 -1.902 0.104 6.97E-01 32424.6910 5.4695 -1.9184 

PG1708+142 18154 459 3.480 0.080 -1.245 0.068 2.06E+00 20510.9488 3.6958 -1.5666 

PG1708+602 42477 775 5.256 0.074 -0.945 0.060 8.49E-01 52356.2200 5.9487 -3.2523 

PG1710+490 29497 546 5.102 0.069 -2.705 0.220 1.06E+00 30354.3920 5.2502 -3.0947 

PG1710+567 33908 486 5.110 0.033 -1.538 0.045 9.87E-01 30094.2953 4.8112 -2.7425 

PG1715+273 29074 227 3.797 0.016 1.070 0.010 4.32E+00 35385.5095 4.5349 3.0662 

PG1717+423 23293 615 4.654 0.074 -2.991 0.425 1.12E+00 26991.4606 4.9110 -3.2374 

PG1722+286 33802 292 5.795 0.058 -1.757 0.050 1.33E+00 31508.9630 5.7798 -1.6469 

PG1724+590 28706 605 5.000 0.094 -2.420 0.114 9.74E-01 27423.4056 4.9714 -3.2446 

PG1738+505 24113 746 4.922 0.081 -1.829 0.059 9.28E-01 28143.7622 5.3359 -1.8641 

PG1739+489 22474 539 4.569 0.066 -2.744 0.241 7.71E-01 28940.6842 5.0868 -3.4167 

PG1743+477 26873 1206 4.904 0.123 -2.214 0.071 1.82E+00 26957.5253 4.8522 -3.3545 

PG2059+013 33086 233 5.583 0.046 -1.607 0.053 2.64E+00 31942.9438 5.5146 -1.9470 

PG2111+023 14681 254 4.000 0.055 -1.274 0.073 3.69E+00 18712.6960 4.0693 -2.2958 

PG2120+062 32008 512 4.133 0.052 -0.950 0.046 7.43E-01 32664.6531 4.1185 -1.1104 

PG2148+095 30001 806 4.555 0.082 -3.000 0.434 5.46E-01 25977.6010 3.9359 -3.3697 

PG2151+100 35789 66 5.941 0.034 -2.677 0.206 7.84E-01 41296.8002 5.7262 -3.0819 

PG2158+082 49999 743 5.500 0.119 0.364 0.029 1.28E+00 53960.7462 5.8944 -1.2564 

PG2159+051 13496 252 3.244 0.050 -1.441 0.084 3.91E-01 13921.4240 3.3088 -1.7585 


171


SFIT Results 

ANN Results 



PG2204+035 31535 302 5.917 0.059 -2.506 0.139 3.75E+00 23039.5221 5.2470 -2.1168 

PG2205+023 27156 646 5.622 0.069 -3.000 0.434 1.06E+00 24413.7057 5.4836 -3.0047 

PG2215+151 45566 318 5.841 0.068 1.951 0.620 4.56E+00 41433.7407 5.4859 -0.1429 

PG2218+020 21062 711 4.768 0.070 -2.707 0.221 1.34E+00 30254.5605 5.4740 -2.9763 

PG2219+094 19206 754 3.546 0.068 -1.457 0.112 1.39E+00 24813.0492 4.0618 -1.7815 

PG2229+099 16940 236 3.755 0.053 -0.958 0.031 1.41E+00 18337.4404 3.8720 -1.3692 

PG2258+155 34000 381 4.481 0.046 0.945 0.008 3.68E+00 37528.2798 5.2630 2.4797 

PG2259+134 31323 396 5.772 0.057 -1.975 0.082 1.35E+00 31934.3911 5.8685 -2.0224 

PG2301+259 18959 395 4.217 0.061 -1.904 0.070 6.67E-01 16716.7103 4.1106 -1.5974 

PG2314+076 30140 305 5.640 0.050 -3.000 0.434 3.38E+00 31564.2513 5.1605 -3.1646 

PG2317+046 33177 797 4.504 0.104 -3.000 0.434 1.54E+00 40658.1786 4.6877 -3.3510 

PG2318+239 16940 296 3.778 0.036 -1.392 0.075 1.47E+00 20890.7160 4.2557 -1.8445 

PG2321+214 38502 268 4.977 0.063 2.314 0.179 1.52E+00 39171.8164 5.0203 2.4143 

PG2331+038 29017 428 5.642 0.054 -2.401 0.109 3.59E+00 32370.5612 5.8341 -2.7235 

PG2337+070 29563 670 5.735 0.060 -1.997 0.086 7.85E-01 29133.8280 5.8738 -1.4688 

PG2339+199 30763 396 4.189 0.043 1.042 0.009 4.17E+00 33495.0283 4.4077 1.0705 

PG2345+241 17743 288 3.699 0.044 -0.954 0.046 6.76E-01 19581.9082 3.8835 -1.0702 

PG2349+002 28383 334 5.600 0.053 -3.000 0.434 2.00E+00 26081.2311 5.2581 -4.0008 

PG2351+198 14539 269 3.768 0.046 -1.380 0.094 1.64E+00 13733.0764 4.0071 -1.6889 

PG2352+181 47309 246 5.873 0.089 0.264 0.021 2.81E+00 44348.4299 5.7750 -0.4701 

PG2358+107 23768 966 4.978 0.105 -2.685 0.210 3.04E+00 24626.4068 4.9117 -3.6593 




SFIT Results 

ANN Results 



PHL 1079 31862 389 5.589 0.055 -2.303 0.087 1.32E+00 31739.7364 5.6372 -2.0741 

PHL 4 40197 685 5.000 0.093 -1.254 0.062 6.59E-01 50528.5221 5.6359 -2.8566 

TON 107 39369 266 5.602 0.039 -0.076 0.003 4.42E+00 36793.9981 4.8072 -0.2664 

VZ1128 M3 34893 388 4.500 0.058 -0.968 0.044 8.72E-01 35857.4448 4.3289 -2.0402 

173 


Appendix B 

Results for 282 SDSS DR3 Hot 

Subdwarf Candidates 

This table lists the classification and parameterisation results for the SDSS hot subdwarf 

candidates of Chapter 5. Also listed for each star are its position and redshift as 

obtained by the SDSS. The internal errors of SFIT are given, along with the value of 

χ 2 for the best fit. 

175

Table B.1: Results for 282 SDSS Hot Subdwarf Candidates 

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 

(km s −1 ) (K) (cgs) 

J000607.88-010320.8 00:06:07.88 -01:03:20.8 -279.634 26.6456 sdO7VII:He39 42788 155 5.280 0.060 2.093 0.108 6.17E+00 

J001651.42-011329.3 00:16:51.42 -01:13:29.3 115.816 44.9755 sdO2VII:He26 47737 3077 3.726 0.083 0.364 0.044 2.59E+00 

J001837.14+152150.0 00:18:37.14 +15:21:50.0 -73.1296 25.5077 sdO9VII:He39 38579 94 4.555 0.026 2.522 0.433 6.46E+00 

J001930.36+135530.9 00:19:30.36 +13:55:30.9 -225.249 34.6605 sdB0VI:He12 29686 475 5.382 0.045 -1.120 0.023 6.17E+00 

J002323.99-002953.3 00:23:23.99 -00:29:53.3 -5.64728 33.4529 sdB3VI:He5 30771 241 5.746 0.042 -1.873 0.032 6.68E-01 

J002852.26+135446.5 00:28:52.26 +13:54:46.5 35.3602 31.5028 sdO9VI:He8 33187 387 4.944 0.062 -2.692 0.214 9.48E-01 

J004233.43+004717.6 00:42:33.43 +00:47:17.6 24.2894 32.9952 sdB4V:He1 29989 668 4.987 0.080 -3.000 0.434 2.73E+00 

J011506.17+140513.5 01:15:06.17 +14:05:13.5 -162.802 29.9172 sdB3IV:He7 32000 380 4.545 0.056 -2.991 0.425 7.56E+00 

J013847.59+141532.1 01:38:47.59 +14:15:32.1 -188.872 33.771 sdO9VIII:He7 27012 612 4.972 0.072 -2.212 0.071 7.17E+00 

J015026.10-094226.9 01:50:26.10 -09:42:26.9 -18.51 33.4439 sdB8VI:He10 33657 341 5.303 0.029 -1.402 0.022 1.58E+00 

J021617.11-095513.1 02:16:17.11 -09:55:13.1 -29.2216 33.6757 sdO8VI:He12 35372 180 5.561 0.034 -1.195 0.027 9.74E-01 

J023032.65-081439.5 02:30:32.65 -08:14:39.5 -194.8 28.2265 sdB6IV:He3 13683 176 3.755 0.037 -1.399 0.065 7.83E-01 

J031620.13+004222.9 03:16:20.13 +00:42:22.9 -23.1856 31.5085 sdB0V:He8 32906 81 5.692 0.016 -2.227 0.073 1.67E+00 

J031854.14+004135.0 03:18:54.14 +00:41:35.0 6.24576 32.428 sdB7IV:He2 20554 512 4.500 0.063 -2.047 0.097 1.82E+00 

J033358.21+002007.5 03:33:58.21 +00:20:07.5 117.167 34.2372 sdB2VI:He13 34164 234 5.270 0.036 -0.771 0.017 6.59E+00 

J073712.28+264224.7 07:37:12.28 +26:42:24.7 8.71841 31.8302 sdO9VI:He7 32799 75 5.788 0.019 -2.278 0.082 1.71E+00 

J073856.99+401942.1 07:38:56.99 +40:19:42.1 -202.597 22.922 sdO1VII:He34 50000 547 5.495 0.087 0.644 0.047 2.18E+00 

J074001.91+240127.4 07:40:01.91 +24:01:27.4 -130.711 32.684 sdA1IV:He0 30156 269 5.085 0.042 -2.991 0.425 7.72E+00 

J074458.10+324259.9 07:44:58.10 +32:42:59.9 50.6134 38.7134 sdO9VII:He24 38824 74 5.996 0.025 -0.512 0.010 8.41E+00 

J074534.16+372718.6 07:45:34.16 +37:27:18.6 29.853 33.6007 sdB0VII:He4 35000 155 5.394 0.033 -3.000 0.434 2.44E+00 

J074613.17+333307.6 07:46:13.17 +33:33:07.6 -13.6868 23.2616 sdO7VII:He39 44962 100 6.000 0.031 1.261 0.047 1.60E+00 

J074720.59+384910.7 07:47:20.59 +38:49:10.7 25.1426 29.7373 sdB7IV:He3 16696 250 3.821 0.047 -1.588 0.101 2.74E+00 


176 Chapter B - On the Automatic Analysis of Stellar Spectra

Table B.1: continued 


(km s −1 ) (K) (cgs) 

J074806.15+342927.7 07:48:06.15 +34:29:27.7 -54.5131 34.0109 sdB0VI:He6 33653 269 5.478 0.053 -1.697 0.043 1.38E+00 

J074811.34+435239.6 07:48:11.34 +43:52:39.6 26.4415 32.7577 sdB1VI:He8 25154 617 5.003 0.067 -1.463 0.025 1.29E+00 

J075236.78+441642.5 07:52:36.78 +44:16:42.5 -102.636 34.6248 sdB1V:He9 34243 241 5.496 0.019 -1.059 0.020 2.63E+00 

J075249.96+305935.2 07:52:49.96 +30:59:35.2 144.115 29.591 sdB4V:He34 39999 182 5.576 0.027 1.134 0.012 9.97E+00 

J080259.80+411438.0 08:02:59.80 +41:14:38.0 -26.9931 22.3976 sdO7VII:He39 44340 95 6.000 0.030 1.261 0.047 1.68E+00 

J080628.10+323059.4 08:06:28.10 +32:30:59.4 -38.8078 32.5904 sdB2V:He11 30983 240 5.650 0.035 -1.332 0.019 8.91E-01 


J080726.80+303501.8 08:07:26.80 +30:35:01.8 -65.0792 28.5258 sdB5IV:He4 13573 172 3.697 0.035 -1.552 0.062 8.53E-01 

J081342.92+275034.8 08:13:42.92 +27:50:34.8 -5.38343 32.9205 sdB3VI:He2 27634 681 5.500 0.070 -2.294 0.085 1.28E+00 

J081540.66+430524.5 08:15:40.66 +43:05:24.5 131.458 38.0934 sdB4V:He32 37418 195 5.536 0.033 -0.307 0.008 1.52E+01 

J081607.91+480349.7 08:16:07.91 +48:03:49.7 -26.1981 31.9453 sdB2VI:He5 24992 545 5.359 0.058 -2.293 0.085 1.18E+00 

J082751.06+410925.9 08:27:51.06 +41:09:25.9 -21.3762 30.7086 sdO5VII:He36 48823 522 5.503 0.084 0.397 0.023 1.30E+01 

J082802.04+404009.0 08:28:02.04 +40:40:09.0 -182.813 30.9737 sdO4VII:He11 36831 344 5.088 0.050 -2.117 0.114 3.48E+00 

J083006.17+475150.4 08:30:06.17 +47:51:50.4 -6.3029 32.2109 sdB0VII:He3 27934 1042 5.330 0.084 -2.892 0.339 7.32E-01 

J083241.96+483445.1 08:32:41.96 +48:34:45.1 26.2838 29.9143 sdB4V:He5 18422 288 4.241 0.036 -1.521 0.072 2.29E+00 

J083456.98+422053.2 08:34:56.98 +42:20:53.2 13.8063 28.1785 sdB6III:He5 10816 93 3.075 0.028 -1.789 0.187 3.02E+00 

J083842.71+053309.5 08:38:42.71 +05:33:09.5 68.4768 32.1614 sdO9VII:He5 30554 214 5.502 0.047 -2.281 0.083 9.13E-01 

J083935.91+030840.8 08:39:35.91 +03:08:40.8 21.5494 31.5714 sdB0VI:He10 35310 415 4.854 0.057 -2.592 0.170 7.74E-01 

J084122.67+063029.6 08:41:22.67 +06:30:29.6 -11.2503 31.7429 sdB0VI:He6 32357 79 5.620 0.015 -2.141 0.060 8.36E-01 

J084413.77+023229.3 08:44:13.77 +02:32:29.3 268.878 31.7126 sdB6IV:He0 13225 156 4.036 0.033 -2.156 0.124 3.87E+00 

J084556.16+542357.6 08:45:56.16 +54:23:57.6 6.03341 31.7933 sdB5IV:He11 30069 344 5.053 0.036 -1.170 0.026 1.02E+01 

J084727.88+024814.8 08:47:27.88 +02:48:14.8 136.219 28.589 sdB5IV:He2 13429 182 3.539 0.035 -2.100 0.164 1.47E+00 

J085422.40+013651.0 08:54:22.40 +01:36:51.0 -7.05121 26.6296 sdB0VI:He25 34383 203 5.287 0.037 -0.784 0.018 1.70E+00 


177



(km s −1 ) (K) (cgs) 

J085650.28+401730.9 08:56:50.28 +40:17:30.9 -31.3088 19.8957 sdO4VI:He11 27006 1513 3.878 0.098 -3.000 0.434 1.20E+00 

J085727.66+424215.4 08:57:27.66 +42:42:15.4 192.101 26.8266 sdB3VI:He35 38783 151 5.406 0.025 0.689 0.018 9.33E+00 

J085900.33+023313.1 08:59:00.33 +02:33:13.1 19.9144 33.5645 sdB0VI:He4 33042 250 5.451 0.050 -1.685 0.042 9.75E-01 

J090559.15+055442.1 09:05:59.15 +05:54:42.1 316.248 26.3361 sdA0III:He3 12411 140 3.377 0.044 -1.647 0.192 6.02E+00 

J091225.13+421922.5 09:12:25.13 +42:19:22.5 -82.0724 32.7113 sdB4V:He6 30073 349 5.150 0.044 -2.647 0.193 2.67E+00 

J091544.44+511338.8 09:15:44.44 +51:13:38.8 -19.1172 29.6024 sdB2V:He6 34140 265 5.386 0.051 -1.514 0.028 1.92E+00 

J092436.41+040135.7 09:24:36.41 +04:01:35.7 114.477 29.0217 sdB4IV:He4 15224 225 3.614 0.032 -1.715 0.135 2.78E+00 

J092520.70+470330.6 09:25:20.70 +47:03:30.6 38.5098 33.0974 sdB5VI:He4 29376 480 5.196 0.054 -2.459 0.125 3.43E+00 

J092634.88+473036.0 09:26:34.88 +47:30:36.0 -6.49944 26.6692 sdB3V:He4 17256 215 4.128 0.038 -1.783 0.053 8.97E+00 

J092830.55+561811.8 09:28:30.55 +56:18:11.8 -135.281 31.0954 sdO6VI:He18 41016 436 5.000 0.067 -0.737 0.030 1.50E+00 

J093059.63+025032.4 09:30:59.63 +02:50:32.4 1.1058 33.0569 sdB1VI:He7 30308 171 5.537 0.037 -2.122 0.058 1.11E+00 

J093215.32-002108.5 09:32:15.32 -00:21:08.5 178.664 28.1481 sdB4V:He5 14602 166 3.723 0.036 -2.053 0.147 2.31E+00 

J093245.91+081618.6 09:32:45.91 +08:16:18.6 83.619 33.7491 sdB4VI:He4 32561 239 5.360 0.042 -1.662 0.040 1.54E+00 

J093322.20+440322.7 09:33:22.20 +44:03:22.7 -49.4753 29.8445 sdB4V:He2 16544 229 4.143 0.026 -1.629 0.111 2.44E+00 

J093549.72+544101.0 09:35:49.72 +54:41:01.0 -99.729 34.2498 sdB0VI:He4 36231 258 5.497 0.054 -1.463 0.025 3.48E+00 

J094143.53+535833.4 09:41:43.53 +53:58:33.4 -41.6598 28.0576 sdB5IV:He2 12276 163 3.500 0.041 -2.066 0.152 1.12E+00 

J094346.62+531429.1 09:43:46.62 +53:14:29.1 -150.267 25.7962 sdB1V:He26 35458 147 5.043 0.030 -0.573 0.014 3.93E+00 

J094623.03+040456.1 09:46:23.03 +04:04:56.1 55.3774 34.0528 sdB0VI:He9 37074 124 6.000 0.032 -1.395 0.022 1.01E+00 

J094900.45+025702.9 09:49:00.45 +02:57:02.9 245.283 29.1801 sdB8IV:He0 14000 212 3.251 0.040 -2.259 0.158 1.88E+01 

J095101.29+034757.0 09:51:01.29 +03:47:57.0 136.159 32.8147 sdB1VI:He3 30001 560 5.417 0.056 -2.228 0.073 9.20E-01 

J095847.23+602147.4 09:58:47.23 +60:21:47.4 -298.171 27.6608 sdB5IV:He2 12650 125 3.631 0.031 -1.726 0.115 9.37E-01 

J100019.99-003413.3 10:00:19.99 -00:34:13.3 90.0055 48.3655 sdO3VIII:He15 45730 659 5.151 0.072 -0.716 0.043 4.93E+00 





(km s −1 ) (K) (cgs) 

J100317.05+025510.4 10:03:17.05 +02:55:10.4 203.522 29.2731 sdB3VI:He34 38522 164 5.908 0.032 0.045 0.001 8.16E+00 

J100740.10+454252.5 10:07:40.10 +45:42:52.5 2.52572 33.5207 sdF0IV:He3 30057 388 5.239 0.051 -2.987 0.421 4.46E+00 

J101025.64+045357.0 10:10:25.64 +04:53:57.0 94.7419 30.4631 sdB7II:He5 15489 177 4.171 0.034 -2.103 0.110 8.80E+00 

J101213.21+064030.7 10:12:13.21 +06:40:30.7 96.1234 36.4263 sdO4VII:He26 42824 249 5.928 0.056 -0.568 0.010 2.14E+00 

J101218.95+004413.4 10:12:18.95 +00:44:13.4 -16.2037 36.3286 sdB0VII:He9 35915 227 5.373 0.042 -1.194 0.027 3.83E+00 

J101242.22+484937.4 10:12:42.22 +48:49:37.4 30.3123 31.632 sdB2VI:He4 28000 423 5.109 0.042 -2.195 0.068 1.27E+00 


J101640.84-010900.6 10:16:40.84 -01:09:00.6 7.0939 31.8275 sdO9VI:He5 30235 296 5.000 0.054 -2.659 0.198 2.13E+00 

J102057.16+013751.4 10:20:57.16 +01:37:51.4 238.157 32.448 sdB1VI:He3 28380 289 5.066 0.045 -3.000 0.434 1.43E+00 

J102120.45+444636.9 10:21:20.45 +44:46:36.9 -95.3694 22.147 sdO8VII:He40 45002 100 5.828 0.030 1.261 0.047 4.02E+00 

J102320.37+462026.8 10:23:20.37 +46:20:26.8 38.9026 20.1206 sdO6VI:He35 45164 147 5.814 0.034 1.272 0.057 5.39E+00 

J103022.08+020524.3 10:30:22.08 +02:05:24.3 62.5766 28.0526 sdB7IV:He3 13334 236 3.487 0.045 -2.140 0.120 1.92E+00 

J103549.68+092551.9 10:35:49.68 +09:25:51.9 193.763 23.8682 sdO5VII:He39 45000 147 5.745 0.048 1.473 0.232 1.78E+00 

J103854.02+525847.8 10:38:54.02 +52:58:47.8 49.5557 30.5354 sdB2VI:He2 27999 411 5.289 0.045 -2.582 0.166 9.41E-01 

J104248.95+033355.4 10:42:48.95 +03:33:55.4 370.055 29.0912 sdB0VII:He9 34770 212 5.105 0.041 -2.269 0.081 3.92E+00 

J105608.43+034821.3 10:56:08.43 +03:48:21.3 6.83794 29.6862 sdA0III:He-0 14204 151 3.805 0.037 -1.997 0.129 4.82E+00 

J110053.56+034622.8 11:00:53.56 +03:46:22.8 278.131 37.5145 sdO0VIII:He18 46166 513 5.616 0.056 -1.181 0.039 3.28E+00 

J110215.46+024034.2 11:02:15.46 +02:40:34.2 25.8182 27.2368 sdO3VII:He36 50000 208 5.804 0.066 0.364 0.020 3.51E+00 

J110255.98+521858.2 11:02:55.98 +52:18:58.2 -179.727 20.5426 sdO6VII:He38 45355 380 5.549 0.045 0.588 0.030 3.77E+00 

J110256.32+010012.3 11:02:56.32 +01:00:12.3 301.18 31.4893 sdB4V:He9 15645 228 3.701 0.037 -1.483 0.053 1.35E+01 

J110302.37-010338.7 11:03:02.37 -01:03:38.7 139.329 33.2725 sdO9VII:He4 29649 500 5.096 0.052 -2.579 0.165 1.46E+00 

J110445.01+092530.9 11:04:45.01 +09:25:30.9 182.534 29.6543 sdO5VI:He8 30800 333 4.500 0.059 -2.692 0.214 1.38E+00 

J111438.57-004024.1 11:14:38.57 -00:40:24.1 110.919 35.1207 sdO1VII:He21 42076 569 5.272 0.055 -0.866 0.038 5.76E+00 


179



(km s −1 ) (K) (cgs) 

J111633.30+052507.9 11:16:33.30 +05:25:07.9 -72.9026 27.5999 sdO9VI:He29 34332 221 4.894 0.037 -0.496 0.010 6.62E+00 

J112056.23+093641.8 11:20:56.23 +09:36:41.8 143.332 30.1321 sdO5VII:He11 35000 357 4.714 0.055 -2.584 0.167 1.68E+00 

J112242.70+613758.5 11:22:42.70 +61:37:58.5 -64.5492 33.3237 sdB3VI:He6 30001 495 5.622 0.043 -2.053 0.049 8.95E-01 

J112504.73+671658.3 11:25:04.73 +67:16:58.3 -88.07 25.0661 sdO9VI:He12 34249 130 4.948 0.033 -1.800 0.082 1.91E+00 

J112719.00+660538.7 11:27:19.00 +66:05:38.7 -8.00635 34.4941 sdB5V:He6 30552 197 5.303 0.042 -2.661 0.199 7.43E+00 

J113312.13+010824.9 11:33:12.13 +01:08:24.9 535.441 27.4911 sdB5IV:He4 11978 183 3.412 0.047 -2.020 0.319 6.60E+00 

J113840.69-003531.8 11:38:40.69 -00:35:31.8 -81.2527 32.8276 sdF5VI:He3 30867 228 5.424 0.048 -2.209 0.070 7.48E-01 

J113935.45+614954.0 11:39:35.45 +61:49:54.0 -3.84451 33.6331 sdB0VI:He3 29767 646 5.000 0.079 -2.995 0.430 2.80E+00 

J114352.74+660723.4 11:43:52.74 +66:07:23.4 -131.878 35.8462 sdB3VII:He2 30474 218 5.389 0.046 -2.096 0.054 6.31E+00 

J114417.53-012914.1 11:44:17.53 -01:29:14.1 286.516 30.0023 sdB4V:He2 13292 160 3.583 0.032 -2.300 0.173 3.84E+00 

J114821.30+033625.8 11:48:21.30 +03:36:25.8 -4.57768 35.4397 sdB5V:He5 33168 178 5.686 0.036 -1.602 0.035 6.88E+00 

J115009.49+061042.1 11:50:09.49 +06:10:42.1 44.1678 35.0436 sdO3V:He19 40640 288 5.147 0.057 -0.844 0.030 8.63E+00 

J115101.04+541003.5 11:51:01.04 +54:10:03.5 -210.473 28.6527 sdO6VI:He8 32000 602 4.484 0.077 -2.241 0.076 2.78E+00 

J115115.19-015255.2 11:51:15.19 -01:52:55.2 108.73 21.8454 sdB6V:He12 16099 329 2.774 0.052 -2.995 0.430 1.35E+01 

J115654.09-032510.2 11:56:54.09 -03:25:10.2 48.0882 26.4144 sdO9V:He14 32698 279 4.499 0.043 -1.126 0.029 2.16E+00 

J115716.38+612410.8 11:57:16.38 +61:24:10.8 -160.892 34.0894 sdB3VII:He1 29999 521 5.224 0.057 -3.000 0.434 2.15E+00 

J120311.26+045419.6 12:03:11.26 +04:54:19.6 317.09 31.3655 sdB8IV:He0 12883 188 3.596 0.033 -2.212 0.142 1.39E+01 

J120626.55+663352.5 12:06:26.55 +66:33:52.5 -83.8421 22.4985 sdO7VII:He39 43748 87 5.875 0.032 2.007 0.132 2.07E+00 

J121123.37+611203.9 12:11:23.37 +61:12:03.9 -147.955 39.0234 sdB7V:He9 32971 192 5.833 0.041 -1.924 0.073 7.23E+00 

J121424.81+550226.3 12:14:24.81 +55:02:26.3 -69.0878 35.2988 sdO5VII:He12 41632 331 5.253 0.046 -1.631 0.056 2.43E+00 

J121625.83-014804.6 12:16:25.83 -01:48:04.6 67.9746 31.4899 sdB1V:He5 12307 116 3.392 0.041 -1.159 0.087 1.45E+01 

J121643.73+020835.9 12:16:43.73 +02:08:35.9 65.6063 30.4367 sdO8VI:He38 40933 127 5.595 0.021 0.065 0.001 1.07E+01 





(km s −1 ) (K) (cgs) 

J122057.48-012642.4 12:20:57.48 -01:26:42.4 -4.416 32.305 sdB9III:He1 18548 245 4.613 0.038 -1.862 0.063 1.10E+01 

J122444.98+583313.9 12:24:44.98 +58:33:13.9 -85.1992 29.6831 sdB2IV:He-0 11106 135 3.124 0.037 -2.291 0.170 1.74E+01 

J122637.12+575927.6 12:26:37.12 +57:59:27.6 -194.41 31.7309 sdB0VII:He3 30382 187 5.350 0.042 -2.718 0.227 2.14E+00 

J123808.66+053318.2 12:38:08.66 +05:33:18.2 25.9705 24.1427 sdO4VII:He38 44999 127 5.352 0.041 2.125 0.289 3.96E+00 

J123821.49-021211.5 12:38:21.49 -02:12:11.5 83.2689 33.3849 sdO1VIII:He29 48017 516 5.158 0.081 0.210 0.008 7.27E+00 

J124706.79-003925.9 12:47:06.79 -00:39:25.9 7.95478 36.143 sdO9VII:He4 36859 318 5.551 0.042 -3.000 0.434 9.50E-01 


J124728.16+562958.3 12:47:28.16 +56:29:58.3 -159.902 33.0009 sdO9VII:He1 24936 896 4.876 0.085 -2.823 0.289 3.60E+00 

J124819.08+035003.3 12:48:19.08 +03:50:03.3 -56.2564 26.8671 sdO6VIII:He16 49129 1057 5.500 0.069 -1.226 0.051 2.62E+00 

J125229.60-030129.6 12:52:29.60 -03:01:29.6 47.7554 33.4044 sdB0V:He6 30736 221 5.454 0.047 -2.345 0.096 1.46E+00 

J125248.84+521604.1 12:52:48.84 +52:16:04.1 -128.9 29.8305 sdB7III:He4 13184 155 3.605 0.032 -1.715 0.090 8.25E+00 

J125328.45+042044.0 12:53:28.45 +04:20:44.0 -1.7345 29.2904 sdB4IV:He3 12250 168 3.500 0.043 -2.231 0.148 2.41E+00 

J125408.32+014324.1 12:54:08.32 +01:43:24.1 2.43989 23.7974 sdO6VII:He40 45000 100 5.999 0.031 1.311 0.053 2.30E+00 

J125410.86-010408.4 12:54:10.86 -01:04:08.4 88.5476 33.4203 sdB3IV:He6 20858 320 4.632 0.039 -1.689 0.042 5.36E+00 

J125941.88-003928.8 12:59:41.88 -00:39:28.8 45.2168 31.1712 sdO5VIII:He15 34375 213 5.149 0.025 -1.570 0.048 2.75E+00 

J130025.53+004530.2 13:00:25.53 +00:45:30.2 69.5908 33.3195 sdO9VII:He16 38249 102 5.756 0.025 -0.845 0.021 1.90E+00 

J130059.21+005711.8 13:00:59.21 +00:57:11.8 -30.0593 28.2369 sdB0VII:He25 37650 234 5.212 0.034 -0.443 0.010 2.90E+00 

J131425.39+011153.4 13:14:25.39 +01:11:53.4 -116.094 28.1469 sdB5IV:He2 13063 134 3.663 0.032 -1.696 0.086 6.81E-01 

J131452.97+023740.3 13:14:52.97 +02:37:40.3 -67.101 30.4253 sdB9IV:He2 17290 316 4.000 0.047 -1.858 0.063 5.24E+00 

J131638.48+034818.5 13:16:38.48 +03:48:18.5 6.70663 27.1188 sdB1V:He9 32528 59 5.250 0.031 -1.376 0.021 2.06E+00 

J131658.35+641522.5 13:16:58.35 +64:15:22.5 -178.639 24.6498 sdB1VI:He14 34296 190 5.544 0.038 -1.427 0.023 2.10E+00 

J131745.80+010450.4 13:17:45.80 +01:04:50.4 60.094 30.6874 sdB1VI:He3 26694 850 4.929 0.089 -2.159 0.063 1.48E+00 

J131916.15-011405.0 13:19:16.15 -01:14:05.0 227.376 30.5863 sdB4VI:He3 16894 205 4.132 0.035 -1.737 0.047 1.39E+00 


181



(km s −1 ) (K) (cgs) 

J132503.17+043239.4 13:25:03.17 +04:32:39.4 -179.976 30.226 sdB9V:He3 12173 89 3.128 0.023 -0.750 0.047 1.16E+01 

J132556.94-032329.6 13:25:56.94 -03:23:29.6 66.5581 34.6956 sdB5VI:He10 41733 589 5.500 0.064 -1.997 0.129 5.42E+00 

J132619.95+035754.4 13:26:19.95 +03:57:54.4 -82.3389 32.8285 sdO9VI:He13 34266 169 5.312 0.011 -1.206 0.028 1.41E+00 

J133200.96+673325.8 13:32:00.96 +67:33:25.8 -101.751 33.9539 sdO9VII:He7 34458 233 5.474 0.048 -1.532 0.030 2.48E+00 

J133449.26+041014.9 13:34:49.26 +04:10:14.9 83.6733 27.9956 sdB0VII:He29 34610 210 4.778 0.033 -0.136 0.004 5.32E+00 

J133546.10+555429.8 13:35:46.10 +55:54:29.8 -21.676 31.3781 sdB4V:He3 16963 259 4.022 0.039 -1.767 0.051 9.94E+00 

J133757.40-005647.2 13:37:57.40 -00:56:47.2 -159.448 30.4751 sdB6III:He6 12195 126 3.450 0.037 -2.019 0.181 5.28E+00 

J134344.11+465825.3 13:43:44.11 +46:58:25.3 3.03651 30.0884 sdO5VIII:He10 29224 223 4.133 0.047 -3.000 0.434 7.49E+00 

J134545.24-000641.6 13:45:45.24 -00:06:41.6 -3.70993 37.7028 sdO8VII:He28 35903 207 5.268 0.035 -0.269 0.006 3.79E+00 

J134600.55+052034.3 13:46:00.55 +05:20:34.3 -39.3819 33.4071 sdB4V:He0 30255 205 5.216 0.041 -3.000 0.434 1.85E+00 

J134948.30-024639.3 13:49:48.30 -02:46:39.3 -149.397 31.2636 sdB4IV:He3 11972 150 3.635 0.044 -2.284 0.167 7.21E+00 

J135140.69+023429.2 13:51:40.69 +02:34:29.2 20.1812 31.8161 sdO9VI:He4 33545 283 5.190 0.048 -3.000 0.434 1.74E+00 

J135707.35+010454.4 13:57:07.35 +01:04:54.4 -164.908 27.8615 sdO7VIII:He29 31999 274 3.749 0.049 0.597 0.027 4.46E+00 

J135746.59+530758.7 13:57:46.59 +53:07:58.7 -88.3881 32.612 sdB2VI:He2 29997 447 5.104 0.044 -2.552 0.155 3.56E+00 

J140118.74-012024.8 14:01:18.74 -01:20:24.8 -138.416 33.4862 sdO6VII:He7 30768 263 5.000 0.053 -2.479 0.131 2.19E+00 

J140252.20+465918.5 14:02:52.20 +46:59:18.5 -178.233 32.0421 sdB5VI:He6 29111 465 5.131 0.052 -2.315 0.090 5.23E+00 

J140545.26+014419.1 14:05:45.26 +01:44:19.1 -66.9086 31.5385 sdO9VI:He6 28704 421 5.389 0.051 -1.645 0.038 1.07E+00 

J140715.42+033147.6 14:07:15.42 +03:31:47.6 31.4863 28.4411 sdO8VI:He35 49941 546 5.500 0.087 0.572 0.039 6.76E+00 

J140839.10+653124.4 14:08:39.10 +65:31:24.4 -170.614 33.9581 sdB0VI:He4 30105 320 5.122 0.045 -3.000 0.434 2.97E+00 

J141812.51-024427.0 14:18:12.51 -02:44:27.0 -183.21 30.1228 sdO1VIII:He32 50000 209 5.951 0.067 0.364 0.020 2.07E+00 

J142226.93-023100.5 14:22:26.93 -02:31:00.5 129.172 31.5639 sdO6V:He2 14553 198 3.828 0.040 -1.470 0.090 1.70E+01 

J142339.81+014947.3 14:23:39.81 +01:49:47.3 27.9633 33.792 sdB4V:He6 30705 324 5.509 0.049 -1.753 0.049 2.50E+00 





(km s −1 ) (K) (cgs) 

J142416.88-014335.0 14:24:16.88 -01:43:35.0 91.6777 27.5647 sdB3V:He3 14999 189 3.634 0.034 -2.168 0.128 9.53E-01 

J142459.58+031943.4 14:24:59.58 +03:19:43.4 37.3083 33.0024 sdO8VII:He6 34325 274 5.321 0.038 -1.414 0.023 1.74E+00 

J142551.30-013317.3 14:25:51.30 -01:33:17.3 -37.9939 32.3239 sdO2VII:He13 41093 307 5.375 0.046 -1.716 0.045 1.54E+00 

J142956.63+563144.0 14:29:56.63 +56:31:44.0 -91.5749 31.4605 sdB1VI:He3 28557 299 5.068 0.046 -2.598 0.172 1.03E+00 

J143006.23+510314.1 14:30:06.23 +51:03:14.1 -112.744 21.9973 sdO4VII:He37 45000 99 5.767 0.030 1.261 0.047 2.27E+00 

J143153.06-002824.3 14:31:53.06 -00:28:24.3 -31.9216 39.9306 sdO6VIII:He7 35581 232 5.499 0.043 -0.977 0.020 5.72E+00 


J143917.64+010250.8 14:39:17.64 +01:02:50.8 28.1612 30.2275 sdB1VI:He6 19939 368 4.134 0.043 -1.676 0.041 1.34E+00 

J143917.64+010251.1 14:39:17.64 +01:02:51.1 24.1876 30.1996 sdB5V:He3 19464 338 4.116 0.041 -1.709 0.044 2.06E+00 

J144024.72+022118.7 14:40:24.72 +02:21:18.7 -35.8459 36.2062 sdO6VI:He11 34898 91 5.332 0.021 -1.067 0.020 3.55E+00 

J144141.37+450651.4 14:41:41.37 +45:06:51.4 -154.864 30.9137 sdB2VI:He6 19982 484 4.500 0.059 -1.690 0.043 2.50E+00 

J144301.69+514410.3 14:43:01.69 +51:44:10.3 -171.018 29.5335 sdB8IV:He4 19181 308 4.358 0.050 -1.592 0.034 1.22E+00 

J144346.62+491733.7 14:43:46.62 +49:17:33.7 -80.1405 35.8834 sdO8VIII:He12 35572 224 5.421 0.048 -1.450 0.024 1.91E+00 

J144514.93+000249.0 14:45:14.93 +00:02:49.0 0.302733 25.936 sdB1VII:He12 34682 78 5.019 0.033 -1.399 0.033 3.37E+00 

J144709.20+511639.8 14:47:09.20 +51:16:39.8 -148.603 33.0281 sdO7VI:He34 45000 100 5.908 0.031 1.260 0.047 1.74E+01 

J144737.76+020942.6 14:47:37.76 +02:09:42.6 35.4762 34.5196 sdO5VII:He11 32773 127 5.563 0.016 -2.085 0.053 7.14E+00 

J145049.50+624940.9 14:50:49.50 +62:49:40.9 -159.082 31.7249 sdB0VI:He2 28051 771 4.739 0.071 -3.000 0.434 6.07E+00 

J145426.67+472004.4 14:54:26.67 +47:20:04.4 21.6808 32.273 sdB2VI:He5 29437 522 5.478 0.056 -2.180 0.066 9.68E-01 

J145606.42+500155.3 14:56:06.42 +50:01:55.3 -69.5387 32.8387 sdB5V:He8 30764 300 5.343 0.044 -1.598 0.034 1.78E+00 

J145657.73+495310.8 14:56:57.73 +49:53:10.8 -39.8157 24.9202 sdB1VI:He9 33478 223 5.348 0.046 -1.547 0.031 1.62E+00 

J145748.84+561323.5 14:57:48.84 +56:13:23.5 -202.486 30.5689 sdB3VI:He2 12512 109 3.669 0.029 -2.057 0.148 8.49E+00 

J150829.03+494051.0 15:08:29.03 +49:40:51.0 -136.073 34.0157 sdB0VII:He1 27313 913 5.012 0.087 -2.359 0.099 4.82E+00 

J151030.69-014345.9 15:10:30.69 -01:43:45.9 -152.78 22.8201 sdO7VII:He39 44820 94 5.982 0.033 1.866 0.191 2.06E+00 


183



(km s −1 ) (K) (cgs) 

J151042.06+040955.5 15:10:42.06 +04:09:55.5 -63.2658 33.3369 sdO9VI:He10 34792 231 5.512 0.046 -1.360 0.020 1.64E+00 

J151105.38+515956.4 15:11:05.38 +51:59:56.4 -151.756 33.1678 sdB3VI:He1 18406 281 4.286 0.047 -1.793 0.081 1.09E+01 

J151231.29+005317.7 15:12:31.29 +00:53:17.7 -47.1969 36.6202 sdB1VI:He29 36493 164 5.723 0.031 -0.486 0.010 7.25E+00 

J151306.72+011439.1 15:13:06.72 +01:14:39.1 -93.1191 35.0598 sdB3IV:He2 26002 436 5.037 0.045 -2.301 0.087 3.36E+00 

J151415.66-012925.2 15:14:15.66 -01:29:25.2 -121.987 24.7826 sdO5VII:He39 45000 164 5.687 0.044 0.560 0.029 2.33E+00 

J151617.94+412948.4 15:16:17.94 +41:29:48.4 -181.258 32.9988 sdB3VI:He2 28962 432 5.175 0.052 -2.447 0.122 3.69E+00 

J151743.47+514445.4 15:17:43.47 +51:44:45.4 -79.9052 32.279 sdO8VII:He10 34518 236 5.299 0.046 -1.494 0.027 2.61E+00 

J151808.48+041043.7 15:18:08.48 +04:10:43.7 -38.9928 26.0169 sdO9VI:He14 35439 221 5.426 0.048 -1.549 0.031 1.38E+00 

J151847.69+551154.2 15:18:47.69 +55:11:54.2 -34.4602 29.707 sdB5V:He3 17317 318 4.000 0.047 -1.605 0.035 2.56E+00 

J152332.82+353237.0 15:23:32.82 +35:32:37.0 11.3484 34.981 sdB3VI:He24 36703 193 5.500 0.035 -0.685 0.016 2.50E+00 

J152607.88+001640.8 15:26:07.88 +00:16:40.8 87.1892 31.4734 sdO5VI:He34 43975 320 5.050 0.042 -0.072 0.004 2.08E+00 

J152833.16+440009.7 15:28:33.16 +44:00:09.7 45.6215 28.5694 sdB7III:He2 12000 184 3.413 0.049 -2.108 0.223 2.66E+00 

J153056.33+024222.6 15:30:56.33 +02:42:22.6 -69.1411 21.3778 sdO6VII:He39 44761 109 6.000 0.029 1.163 0.019 1.62E+00 

J153204.36+324152.7 15:32:04.36 +32:41:52.7 -152.653 30.2446 sdB0VI:He38 39980 117 5.783 0.018 -0.194 0.004 1.69E+01 

J153217.24+454621.0 15:32:17.24 +45:46:21.0 -56.1277 31.2027 sdO8VII:He9 33385 349 5.063 0.024 -1.818 0.057 2.48E+00 

J153411.10+543345.3 15:34:11.10 +54:33:45.3 -89.259 33.0638 sdB0VI:He3 31907 321 5.106 0.048 -2.394 0.108 1.78E+00 

J153508.52+032456.3 15:35:08.52 +03:24:56.3 11.5925 26.1501 sdB4II:He10 13657 282 3.012 0.047 -1.400 0.065 1.69E+01 

J154043.10+435950.1 15:40:43.10 +43:59:50.1 -70.6263 24.0808 sdO7VII:He38 44843 123 5.904 0.033 1.218 0.050 2.40E+00 

J154338.69+001202.1 15:43:38.69 +00:12:02.1 -29.8052 32.9385 sdB2VI:He2 29654 499 5.247 0.057 -2.589 0.169 2.86E+00 

J154531.02+563944.7 15:45:31.02 +56:39:44.7 -104.244 30.7923 sdB3VI:He5 26509 389 4.933 0.053 -2.039 0.047 1.55E+00 

J154809.97-004931.4 15:48:09.97 -00:49:31.4 -28.7556 35.4187 sdO9VI:He4 32645 255 5.499 0.049 -2.978 0.413 1.42E+00 

J154830.67+003656.7 15:48:30.67 +00:36:56.7 -126.171 29.7408 sdB4IV:He4 11429 135 3.277 0.035 -0.911 0.052 9.04E-01 





(km s −1 ) (K) (cgs) 

J155628.35+011335.0 15:56:28.35 +01:13:35.0 67.4947 34.8931 sdB3V:He2 30988 255 5.425 0.049 -3.000 0.434 8.72E-01 

J155642.95+501537.5 15:56:42.95 +50:15:37.5 -173.848 29.5376 sdO2VII:He33 50000 208 5.852 0.066 0.364 0.020 1.69E+00 

J160241.13-001207.1 16:02:41.13 -00:12:07.1 -69.1264 35.5392 sdO7VII:He2 31639 283 5.111 0.046 -2.624 0.183 4.62E+00 

J160759.27+383746.4 16:07:59.27 +38:37:46.4 -49.3069 34.2912 sdO9VII:He14 34081 201 5.542 0.035 -0.941 0.022 1.90E+00 

J160810.18+425845.1 16:08:10.18 +42:58:45.1 -163.815 33.3498 sdB6V:He0 31066 258 5.316 0.047 -3.000 0.434 5.77E+00 

J161328.22+004703.2 16:13:28.22 +00:47:03.2 -149.526 32.9946 sdB7IV:He5 21059 530 4.379 0.062 -2.274 0.082 5.34E+00 


J161418.97+261628.8 16:14:18.97 +26:16:28.8 106.961 33.64 sdB5V:He5 28481 296 5.192 0.048 -2.471 0.128 4.44E+00 

J161627.11-002933.0 16:16:27.11 -00:29:33.0 -0.0543509 28.2373 sdO7VI:He39 45000 128 5.573 0.046 2.083 0.210 6.30E+00 

J161631.29-003853.3 16:16:31.29 -00:38:53.3 52.866 36.9722 sdB1VII:He6 33369 346 5.202 0.027 -1.568 0.032 2.41E+00 

J162250.09+002631.9 16:22:50.09 +00:26:31.9 -15.4032 35.4912 sdB2VI:He5 30731 222 5.410 0.047 -2.238 0.075 1.79E+00 

J162256.66+473051.1 16:22:56.66 +47:30:51.1 -68.6 32.3725 sdB2VI:He5 29637 557 5.500 0.059 -1.764 0.050 7.12E-01 

J162310.50+425831.2 16:23:10.50 +42:58:31.2 -8.93843 34.4132 sdB6IV:He0 35771 59 5.768 0.025 -2.504 0.139 6.83E+00 

J162359.61+375435.3 16:23:59.61 +37:54:35.3 -273.35 27.2163 sdB5V:He3 12662 107 3.617 0.032 -1.457 0.124 1.16E+00 

J162535.78+362039.3 16:25:35.78 +36:20:39.3 -218.657 34.2576 sdB5V:He3 30223 254 5.481 0.048 -2.371 0.102 8.39E+00 

J162616.71+380710.5 16:26:16.71 +38:07:10.5 -45.5658 21.99 sdO6VII:He40 44792 98 6.000 0.031 1.556 0.094 1.87E+00 

J162628.92+370448.6 16:26:28.92 +37:04:48.6 -51.0648 27.457 sdB6IV:He2 12281 171 3.500 0.043 -2.359 0.198 1.25E+00 

J162711.81-000950.9 16:27:11.81 -00:09:50.9 34.3853 29.4658 sdB3VII:He34 38699 179 5.927 0.027 0.121 0.003 8.53E+00 

J163148.85+372617.2 16:31:48.85 +37:26:17.2 -92.9749 28.006 sdB5III:He2 14660 186 3.539 0.035 -1.962 0.119 2.44E+00 

J163306.58+003216.3 16:33:06.58 +00:32:16.3 -61.1765 34.9432 sdB2VI:He1 31471 274 5.390 0.050 -3.000 0.434 1.45E+00 

J163446.48-005345.6 16:34:46.48 -00:53:45.6 52.9502 34.1242 sdB4V:He1 29977 519 5.293 0.060 -3.000 0.434 2.42E+00 

J163509.13+000235.0 16:35:09.13 +00:02:35.0 41.7581 34.7076 sdB6VI:He1 28046 733 5.097 0.059 -3.000 0.434 3.41E+00 

J163702.79-011351.7 16:37:02.79 -01:13:51.7 -20.6207 26.3892 sdO7VI:He39 45001 99 5.664 0.030 1.261 0.047 2.51E+00 


185



(km s −1 ) (K) (cgs) 

J163800.17+010259.7 16:38:00.17 +01:02:59.7 -108.217 29.5044 sdB6V:He3 17182 323 4.000 0.047 -2.192 0.135 1.50E+00 

J163815.97-001919.2 16:38:15.97 -00:19:19.2 -136.585 31.9708 sdB4V:He2 20734 493 4.617 0.045 -2.172 0.064 4.45E+00 

J163913.62+384957.1 16:39:13.62 +38:49:57.1 -66.4781 29.4706 sdB3V:He2 16016 225 3.671 0.037 -1.574 0.049 3.06E+00 

J163936.03+343230.6 16:39:36.03 +34:32:30.6 -228.974 26.0607 sdB2IV:He15 26684 424 4.531 0.046 -1.121 0.023 1.44E+00 

J164042.91+311734.6 16:40:42.91 +31:17:34.6 -369.398 41.0764 sdB1VII:He33 30989 270 3.876 0.033 0.945 0.008 1.04E+01 

J164122.33+334452.1 16:41:22.33 +33:44:52.1 -49.8183 33.4568 sdO9VI:He7 29235 394 5.530 0.043 -2.085 0.053 1.54E+00 

J164204.38+440303.3 16:42:04.38 +44:03:03.3 -336.658 30.8894 sdO9VII:He5 30160 380 4.958 0.058 -2.545 0.152 2.84E+00 

J164326.04+330113.2 16:43:26.04 +33:01:13.2 -66.2194 33.4823 sdB2VI:He3 29898 554 5.500 0.059 -2.254 0.078 1.22E+00 

J164419.45+452326.8 16:44:19.45 +45:23:26.8 -356.849 35.8627 sdB1VI:He5 32287 245 5.499 0.047 -2.080 0.052 2.67E+00 

J164444.94+312345.4 16:44:44.94 +31:23:45.4 -64.8034 31.1266 sdO8VII:He7 32067 275 5.483 0.051 -3.000 0.434 2.03E+00 

J165022.05+312749.7 16:50:22.05 +31:27:49.7 8.82859 29.3108 sdB2VI:He26 35685 174 5.204 0.032 0.118 0.003 3.37E+00 

J165404.27+303701.8 16:54:04.27 +30:37:01.8 134.707 31.3034 sdB1VI:He6 27306 638 5.502 0.067 -2.177 0.065 1.04E+00 

J165422.26+631534.3 16:54:22.26 +63:15:34.3 -20.424 35.2667 sdB2VI:He5 34568 196 5.672 0.037 -1.387 0.021 7.54E-01 

J165424.30+303941.3 16:54:24.30 +30:39:41.3 -249.1 28.7513 sdB7IV:He0 13840 313 3.500 0.055 -3.000 0.434 9.77E+00 

J165841.83+413115.6 16:58:41.83 +41:31:15.6 -40.2591 30.1816 sdB2VI:He8 32230 294 5.038 0.037 -1.819 0.057 1.59E+00 

J170045.67+604308.5 17:00:45.67 +60:43:08.5 -270.047 28.8328 sdO4VII:He36 48271 357 5.869 0.071 0.731 0.052 5.12E+00 

J170356.68+341505.0 17:03:56.68 +34:15:05.0 -83.8331 32.767 sdB3VI:He3 28252 453 5.000 0.069 -2.653 0.195 1.91E+00 

J170714.27+654025.6 17:07:14.27 +65:40:25.6 -92.8553 34.0747 sdB4VI:He7 34656 250 5.350 0.049 -1.539 0.030 1.04E+00 

J171424.17+614711.0 17:14:24.17 +61:47:11.0 -12.9255 33.6796 sdB0VII:He2 32113 394 4.999 0.065 -3.000 0.434 1.81E+00 

J171629.93+575121.2 17:16:29.93 +57:51:21.2 -312.791 35.8141 sdO5VIII:He21 34186 229 5.465 0.038 -0.798 0.019 6.08E+00 

J171722.10+580558.9 17:17:22.10 +58:05:58.9 -110.822 33.3324 sdO9VI:He9 34248 187 5.153 0.021 -1.578 0.049 1.84E+00 

J171813.87+595355.2 17:18:13.87 +59:53:55.2 -109.876 28.7311 sdB5IV:He2 13053 128 3.685 0.032 -2.096 0.163 2.32E+00 





(km s −1 ) (K) (cgs) 

J171929.52+273229.3 17:19:29.52 +27:32:29.3 -82.9454 31.6473 sdB1VI:He4 30755 350 4.961 0.061 -3.000 0.434 2.30E+00 

J171947.87+591604.3 17:19:47.87 +59:16:04.3 89.6508 29.0229 sdB6IV:He2 16126 217 3.854 0.048 -1.194 0.068 2.05E+00 

J172037.66+534009.4 17:20:37.66 +53:40:09.4 -72.6856 35.312 sdO5VI:He7 29304 477 5.200 0.054 -2.384 0.105 8.30E+00 

J172338.54+601444.1 17:23:38.54 +60:14:44.1 -41.3729 24.0531 sdO9VI:He12 34424 79 5.235 0.024 -1.353 0.029 2.21E+00 

J203729.93+001954.1 20:37:29.93 +00:19:54.1 -79.0277 25.4161 sdO8VII:He9 33681 230 5.173 0.046 -1.762 0.050 1.58E+00 

J203826.42+010953.5 20:38:26.42 +01:09:53.5 -113.742 33.1624 sdB4V:He3 29406 458 5.398 0.063 -2.442 0.120 2.16E+00 


J204546.82-054355.7 20:45:46.82 -05:43:55.7 -44.8684 28.8934 sdO9VI:He8 32441 54 5.167 0.031 -1.477 0.026 3.53E+00 

J204658.84-055100.1 20:46:58.84 -05:51:00.1 -57.0742 33.6601 sdB2VI:He3 33102 231 5.450 0.047 -1.545 0.030 1.77E+00 

J204726.94-060325.8 20:47:26.94 -06:03:25.8 -1.97117 33.4697 sdB2VI:He13 35395 177 5.648 0.037 -1.330 0.028 3.50E+00 

J205030.40-061957.9 20:50:30.40 -06:19:57.9 -489.474 28.5065 sdO5VI:He38 45663 558 5.545 0.047 0.388 0.023 8.00E+00 

J210454.89+110645.6 21:04:54.89 +11:06:45.6 -41.5566 31.6509 sdO5VIII:He8 35000 210 5.086 0.037 -3.000 0.434 2.34E+00 

J211045.16+000142.1 21:10:45.16 +00:01:42.1 -103.57 31.6623 sdO7VII:He10 33998 369 4.976 0.061 -2.257 0.079 1.96E+00 

J211104.97+091042.9 21:11:04.97 +09:10:42.9 157.456 32.7679 sdB4V:He3 29504 485 5.321 0.059 -2.820 0.287 4.77E+00 

J211318.37+001738.4 21:13:18.37 +00:17:38.4 -14.4957 21.8039 sdO7VII:He38 45000 100 5.905 0.031 1.475 0.078 1.70E+00 

J211338.31-000940.7 21:13:38.31 -00:09:40.7 -23.6116 37.4435 sdB0VII:He7 36649 409 5.500 0.053 -2.148 0.061 3.58E+00 

J211339.69+100640.4 21:13:39.69 +10:06:40.4 -65.7901 26.6768 sdO6VII:He9 32140 432 4.916 0.061 -2.154 0.062 2.72E+00 

J211425.02+005517.6 21:14:25.02 +00:55:17.6 14.6469 36.5492 sdO9VII:He17 36832 199 5.805 0.038 -1.058 0.020 5.22E+00 

J211651.96-003328.5 21:16:51.96 -00:33:28.5 11.2966 32.9385 sdB4V:He5 28210 447 5.495 0.064 -2.409 0.111 6.76E+00 

J211921.36+005749.8 21:19:21.36 +00:57:49.8 -49.1174 20.1741 sdO5VII:He38 44901 114 6.000 0.039 1.816 0.170 1.47E+00 

J213112.24+112936.2 21:31:12.24 +11:29:36.2 3.08591 34.7669 sdO8VII:He9 35450 179 5.595 0.036 -1.432 0.023 1.28E+00 

J213718.87+123303.3 21:37:18.87 +12:33:03.3 -106.859 28.2812 sdB6IV:He2 14718 176 3.601 0.032 -1.630 0.056 2.40E+00 

J213808.12+105741.8 21:38:08.12 +10:57:41.8 26.7674 26.7784 sdB2VI:He10 34133 180 5.095 0.017 -1.620 0.036 3.81E+00 


187



(km s −1 ) (K) (cgs) 

J215049.19+010338.4 21:50:49.19 +01:03:38.4 34.2033 36.462 sdB1V:He6 34388 84 5.351 0.023 -1.279 0.033 6.39E+00 

J215053.84+131650.6 21:50:53.84 +13:16:50.6 -108.905 33.9992 sdB1VI:He5 30233 248 5.423 0.047 -2.222 0.072 1.65E+00 

J215227.25+115726.7 21:52:27.25 +11:57:26.7 5.51462 32.5347 sdB3VI:He5 34586 272 5.273 0.041 -1.354 0.029 3.73E+00 

J215307.34-071948.4 21:53:07.34 -07:19:48.4 -13.1178 35.7679 sdB1VI:He7 32449 166 5.683 0.037 -1.996 0.086 1.58E+00 

J215631.56+121237.7 21:56:31.56 +12:12:37.7 -66.6073 23.9981 sdO7VII:He40 45581 220 5.891 0.035 1.303 0.052 3.56E+00 

J220403.45+122507.3 22:04:03.45 +12:25:07.3 -153.325 28.335 sdB7IV:He1 11740 170 3.500 0.049 -2.050 0.244 1.14E+00 

J220810.05+115913.9 22:08:10.05 +11:59:13.9 -69.4664 29.3261 sdB2VI:He7 26862 735 5.000 0.076 -2.150 0.061 3.14E+00 

J221816.78+121400.7 22:18:16.78 +12:14:00.7 -74.2256 32.2427 sdB7IV:He4 25415 605 5.000 0.066 -1.422 0.023 2.25E+00 

J222238.69+005125.0 22:22:38.69 +00:51:25.0 -114.061 30.8319 sdO9VII:He11 33853 422 4.627 0.058 -3.000 0.434 1.20E+00 

J222932.81-004822.5 22:29:32.81 -00:48:22.5 -18.7916 34.4566 sdO9VII:He8 34802 243 5.328 0.044 -1.360 0.030 1.69E+00 

J223008.26+132734.2 22:30:08.26 +13:27:34.2 -30.7785 29.1779 sdB5IV:He2 14433 235 3.762 0.036 -2.304 0.175 3.47E+00 

J223839.13+122517.9 22:38:39.13 +12:25:17.9 -96.1084 31.3337 sdB0IV:He6 30656 195 5.244 0.042 -3.000 0.434 5.02E+00 

J224105.19+141810.2 22:41:05.19 +14:18:10.2 -202.31 30.0845 sdB9III:He0 13763 258 3.501 0.052 -2.669 0.203 1.51E+01 

J231956.10-093937.6 23:19:56.10 -09:39:37.6 16.0919 28.428 sdO9VI:He16 35985 299 4.845 0.049 -0.774 0.023 3.23E+00 

J233914.00+134214.3 23:39:14.00 +13:42:14.3 -411.792 25.2673 sdO5VI:He39 45402 513 5.500 0.077 0.607 0.030 4.55E+00 

J234421.80-101142.8 23:44:21.80 -10:11:42.8 -72.6667 27.5984 sdB7IV:He2 11132 98 3.217 0.026 -0.954 0.054 7.84E-01 

J234853.52+151215.5 23:48:53.52 +15:12:15.5 -69.1169 32.1839 sdB0VI:He5 30651 167 5.556 0.035 -2.315 0.090 6.37E-01 

J235108.66+002623.0 23:51:08.66 +00:26:23.0 -195.75 28.5503 sdB0VII:He30 38778 177 5.407 0.039 -0.288 0.007 4.21E+00 


Appendix C 

Results for 83 2MASS-Selected 

Hot Subdwarf Candidates 

Parameters and classifications are listed in this table for the 2MASS-selected stars 

obtained from E.M. Green (Green et al., 2006). The internal errors of SFIT are given 

along with the value of χ 2 for the best fit. 

Table C.1: Results for 83 2MASS-Selected Hot Subdwarf Candidates 

Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 

(2MASX J-) (K) (cgs) 

Balloon 090900004 sdO7VII:He11 31147 278 4.757 0.054 -1.811 0.056 1.77E+00 

BD+48 2721 sdB2VI:He6 22979 240 5.267 0.032 -1.629 0.018 2.54E+00 

J011407.62+160800.6 sdB8VI:He5 10795 61 3.156 0.016 -0.368 0.007 1.16E+00 

J020656.17+143858.6 sdB1VII:He7 29873 484 5.850 0.046 -1.897 0.034 1.70E+00 

J021555.50+234314.3 sdB1VI:He8 32485 57 5.623 0.008 -0.758 0.010 2.60E+00 

J021619.04+275902.0 sdB1VI:He6 27594 292 5.719 0.034 -2.100 0.055 1.66E+00 

J021742.16+280329.5 sdO9VII:He8 32698 196 5.838 0.033 -1.341 0.019 1.34E+00 

J022512.51+234820.7 sdO6VII:He13 38384 119 6.000 0.030 -1.417 0.023 1.86E+00 

J030725.66+175248.0 sdB2V:He10 28000 352 5.095 0.030 -0.701 0.010 2.94E+00 

J041550.17+015421.0 sdB0VII:He9 32883 197 5.943 0.035 -1.390 0.011 1.38E+00 

J042034.85+012041.0 sdO6VII:He38 40547 120 5.117 0.035 1.301 0.017 3.44E+00 

J043037.82-010308.3 sdB5VI:He3 13447 91 3.640 0.028 -0.293 0.004 8.08E-01 

J074722.07+622545.2 sdB3VI:He12 27665 271 5.752 0.031 -0.696 0.006 8.00E+00 


189

190 Chapter C - On the Automatic Analysis of Stellar Spectra 

Table C.1: continued 



J075407.66+651540.2 sdB4V:He5 11002 53 2.750 0.020 0.000 0.000 1.42E+00 

J075815.66+514348.0 sdB3V:He7 11175 56 2.706 0.020 0.000 0.000 1.67E+00 

J080245.68+474817.7 sdB5VII:He2 9783 93 3.797 0.034 -0.954 0.093 1.50E+00 

J082643.33+330859.2 sdB4V:He7 18872 171 4.356 0.034 -0.827 0.009 2.38E+00 

J082822.23+295131.3 sdB3V:He7 16877 137 4.122 0.022 -0.723 0.013 1.60E+00 

J083127.37+422201.7 sdA5VI:He2 10463 50 3.179 0.013 -0.395 0.015 1.39E+00 

J083320.34+202424.8 sdB4VI:He14 22956 173 5.216 0.026 -0.407 0.004 6.77E+00 

J083535.58+194412.6 sdB3VI:He6 27775 429 5.784 0.041 -1.846 0.030 1.64E+00 

J083734.74+672413.6 sdB0V:He17 30001 392 4.688 0.041 -0.578 0.009 4.39E+00 

J083909.92+182416.6 sdB4V:He5 10235 43 2.698 0.017 -0.079 0.002 1.78E+00 

J084447.93+404426.5 sdB4V:He6 12351 59 3.321 0.024 0.001 0.000 1.29E+00 

J084535.67+194150.3 sdB3VI:He7 22899 242 5.236 0.033 -1.309 0.009 1.97E+00 

J084937.68+234847.3 sdB3VI:He5 18128 162 4.653 0.029 -1.348 0.019 2.31E+00 

J085148.86+434402.5 sdB7VI:He4 10292 49 3.139 0.014 -0.368 0.007 1.55E+00 

J085649.27+170114.7 sdB1VI:He5 29527 276 5.747 0.035 -2.111 0.056 1.47E+00 

J090158.77+395931.3 sdB6VI:He2 11188 80 3.359 0.026 -0.257 0.006 1.13E+00 

J091206.53+091621.7 sdB2V:He10 27999 442 4.689 0.041 -0.672 0.012 1.86E+00 

J091706.65+541817.3 sdB5VI:He2 11016 71 3.157 0.019 -0.122 0.002 8.49E-01 

J091751.45+615630.1 sdB4V:He5 10286 32 2.662 0.014 0.159 0.002 2.01E+00 

J092116.62+023741.0 sdB3VI:He15 27144 227 5.295 0.030 -0.300 0.003 6.12E+00 

J092246.92+001741.0 sdB4V:He5 11427 69 2.797 0.019 -0.000 0.000 1.39E+00 

J093112.84+051040.4 sdB5VI:He3 10558 56 3.376 0.018 -0.278 0.005 1.11E+00 

J093150.58+031848.0 sdB6VI:He1 10097 62 3.303 0.027 -0.645 0.018 1.34E+00 

J093426.95+821304.3 sdB7IV:He7 9909 57 2.917 0.025 0.297 0.012 5.71E+00 

J093453.32+841851.5 sdO7VII:He37 42886 87 5.789 0.030 1.526 0.015 1.38E+00 

J093832.18+041343.9 sdB7V:He1 10290 71 3.306 0.026 -0.553 0.013 1.64E+00 

J093935.15+104321.9 sdB3V:He6 10868 53 2.793 0.020 0.000 0.000 1.43E+00 

J094047.71+185332.9 sdB4VI:He5 10589 44 2.823 0.014 0.308 0.005 1.64E+00 

J094105.31-004755.8 sdO4VII:He33 45003 121 5.312 0.043 0.000 0.000 2.53E+00 

J094107.57+375342.6 sdB3V:He7 15712 55 3.398 0.008 -0.597 0.008 1.31E+00 

J094353.47+783140.7 sdB2VI:He5 27999 382 5.662 0.036 -2.105 0.055 1.92E+00 

J094509.99+553450.2 sdB4V:He6 18717 165 4.251 0.031 -0.898 0.014 1.65E+00 

J094637.19+351755.8 sdB7VI:He3 11021 69 3.387 0.021 -0.140 0.005 1.01E+00 

J095219.06+441941.9 sdB4V:He7 13014 73 3.197 0.025 0.046 0.000 1.24E+00 

J095708.88+223055.6 sdB4V:He5 10910 45 2.714 0.012 0.272 0.004 1.45E+00 

J095854.23+360314.3 sdF8VI:He2 9618 68 3.384 0.022 -0.467 0.019 1.81E+00 

J095855.78-044413.9 sdB7IV:He6 9321 48 2.795 0.021 -0.368 0.012 3.84E+00 

continued on next page

191 

Table C.1: continued 



J095859.91+082504.4 sdB6VII:He4 9806 74 3.560 0.013 -0.319 0.015 1.35E+00 

J100058.89+024804.4 sdB4V:He8 17115 148 4.229 0.026 -0.632 0.009 2.76E+00 

J100145.47+375733.2 sdB3VI:He3 9842 50 3.190 0.011 -0.336 0.016 1.21E+00 

J100509.89+384615.2 sdB6VII:He2 10547 70 3.793 0.030 -0.700 0.021 9.22E-01 

J100607.62+005326.2 sdB8V:He3 11614 83 3.096 0.017 -0.368 0.009 1.27E+00 

J100739.11+202546.7 sdB5VII:He5 10000 51 3.748 0.014 0.065 0.002 1.27E+00 

J104130.43+184209.8 sdB0VII:He8 32521 166 5.635 0.030 -1.401 0.022 1.15E+00 

J104653.08+515435.9 sdO8VII:He9 30750 262 4.799 0.053 -1.978 0.083 9.67E-01 

J104912.91+380014.9 sdB2V:He9 20087 213 4.130 0.030 -0.718 0.015 1.70E+00 

J111631.06+305838.7 sdB4V:He5 9474 52 2.754 0.021 -0.368 0.012 1.84E+00 

J111719.94+241207.1 sdB4V:He5 11157 73 2.806 0.023 -0.292 0.006 1.10E+00 

J111819.13+093144.4 sdA2V:He5 12109 52 3.182 0.022 -0.131 0.003 1.04E+00 

J112129.35+111917.0 sdB4V:He6 12729 67 3.204 0.024 0.056 0.000 1.14E+00 

J112832.64+603859.3 sdF5V:He3 10302 52 3.025 0.017 -0.307 0.005 1.53E+00 

J113435.70+664252.6 sdB4V:He6 12712 67 3.131 0.023 0.057 0.000 1.07E+00 

J113633.63+750653.7 sdO9VII:He7 35699 62 6.000 0.016 -1.672 0.041 1.43E+00 

J113837.54+250043.4 sdB5IV:He4 10255 43 2.594 0.016 -0.136 0.003 2.43E+00 

J114454.50+031550.2 sdA7V:He4 9764 53 3.118 0.021 -0.368 0.010 1.37E+00 

J122617.00+774312.4 sdB1VI:He6 28443 239 5.889 0.035 -1.996 0.043 2.38E+00 

J122745.99+113636.1 sdB6IV:He6 10356 40 2.666 0.017 0.030 0.000 3.30E+00 

J122843.58+282036.6 sdB4VI:He5 10230 46 2.844 0.018 -0.083 0.002 1.87E+00 

J123014.92+463720.0 sdB5V:He6 12513 44 3.084 0.018 0.154 0.003 1.15E+00 

J125049.06+743943.5 sdB2VI:He5 27913 436 5.654 0.038 -1.993 0.043 2.28E+00 

J131359.98+183131.3 sdB6VI:He3 10504 52 3.505 0.017 -0.083 0.003 1.15E+00 

J132546.78+400827.0 sdB3V:He8 11653 68 2.761 0.023 0.122 0.002 1.50E+00 

J132546.78+400827.0 sdB3V:He8 16440 74 4.149 0.014 -0.641 0.023 5.48E+00 

J132546.78+400827.0 sdB4V:He6 11653 68 2.761 0.023 0.122 0.002 1.50E+00 

J132546.78+400827.0 sdB4V:He6 16440 74 4.149 0.014 -0.641 0.023 5.48E+00 

J135515.91+533442.5 sdB3VI:He12 25842 231 5.254 0.030 -0.673 0.008 2.54E+00 

J135648.63+210510.1 sdB4V:He5 10516 42 2.609 0.012 0.135 0.002 1.45E+00 

J140123.40+742150.5 sdB4V:He7 16490 73 4.140 0.014 -0.689 0.026 1.77E+00 

J142127.88+712421.4 sdB2VI:He5 25982 319 5.847 0.037 -2.346 0.096 2.57E+00 

J143155.38+172404.9 sdA7V:He3 10112 55 2.881 0.019 -0.404 0.015 1.19E+00 

J145239.03+412618.1 sdB7V:He10 9479 31 2.852 0.021 0.439 0.005 5.63E+00 

J152653.06+794130.7 sdB0VI:He6 32936 67 5.770 0.015 -2.235 0.075 1.69E+00 


Appendix D 

The Armagh Observatory Cluster 

Over the course of this project, Armagh Observatory, as part of the CosmoGrid 1 initiative, 

acquired a dedicated computing cluster which I helped to set up and administer. 

The software configuration used by the cluster at the time of writing is documented 

herein. 

D.1 Hardware Configuration 

The cluster presently consists of sixteen vertically mounted Blade nodes: one master 

node, and fifteen slave nodes. Each slave node contains: 

• Two Intel Xeon 3GHz processors each with 1MB cache 

• 2GB RAM 

• One 40GB Maxtor SATA UDMA/133 hard drive 

• One Broadcom BCM5721 1000Base-T PCI Express NIC 

for: 

The master node has the same basic hardware configuration as per the slaves except 

1 http://www.cosmogrid.ie/ 

193

194 Chapter D - On the Automatic Analysis of Stellar Spectra 

• Two 240GB Maxtor SATA UDMA/133 hard drives 

• One CDRW/DVDR drive 

• One floppy disk drive 

• Two 1000Base-T network cards 

All of the nodes are interlinked by one 24 port gigabit ethernet switch, and are 

connected to one 16 port KVM unit. 

D.2 Software Configuration 

System Software 

The operating system used on all of the nodes is currently Red Hat Enterprise Linux 

AS release 3 (Taroon Update 3). 

The following software packages form the core of the cluster setup: 

• Condor 2 version 6.6.10 

• Intel Fortran Compiler Version 8.1 

• MPICH 1.2.4 

• Ganglia 3.0 

User Account Management 

User accounts are managed centrally on the master node by editing /etc/passwd and 

/etc/shadow using the standard account management tools. Once any changes to the 

2 http://www.cs.wisc.edu/condor/

D.2 Software Configuration 195 

user accounts have been made, /etc/passwd and /etc/shadow must be refreshed on 

all of the slave nodes by using the brcp and brsh commands. 

Home Directories 

The central partition of user home directories is located on the master node, and is 

shared out to all the slave nodes using NFS. This creates a single storage domain for the 

cluster, allowing user jobs running on the slave nodes to read/write data from/to the 

user’s home directory, thus avoiding the need for any bothersome manual file transfer 

operations. 

Each user has a disk space quota of 10GB. 

Condor 

Condor is a specialised batch system for managing compute-intensive jobs. Like most 

batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme, 

and resource classifications. Users submit their compute jobs to Condor, Condor puts 

the jobs in a queue, runs them, and then informs the user as to the result. 

A Condor cluster is comprised of a single machine which serves as the central manager, 

and an arbitrary number of other machines that are part of the cluster. Conceptually, 

the cluster is a collection of resources (machines) and resource requests (jobs). 

The role of Condor is to match waiting requests with available resources. Every part 

of Condor sends periodic updates to the central manager, the centralised repository of 

information about the state of the cluster. Periodically, the central manager assesses 

the current state of the cluster and tries to match pending requests with appropriate 

resources. 

The basic Condor setup for the Armagh Observatory cluster nominates the mas- 



ter node as the central manager for the cluster, with the slave nodes functioning as 

dedicated computing resources. No jobs are permitted to run on the master. 

Directory Layout And NFS Shares 

The Condor software is installed 

solely on the master in the directory 

/opt/condor-6.6.10. As the name of this directory is dependent on the version of 

Condor installed, a symbolic link called /opt/condor points to whatever directory contains 

the latest version. This symbolic link has been added to /etc/exports, and the 

Condor installation directory is shared out to all the slaves over NFS. 

Condor is set up to require that every node has a directory on its local filesystem 

to which the Condor daemons can write log information and create temporary work 

folders for user jobs. This directory is typically located at /home/condor, however, 

the central NFS share of home directories from the master does not allow a unique 

/home/condor for every node. 

Instead, each slave node has a disk partition called /condorhome which contains the 

directory /condorhome/condor/ that can be used by the local Condor daemons. 

On the master node, /condorhome is a symbolic link pointing to the /home partition 

wherein a directory called condor exists. 

Boot Script 

To ensure the Condor daemons are loaded up when a node is first powered on, a boot 

script named condor is located in /etc/init.d on each node. This boot script is then 

sym-linked into the runlevel 3 startup scripts directory, /etc/rc3.d/, as the entry 

S98condor. 

The boot script listing is:


#! /bin/sh 

export CONDOR_CONFIG=/opt/condor/etc/condor_config 

MASTER=/opt/condor/sbin/condor_master 

PS="/bin/ps auwx" 

case $1 in 

’start’) 

if [ -x $MASTER ]; then 

echo "Starting up Condor" 

$MASTER 

else 

echo "$MASTER is not executable. Skipping Condor startup." 

exit 1 

fi 

;; 

’stop’) 

pid=‘$PS | grep condor_master | grep -v grep | awk ’{print $2}’‘ 

if [ -n "$pid" ]; then 

# send SIGQUIT to the condor_master, which initiates its fast 

# shutdown method. The master itself will start sending 

# SIGKILL to all it’s children if they’re not gone in 20 

# seconds. 

echo "Shutting down Condor (fast-shutdown mode)" 

kill -QUIT $pid 

else 

echo "Condor not running" 

fi 

;; 

*) 

echo "Usage: condor {start|stop}" 

;; 

esac 

User Path Setup 

The Condor user commands for submitting a job to the cluster, checking cluster status 

and job queues, etc., along with their associated manual pages, are located in the 

/opt/condor subtree. 

To give users easy access to the commands and man pages, the appropriate shell 

variables are modified on login by two system-wide shell profile files, condor.sh and 

condor.csh, located in/etc/profile.d. They also set up the environment for MPICH 



and Intel’s Fortran compiler. 

For bash users, condor.sh effects this configuration: 

export CONDOR_CONFIG=/opt/condor/etc/condor_config 

if [ -z "${PATH}" ] 

then 

export PATH=/opt/condor/bin:/opt/mpich/bin 

else 

export PATH=/opt/condor/bin:/opt/mpich/bin:$PATH 

fi 

if [ -z "${MANPATH}" ] 

then 

export MANPATH=/opt/condor/man:/opt/mpich/man 

else 

export MANPATH=/opt/condor/man:/opt/mpich/man:$MANPATH 

fi 

if [ ‘id -u‘ = 0 ]; then 

export PATH=$PATH:/opt/condor/sbin:/opt/mpich/sbin 

fi 

### Set up ifort and idb 

. /opt/intel_fc_80/bin/ifortvars.sh 

. /opt/intel_idb_80/bin/idbvars.sh 

And condor.csh does the same for tcsh users: 

setenv CONDOR_CONFIG /opt/condor/etc/condor_config 

if !($?PATH) then 

setenv PATH /opt/condor/bin:/opt/mpich/bin 

else 

setenv PATH /opt/condor/bin:/opt/mpich/bin:$PATH 

endif 

if !($?MANPATH) then 

setenv MANPATH /opt/condor/man:/opt/mpich/man 

else 

setenv MANPATH /opt/condor/man:/opt/mpich/man:$MANPATH 

endif 

### Set up ifort and idb 

source /opt/intel_fc_80/bin/ifortvars.csh 

source /opt/intel_idb_80/bin/idbvars.csh


Condor Configuration Files 

/opt/condor/etc/condor_config is the global Condor configuration file containing 

settings for everything from basic cluster setup details, to network permissions, user 

policies, flocking, daemon controls, and so on. 

Most of the setting in this file can be left at their defaults. However, Part One of the 

file contains settings that must be customised for the particular Condor installation at 

a site. For the Observatory cluster, the settings for Part One are as follows: 

CONDOR_HOST 

RELEASE_DIR 

LOCAL_DIR 

LOCAL_CONFIG_FILE 

= master 

= /opt/condor 

= /condorhome/condor 

= $(RELEASE_DIR)/etc/$(HOSTNAME).local 

REQUIRE_LOCAL_CONFIG_FILE = TRUE 

CONDOR_ADMIN 

MAIL 

UID_DOMAIN 

FILESYSTEM_DOMAIN 

= root@master 

= /usr/bin/mail 

= arm.ac.uk 

= $(FULL_HOSTNAME) 

Other miscellaneous settings that have been changed are: 

### Only allow daemon read/write access to the 

### slave nodes connected on the LAN. 

HOSTALLOW_READ = 192.168.0.* 

HOSTALLOW_WRITE = 192.168.0.* 

### Fully qualified names are not used in /etc/hosts 

### so Condor likes this set. 

DEFAULT_DOMAIN_NAME = arm.ac.uk 

Each of the nodes in the cluster has its own Condor configuration file in/opt/condor/etc. 

The master node and the slave nodes are treated differently with the master having its 

own specific settings, and the slaves all having the same settings. 



The master’s configuration file, m44.local, contains the following: 

### The master never runs jobs 

START = FALSE 

### There are two NICs in the master. This tells 

### Condor to use the internal NIC. 

NETWORK_INTERFACE = 192.168.0.149 

COLLECTOR 

NEGOTIATOR 

DAEMON_LIST 

= $(SBIN)/condor_collector 

= $(SBIN)/condor_negotiator 

= MASTER, COLLECTOR, STARTD, NEGOTIATOR, SCHEDD 

JAVA = /usr/bin/java 

### Turn off reporting of pool stats to the 

### Condor people 

CONDOR_DEVELOPERS_COLLECTOR = NONE 

CONDOR_DEVELOPERS = NONE 

### PRIORITY_HALFLIFE = 1 adjust a user’s Condor 

### priority in real-time. Thus, when their job 

### releases any resources, the user’s priority 

### returns to 0.5 very quickly. 

PRIORITY_HALFLIFE = 1 

### Turn off any job preemption. No jobs will be 

### preempted for any reason. 

PREEMPTION_REQUIREMENTS = FALSE 

PREEMPTION_RANK = FALSE 

As each slave node has the same configuration, a time-saving device has been employed 

wherein any modifications to the slave setup are made in a template file called 

node.local.template. This file is then copied using a shell script to create all the 

nodeXX.local files for the slaves. 

At present, node.local.template contains: 

### Dedicated scheduler for running MPI jobs. 

DedicatedScheduler = "DedicatedScheduler@master" 

STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler 

START = TRUE 

SUSPEND = FALSE 

CONTINUE = TRUE 

PREEMPT = FALSE 

KILL = FALSE 

WANT_SUSPEND = FALSE 

WANT_VACATE = FALSE


RANK 

= Scheduler =?= $(DedicatedScheduler) 

### Tell the daemons not to pay attention to any 

### console activity. Prevents their Condor status 

### changing to ’Owner’ if someone logs in to a 

### node to perform maintenance. 

VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE = 0 

VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD = 0 

The shell script which performs the copying, refresh.sh, works by slurping the 

node names from /etc/brshtab, and then copies the template file using a for loop: 

#!/bin/bash 

NODES="‘cat /etc/brshtab‘" 

for I in $NODES 

do 

cp node.local.template $I.local 

done 

Condor User Policies 

Most users of the cluster tend to run large batches of relatively short jobs (of the order 

of < 1 hour per job), but some have submitted a small number of long-running jobs (of 

the order of several hours to several days). 

In general, each job submitted must be allowed to run to completion without being 

preempted, otherwise the job must start again at the beginning when it is reallocated 

to a slave node. For users who submit large batches of short jobs, such preemption is 

merely troubling. However, for users with long-running jobs, any interruption could 

mean the serious loss of several days of computation. 

To ensure fair use of cluster resources without job preemption, the 

PRIORITY_HALFLIFE Condor variable has been set to equal 1 in the local configuration 

file for the master node. This allows Condor to adjust a user’s priority level 

almost as soon as their jobs start running. As their jobs begin to use cluster resources, 

Condor lowers the user’s priority. If someone else submits a batch of jobs to the queue, 



their user priority will be higher than that of the other user. So, as one of the other 

user’s jobs finishes on a node, Condor will then allocate that node to a job belonging to 

the user with the highest priority. This will gradually allow Condor to balance out the 

allocation of resources so that no one user can use all of the resources all of the time. 

To prevent Condor from preempting currently running jobs if someone with a higher 

user priority submits jobs to the queue, the configuration variables 

PREEMPTION_REQUIREMENTS and PREEMPTION_RANK have both been set to false in the 

master’s local configuration file. 

Over time, undoubtedly the user policy for the cluster will change. Refer to Section 

3 of the Condor manual. 

D.3 MPICH 1.2.4 RPM Spec File 

This spec file can be used to build RPM packages from a standard MPICH v1.2.4 tarball. 

The spec file ensures that the current installation of Intel’s Fortran compiler is 

used to build the F77 and F90 bindings, and it produces two RPMs: one standard RPM 

which contains the MPICH runtime libraries that should be installed on all the nodes, 

and a development RPM containing all the MPI compiler wrappers which should only 

be installed on the master node. 

Name: mpich 

License: Other License(s), see package 

Group: Development/Libraries/Parallel 

URL: ftp://ftp.mcs.anl.gov/pub/mpi/old/ 

Version: 1.2.4 

Release: 3 

Summary: A Portable Implementation of MPI 

Source: mpich-%{version}.tar.gz 

BuildRoot: %{_tmppath}/%{name}-%{version}-build 

Autoreqprov: on 

%define _mpich_root /opt/mpich 

%description 

MPICH is a freely available, portable implementation of 

MPI, the Standard for message-passing libraries.

D.3 MPICH 1.2.4 RPM Spec File 203 

%package devel 

Summary: A Portable Implementation of MPI 

Group: Development/Libraries/Parallel 

Autoreqprov: on 

Requires: mpich 

Provides: mpich-doc 

Obsoletes: mpich-doc 

%description devel 

MPICH is a freely available, portable implementation of 

MPI, the Standard for message-passing libraries. 

%prep 

%setup -q 

DIRS=$(find -type d) 

%build 

CFLAGS=$RPM_OPT_FLAGS; export CFLAGS; 

export F90="ifort" ; 

export FC="ifort" ; 

export CCFLAGS="-O2"; 

export FFLAGS="-O2"; 

export RSHCOMMAND="/opt/condor/sbin/rsh"; 

sh configure --with-arch=LINUX \ 

--with-device=ch_p4 \ 

--with-comm=ch_p4 \ 

--with-romio \ 

--with-mpe \ 

--libdir=$RPM_BUILD_ROOT%{_mpich_root}/%_lib \ 

--enable-sharedlib \ 

--enable-c++ \ 

--enable-f77 \ 

--enable-f90modules \ 

--disable-mpedbg \ 

--disable-devdebug \ 

--disable-debug \ 

-prefix=$RPM_BUILD_ROOT%{_mpich_root} \ 

-c++=/usr/bin/g++ \ 

-opt=-O2 \ 

-cc=/usr/bin/gcc \ 

-fc=/opt/intel_fc_80/bin/ifort \ 

-f90=/opt/intel_fc_80/bin/ifort \ 

-f90flags=-O2 \ 

-optcc=-O2 \ 

-mpe_opts=-O2 

make 

%install 

rm -rf $RPM_BUILD_ROOT 

make install PREFIX=$RPM_BUILD_ROOT%{_mpich_root} \ 

MPIINSTALL_OPTS="-manpath=$RPM_BUILD_ROOT/%{_mpich_root}/man" \ 

-libdir=$RPM_BUILD_ROOT/%{_mpich_root}/%_lib 

find $RPM_BUILD_ROOT%{_mpich_root} -type l -name "mpirun" | \ 

xargs rm -f 



grep -lr "$RPM_BUILD_ROOT" $RPM_BUILD_ROOT/%{_mpich_root}/ | \ 

xargs perl -pi -e "s@$RPM_BUILD_ROOT@@g" 

rm -f examples/perftest/config.cache \ 

examples/perftest/config.log \ 

examples/perftest/config.status \ 

examples/test/config.log \ 

examples/test/config.status 

# libs 

rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/lib* 

rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared/lib* 

[ -e lib/libmpich.a ] && cp -f lib/*.a $RPM_BUILD_ROOT%{_mpich_root}/%_lib 

[ -e lib/*.o ] && cp -f lib/*.o $RPM_BUILD_ROOT%{_mpich_root}/%_lib 

[ -e lib/*.s* ] && cp -f lib/*.s* $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared 

for i in libfmpich libmpich libpmpich; do 

echo Working on $i; 

cp -f lib/shared/$i.so.1.0 $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared 

( 

cd $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared; 

ln -sf $i.so.1.0 $i.so 

) 

done 

# docs 

rm -fr $RPM_BUILD_ROOT%{_mpich_root}/www 

export manpath="$manpath /opt/mpich/man" 

%clean 

#rm -rf $RPM_BUILD_ROOT 

%files 

%defattr(-,root,root,755) 

%doc COPYRIGHT 

%{_mpich_root}/sbin/* 

%{_mpich_root}/bin/mpirun* 

%{_mpich_root}/bin/mpiman 

%{_mpich_root}/bin/mpireconfig 

%{_mpich_root}/bin/mpireconfig.dat 

%{_mpich_root}/bin/tarch 

%{_mpich_root}/bin/tdevice 

%{_mpich_root}/bin/serv_p4 

%{_mpich_root}/%_lib/shared/*.so.* 

%{_mpich_root}/share/* 

%{_mpich_root}/man/mandesc 

%{_mpich_root}/man/man1/*.1* 

%files devel 

%defattr(-,root,root,755) 

%{_mpich_root}/doc/* 

%{_mpich_root}/examples/* 



%doc COPYRIGHT 

%{_mpich_root}/include/mpi2c++/*.h 

%{_mpich_root}/include/f90base/*.mod 

%{_mpich_root}/include/f90choice/*.mod 

%{_mpich_root}/include/*.h

D.3 MPICH 1.2.4 RPM Spec File 205 

%{_mpich_root}/%_lib/*.a 

%{_mpich_root}/%_lib/shared/*.so 

%{_mpich_root}/etc/* 

%{_mpich_root}/bin/mpicc 

%{_mpich_root}/bin/mpiCC 

%{_mpich_root}/bin/mpif77 

%{_mpich_root}/bin/mpif90 


Appendix E 

LTE-CODES 

LTE-CODES is a package of Fortran programs and supporting libraries for analysing the 

spectra of hot stars. The main components of the package are: 

STERNE computes plane-parallel, line-blanketed model atmospheres for hot stars, 

T eff > 8000K, in local thermal, radiative, and hydrostatic equilibrium. The code 

handles extremely H-deficient mixtures and composition stratification. 

SPECTRUM computes synthetic spectra, line profiles, equivalent widths, and specific 

intensities, assuming LTE, from model atmospheres of hot stars, T eff > 

8000K. It can handle atmospheres of arbitrary chemical composition. 

SFIT is a general-purpose code designed to optimise theoretical stellar spectra to 

an observed spectrum. The code offers several different parameter optimisation 

methods, including Levenburg-Marquardt, Amoeba, and Genetic Algorithms. It 

has also been designed for both single and composite (binary) stellar spectra. 

As part of this thesis, the old build system for these codes (which was based on 

a series of hand-coded Makefiles) was overhauled and ported to the GNU Autotools 1 

system. GNU Autotools is a suite of tools that assists in making software projects 

1 http://www.gnu.org/ 

207

208 Chapter E - On the Automatic Analysis of Stellar Spectra 

easy to build across many platforms. It offers a flexible environment for automatically 

configuring and generating Makefiles according to the needs of the software project, 

and adapting them to suit the specifics of whatever operating system, compilers, and 

other system tools are at hand. 

E.1 Directory Layout 

The hierarchical layout of the LTE-CODES package is straightforward. The top-level directory 

branches into several subdirectories, the most important of which issrc. Within 

src are two subdirectories pointing towards the source code for all the libraries and 

apps (i.e., the applications STERNE, SPECTRUM, and SFIT). In summary: 

lte-codes-x.x 

| 

|-- config 

|-- include 

\-- src 

| 

|-- libraries 

| | 

| |--------------\ 

| | | 

| |-- at |-- prof 

| |-- bb |-- qub 

| |-- chr |-- rot 

| |-- dp |-- rtf 

| |-- mth |-- sdb 

| |-- mx |-- stn2 

| |-- nr |-- str 

| |-- nr_d |-- tap 

| |-- op |-- tap95 

| |-- opk2 |-- util 

| |-- phot \-- xfit 

| \-- phys 

| 

| 

\-- apps 

| 

|-- sfit2 

|-- spectrum 

\-- sterne

E.2 Build System Organisation 209 

E.2 Build System Organisation 

The central components of the autotools-based build system are the configure.in 

file which resides in the top-level directory, and the Makefile.am files which are to be 

found one in every directory. 

configure.in 

configure.in is actually a Bourne shell script which contains a number of calls to 

autoconf and automake macros in order to set up the build environment. The particular 

language used for the project can be selected, and specific details such as compiler 

commands and flags can be defined. The autoconf macros also allow the programmer 

to tell the build system to test the underlying operating system for the existence of 

particular tools, libraries, and files, and to modify the source files of the project as 

appropriate. 

configure.in is processed by autoconf to generate a configure script. When this 

script is executed, it traverses the build tree and generates all the necessary Makefiles 

in the correct manner. 

The contents of configure.in for LTE-CODES 1.4 are as follows: 

AC_INIT 

AC_CONFIG_AUX_DIR(config) 

AM_INIT_AUTOMAKE(lte-codes, 1.4, "http://www.arm.ac.uk/~csj") 

AC_SUBST(ac_aux_dir) 

# Checks for programs. 

AC_PROG_F77(ifort ifc) 

AC_PROG_LIBTOOL 

AC_PROG_MAKE_SET 

FFLAGS=’-I$(top_srcdir)/include -I$(top_srcdir)/include/mod -cm -w -w90 -w95’ 

AC_OUTPUT(Makefile \ 

src/Makefile \ 

src/libraries/Makefile \ 



src/libraries/at/Makefile \ 

src/libraries/bb/Makefile \ 

src/libraries/chr/Makefile \ 

src/libraries/dp/Makefile \ 

src/libraries/mth/Makefile \ 

src/libraries/mx/Makefile \ 

src/libraries/nr/Makefile \ 

src/libraries/nr_d/Makefile \ 

src/libraries/op/Makefile \ 

src/libraries/opk2/Makefile \ 

src/libraries/phot/Makefile \ 

src/libraries/phys/Makefile \ 

src/libraries/prof/Makefile \ 

src/libraries/qub/Makefile \ 

src/libraries/rot/Makefile \ 

src/libraries/rtf/Makefile \ 

src/libraries/sdb/Makefile \ 

src/libraries/stn2/Makefile \ 

src/libraries/str/Makefile \ 

src/libraries/tap/Makefile \ 

src/libraries/tap95/Makefile \ 

src/libraries/util/Makefile \ 

src/libraries/xfit/Makefile \ 

src/apps/Makefile \ 

src/apps/sfit2/Makefile \ 

src/apps/spectrum/Makefile \ 

src/apps/spectrum/data/Makefile \ 

src/apps/spectrum/models/Makefile \ 

src/apps/spectrum/scripts/Makefile \ 

src/apps/sterne/Makefile \ 

src/apps/sterne/scripts/Makefile \ 

src/apps/sterne/utils/Makefile) 

If any modifications are made to configure.in, autoconf must be invoked on it to 

effect the changes. A small shell script called bootstrap has been defined to call 

autoconf in this instance, and the other autotools utilities, to ensure the entire build 

system is updated correctly. bootstrap is defined as: 

#!/bin/sh 

libtoolize --force --copy 

aclocal -I config 

automake --add-missing --force-missing --gnu --copy 

autoconf 

In-depth documentation on autoconf can be found in the manual located at: http: 

//www.gnu.org/software/autoconf/manual/index.html

E.2 Build System Organisation 211 

Makefile.am 

Every Makefile.am is processed by automake to produce a Makefile.in file. This is 

subsequently used by the configure script to create a Makefile at every point in the 

build tree. Typically, each Makefile.am contains a number of variable assignments 

that are used to describe what source files are to be compiled, if the sources form a 

library or a binary, what subdirectories lie beneath the current directory, and so on. 

In the top-level directory, Makefile.am contains the following: 

include $(top_srcdir)/config/am_global_include.mk 

## Proces this file with automake to produce Makefile.in 

SUBDIRS = src 

# Include bootstrap script and other folders in distribution 

EXTRA_DIST = bootstrap include test 

# Include files in config directory in distribution 

AUX_DIST = $(ac_aux_dir)/config.guess \ 

$(ac_aux_dir)/config.sub \ 

$(ac_aux_dir)/install-sh \ 

$(ac_aux_dir)/ltmain.sh \ 

$(ac_aux_dir)/missing \ 

$(ac_aux_dir)/mkinstalldirs \ 

$(ac_aux_dir)/am_global_include.mk 

MAINTAINERCLEANFILES = Makefile.in aclocal.m4 configure config-h.in $(AUX_DIST) 

## Make sure config directory and files it contains are correctly 

## added to distribution by ’make dist’ 

dist-hook: 

for file in $(AUX_DIST); do \ 

cp $$file $(distdir)/$$file; \ 

done 

This file is fairly basic, the most significant entry being the SUBDIRS variable which 

specifies what subdirectories must be traversed from here during the build. The rest 

of the assignments are mostly concerned with telling the build system about other files 

which are part of the project but don’t need to be compiled. 

The Makefile.am for a program or a library looks like this: 



include $(top_srcdir)/config/am_global_include.mk 

SUBDIRS = scripts data models 

bin_PROGRAMS = spectrum 

spectrum_SOURCES = Spectrum.f 

spectrum_LDADD = \ 

../../libraries/dp/libdp.a \ 

../../libraries/qub/libqub.a \ 

../../libraries/opk2/libopk2.a \ 

../../libraries/op/libop.a \ 

../../libraries/tap95/libtap95.a \ 

../../libraries/str/libstr.a \ 

../../libraries/chr/libchr.a \ 

../../libraries/rtf/librtf.a \ 

../../libraries/nr/libnr.a \ 

../../libraries/nr_d/libnr_d.a \ 

../../libraries/mth/libmth.a 

Here, the name of the final program is specified along with its source files and libraries 

upon which it depends. The build system takes care to ensure that any such dependencies 

are compiled first before any attempt is made to compile the current program 

or library. 

Further documentation onautomake can be found athttp://www.gnu.org/software/ 

automake/manual/automake.html 

E.3 Installation Instructions 

To install LTE-CODES from the source tarball as a non-root user to an arbitrary directory: 

1. Unpack the archive: tar -xvzf lte-codes-x.x.tar.gz 

2. cd lte-codes-x.x 

3. ./configure --prefix=/path/to/install 

4. make 

5. make install 

6. Set the shell environment variable LTECODES to point to the install location

213

On the Automatic Analysis of Stellar Spectra - Armagh Observatory

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?