16.04.2014 Views

On the Automatic Analysis of Stellar Spectra - Armagh Observatory

On the Automatic Analysis of Stellar Spectra - Armagh Observatory

On the Automatic Analysis of Stellar Spectra - Armagh Observatory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong><br />

<strong>Spectra</strong><br />

A <strong>the</strong>sis submitted for <strong>the</strong> degree <strong>of</strong><br />

Doctor <strong>of</strong> Philosophy<br />

by<br />

Christopher Winter, B.Eng.<br />

<strong>Armagh</strong> <strong>Observatory</strong><br />

<strong>Armagh</strong>, Nor<strong>the</strong>rn Ireland<br />

&<br />

Faculty <strong>of</strong> Science and Agriculture<br />

Department <strong>of</strong> Pure and Applied Physics<br />

The Queen’s University <strong>of</strong> Belfast<br />

Belfast, Nor<strong>the</strong>rn Ireland<br />

March 2006


“Quia non erit impossibile apud Deum omne verbum”


To Stacey<br />

“Qui invenit mulierem invenit bonum<br />

et hauriet iucunditatem a Domino”


Acknowledgements<br />

I would like to acknowledge and thank my supervisor, C.S. Jeffery, for his sound advice<br />

and direction over <strong>the</strong> course <strong>of</strong> this project, and <strong>the</strong> staff and students <strong>of</strong> <strong>the</strong> <strong>Armagh</strong><br />

<strong>Observatory</strong> for <strong>the</strong>ir helpful support and assistance.<br />

I am very grateful to J.S. Drilling, E.M. Green, and A. Ahmad, all <strong>of</strong> whom supplied<br />

spectroscopic data that was used in this project. In addition, my thanks go to C.A.L<br />

Bailer-Jones for <strong>the</strong> use <strong>of</strong> his neural network code, STATNET.<br />

This work was carried out as part <strong>of</strong> <strong>the</strong> CosmoGrid project, funded under <strong>the</strong><br />

Programme for Research in Third Level Institutions (PRTLI) administered by <strong>the</strong> Irish<br />

Higher Education Authority under <strong>the</strong> National Development Plan and with partial<br />

support from <strong>the</strong> European Regional Development Fund.<br />

This work also uses data from <strong>the</strong> Sloan Digital Sky Survey (SDSS) data archive.<br />

Funding for <strong>the</strong> creation and distribution <strong>of</strong> <strong>the</strong> SDSS Archive has been provided by <strong>the</strong><br />

Alfred P. Sloan Foundation, <strong>the</strong> Participating Institutions, <strong>the</strong> National Aeronautics<br />

and Space Administration, <strong>the</strong> National Science Foundation, <strong>the</strong> U.S. Department <strong>of</strong><br />

Energy, <strong>the</strong> Japanese Monbukagakusho, and <strong>the</strong> Max Planck Society. The SDSS Web<br />

site is http://www.sdss.org/.<br />

The SDSS is managed by <strong>the</strong> Astrophysical Research Consortium (ARC) for <strong>the</strong> Participating<br />

Institutions. The Participating Institutions are The University <strong>of</strong> Chicago,<br />

Fermilab, <strong>the</strong> Institute for Advanced Study, <strong>the</strong> Japan Participation Group, The Johns<br />

Hopkins University, <strong>the</strong> Korean Scientist Group, Los Alamos National Laboratory,<br />

<strong>the</strong> Max-Planck-Institute for Astronomy (MPIA), <strong>the</strong> Max-Planck-Institute for Astrophysics<br />

(MPA), New Mexico State University, University <strong>of</strong> Pittsburgh, University <strong>of</strong><br />

Portsmouth, Princeton University, <strong>the</strong> United States Naval <strong>Observatory</strong>, and <strong>the</strong> University<br />

<strong>of</strong> Washington.<br />

Chris Winter<br />

March, 2006<br />

iii


Abstract<br />

This project investigates <strong>the</strong> problem <strong>of</strong> automatically searching for and analysing<br />

astronomical spectra from large data sets. The three core problems <strong>of</strong> (1) spectral classification,<br />

(2) physical parameterisation, and (3) searching are examined, and a generalisable<br />

set <strong>of</strong> tools is established based on <strong>the</strong> techniques <strong>of</strong> artificial neural networks<br />

(ANNs), χ 2 minimisation, and principal components analysis (PCA). These tools are<br />

<strong>the</strong>n applied to <strong>the</strong> archives <strong>of</strong> <strong>the</strong> Sloan Digital Sky Survey (SDSS) to automatically<br />

search for and analyse <strong>the</strong> spectra <strong>of</strong> hot subdwarf stars.<br />

<strong>Spectra</strong>l classification is tackled by <strong>the</strong> versatile statistical machine learning method<br />

<strong>of</strong> ANNs. An ANN is trained to classify hot subdwarf spectra onto <strong>the</strong> classification<br />

system defined by Drilling et al. (2006), obtaining global errors (σ rms ) <strong>of</strong> ∼ 2 subtypes<br />

for spectral type, ∼ 1 subclass for luminosity class, and ∼ 4 subclasses for <strong>the</strong> helium<br />

class. These errors are in line with accuracies achieved by human classifiers.<br />

Physical parameters are obtained by fitting observations to grids <strong>of</strong> <strong>the</strong>oretical models<br />

using a χ 2 minimisation procedure. A new methodology has been developed for<br />

managing and indexing large grids <strong>of</strong> <strong>the</strong>oretical models in <strong>the</strong> χ 2 minimisation code,<br />

SFIT. Concepts from <strong>the</strong> field <strong>of</strong> computational geometry are used to remove several<br />

limitations from this code, and pave <strong>the</strong> way for its use in a distributed parallel<br />

computing environment.<br />

Searching for <strong>the</strong> spectra <strong>of</strong> a particular type <strong>of</strong> object in large, unknown data sets<br />

is accomplished using <strong>the</strong> multivariate statistical technique, PCA. The mechanics <strong>of</strong><br />

this tool are outlined, and its use demonstrated by searching for hot subdwarf spectra<br />

in <strong>the</strong> SDSS. This solution provides a means to reduce unknown data sets to quantities<br />

suitable for visual inspection.<br />

282 spectra <strong>of</strong> hot subdwarf candidates are obtained from <strong>the</strong> SDSS and analysed.<br />

The results evidence several unexplained phenomena <strong>of</strong> extended horizontal branch<br />

stars, namely: 1) <strong>the</strong> existence <strong>of</strong> <strong>the</strong> second horizontal branch gap <strong>of</strong> Newell (1973);<br />

2) two sdB n He –T eff sequences; and 3) a clustering <strong>of</strong> hot, helium rich stars at T eff ≈<br />

44,000K, log g = 5.7. These findings pose important questions for stellar evolution<br />

<strong>the</strong>ory in <strong>the</strong> realms <strong>of</strong> <strong>the</strong> extended horizontal branch.<br />

v


Contents<br />

Acknowledgements<br />

iii<br />

Abstract<br />

v<br />

List <strong>of</strong> Tables<br />

xii<br />

List <strong>of</strong> Figures<br />

xvi<br />

1 Introduction 1<br />

1.1 Astronomical Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />

1.2 Large Data Sets And Their Sources . . . . . . . . . . . . . . . . . . . . . 6<br />

1.3 Astronomical <strong>Spectra</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

1.3.1 Types Of Objects And Their <strong>Spectra</strong> . . . . . . . . . . . . . . . . 13<br />

1.3.2 <strong>Automatic</strong> Methods <strong>of</strong> <strong>Analysis</strong> . . . . . . . . . . . . . . . . . . . 17<br />

1.4 Hot Subdwarf Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

1.4.1 Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

1.4.2 <strong>Stellar</strong> Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

1.4.3 Why Study Them? . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

1.4.4 Why Search For Them In The SDSS? . . . . . . . . . . . . . . . 26<br />

1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2 Classification - Artificial Neural Networks 29<br />

2.1 Classifying Hot Subdwarfs . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

vii


viii<br />

CONTENTS<br />

2.1.1 The Training Sample . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

2.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />

2.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

2.2 Physical Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

2.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

2.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3 Parameterisation - χ 2 Fitting 51<br />

3.1 Analysing <strong>Stellar</strong> <strong>Spectra</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

3.2 SFIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55<br />

3.2.1 Limitations <strong>of</strong> SFIT . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

3.2.2 Proposal to Remove SFIT’s Limitatons . . . . . . . . . . . . . . 58<br />

3.3 Tetrahedralisation: Interpolation and Indexing . . . . . . . . . . . . . . 62<br />

3.3.1 Simplex Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.3.2 Grid Index - Delaunay Triangulation . . . . . . . . . . . . . . . . 64<br />

3.3.3 Navigating <strong>the</strong> Index - Point Location . . . . . . . . . . . . . . . 67<br />

3.4 Testing <strong>the</strong> Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4 Filtering - Principal Components <strong>Analysis</strong> 81<br />

4.1 Constructing A PCA-Based Filter . . . . . . . . . . . . . . . . . . . . . 83<br />

4.1.1 Ma<strong>the</strong>matics <strong>of</strong> PCA . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

4.1.2 Building A Hot Subdwarf Filter . . . . . . . . . . . . . . . . . . 86<br />

4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs . . . . . . . . . . . . . . . . . . . 95<br />

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104<br />

5 Application I - SDSS Hot Subdwarfs 107<br />

5.1 Search Criteria And Data Sets . . . . . . . . . . . . . . . . . . . . . . . 107


CONTENTS<br />

ix<br />

5.2 PCA Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />

5.3 <strong>Analysis</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

5.4.1 Parameterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

5.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<br />

5.4.3 Radial Velocities . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />

5.5 Sources <strong>of</strong> Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />

5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency . . . . . . . . . . . . . . . . . . . . . . 123<br />

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br />

6 Application II - O<strong>the</strong>r Data Sets 131<br />

6.1 2MASS-Selected Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . 131<br />

6.2 SDSS sdB-He Stars <strong>of</strong> Harris et al. (2003) . . . . . . . . . . . . . . . . . 137<br />

6.3 Ahmad & Jeffery (2003) He-sdBs . . . . . . . . . . . . . . . . . . . . . . 138<br />

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138<br />

7 Conclusions And Future Work 141<br />

Bibliography 152<br />

Appendices 161<br />

A Results for 192 Drilling et al. (2006) Hot Subdwarfs 163<br />

B Results for 282 SDSS DR3 Hot Subdwarf Candidates 175<br />

C Results for 83 2MASS-Selected Hot Subdwarf Candidates 189<br />

D The <strong>Armagh</strong> <strong>Observatory</strong> Cluster 193<br />

D.1 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

D.2 S<strong>of</strong>tware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


x<br />

CONTENTS<br />

D.3 MPICH 1.2.4 RPM Spec File . . . . . . . . . . . . . . . . . . . . . . . . 202<br />

E LTE-CODES 207<br />

E.1 Directory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208<br />

E.2 Build System Organisation . . . . . . . . . . . . . . . . . . . . . . . . . 209<br />

E.3 Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 212


List <strong>of</strong> Tables<br />

2.1 Results <strong>of</strong> <strong>the</strong> leave-one-out procedure as applied to a committee <strong>of</strong> five<br />

901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations. . . 38<br />

2.2 As Table 2.1, but for <strong>the</strong> committee <strong>of</strong> five 901:5:5:3 ANNs. . . . . . . 39<br />

2.3 Results <strong>of</strong> parameterising <strong>the</strong> 60 calibration stars. . . . . . . . . . . . . 45<br />

2.4 A comparison between ANNs and χ 2 minimisation for parameterising<br />

<strong>the</strong> 133 unparameterised stars. . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.1 Details <strong>of</strong> <strong>the</strong> model grid used in <strong>the</strong> comparison . . . . . . . . . . . . . 72<br />

3.2 Initial parameters used for <strong>the</strong> Amoeba and Levenberg-Marquardt optimisation<br />

routines. The step sizes used for Amoeba are also given . . . . 73<br />

3.3 Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> unmodified version <strong>of</strong> SFIT . 73<br />

3.4 Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> modified version <strong>of</strong> SFIT . . . 74<br />

3.5 The model grid used to obtain physical parameters <strong>of</strong> <strong>the</strong> set <strong>of</strong> test<br />

models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

3.6 RMS comparison <strong>of</strong> parameterisation results from each interpolation<br />

method with <strong>the</strong> original parameters <strong>of</strong> each model. Also given is <strong>the</strong><br />

RMS difference between <strong>the</strong> methods, and a comparison between <strong>the</strong><br />

results in <strong>the</strong> region <strong>of</strong> parameter space for which both schemes seem to<br />

give <strong>the</strong>ir best results (see Figures 3.6 and 3.7). . . . . . . . . . . . . . 79<br />

5.1 Summary <strong>of</strong> data quantities obtained from <strong>the</strong> SDSS DR3. . . . . . . . 108<br />

5.2 The model grid used to obtain physical parameters from <strong>the</strong> SDSS hot<br />

subdwarf candidates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

xi


xii<br />

LIST OF TABLES<br />

6.1 Parameters <strong>of</strong> <strong>the</strong> two calibration stars as obtained by χ 2 -fitting to NLTE<br />

(Green et al., 2006) and LTE (<strong>Armagh</strong>) model atmospheres. Formal<br />

errors are given in paren<strong>the</strong>ses. . . . . . . . . . . . . . . . . . . . . . . . 133<br />

6.2 Classification results for <strong>the</strong> sdB-He stars <strong>of</strong> Harris et al. (2003). . . . . 137<br />

6.3 Classification results for <strong>the</strong> Ahmad & Jeffery (2003) He-sdBs. . . . . . 140<br />

A.1 Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs . 164<br />

B.1 Results for 282 SDSS Hot Subdwarf Candidates . . . . . . . . . . . . . . 176<br />

C.1 Results for 83 2MASS-Selected Hot Subdwarf Candidates . . . . . . . . 189


List <strong>of</strong> Figures<br />

1.1 A stellar spectrum (top), and a galaxy spectrum (bottom). (Taken from<br />

<strong>the</strong> SDSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />

1.2 Example <strong>of</strong> a quasar (top) and carbon star (bottom) spectrum. (Taken<br />

from <strong>the</strong> SDSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

1.3 The emission spectrum <strong>of</strong> <strong>the</strong> Orion nebula (M42). . . . . . . . . . . . 16<br />

1.4 Examples from each hot subdwarf spectrographic subgroup. Classifications<br />

listed are those from Drilling et al. (2006). . . . . . . . . . . . . . 20<br />

1.5 Schematic temperature-luminosity diagrams showing: a) <strong>the</strong> positions<br />

<strong>of</strong> stars belonging to <strong>the</strong> main stellar groups; b) <strong>the</strong> normal sequence <strong>of</strong><br />

stellar evolution experienced by a star <strong>of</strong> a few solar masses; c) possible<br />

evolution <strong>of</strong> an sdB star in a binary system. (Diagram courtesy <strong>of</strong> C.S.<br />

Jeffery). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.1 The training sample shows clustering in certain regions <strong>of</strong> <strong>the</strong> classification<br />

space. For clarity, points have been <strong>of</strong>fset by small random shifts in<br />

both coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

2.2 Results <strong>of</strong> <strong>the</strong> leave-one-out procedure for both ANN architectures at <strong>the</strong><br />

near-optimal training time <strong>of</strong> 300 iterations for <strong>the</strong> 901:10:3 architecture<br />

(left column), and 500 iterations for <strong>the</strong> 901:5:5:3 architecture (right<br />

column). Also plotted is <strong>the</strong> best-fit linear least squares line. . . . . . . 41<br />

2.3 Parameterisations <strong>of</strong> <strong>the</strong> 60 calibration stars. Results from each method<br />

have been combined onto each plot. ANN results are indicated by blue<br />

crosses, and χ 2 minimiser results by red pluses. . . . . . . . . . . . . . 46<br />

2.4 Parameterisations <strong>of</strong> <strong>the</strong> 133 unparameterised stars using <strong>the</strong> ANNs and<br />

χ 2 minimiser. Also shown is <strong>the</strong> best-fit linear least squares line. . . . . 48<br />

xiii


xiv<br />

LIST OF FIGURES<br />

3.1 Example <strong>of</strong> a k-D tree in two dimensions. <strong>On</strong> <strong>the</strong> left is <strong>the</strong> representation<br />

<strong>of</strong> how <strong>the</strong> k-D tree on <strong>the</strong> right splits up <strong>the</strong> x,y plane. (Adapted<br />

from Moore 1991.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

3.2 A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is<br />

a tetrahedron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

3.3 In two dimensions, <strong>the</strong> Delaunay triangulation guarantees that no o<strong>the</strong>r<br />

points lie in <strong>the</strong> circumcircle <strong>of</strong> any simplex. . . . . . . . . . . . . . . . 65<br />

3.4 The line segment, L, is constructed using <strong>the</strong> centroid <strong>of</strong> <strong>the</strong> starting<br />

tetrahedron, T, and <strong>the</strong> interpolation point, p. The tetrahedra visited<br />

on <strong>the</strong> walk-through are coloured grey. . . . . . . . . . . . . . . . . . . 68<br />

3.5 Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />

Clearly visible are anomalous results arising from a suspected defect in<br />

<strong>the</strong> method’s implementation. . . . . . . . . . . . . . . . . . . . . . . . 76<br />

3.6 Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />

Axes have been restricted to give a view <strong>of</strong> <strong>the</strong> grid boundaries described<br />

in Table 3.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />

3.7 Parameterisation results from <strong>the</strong> simplex-based interpolation scheme.<br />

In contrast with Figures 3.5 and 3.6, <strong>the</strong> simplex-based scheme clearly<br />

restricts <strong>the</strong> optimisers to <strong>the</strong> grid boundaries. . . . . . . . . . . . . . . 78<br />

4.1 Principal component analysis. u 1 is <strong>the</strong> first principal component and<br />

<strong>the</strong> axis onto which <strong>the</strong> projected positions <strong>of</strong> <strong>the</strong> data have <strong>the</strong>ir maximum<br />

sum. u 2 is <strong>the</strong> second principal component, and u 1 · u 2 = 0. . . . 83<br />

4.2 Mean spectrum <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample. . . . . . . . . . . . . 87<br />

4.3 First five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample. . . . . . . . . . . . . 89<br />

4.4 Second five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample. . . . . . . . . . . . 90<br />

4.5 Cumulative variance <strong>of</strong> <strong>the</strong> first ten PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006)<br />

sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

4.6 Illustration <strong>of</strong> projecting hot subdwarf spectra onto <strong>the</strong> first four PCs <strong>of</strong><br />

<strong>the</strong> Drilling et al. (2006) standards. . . . . . . . . . . . . . . . . . . . . . 93<br />

4.7 Histogram <strong>of</strong> reconstructions errors from <strong>the</strong> SDSS data sample. . . . . 96<br />

4.8 <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0). . 97


LIST OF FIGURES<br />

xv<br />

4.9 <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0). . 98<br />

4.10 Sample <strong>of</strong> spectra from <strong>the</strong> eighth error bin (R ∼ 3.0). . . . . . . . . . . 100<br />

4.11 Sample <strong>of</strong> spectra from <strong>the</strong> fourteenth error bin (R ∼ 4.5). . . . . . . . 101<br />

4.12 Sample <strong>of</strong> high S/N DA white dwarfs from <strong>the</strong> 22 nd − 24 th error bins<br />

(R ∼ 6.4 − 7.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

4.13 Sample <strong>of</strong> spectra from <strong>the</strong> fifty-third error bin (R > 15.0). . . . . . . . 103<br />

5.1 Histogram <strong>of</strong> reconstruction errors for <strong>the</strong> colour-colour selected SDSS<br />

sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

5.2 Parameterisation results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. The<br />

helium main sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary<br />

tracks <strong>of</strong> Dorman et al. (1993) are also plotted. . . . . . . . . . . . . . 112<br />

5.3 Four example fits from <strong>the</strong> 282 SDSS hot subdwarfs. The classification<br />

and physical parameters (T eff (K), log g, log(n He /n H )) obtained for each<br />

star are printed in <strong>the</strong> lower corners <strong>of</strong> each plot. . . . . . . . . . . . . 113<br />

5.4 The results <strong>of</strong> applying a kernel density estimate analysis to <strong>the</strong> data<br />

from Figure 5.2. The low-density at T eff ≈ 22,500K is prominent, along<br />

with ano<strong>the</strong>r possible low-density region at T eff ≈ 41,000K. . . . . . . . 114<br />

5.5 Classification results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. Points<br />

have been given small random <strong>of</strong>fsets in each axis for clarity. . . . . . . 117<br />

5.6 A comparison <strong>of</strong> <strong>the</strong> ANN classifications <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf<br />

candidates (left-most plots) with all <strong>the</strong> stars classified by Drilling et al.<br />

(2006) (right-most plots). Points have been given small random <strong>of</strong>fsets<br />

in each axis for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />

5.7 A calibration <strong>of</strong> <strong>the</strong> ANN classifications onto <strong>the</strong> Drilling et al. (2006)<br />

system using <strong>the</strong> 282 SDSS hot subdwarf candidates. . . . . . . . . . . 119<br />

5.8 The distribution <strong>of</strong> SDSS-derived redshifts <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates.<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />

5.9 Examples <strong>of</strong> white dwarf and BHB contaminants. A - BHB star with<br />

deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines<br />

due to high surface gravity. C - DB white dwarf. D - Uncertain (some<br />

evidence <strong>of</strong> weak carbon absorption, so possibly a DQ white dwarf). . . 125<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


xvi<br />

LIST OF FIGURES<br />

5.10 This gray-shaded region <strong>of</strong> <strong>the</strong> log g–T eff plane represents an area <strong>of</strong> good<br />

probability that <strong>the</strong> stars within it are subdwarfs. . . . . . . . . . . . . 126<br />

5.11 TP rates (red) and FP rates (blue) <strong>of</strong> <strong>the</strong> PCA filter as a function <strong>of</strong><br />

<strong>the</strong> reconstruction error threshold, R. The green curve is <strong>the</strong> difference<br />

between <strong>the</strong> TP and FP rates. . . . . . . . . . . . . . . . . . . . . . . . 127<br />

5.12 A closer examination <strong>of</strong> <strong>the</strong> TP and FP rates. The peak in <strong>the</strong> green<br />

TP-FP curve occurs at R ∼ 7.0 and signifies <strong>the</strong> optimum value for R<br />

in <strong>the</strong> SDSS sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />

6.1 SFIT physical parameters for 2MASS-selected sample. The helium main<br />

sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary tracks <strong>of</strong> Dorman<br />

et al. (1993) are also plotted. . . . . . . . . . . . . . . . . . . . . . 134<br />

6.2 ANN classification for 2MASS-selected sample. Points have been given<br />

small random <strong>of</strong>fsets in each axis for clarity. . . . . . . . . . . . . . . . 135<br />

6.3 The stars assigned late-A and early-F spectral types by <strong>the</strong> neural network.<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136<br />

6.4 Comparison <strong>of</strong> ANN classifications with those <strong>of</strong> Drilling et al. (2006)<br />

for <strong>the</strong> 17 He-sdBs <strong>of</strong> Ahmad & Jeffery (2003). Points have been given<br />

small random <strong>of</strong>fsets in each axis for clarity. Also plotted is <strong>the</strong> best<br />

fit least squares regression line with error bars showing <strong>the</strong> RMS <strong>of</strong> <strong>the</strong><br />

residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139<br />

7.1 Schematic diagram showing how <strong>the</strong> work <strong>of</strong> this <strong>the</strong>sis fits in with <strong>the</strong><br />

wider system envisaged by Jeffery (2003). . . . . . . . . . . . . . . . . . 149


Chapter 1<br />

Introduction<br />

The spectroscopy <strong>of</strong> light from astronomical objects is one <strong>of</strong> <strong>the</strong> most important<br />

methods for understanding <strong>the</strong> physics at work in <strong>the</strong> universe. Many fundamental<br />

parameters <strong>of</strong> those objects can be determined by analysing <strong>the</strong>ir spectrum, including<br />

temperature, chemical composition, motion, and o<strong>the</strong>r clues about <strong>the</strong>ir origin and<br />

evolution.<br />

Advances in information technology over <strong>the</strong> past 35 years, and <strong>the</strong>ir subsequent influence<br />

on observational methods, have allowed spectroscopic studies <strong>of</strong> unprecedented<br />

numbers <strong>of</strong> objects to be carried out over a short period <strong>of</strong> time. Modern astronomy<br />

is now about dealing with very large quantities <strong>of</strong> data, and <strong>the</strong> problems associated<br />

with its management and analysis.<br />

This project develops a collection <strong>of</strong> tools to assist astronomers in data mining large<br />

sets <strong>of</strong> astronomical spectra. The tools are general in nature, and can be used to search<br />

for and automatically study <strong>the</strong> spectra <strong>of</strong> potentially any type <strong>of</strong> astronomical object.<br />

Toge<strong>the</strong>r, <strong>the</strong> tools form a semi-automatic pipeline allowing a fast progression from<br />

large quantities <strong>of</strong> unknown spectra to useful scientific results.<br />

In <strong>the</strong> past, studies <strong>of</strong> automatic methods <strong>of</strong> spectral analysis have mainly centred<br />

around <strong>the</strong> problem <strong>of</strong> object classification. This makes sense from <strong>the</strong> point <strong>of</strong> view<br />

1


2 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>of</strong> a survey mission because it is desirable to know what types <strong>of</strong> objects have been<br />

observed, with particular interest being paid to those objects not falling into any known<br />

category.<br />

However, <strong>the</strong> individual astronomer, studying a particular type <strong>of</strong> object, is not<br />

always interested in large-scale classification. He needs a way to search exclusively<br />

for samples in a data set which are most like his object <strong>of</strong> interest. <strong>On</strong>ce located,<br />

those samples are likely to exist in large enough numbers to require fur<strong>the</strong>r automatic<br />

assistance in <strong>the</strong>ir analysis.<br />

The techniques needed to help solve this problem already exist in <strong>the</strong> field, but <strong>the</strong>y<br />

have not yet been brought toge<strong>the</strong>r and adapted to form any sort <strong>of</strong> useful, coherent<br />

system. As such, scientific insights contained in large data sets remain mostly untapped.<br />

The work in this project represents what seems to be <strong>the</strong> first attempt at rectifying<br />

this issue. Three major algorithms are employed to construct a general data mining<br />

tool set.<br />

1. Principal Components <strong>Analysis</strong> is applied in a supervised classification role to<br />

create a filter that can help search for a specific type <strong>of</strong> object in an unknown<br />

data set.<br />

2. Artificial Neural Networks have been shown to be a robust and versatile tool for<br />

many tasks in astronomy. They are used here to provide spectral classifications.<br />

3. χ 2 minimisation is used to derive physical parameters for spectra by fitting <strong>the</strong>m<br />

to grids <strong>of</strong> <strong>the</strong>oretical models.<br />

Additional minor tools to facilitate data processing, management, and visualisation<br />

are also prototyped.<br />

Fur<strong>the</strong>rmore, a new and original methodology has been developed to extend <strong>the</strong><br />

functionality <strong>of</strong> <strong>the</strong> χ 2 minimisation code, SFIT, used at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong>.


1.1 Astronomical Data Mining 3<br />

The code is modified using concepts from <strong>the</strong> field <strong>of</strong> computational geometry to allow<br />

<strong>the</strong> use <strong>of</strong> arbitrarily large, three-dimensional grids <strong>of</strong> <strong>the</strong>oretical models. This removes<br />

several severe limitations from <strong>the</strong> program, and prepares it for fur<strong>the</strong>r modification to<br />

permit its use in a distributed computational environment.<br />

The specific outcome <strong>of</strong> this project is a set <strong>of</strong> general tools which can be used<br />

to study <strong>the</strong> spectra <strong>of</strong> any astronomical object, and a “real-world” demonstration<br />

<strong>of</strong> <strong>the</strong>se tools through <strong>the</strong>ir application to search for and analyse <strong>the</strong> spectra <strong>of</strong> hot<br />

subdwarf stars from <strong>the</strong> archives <strong>of</strong> <strong>the</strong> Sloan Digital Sky Survey. The results evidence<br />

several unexplained phenomena <strong>of</strong> extended horizontal branch stars that pose important<br />

questions for <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stellar evolution.<br />

The work undertaken in this project is a step towards <strong>the</strong> larger computational<br />

framework <strong>of</strong> Jeffery (2003) which outlines a wider system incorporating <strong>the</strong> management<br />

<strong>of</strong> atomic data, dynamic generation and storage <strong>of</strong> grids <strong>of</strong> <strong>the</strong>oretical models,<br />

parameter space visualisation, and automated analysis. The use <strong>of</strong> distributed computational<br />

resources, such as <strong>the</strong> Grid, is also envisaged.<br />

1.1 Astronomical Data Mining<br />

The term “data mining” refers to <strong>the</strong> use <strong>of</strong> a broad set <strong>of</strong> techniques and algorithms for<br />

extracting useful patterns and models from very large data sets. Typically, <strong>the</strong> goal is<br />

to discover ei<strong>the</strong>r something hi<strong>the</strong>rto unknown about a phenomenon that only becomes<br />

apparent when it is studied en masse, or else a new phenomenon that only becomes<br />

apparent when observations are ga<strong>the</strong>red in large enough quantities over a sufficiently<br />

wide range.<br />

Traditionally, in astronomy, much effort was invested in ga<strong>the</strong>ring observations <strong>of</strong><br />

one particular object, such as a star, in an attempt to understand that object in detail.<br />

Given <strong>the</strong> universality <strong>of</strong> physics, <strong>the</strong> insights gained are usually applicable to o<strong>the</strong>r<br />

objects <strong>of</strong> <strong>the</strong> same type, allowing a wider understanding to be achieved.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


4 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

However, advances in technology, such as large-area mosaic CCDs and multi-object<br />

fibre-fed spectrographs, mean that modern telescopes can be made to ga<strong>the</strong>r observations<br />

<strong>of</strong> thousands <strong>of</strong> objects in a single night. This opens up <strong>the</strong> possibility <strong>of</strong><br />

discovering new facts about particular objects by studying <strong>the</strong>ir properties in large<br />

numbers, and also <strong>the</strong> possibility <strong>of</strong> discovering completely new objects.<br />

Unfortunately, this abundance <strong>of</strong> data brings with it a set <strong>of</strong> new problems. Managing<br />

all <strong>of</strong> <strong>the</strong> information requires knowledge <strong>of</strong> data formats, storage mechanisms, and<br />

techniques for indexing, searching, and analysing it all. Indeed, modern astronomy is<br />

fast becoming a cross-disciplinary endeavour, providing a rich area for exploring many<br />

aspects <strong>of</strong> computer science and statistics in <strong>the</strong> context <strong>of</strong> real-world applications.<br />

Data Types<br />

The nature <strong>of</strong> astronomical data means that it is inherently heterogeneous in both<br />

format and content, with observations now being ga<strong>the</strong>red over all regions <strong>of</strong> <strong>the</strong> electromagnetic<br />

spectrum. Broadly speaking, astronomical data can be classified into five<br />

domains.<br />

• Imaging data are <strong>the</strong> fundamental component <strong>of</strong> astronomical observations, capturing<br />

a two-dimensional picture <strong>of</strong> <strong>the</strong> universe within a narrow wavelength<br />

region at a particular point in time.<br />

• Catalogues <strong>of</strong> objects are constructed by analysing imaging data, and recording<br />

many different parameters about each object such as brightness and colour,<br />

morphological information, and coordinates.<br />

• Spectroscopy provides detailed physical quantification <strong>of</strong> objects including temperature,<br />

chemical composition, and kinematical information.<br />

• Studies <strong>of</strong> objects in <strong>the</strong> time-domain provide valuable insight into <strong>the</strong> nature<br />

<strong>of</strong> <strong>the</strong> universe by identifying moving objects, variable sources (e.g., pulsating


1.1 Astronomical Data Mining 5<br />

stars), or transient objects such as supernovae and gamma-ray bursts.<br />

• Finally, <strong>the</strong>oretical simulations <strong>of</strong> astronomical objects are an important source<br />

<strong>of</strong> data. Comparing <strong>the</strong>oretical models with observational data is <strong>the</strong> central<br />

mechanism in understanding how <strong>the</strong>se objects formed and have evolved.<br />

Each <strong>of</strong> <strong>the</strong>se data domains carries its own particular problems to be solved in a<br />

data management and mining context. Imaging data and catalogue construction require<br />

robust, automatic techniques to identify sources distinct from background-level noise,<br />

<strong>the</strong>n differentiate between different types <strong>of</strong> objects (e.g., stars, galaxies, and comets),<br />

and finally <strong>the</strong> indexing <strong>of</strong> <strong>the</strong>se data to allow fast searching based on spatial criteria.<br />

Spectroscopy and time-domain data require more involved algorithms for <strong>the</strong> automated<br />

reduction and calibration <strong>of</strong> observations – algorithms which <strong>of</strong>ten have to<br />

be tailored for a specific instrument and telescope setup. The automatic analysis <strong>of</strong><br />

spectroscopic data typically seeks to classify an object onto a predefined categorical<br />

system by somehow comparing <strong>the</strong> object with <strong>the</strong> set <strong>of</strong> standards which define <strong>the</strong><br />

system. The physics <strong>of</strong> an object which are manifest in its spectrum are determined<br />

by computing accurate <strong>the</strong>oretical models and comparing <strong>the</strong>m with <strong>the</strong> observations.<br />

Any results <strong>the</strong>n need to be stored and indexed with <strong>the</strong> observations in a manner that<br />

allows for fur<strong>the</strong>r re-analysis as more improved observations and <strong>the</strong>oretical models<br />

become available.<br />

Numerical simulations to generate <strong>the</strong>oretical models are always in need <strong>of</strong> powerful<br />

and plentiful computational resources to allow more detail and precision to be attained.<br />

As models will always have a shorter shelf-life than observations, appropriate meta-data<br />

needs to be recorded and stored with <strong>the</strong> models so a historical record can be kept as<br />

<strong>the</strong> underlying physics improves. This meta-data is also needed to help automate<br />

<strong>the</strong> parameterisation <strong>of</strong> observations by providing a means to explore grids <strong>of</strong> models,<br />

and ascertain when new models need to be generated to cover a required part <strong>of</strong> <strong>the</strong><br />

parameter space.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


6 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

1.2 Large Data Sets And Their Sources<br />

Three main sources contribute to large observational data sets in astronomy, namely,<br />

those generated by specific surveys, general-purpose observatories, and space missions.<br />

In recent years, Virtual <strong>Observatory</strong> projects are investigating ways to combine <strong>the</strong><br />

various databases generated by <strong>the</strong>se sources, mapping out <strong>the</strong> computational infrastructures<br />

and tools needed to explore large data volumes.<br />

Specific Surveys<br />

Digital sky surveys generate very large quantities <strong>of</strong> homogeneous data over multiple<br />

wavelengths. As such, <strong>the</strong>y are <strong>the</strong> main drivers behind <strong>the</strong> study <strong>of</strong> data mining<br />

methods in astronomy.<br />

The Digitized Palomar <strong>Observatory</strong> Sky Survey 1 (DPOSS; Djorgovski et al.,<br />

1998) is a digital survey <strong>of</strong> <strong>the</strong> entire Nor<strong>the</strong>rn sky in three visible-light bands, based<br />

on <strong>the</strong> photographic sky atlas, POSS-II, <strong>the</strong> second Palomar <strong>Observatory</strong> Sky Survey<br />

(Reid et al., 1991). A set <strong>of</strong> three photographic plates (one in each filter), each covering<br />

36 square degrees, were taken at each <strong>of</strong> 894 pointings spaced by 5 degrees, covering <strong>the</strong><br />

Nor<strong>the</strong>rn sky. The plates were <strong>the</strong>n digitised at <strong>the</strong> Space Telescope Science Institute<br />

(STScI), producing about 1 gigabyte per plate, and about 3 terabytes <strong>of</strong> data in total.<br />

Specially developed data mining s<strong>of</strong>tware called SKICAT (Weir et al., 1995) was<br />

used to perform object classification and measure around 40 parameters for each object,<br />

storing this information in a database which will eventually be released to <strong>the</strong><br />

community as <strong>the</strong> Palomar-Norris Sky Catalog.<br />

The Two Micron All-Sky Survey 2 (2MASS; Skrutskie et al., 2006) is a nearinfrared<br />

(J, H, and K S ) all-sky survey. The project is a collaboration between <strong>the</strong><br />

1 http://dposs.caltech.edu/<br />

2 http://www.ipac.caltech.edu/2mass/


1.2 Large Data Sets And Their Sources 7<br />

University <strong>of</strong> Massachusetts which constructed <strong>the</strong> observatory facilities and operated<br />

<strong>the</strong> survey, and <strong>the</strong> Infrared Processing and <strong>Analysis</strong> Center at Caltech which is responsible<br />

for all data processing and archive issues. The survey began in <strong>the</strong> spring<br />

<strong>of</strong> 1997, completing survey-quality operations in 2000, with <strong>the</strong> final catalogue being<br />

released in March, 2003.<br />

The survey includes over 12 terabytes <strong>of</strong> imaging data, with <strong>the</strong> final catalogue<br />

containing over one million resolved galaxies, and more than three hundred million<br />

stars and o<strong>the</strong>r unresolved sources to a limiting magnitude <strong>of</strong> K S < 14.3. 2MASS is<br />

currently producing <strong>the</strong> following data products for <strong>the</strong> entire sky:<br />

• A digital atlas <strong>of</strong> <strong>the</strong> sky comprising approximately 4 million 8´×16´ images,<br />

having about 4´´ spatial resolution in each <strong>of</strong> <strong>the</strong> three wavelength bands,<br />

• A point source catalog containing accurate positions and fluxes for ∼ 300 million<br />

stars and o<strong>the</strong>r unresolved objects,<br />

• An extended source catalog containing positions and total magnitudes for more<br />

than one million galaxies and o<strong>the</strong>r nebulae.<br />

The 2dF Galaxy Redshift Survey 3 (2dFGRS; Colless et al., 2001) is a major<br />

spectroscopic survey taking full advantage <strong>of</strong> <strong>the</strong> unique capabilities <strong>of</strong> <strong>the</strong> 2dF facility<br />

built by <strong>the</strong> Anglo-Australian <strong>Observatory</strong> 4 . The 2dFGRS obtained spectra for 245,591<br />

objects, mainly galaxies, brighter than a nominal extinction-corrected magnitude limit<br />

<strong>of</strong> b J = 19.45. Reliable redshifts were obtained for 221,414 galaxies. The galaxies cover<br />

an area <strong>of</strong> approximately 1,500 square degrees selected from <strong>the</strong> extended APM Galaxy<br />

Survey <strong>of</strong> <strong>the</strong> South Galactic cap.<br />

The final release dataset comprises <strong>the</strong> following elements:<br />

• source catalogues for <strong>the</strong> full survey, containing data for 382,323 objects, toge<strong>the</strong>r<br />

3 http://www.mso.anu.edu.au/2dFGRS/<br />

4 http://www.aao.gov.au/2df/<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


8 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

with related material,<br />

• spectroscopic catalogues for 245,591 objects, containing <strong>the</strong> spectroscopic parameters<br />

such as redshifts and spectral types.<br />

The Sloan Digital Sky Survey 5 (SDSS; York et al., 2000) is a project to survey<br />

a 10,000 square degree area (1/4 <strong>of</strong> <strong>the</strong> entire sky) <strong>of</strong> <strong>the</strong> North Galactic hemisphere<br />

over a 5 year period. The estimated 100 million catalogued sources from this survey<br />

will <strong>the</strong>n be used as <strong>the</strong> foundation for <strong>the</strong> largest ever spectroscopic survey <strong>of</strong> galaxies,<br />

quasars and stars.<br />

A dedicated 2.5m telescope is specially designed to take wide field (3x3 degree)<br />

images using a 5×6 mosaic <strong>of</strong> 2048×2048 CCD’s, in five wavelength bands, operating<br />

in scanning mode. Spectroscopic targets are <strong>the</strong>n observed using two spectrographs<br />

each with 320 fibres feeding in light from <strong>the</strong> focal plane. A total <strong>of</strong> four 2048×2048<br />

CCDs (one for each channel <strong>of</strong> each spectrograph) collect <strong>the</strong> spectra.<br />

The total raw data will exceed 40 terabytes, and a processed subset <strong>of</strong> about 1<br />

terabyte in size will consist <strong>of</strong> 1 million spectra, positions, and image parameters for<br />

over 100 million objects, plus a mini-image centered on each object in every colour.<br />

The data will be made available to <strong>the</strong> public at specific milestone releases, and upon<br />

completion <strong>of</strong> <strong>the</strong> survey.<br />

General-Purpose Observatories<br />

Traditional ground-based observatories have been saving data, primarily as backups<br />

for <strong>the</strong> users, for a significant time, accumulating large quantities <strong>of</strong> valuable, but<br />

heterogeneous, data. Unfortunately, lack <strong>of</strong> funding, and this inherent heterogeneity,<br />

makes it difficult to archive <strong>the</strong> data in such a way as to make it available and easy<br />

to access for <strong>the</strong> wider astronomical community. However, some notable exceptions do<br />

5 http://www.sdss.org/


1.2 Large Data Sets And Their Sources 9<br />

exist.<br />

The National Optical Astronomy <strong>Observatory</strong> 6 (NOAO) is a US organisation<br />

that manages ground-based national astronomical observatories including <strong>the</strong> Kitt Peak<br />

National <strong>Observatory</strong>, Cerro Tololo Inter-American <strong>Observatory</strong>, and <strong>the</strong> National Solar<br />

<strong>Observatory</strong>.<br />

The NOAO has been archiving all data from <strong>the</strong>ir telescopes in a program called<br />

“Save-<strong>the</strong>-Bits” which, prior to <strong>the</strong> introduction <strong>of</strong> survey-grade instrumentation, generated<br />

around half a terabyte and over 250,000 images a year. With <strong>the</strong> introduction <strong>of</strong><br />

survey instruments and related programs, <strong>the</strong> rate <strong>of</strong> data accumulation has increased,<br />

and NOAO now manages over 10 terabytes <strong>of</strong> data.<br />

The European Sou<strong>the</strong>rn <strong>Observatory</strong> 7 (ESO) operates a number <strong>of</strong> telescopes<br />

(including <strong>the</strong> four 8m class VLT) telescopes at two observatories in <strong>the</strong> sou<strong>the</strong>rn<br />

hemisphere: <strong>the</strong> La Silla <strong>Observatory</strong>, and <strong>the</strong> Paranal observatory. As with many<br />

o<strong>the</strong>r ground-based observatories, ESO has been archiving data for some time, with<br />

storage rates approaching a steady rate <strong>of</strong> approximately 20 terabytes <strong>of</strong> data per year<br />

from all <strong>of</strong> <strong>the</strong>ir telescopes. This number will eventually increase to several hundred<br />

terabytes with <strong>the</strong> completion <strong>of</strong> <strong>the</strong> rest <strong>of</strong> <strong>the</strong> planned facilities, including <strong>the</strong> VST, a<br />

dedicated survey telescope similar in nature to <strong>the</strong> telescope built for <strong>the</strong> SDSS project.<br />

Space Missions<br />

Although ground-based observatories are aided by <strong>the</strong> advancement <strong>of</strong> technology and<br />

continue to make important discoveries, <strong>the</strong>y will always be encumbered by <strong>the</strong> restrictions<br />

imposed by <strong>the</strong> Earth’s atmosphere. Thus, space missions, although extremely<br />

expensive, are critical components in <strong>the</strong> study <strong>of</strong> <strong>the</strong> universe, and all <strong>of</strong> <strong>the</strong> data <strong>the</strong>y<br />

produce are very valuable and <strong>the</strong>refore archived.<br />

6 http://www.noao.edu/<br />

7 http://www.eso.org/<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


10 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

The Multimission Archive at <strong>the</strong> Space Telescope Science Institue 8 (MAST)<br />

archives a variety <strong>of</strong> astronomical data ga<strong>the</strong>red from space missions, with <strong>the</strong> primary<br />

emphasis on <strong>the</strong> optical, ultraviolet, and near-infrared parts <strong>of</strong> <strong>the</strong> spectrum. MAST<br />

provides a cross correlation tool allowing users to search all archived data for all observations<br />

which contain sources from ei<strong>the</strong>r archived or user-supplied catalogue data. In<br />

addition, MAST provides individual mission query capabilities.<br />

The dominant holding for MAST is <strong>the</strong> data archive from <strong>the</strong> Hubble Space Telescope,<br />

but with total holdings currently exceeding ten terabytes, including (or providing<br />

links to) archival data for <strong>the</strong> following missions or projects: Hubble Data Archive,<br />

Galaxy Explorer, Far Ultraviolet Explorer, International Ultraviolet Explorer Final<br />

Archive, Extreme Ultraviolet Explorer, Hopkins Ultraviolet Telescope Archive, Ultraviolet<br />

Imaging Telescope Archive, Wisconsin Ultraviolet Photopolarimeter Experiment<br />

Archive, Copernicus UV Satellite Archive, Berkeley Extreme and Far-UV Spectrometer,<br />

The Interstellar Medium Absorption Pr<strong>of</strong>ile Spectrograph, Digitized Sky Survey,<br />

The Röntgen SATellite Archive.<br />

Virtual Observatories<br />

The Virtual <strong>Observatory</strong> (VO) concept represents a scientific and technological framework<br />

aimed at trying to manage <strong>the</strong> ongoing exponential growth in <strong>the</strong> volume, quality,<br />

and complexity <strong>of</strong> astronomical data ga<strong>the</strong>red by all <strong>of</strong> <strong>the</strong> sources discussed previously.<br />

Two main challenges are faced:<br />

1. The effective inter-linking <strong>of</strong> large, geographically distributed data sets and digital<br />

sky archives in a homogeneous manner <strong>the</strong>reby allowing <strong>the</strong> optimal use <strong>of</strong> data<br />

mining algorithms to extract new science.<br />

2. The research and development <strong>of</strong> data mining and “knowledge discovery in<br />

databases” (KDD) algorithms and techniques for <strong>the</strong> exploration and scientific<br />

8 http://archive.stsci.edu/mast.html


1.2 Large Data Sets And Their Sources 11<br />

investigation <strong>of</strong> large digital sky surveys, including combined multi-wavelength<br />

data sets.<br />

These problems have significant relevance beyond <strong>the</strong> field <strong>of</strong> astronomy as many<br />

aspects <strong>of</strong> society are struggling with information overload.<br />

The National Virtual <strong>Observatory</strong> 9 (NVO) is a project funded by <strong>the</strong> US National<br />

Science Foundation to research and explore <strong>the</strong> technologies necessary to create<br />

a VO. The central <strong>the</strong>mes <strong>of</strong> this research are <strong>the</strong> formation and adoption <strong>of</strong> standards<br />

to make <strong>the</strong> sharing <strong>of</strong> astronomical data easier. An NVO standard that has been<br />

adopted worldwide in this regard is “VOTable”, a way to represent a table <strong>of</strong> data in<br />

XML with good meta-data about <strong>the</strong> semantic meaning <strong>of</strong> <strong>the</strong> data. Grid computing<br />

is seen as an important resource for <strong>the</strong> large-scale analysis <strong>of</strong> astronomical data. The<br />

NVO have also produced research prototypes demonstrating that interesting and efficient<br />

research can be done by building upon on just a few new protocols and standards<br />

for data exchange and access.<br />

The AstroGrid 10 project is a UK government funded, open source project designed<br />

to create a working VO for UK and international astronomers. The goals <strong>of</strong> <strong>the</strong> Astrogrid<br />

project are:<br />

• A working datagrid for key UK databases<br />

• High throughput data mining facilities for interrogating those databases<br />

• A uniform archive query and data-mining s<strong>of</strong>tware interface<br />

• The ability to browse simultaneously multiple datasets<br />

• A set <strong>of</strong> tools for integrated on-line analysis <strong>of</strong> extracted data<br />

• A set <strong>of</strong> tools for on-line database analysis and exploration<br />

9 http://www.us-vo.org/<br />

10 http://www.astrogrid.org/<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


12 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

• A facility for users to upload code to run <strong>the</strong>ir own algorithms on <strong>the</strong> data mining<br />

machines,<br />

• An exploration <strong>of</strong> techniques for open-ended resource discovery<br />

Many <strong>of</strong> <strong>the</strong>se goals are common to o<strong>the</strong>r nations and o<strong>the</strong>r disciplines, and <strong>the</strong><br />

AstroGrid project is working closely with o<strong>the</strong>r VO projects worldwide through <strong>the</strong><br />

International Virtual <strong>Observatory</strong> Alliance (IVOA) – jointly formed with <strong>the</strong> NVO,<br />

and o<strong>the</strong>r world-wide VO efforts – to deliver <strong>the</strong>se goals.<br />

1.3 Astronomical <strong>Spectra</strong><br />

It is clear that much work lies ahead if astronomers are to keep up with <strong>the</strong> ever<br />

increasing amounts <strong>of</strong> data <strong>the</strong>ir telescopes are able to ga<strong>the</strong>r. As such, <strong>the</strong> project<br />

presented in this <strong>the</strong>sis focusses on one particular aspect <strong>of</strong> <strong>the</strong> data mining problem:<br />

methods to analyse digitised astronomical spectra in an automated fashion.<br />

The central idea <strong>of</strong> data mining is to be able to turn large quantities <strong>of</strong> unknown<br />

information into meaningful interpretations, and this is very much a non-trivial task in<br />

<strong>the</strong> context <strong>of</strong> astronomical spectra. Before large-scale statistics can be done to search<br />

for patterns, <strong>the</strong> spectra <strong>of</strong> an interesting type <strong>of</strong> object need to be selected from a<br />

set <strong>of</strong> unknown data. Then, <strong>the</strong> major analytical tasks are usually <strong>the</strong> classification<br />

and physical parameterisation <strong>of</strong> <strong>the</strong> spectra, after which pattern searching can be<br />

performed.<br />

The problems <strong>of</strong> searching, classification, and physical parameterisation all involve<br />

some kind <strong>of</strong> pattern matching in and <strong>of</strong> <strong>the</strong>mselves. Searching, which is basically a very<br />

coarse initial classification, matches unknown spectra to a set <strong>of</strong> known examples <strong>of</strong> a<br />

search target, retaining only those spectra which are within some acceptable distance<br />

from <strong>the</strong> set <strong>of</strong> examples. Classification assigns a fine-grained category to an object<br />

based on how well it matches <strong>the</strong> spectral standards <strong>of</strong> <strong>the</strong> classification system used.


1.3 Astronomical <strong>Spectra</strong> 13<br />

Physical parameterisation matches observations to grids <strong>of</strong> <strong>the</strong>oretical models in an<br />

attempt to find <strong>the</strong> best fit and, consequently, estimates for <strong>the</strong> main physical quantities<br />

<strong>of</strong> interest<br />

1.3.1 Types Of Objects And Their <strong>Spectra</strong><br />

All objects in <strong>the</strong> night sky can be studied by spectroscopic analysis. Each object has<br />

a set <strong>of</strong> distinct features which can be found in its spectrum, reflecting <strong>the</strong> specific<br />

physical processes at work in or around <strong>the</strong> object. This section gives some examples<br />

<strong>of</strong> <strong>the</strong>se objects and <strong>the</strong> spectra <strong>the</strong>y produce.<br />

In Figure 1.1, <strong>the</strong> top plot shows <strong>the</strong> spectrum <strong>of</strong> a hot star. The overall shape<br />

<strong>of</strong> a stellar spectrum approximates <strong>the</strong> curve <strong>of</strong> a black body at <strong>the</strong> same effective<br />

temperature. This temperature can be estimated from <strong>the</strong> peak wavelength (Wien’s<br />

displacement law) or from <strong>the</strong> area under <strong>the</strong> spectrum (using <strong>the</strong> Stefan-Boltzmann<br />

law). The absorption lines in <strong>the</strong> spectrum reflect <strong>the</strong> various chemicals present in <strong>the</strong><br />

star’s atmosphere, and tell <strong>of</strong> <strong>the</strong> specific physical conditions in that region <strong>of</strong> <strong>the</strong> star.<br />

The bottom plot in Figure 1.1 is that <strong>of</strong> a galaxy spectrum. The overall spectrum<br />

<strong>of</strong> a galaxy is simply <strong>the</strong> combined spectrum <strong>of</strong> all <strong>the</strong> stars and o<strong>the</strong>r radiating matter<br />

in <strong>the</strong> galaxy. As galaxies differ in structure and relative composition <strong>of</strong> stellar type<br />

and gas, <strong>the</strong>ir spectra will also differ.<br />

Unlike stars, galaxies are not point sources, so <strong>the</strong>ir spectra must be obtained differently.<br />

As a galaxy can <strong>of</strong>ten be resolved as an extended object, it is possible to take a<br />

spectrum <strong>of</strong> different parts <strong>of</strong> <strong>the</strong> galaxy, providing information about its composition,<br />

<strong>the</strong> stellar birth rates, and rotational velocity for that particular region.<br />

Quasars exhibit very bright emission features relative to a low intensity continuum<br />

in <strong>the</strong>ir spectra, as can be seen in <strong>the</strong> top plot <strong>of</strong> Figure 1.2. In fact, it was only through<br />

careful analysis <strong>of</strong> <strong>the</strong> spectra <strong>of</strong> quasars that astronomers realised <strong>the</strong>y were not just<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


14 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Figure 1.1: A stellar spectrum (top), and a galaxy spectrum (bottom). (Taken from<br />

<strong>the</strong> SDSS)


1.3 Astronomical <strong>Spectra</strong> 15<br />

Figure 1.2: Example <strong>of</strong> a quasar (top) and carbon star (bottom) spectrum. (Taken<br />

from <strong>the</strong> SDSS)<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


16 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Figure 1.3: The emission spectrum <strong>of</strong> <strong>the</strong> Orion nebula (M42).<br />

faint stars. The emission lines in quasar spectra are not where <strong>the</strong>y are expected to be<br />

seen if <strong>the</strong> object was a nearby star. The standard explanation is that <strong>the</strong> quasar is<br />

at a vast distance and so appears to be receding from us due to <strong>the</strong> expansion <strong>of</strong> <strong>the</strong><br />

Universe. This high recession velocity relative to <strong>the</strong> Earth causes <strong>the</strong> spectral lines to<br />

be redshifted to longer wavelengths.<br />

Exotic stars, such as Wolf-Rayet stars or <strong>the</strong> carbon star in <strong>the</strong> bottom plot <strong>of</strong><br />

Figure 1.2, are identified by <strong>the</strong> features present in <strong>the</strong>ir spectra. Carbon stars can<br />

have similar temperatures to G, K, and M-class stars (4,600 - 3,100 K) but have a<br />

much higher abundance <strong>of</strong> carbon than normal stars which appears in <strong>the</strong> spectrum<br />

as very strong molecular bands (C 2 ). As <strong>the</strong>se stars have such low temperatures, <strong>the</strong>y<br />

appear red in colour, but <strong>the</strong> carbon molecules absorb light at blue wavelengths which<br />

makes <strong>the</strong> star appear even redder. Carbon stars are assigned a type C spectral class.<br />

Emission nebulae are clouds <strong>of</strong> high temperature gas. The atoms in <strong>the</strong> cloud are<br />

ionised by ultraviolet light from a nearby star and emit radiation as <strong>the</strong> electrons fall


1.3 Astronomical <strong>Spectra</strong> 17<br />

back into atomic orbitals, so <strong>the</strong>ir spectra show strong emission lines, as can be seen<br />

in Figure 1.3.<br />

These nebulae usually appear to be red because <strong>the</strong> predominant emission line <strong>of</strong><br />

hydrogen in <strong>the</strong> optical (Hα) happens to be red. Although o<strong>the</strong>r colours are produced<br />

by o<strong>the</strong>r atoms, hydrogen is by far <strong>the</strong> most abundant. Emission nebulae are usually<br />

<strong>the</strong> sites <strong>of</strong> recent and ongoing star formation.<br />

1.3.2 <strong>Automatic</strong> Methods <strong>of</strong> <strong>Analysis</strong><br />

Despite <strong>the</strong> diversity in features present in <strong>the</strong> spectra <strong>of</strong> astronomical objects, <strong>the</strong>ir<br />

general character always remains <strong>the</strong> same, namely, flux intensities measured across<br />

some wavelength range. This permits an automated method <strong>of</strong> analysis developed for<br />

one type <strong>of</strong> object to be applied, in principle, to <strong>the</strong> spectra <strong>of</strong> ano<strong>the</strong>r.<br />

Over <strong>the</strong> years, a small number <strong>of</strong> automatic pattern matching techniques have found<br />

wide-spread use in <strong>the</strong> field. <strong>On</strong>e <strong>of</strong> <strong>the</strong> first, and simplest, is <strong>the</strong> cross-correlation function.<br />

This is a signal processing technique wherein two signals are convolved according<br />

to <strong>the</strong> integral<br />

c(z) =<br />

∫ ∞<br />

−∞<br />

T(x)G(z − x)dx. (1.1)<br />

which convolves two functions, T(x) and G(x), over an infinite range, z = [−∞, ∞],<br />

yielding <strong>the</strong> resulting cross-correlation function, c(z).<br />

Simkin (1974) demonstrated <strong>the</strong> use <strong>of</strong> <strong>the</strong> cross-correlation function for measuring<br />

<strong>the</strong> radial velocities <strong>of</strong> stars and galaxies. Tonry & Davis (1979) <strong>the</strong>n applied <strong>the</strong> technique<br />

in a survey to measure galaxy redshifts. Kurtz (1982) used cross-correlation to<br />

classify low resolution (14 Å) stellar spectra onto <strong>the</strong> MK classification system (Morgan<br />

et al., 1978). Cross-correlation remains an important, basic tool that is widely used,<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


18 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

mainly as a method for calculating radial velocities.<br />

Related to <strong>the</strong> cross-correlation function are minimum distance methods (MDM).<br />

Here, an observation is compared with a set <strong>of</strong> templates with <strong>the</strong> intention <strong>of</strong> finding a<br />

match which minimises some distance metric. Kurtz (1982), Lasala (1994), and Gulati<br />

et al. (1994a) used this technique to classify stellar spectra with very positive results.<br />

The application <strong>of</strong> minimum distance methods to <strong>the</strong> parameterisation <strong>of</strong> stellar spectra<br />

by fitting observations to grids <strong>of</strong> <strong>the</strong>oretical models is discussed in Chapter 3.<br />

Aritifical neural networks (ANNs) are a statistical pattern matching algorithm which<br />

have found wide application due to <strong>the</strong>ir powerful ability to “learn” highly non-linear<br />

function mappings by studying examples <strong>of</strong> such mappings. von Hippel et al. (1994)<br />

outline <strong>the</strong> use <strong>of</strong> ANNs for <strong>the</strong> classification <strong>of</strong> stellar spectra. Folkes et al. (1996)<br />

use ANNs to provide automatic classifications <strong>of</strong> low S/N galaxy spectra. Gulati et al.<br />

(1997a) show <strong>the</strong> use <strong>of</strong> ANNs in determining reddening estimates from low-dispersion<br />

ultraviolet spectra <strong>of</strong> O and B stars. Weaver (2000a) demonstrates an ANN-based<br />

technique for performing two-dimensional classification <strong>of</strong> <strong>the</strong> components <strong>of</strong> binary<br />

stars. Qin et al. (2003) use a form <strong>of</strong> ANN to perform automatic star-galaxy separation<br />

by spectra with a high success rate. The use <strong>of</strong> ANNs to provide classifications and<br />

physical parameterisations <strong>of</strong> stellar spectra is studied in Chapter 2.<br />

Principal Components <strong>Analysis</strong> (PCA) is a multivariate statistical technique which<br />

facilitiates <strong>the</strong> discovery <strong>of</strong> linear correlations between observed variables. Early work<br />

by Deeming (1964), Kurtz (1982), and Whitney (1983) examines <strong>the</strong> application <strong>of</strong><br />

PCA to <strong>the</strong> unsupervised classification <strong>of</strong> stellar spectra. Since <strong>the</strong>n, PCA has found a<br />

wide application in spectral analysis such as creating classification systems for galaxy<br />

spectra (Sodre et al., 1998; Galaz & de Lapparent, 1998; Connolly & Szalay, 1999),<br />

determination <strong>of</strong> galactic redshifts (Glazebrook et al., 1998), and investigating <strong>the</strong><br />

polarisation properties <strong>of</strong> broad absorption line quasars (Lamy & Hutsemékers, 2004).<br />

The application <strong>of</strong> PCA to stellar spectra is examined in more detail in Chapter 4.


1.4 Hot Subdwarf Stars 19<br />

1.4 Hot Subdwarf Stars<br />

The automatic analysis tool set established in this <strong>the</strong>sis, although general in nature,<br />

has been applied to <strong>the</strong> analysis <strong>of</strong> a specific type <strong>of</strong> astronomical object in order to<br />

demonstrate <strong>the</strong> effectiveness <strong>of</strong> <strong>the</strong> tools, and how <strong>the</strong>y might be used in a real-world<br />

scenario.<br />

The early type subluminous dwarfs (Greenstein & Sargent, 1974) are defined as stars<br />

which populate a region located below <strong>the</strong> upper main sequence on <strong>the</strong> Hertzsprung-<br />

Russell diagram, extending <strong>the</strong> horizontal branch to higher effective temperatures, <strong>the</strong>y<br />

are mostly considered to be low-mass (M core ≈ 0.50 − 0.55M ⊙ ), core helium burning<br />

objects surrounded by a thin envelope <strong>of</strong> hydrogen. Visibly, <strong>the</strong>y are quite blue objects,<br />

(B − V ) ≈ −0.3, (U − B) ≈ −1.0, and have been shown to dominate <strong>the</strong> population <strong>of</strong><br />

faint blue stars in <strong>the</strong> galaxy (m B ≤ 16) (Green et al., 1986). Regardless <strong>of</strong> <strong>the</strong>ir prior<br />

evolution, hot subdwarfs are thought to be direct progenitors <strong>of</strong> white dwarfs, although<br />

only a small fraction (< 2%) <strong>of</strong> white dwarfs are formed through this route.<br />

1.4.1 Spectroscopy<br />

The hot subdwarfs fall into three broad subgroups based on spectroscopic criteria.<br />

sdB Strong Stark-broadened hydrogen lines, with weak He I and no Mg II absorption<br />

lines.<br />

sdOB/He-sdB Strong HeI absorption with weak or absent hydrogen Balmer lines,<br />

and HeII. Carbon lines <strong>of</strong> varying strength.<br />

sdO Strong He II and weak He I lines, with broad and shallow hydrogen Balmer lines<br />

superimposed with He II lines.<br />

Examples from each <strong>of</strong> <strong>the</strong>se subgroups can be seen in Figure 1.4.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


20 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

3<br />

HeII<br />

2.5<br />

PG1220-056<br />

sdO3VII:He40<br />

Flux (continuum = 1) + const.<br />

2<br />

1.5<br />

1<br />

FEIGE 110<br />

sdO8VII:He6<br />

PG1532+523<br />

sdB1VII:He4<br />

PG1544+488<br />

sdBC1VII:He39<br />

0.5<br />

HeI<br />

H CII H CII CIII<br />

H<br />

0<br />

4000 4200 4400 4600 4800 5000 5200<br />

Wavelength (Angstroms)<br />

Figure 1.4: Examples from each hot subdwarf spectrographic subgroup. Classifications<br />

listed are those from Drilling et al. (2006).<br />

Analyses <strong>of</strong> sdB spectra (e.g., Edelmann et al., 2003) show <strong>the</strong>m to have effective<br />

temperatures in <strong>the</strong> range 20,000 ≤ T eff /K ≤ 40,000, surface gravities in <strong>the</strong> range<br />

5.0 ≤ log g(cgs) ≤ 6.0, and extremely helium-deficient atmospheres n He /n H 0.01.<br />

sdB stars are thought to be low-mass (M core ≈ 0.50 −0.55M ⊙ , Caloi 1976), core helium<br />

burning objects, with a very thin hydrogen envelope (M env 0.02M ⊙ , Heber 1986).<br />

The helium deficiency <strong>of</strong> sdB stars is believed to be caused by gravitational settling,<br />

i.e., <strong>the</strong> settling <strong>of</strong> heavier elements sure to gravity (Wesemael et al., 1982). However,<br />

Heber (1991) found that some sdB stars show metals like carbon and silicon to be<br />

over-abundant in <strong>the</strong>ir atmospheres, believed to be due to radiative levitation being<br />

large for those elements.<br />

Analyses <strong>of</strong> sdO spectra performed by Dreizler et al. (1990) and Thejll et al. (1994)<br />

find that <strong>the</strong>y have effective temperatures in <strong>the</strong> range 40,000 ≤ T eff /K ≤ 80,000, with<br />

<strong>the</strong> majority lying between 40,000 − 50,000K. Surface gravities lie in <strong>the</strong> range 4.0 ≤


1.4 Hot Subdwarf Stars 21<br />

Bright<br />

a)<br />

L<br />

The Hertzsprung Russell Diagram<br />

High mass<br />

Main Sequence<br />

Horizontal<br />

Branch<br />

Large<br />

Red Giants<br />

b) c)<br />

Asymptotic Giant Branch<br />

Helium Burning<br />

L<br />

Giant Branch<br />

L<br />

expansion slowed, envelope<br />

removed by companion<br />

subdwarf B stars<br />

White Dwarfs<br />

Sun<br />

Hydrogen Burning<br />

binary star<br />

Faint<br />

Small<br />

Blue/Hot<br />

T<br />

Low mass<br />

Red/Cool<br />

Normal<br />

<strong>Stellar</strong> Evolution<br />

T<br />

T<br />

Figure 1.5: Schematic temperature-luminosity diagrams showing: a) <strong>the</strong> positions <strong>of</strong><br />

stars belonging to <strong>the</strong> main stellar groups; b) <strong>the</strong> normal sequence <strong>of</strong> stellar evolution<br />

experienced by a star <strong>of</strong> a few solar masses; c) possible evolution <strong>of</strong> an sdB star in a<br />

binary system. (Diagram courtesy <strong>of</strong> C.S. Jeffery).<br />

log g(cgs) ≤ 6.5, and <strong>the</strong> atmospheres <strong>of</strong> most sdO stars are helium-rich, n He 0.50,<br />

with additional enrichment <strong>of</strong> carbon and nitrogen.<br />

Drilling (1996) and Jeffery et al. (1997) represent <strong>the</strong> first attempts to introduce a<br />

homogeneous classification system for hot subdwarfs. This past work has been extended<br />

and fur<strong>the</strong>r refined by Drilling et al. (2006) to produce a three-dimensional classification<br />

system based on a spectral type, luminosity class, and a helium class. The standard<br />

stars <strong>of</strong> this system are used in Chapter 2 as <strong>the</strong> basis for training an artificial neural<br />

network to automatically classify hot subdwarf spectra.<br />

1.4.2 <strong>Stellar</strong> Evolution<br />

<strong>On</strong>e <strong>of</strong> <strong>the</strong> most useful tools in stellar astronomy is <strong>the</strong> Hertzsprung-Russell (HR) diagram<br />

which plots absolute magnitude against spectral type. The relationship between<br />

<strong>the</strong>se two parameters shows several important patterns, with <strong>the</strong> most significant being<br />

that <strong>the</strong> majority <strong>of</strong> stars lie within a band stretching from <strong>the</strong> region <strong>of</strong> bright, hot<br />

stars to <strong>the</strong> region <strong>of</strong> dim, cool stars. This band is called <strong>the</strong> main sequence <strong>of</strong> <strong>the</strong> HR<br />

diagram. The giant stars are seen as a large cluster occuring above <strong>the</strong> cooler end <strong>of</strong><br />

<strong>the</strong> main sequence, and <strong>the</strong> white dwarfs populate a sequence <strong>of</strong> dim, hot stars running<br />

almost parallel to <strong>the</strong> main sequence. Evidently, <strong>the</strong> HR diagram serves as a kind <strong>of</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


22 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

atlas for <strong>the</strong> different types <strong>of</strong> stars, and stellar evolution is usually described in terms<br />

<strong>of</strong> how <strong>the</strong> underlying physics changes a star’s position on <strong>the</strong> HR diagram over time.<br />

The HR diagram can also be plotted as <strong>the</strong> relationship between colour and absolute<br />

magnitude, <strong>the</strong> version frequently used by observers. Theorists prefer to plot luminosity<br />

(or surface gravity) against effective temperature, as shown in <strong>the</strong> schematic diagram<br />

<strong>of</strong> Figure 1.5a. The log g-T eff version <strong>of</strong> <strong>the</strong> HR diagram will be used later in this <strong>the</strong>sis<br />

(Chapters 5 and 6).<br />

Canonical stellar evolution <strong>the</strong>ory (see Figure 1.5b) predicts that a low-mass, core<br />

hydrogen burning main sequence star will eventually exhaust all <strong>the</strong> hydrogen in its<br />

core, converting it, through nuclear fusion, into helium.<br />

At <strong>the</strong> point when core hydrogen fusion ceases, <strong>the</strong> core is not hot enough to begin<br />

helium fusion and starts to collapse because no energy is being generated to counteract<br />

<strong>the</strong> effect <strong>of</strong> gravity. The collapsing core heats up, with some <strong>of</strong> this heat being transferred<br />

into <strong>the</strong> hydrogen envelope surrounding <strong>the</strong> core. Eventually, this envelope can<br />

become hot enough to fuse hydrogen in a thin shell at <strong>the</strong> core boundary.<br />

The continued core collapse and hydrogen shell burning causes temperature and<br />

pressure in <strong>the</strong> shell to increase. The increasing shell temperature supplies sufficient<br />

pressure to <strong>the</strong> outer layers <strong>of</strong> <strong>the</strong> star, causing <strong>the</strong>m to expand and cool. The star leaves<br />

<strong>the</strong> main sequence, and evolves to lower temperatures at nearly constant luminosity,<br />

eventually reaching <strong>the</strong> red giant branch. Mass can be lost from <strong>the</strong> outer layers due<br />

to stellar winds.<br />

The core collapse continues until <strong>the</strong> helium ceases to behave like an ideal gas, and<br />

becomes electron degenerate. Essentially, this means that <strong>the</strong> gas doesn’t expand very<br />

much as its temperature increases. The hydrogen burning shell adds helium to <strong>the</strong><br />

core which continues to increase in temperature. The core finally becomes hot enough<br />

to fuse helium and commences this reaction in an explosive manner called <strong>the</strong> helium<br />

flash.


1.4 Hot Subdwarf Stars 23<br />

The degeneracy <strong>of</strong> <strong>the</strong> core is removed, and it expands and cools as helium burning<br />

continues. The temperature in <strong>the</strong> hydrogen envelope also cools. The star contracts<br />

again as a new state <strong>of</strong> equilibrium is reached, and settles on <strong>the</strong> horizontal branch.<br />

A star on <strong>the</strong> horizontal branch has two energy sources: a helium burning core, and<br />

a hydrogen burning shell. The star evolves at nearly constant luminosity, with <strong>the</strong> core<br />

converting helium into mostly carbon and oxygen. When <strong>the</strong> helium is exhausted, <strong>the</strong><br />

core again begins to contract under gravity. Now, <strong>the</strong>re is a hydrogen burning shell<br />

and a helium burning shell which cause <strong>the</strong> star to expand, evolving with increasing<br />

luminosity to <strong>the</strong> asymptotic giant branch.<br />

This stage <strong>of</strong> <strong>the</strong> star’s life is characterised by high mass loss due to stellar winds.<br />

The process <strong>of</strong> helium fusion is very sensitive to temperature, so <strong>the</strong> helium burning<br />

shell goes through a series <strong>of</strong> <strong>the</strong>rmal pulses alternating with periods <strong>of</strong> quiescence.<br />

This is thought to enhance <strong>the</strong> efficiency <strong>of</strong> <strong>the</strong> stellar winds until <strong>the</strong> entire outer<br />

envelope <strong>of</strong> <strong>the</strong> star is lost. When <strong>the</strong> mass <strong>of</strong> <strong>the</strong> envelope is almost entirely depleted,<br />

<strong>the</strong> star begins to evolve across <strong>the</strong> HR diagram at constant luminosity.<br />

A significant fraction <strong>of</strong> material has been ejected from <strong>the</strong> outer regions <strong>of</strong> <strong>the</strong> star,<br />

and <strong>the</strong> expelled gas is ionised by <strong>the</strong> star (temperatures <strong>of</strong> such stars <strong>of</strong>ten exceed<br />

50,000K). The planetary nebula disperses into interstellar space.<br />

The hydrogen and helium burning layers eventually extinguish, and <strong>the</strong> star becomes<br />

a white dwarf with a degenerate carbon–oxygen core. The core cools quickly and<br />

luminosity decreases, but it takes a long time for <strong>the</strong> <strong>the</strong>rmal energy in <strong>the</strong> core to be<br />

radiated away completely.<br />

sdB Evolution<br />

Extended horizontal branch stars tend to differ from true horizontal branch stars in<br />

terms <strong>of</strong> <strong>the</strong> luminosity <strong>of</strong> <strong>the</strong> hydrogen burning shell. As noted above, <strong>the</strong> mass <strong>of</strong><br />

this envelope is very small (M env 0.02M ⊙ ) for a subdwarf B star, meaning that its<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


24 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

luminosity is negligible. For a normal horizontal branch star, <strong>the</strong> luminosity <strong>of</strong> its<br />

hydrogen envelope equals or even exceeds that <strong>of</strong> <strong>the</strong> helium core.<br />

How <strong>the</strong> hot subdwarfs come to arrive on <strong>the</strong> extended horizontal branch is still<br />

under debate. A number <strong>of</strong> scenarios have been proposed to explain <strong>the</strong> evolution <strong>of</strong><br />

sdB stars.<br />

In <strong>the</strong> single star scenario, enhanced mass loss on <strong>the</strong> red giant branch due to stellar<br />

winds may remove all <strong>of</strong> <strong>the</strong> hydrogen-rich envelope before core helium burning begins<br />

(D’Cruz et al., 1996).<br />

In <strong>the</strong> binary scenario (see Figure 1.5c), Mengel et al. (1976) suggest sdB’s could<br />

be formed from relatively wide binaries. Mass transfer through stable Roche Lobe<br />

overflow results in a depletion <strong>of</strong> <strong>the</strong> hydrogen-rich envelope prior to <strong>the</strong> helium core<br />

flash. If <strong>the</strong> sdB progenitor and its compact companion are in a close binary system, a<br />

common-envelope phase can result in <strong>the</strong> creation <strong>of</strong> a helium star. More recent work<br />

(Maxted et al., 2001) suggests ∼2/3 <strong>of</strong> sdBs are in close binary systems.<br />

sdO Evolution<br />

The atmospheric parameters <strong>of</strong> sdO stars show <strong>the</strong>m to be less homogenous than <strong>the</strong><br />

sdBs. Generally, <strong>the</strong>y appear to fall into two subgroups on <strong>the</strong> log g–T eff plane. <strong>On</strong>e<br />

group (“compact” sdOs) lies close to <strong>the</strong> <strong>the</strong>oretical post-extended horizontal branch<br />

evolutionary tracks, and <strong>the</strong>refore might have evolved from sdB stars. The o<strong>the</strong>r group<br />

have lower surface gravities (“luminous” sdOs), lying closer to <strong>the</strong> post-asymptotic<br />

giant branch tracks. These stars are found in <strong>the</strong> same region on <strong>the</strong> log g–T eff plane<br />

as <strong>the</strong> central stars <strong>of</strong> planetary nebulae.<br />

Various evolutionary scenarios have been proposed to explain <strong>the</strong> origin <strong>of</strong> sdO stars,<br />

and it is unlikely that a single scenario can come to explain both subgroups.<br />

Several <strong>the</strong>ories exist for “compact” sdOs. The Post EHB scenario attempts to


1.4 Hot Subdwarf Stars 25<br />

explain <strong>the</strong> large number <strong>of</strong> sdOs found at <strong>the</strong> extreme end <strong>of</strong> <strong>the</strong> horizontal branch,<br />

along <strong>the</strong> helium burning main sequence, which suggests a close connection to sdB stars<br />

(Caloi, 1989; Dorman et al., 1993). But how does an sdB star become an sdO? It has<br />

been suggested that <strong>the</strong> hydrogen-rich envelope <strong>of</strong> an sdB can re-ignite during <strong>the</strong> postextended<br />

horizontal branch phase, causing <strong>the</strong> star to evolve towards <strong>the</strong> asymptotic<br />

giant branch. However, <strong>the</strong> luminosity <strong>of</strong> <strong>the</strong> star is not sufficient to let it ascend <strong>the</strong><br />

asymptotic giant branch, so <strong>the</strong> star returns to <strong>the</strong> sdO region. Dreizler et al. (1990)<br />

propose an alternate <strong>the</strong>ory wherein deep mixing <strong>of</strong> <strong>the</strong> star’s atmosphere by helium<br />

shell flashes could explain <strong>the</strong> helium enrichment seen in sdO stars.<br />

O<strong>the</strong>r explanations for compact sdOs include <strong>the</strong> delayed helium flash scenario proposed<br />

by Sweigart (1997) which suggests that if mass loss during <strong>the</strong> red giant branch<br />

is too high, <strong>the</strong>n <strong>the</strong> helium core never reaches ignition mass, and <strong>the</strong> star ends up as a<br />

helium white dwarf without going through a horizontal branch phase. Alternatively, if<br />

<strong>the</strong> ignition <strong>of</strong> helium is delayed but can still occur on <strong>the</strong> white dwarf cooling sequence,<br />

it will take <strong>the</strong> star into <strong>the</strong> region <strong>of</strong> <strong>the</strong> sdO stars.<br />

A third evolutionary scenario comes from binary white dwarf mergers as studied by<br />

Iben (1990). It was found that <strong>the</strong> evolution <strong>of</strong> close binary systems, leading to <strong>the</strong><br />

merger <strong>of</strong> He+He and CO+He white dwarfs, could produce low-mass helium burning<br />

stars similar to sdOs. Strong support for this scenario comes from Napiwotzki et al.<br />

(2004) who found that almost all <strong>of</strong> <strong>the</strong> sdO stars in <strong>the</strong>ir sample were apparently<br />

single.<br />

For <strong>the</strong> “luminous” sdO stars, Heber & Hunger (1987) suggest that <strong>the</strong>y are “born<br />

again post-asymptotic giant branch” stars. In this scenario (Iben et al., 1983), a postasymptotic<br />

giant branch star undergoes a late helium shell flash, sending it to <strong>the</strong><br />

asymptotic giant branch for a second time. During this phase, <strong>the</strong> outer hydrogen<br />

envelope can be completely removed by stellar winds, leaving <strong>the</strong> star with <strong>the</strong> appearance<br />

<strong>of</strong> a luminous sdO star. Husfeld et al. (1989) suggest that a small number <strong>of</strong> sdO<br />

stars are also formed from normal post-asymptotic giant branch evolution.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


26 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

1.4.3 Why Study Them?<br />

The study <strong>of</strong> hot subdwarfs is important in several respects. As <strong>the</strong>y exist in large<br />

numbers, and have been shown to be highly evolved stars, <strong>the</strong>y are useful indicators for<br />

studying <strong>the</strong> structure and evolution <strong>of</strong> <strong>the</strong> galaxy. Brown et al. (1997) suggest that<br />

<strong>the</strong>se stars are <strong>the</strong> main cause <strong>of</strong> <strong>the</strong> ultraviolet upturn phenomenon (UV excess) seen<br />

in elliptical galaxies and <strong>the</strong> bulges <strong>of</strong> o<strong>the</strong>r spiral galaxies because <strong>the</strong>y spend a long<br />

time (10 8 yrs) on <strong>the</strong> extended horizontal branch at high temperatures. They are also<br />

considered to be useful age indicators for elliptical galaxies (Brown et al., 2000).<br />

As described previously, <strong>the</strong> hot subdwarfs are interesting in <strong>the</strong>ir own right because<br />

<strong>the</strong>ir evolution cannot seem to be explained by canonical stellar evolution <strong>the</strong>ories. This<br />

makes <strong>the</strong>m important objects from an astrophysical point <strong>of</strong> view.<br />

1.4.4 Why Search For Them In The SDSS?<br />

The Sloan Digital Sky Survey project and <strong>the</strong> data it produces is a prime example <strong>of</strong><br />

where <strong>the</strong> future <strong>of</strong> astronomy is heading.<br />

The main observational goal <strong>of</strong> <strong>the</strong> survey is to collect photometric and spectroscopic<br />

data on galaxies and quasars. However, many quasars appear as very blue objects, so<br />

<strong>the</strong> SDSS will observe spectra for a lot <strong>of</strong> blue stars, such as white dwarfs and hot<br />

subdwarfs, because <strong>the</strong>se objects cannot be differentiated at <strong>the</strong> photometric level.<br />

This makes <strong>the</strong> SDSS an unbiased, magnitude-limited survey containing potentially<br />

hundreds <strong>of</strong> moderate resolution (∼ 3.0Å), fully reduced hot subdwarf spectra which<br />

can be used to statistically identify new subgroups within an extracted sample. The<br />

large, homogeneous, publicly accessible data archives are <strong>the</strong>refore an excellent test site<br />

for <strong>the</strong> tool set developed in this <strong>the</strong>sis.


1.5 Summary 27<br />

1.5 Summary<br />

The continual advancement <strong>of</strong> observational and information technology is driving astronomy<br />

forward as a data-rich discipline. A clear need has been identified for robust<br />

automatic methods to help analyse large databases <strong>of</strong> astronomical data, and extract<br />

from <strong>the</strong>m useful science.<br />

This <strong>the</strong>sis focuses on automatic tools to search for and analyse astronomical spectra<br />

in large databases. Artificial neural networks (Chapter 2), χ 2 minimisation (Chapter<br />

3), and principal components analysis (Chapter 4) are <strong>the</strong> methods used to construct<br />

a generalisable tool kit for performing this task.<br />

The tools will be demonstrated by applying <strong>the</strong>m to <strong>the</strong> problem <strong>of</strong> searching for<br />

and analysing <strong>the</strong> spectra <strong>of</strong> hot subdwarf stars from <strong>the</strong> archives <strong>of</strong> <strong>the</strong> Sloan Digital<br />

Sky Survey (Chapter 5). They will also be used to analyse o<strong>the</strong>r smaller data sets<br />

(Chapter 6).<br />

As <strong>the</strong> amount <strong>of</strong> data ga<strong>the</strong>red by astronomers increases, much work is needed to<br />

improve <strong>the</strong> ways in which it can be analysed, and solve <strong>the</strong> problems that lie ahead.<br />

Some <strong>of</strong> <strong>the</strong> issues encountered during this project are discussed, finally, in Chapter 7.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Chapter 2<br />

Classification - Artificial Neural<br />

Networks<br />

Artificial neural networks (ANNs) are a statistical machine learning algorithm best<br />

thought <strong>of</strong> as arbitrary function estimators. They are able to provide a non-linear<br />

parameterised mapping between some input vector, x, and an output vector, y.<br />

For example, in <strong>the</strong> case <strong>of</strong> stellar spectral classification, x is <strong>the</strong> feature vector<br />

containing <strong>the</strong> flux values <strong>of</strong> a spectrum over some wavelength range, and y is <strong>the</strong> classification<br />

assigned to x according to some classification standard. The mapping performed<br />

by <strong>the</strong> ANN is analogous to <strong>the</strong> process which leads an expert human classifier<br />

to assign classification y to spectrum x. This ability to replicate non-linear functions<br />

makes ANNs a powerful tool in astronomical data mining.<br />

In <strong>the</strong> context <strong>of</strong> machine learning, ANNs are part <strong>of</strong> a wider class <strong>of</strong> methods<br />

to approximate non-linear functions. Some <strong>of</strong> <strong>the</strong> mystery that commonly surrounds<br />

<strong>the</strong>ir use can be dispelled by relating several important issues to <strong>the</strong> simpler process <strong>of</strong><br />

polynomial curve fitting. Here, <strong>the</strong> problem is to fit a polynomial to a set <strong>of</strong> M points<br />

by minimising some error function. The n th -order polynomial is given by<br />

29


30 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

y(x) = w 0 + w 1 x + · · · + w n x n =<br />

n∑<br />

w i x i . (2.1)<br />

i=0<br />

If this is considered as a non-linear mapping which takes x as input and produces y<br />

as output, <strong>the</strong>n <strong>the</strong> exact form <strong>of</strong> <strong>the</strong> function y(x) is determined by <strong>the</strong> values <strong>of</strong> <strong>the</strong><br />

parameters w 0 ,... w n , which are analogous to <strong>the</strong> weights in a neural network.<br />

The weights can be determined by minimising an error function which compares<br />

<strong>the</strong> desired output from <strong>the</strong> polynomial, d(x k ), for each input value, x k , and <strong>the</strong> polynomial’s<br />

actual output, y(x k ), for instance, <strong>the</strong> commonly used sum-<strong>of</strong>-squares error<br />

function,<br />

E = 1 ∑<br />

(y(x k ) − d(x k )) 2 . (2.2)<br />

2<br />

k<br />

The minimisation <strong>of</strong> an error function such as Equation 2.2, which involves target<br />

values for <strong>the</strong> polynomial outputs, is called supervised learning since for each input value<br />

<strong>the</strong> desired output is specified. This is also a common way to determine <strong>the</strong> weights <strong>of</strong><br />

a neural network for a particular application (<strong>the</strong> back-propagation algorithm adjusts<br />

<strong>the</strong> weights by calculating <strong>the</strong> derivatives <strong>of</strong> <strong>the</strong> error function with respect to <strong>the</strong><br />

weights). A second form <strong>of</strong> learning, called unsupervised learning, does not involve <strong>the</strong><br />

use <strong>of</strong> target data. In <strong>the</strong> context <strong>of</strong> neural networks, this form <strong>of</strong> learning can be used<br />

to discover clusters or o<strong>the</strong>r patterns in a data set.<br />

If <strong>the</strong> polynomial <strong>of</strong> Equation 2.1 is being trained to model a particular inputoutput<br />

mapping via supervised training, <strong>the</strong>n <strong>the</strong> goal is to have a model which gives<br />

good predictions for new data, in o<strong>the</strong>r words one which exhibits good generalisation<br />

properties. <strong>On</strong>e <strong>of</strong> <strong>the</strong> factors which influences a model’s ability to generalise is <strong>the</strong><br />

number <strong>of</strong> free parameters it has (i.e., <strong>the</strong> number <strong>of</strong> degrees <strong>of</strong> freedom). If a firstorder<br />

polynomial is chosen to model a non-linear mapping, <strong>the</strong>n it will generalise poorly<br />

because a linear function is not flexibile enough to match <strong>the</strong> underlying mapping


31<br />

function very well. In o<strong>the</strong>r words, <strong>the</strong> model has a high bias, meaning that <strong>the</strong><br />

complexity <strong>of</strong> <strong>the</strong> polynomial is not sufficient to model <strong>the</strong> actual mapping function.<br />

The bias can be reduced by increasing <strong>the</strong> number <strong>of</strong> degrees <strong>of</strong> freedom, i.e., increasing<br />

<strong>the</strong> order <strong>of</strong> <strong>the</strong> polynomial. This gives it greater flexibility to model <strong>the</strong> non-linear<br />

mapping. However, if <strong>the</strong> order is increased too much, <strong>the</strong> polynomial’s approximation<br />

to <strong>the</strong> underlying function will actually get worse - <strong>the</strong> mapping may give an exact<br />

fit to <strong>the</strong> training data, but its ability to generalise is hampered by highly oscillatory<br />

behaviour between training points. Such a model is said to over-fit <strong>the</strong> training data,<br />

and has a high variance meaning that <strong>the</strong> model is sensitive to <strong>the</strong> training data (i.e.,<br />

quantity, noise, distribution, etc.).<br />

The point <strong>of</strong> best generalisation is determined by a trade-<strong>of</strong>f between <strong>the</strong> model’s<br />

bias and variance, and occurs when <strong>the</strong> number <strong>of</strong> degrees <strong>of</strong> freedom in <strong>the</strong> model is<br />

relatively small compared to <strong>the</strong> size <strong>of</strong> <strong>the</strong> training data set. The quantity <strong>of</strong> training<br />

data is a significant factor in achieving good generalisation. As <strong>the</strong> quantity <strong>of</strong> training<br />

data increases, <strong>the</strong> model’s complexity can be increased, <strong>the</strong>reby reducing bias, while<br />

ensuring that <strong>the</strong> model is more heavily constrained, <strong>the</strong>reby also reducing variance.<br />

In <strong>the</strong> context <strong>of</strong> neural networks, <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> model is determined by <strong>the</strong><br />

number and structure <strong>of</strong> <strong>the</strong> internal weights. The weights are arranged in a network <strong>of</strong><br />

layers, with more layers allowing <strong>the</strong> ANN to model essentially any non-linear function.<br />

However, as illustrated in <strong>the</strong> discussion <strong>of</strong> polynomial fitting, a neural network with<br />

too much complexity may succeed in “memorising” <strong>the</strong> training data by over fitting<br />

<strong>the</strong>m and <strong>the</strong>refore yielding poor generalisation properties. A number <strong>of</strong> techniques<br />

exist to combat over-fitting and regularise, or smooth, <strong>the</strong> mapping produced by neural<br />

networks, such as weight decay which adds a penalty term to <strong>the</strong> error function that<br />

weights against large values for <strong>the</strong> network’s internal weights, early stopping <strong>of</strong> <strong>the</strong><br />

training process also prevents <strong>the</strong> network weights from becoming too large, and adding<br />

noise to <strong>the</strong> training data set makes it more difficult for <strong>the</strong> neural network to over-fit.<br />

A more detailed review <strong>of</strong> basic neural network <strong>the</strong>ory can be found in Bishop (1995).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


32 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Previous work by o<strong>the</strong>rs in <strong>the</strong> field <strong>of</strong> automatic stellar spectral analysis demonstrates<br />

that ANNs are well-suited to fast classification and parameterisation <strong>of</strong> large<br />

quantities <strong>of</strong> spectra from across <strong>the</strong> main sequence. See, for example, Gulati et al.<br />

(1994b), von Hippel et al. (1994), Storrie-Lombardi et al. (1994), Weaver & Torres-<br />

Dodgen (1995), Bailer-Jones (1996), Gulati et al. (1996), Bailer-Jones (1997), Gulati<br />

et al. (1997b), Weaver & Torres-Dodgen (1997), Bailer-Jones et al. (1997), Bailer-Jones<br />

et al. (1998), Singh et al. (1998), Rhee et al. (1999), Allende Prieto et al. (2000), Weaver<br />

(2000b), and Snider et al. (2001).<br />

Here, <strong>the</strong> feedforward multilayer back-propagation ANN code STATNET <strong>of</strong> Bailer-<br />

Jones (1996) is used to obtain classifications and acquire astrophysical parameters from<br />

a sample <strong>of</strong> hot subdwarf spectra. The values for <strong>the</strong> astrophysical parameters are<br />

compared with those obtained from a different computerised technique, that <strong>of</strong> χ 2<br />

minimisation as implemented in <strong>the</strong> code SFIT (see Chapter 3).<br />

2.1 Classifying Hot Subdwarfs<br />

The hot subdwarfs do not fall within <strong>the</strong> scope <strong>of</strong> <strong>the</strong> standard MK system (Morgan<br />

et al., 1978), <strong>the</strong>refore Drilling et al. (2006) have extended and refined <strong>the</strong> earlier work<br />

<strong>of</strong> Drilling (1996) and Jeffery et al. (1997) to construct a three-dimensional MK-like<br />

classification scale for hot subdwarfs. This scale is based upon a sample <strong>of</strong> spectra<br />

from a number <strong>of</strong> sources, covering <strong>the</strong> wavelength region 4050–4900Å<br />

at a resolution<br />

<strong>of</strong> 2.5Å , and consists <strong>of</strong> a ‘spectral’ class, ‘luminosity’ class, and a ‘helium’ class.<br />

The classification scale uses a spectral type running from sdO1 to sdA (1 – 20),<br />

analogous to MK spectral classes. It introduces a helium class (0 – 40) based on H, HeI<br />

and HeII line strengths, and uses luminosity classes IV – VIII, where most subdwarfs<br />

have luminosity class ∼VII. The mapping between <strong>the</strong> Drilling et al. (2006) classes and<br />

those used elsewhere, e.g. <strong>the</strong> PG survey (Green et al., 1986), is illustrated in figure 16<br />

<strong>of</strong> Drilling et al. (2006).


2.1 Classifying Hot Subdwarfs 33<br />

2.1.1 The Training Sample<br />

A set <strong>of</strong> subdwarf spectra was taken from a collection compiled by Drilling et al. (2006)<br />

from data provided by Moehler et al. (1990a,b), Dreizler et al. (1990), and Theissen<br />

et al. (1993). It comprises a representative sample <strong>of</strong> 174 PG subdwarfs and blue<br />

horizontal branch stars, plus a few o<strong>the</strong>r stars not included in <strong>the</strong> PG catalog. Several<br />

observations have been supplied for many <strong>of</strong> <strong>the</strong> targets with <strong>the</strong> sample containing<br />

471 spectra in total at an approximate resolution <strong>of</strong> 2.5 Å.<br />

The spectra are not homogeneous. Due to <strong>the</strong> data being ga<strong>the</strong>red by different<br />

observers using different equipment at different locations, etc., a number <strong>of</strong> issues affect<br />

<strong>the</strong> sample including: calibration anomalies, velocity shifting, different windows <strong>of</strong><br />

wavelength coverage, inconsistent S/Ns and dispersion intervals, and so on.<br />

A pre-processing step was needed to correct <strong>the</strong>se problems and establish a more<br />

homogenous sample. The spectra were visually inspected to select <strong>the</strong> best samples for<br />

each star. The resulting 359 spectra were corrected for large cosmic spikes and instrumental<br />

end-effects. A velocity shift correction was applied by cross-correlating each<br />

spectrum with a grid <strong>of</strong> <strong>the</strong>oretical spectra chosen to coarsely cover <strong>the</strong> approximate<br />

T eff , log g, log(n He /n H ) range <strong>of</strong> <strong>the</strong> Drilling et al. (2006) classification scale. Finally,<br />

<strong>the</strong> spectra were rebinned onto a common wavelength grid <strong>of</strong> 4050 – 4950 Å with a<br />

dispersion <strong>of</strong> 1 Å pixel−1 .<br />

It should be noted that <strong>the</strong> radial velocity correction described above already partly<br />

solves <strong>the</strong> parameterisation problem by choosing <strong>the</strong> best fitting model from <strong>the</strong> grid.<br />

As such, training <strong>the</strong> neural network to solve for this parameter simultaneously alongside<br />

<strong>the</strong> o<strong>the</strong>r astrophysical parameters may be a more convenient approach. However,<br />

this was not attempted here.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


34 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

0<br />

I<br />

II<br />

Luminosity Class<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

VIII<br />

IX<br />

O O5 B B5 A<br />

<strong>Spectra</strong>l Type<br />

40<br />

30<br />

Helium Class<br />

20<br />

10<br />

0<br />

O O5 B B5 A<br />

<strong>Spectra</strong>l Type<br />

40<br />

30<br />

Helium Class<br />

20<br />

10<br />

0<br />

IX<br />

VIII<br />

VII<br />

VI<br />

V<br />

IV<br />

III<br />

II<br />

I<br />

0<br />

Luminosity Class<br />

Figure 2.1: The training sample shows clustering in certain regions <strong>of</strong> <strong>the</strong> classification<br />

space. For clarity, points have been <strong>of</strong>fset by small random shifts in both coordinates.


2.1 Classifying Hot Subdwarfs 35<br />

2.1.2 Methodology<br />

As described at <strong>the</strong> beginning <strong>of</strong> <strong>the</strong> chapter, training an ANN to learn <strong>the</strong> Drilling<br />

et al. (2006) classification system involves iterating over <strong>the</strong> training set and minimising<br />

<strong>the</strong> sum-<strong>of</strong>-squares error function between <strong>the</strong> desired output and <strong>the</strong> network’s<br />

actual output (see Equation 2.2) with respect to <strong>the</strong> ANN’s internal parameters. The<br />

minimisation process continues until some criterion <strong>of</strong> convergence has been reached<br />

(e.g., when <strong>the</strong> weight updates have become very small).<br />

A typical strategy to assess network performance after training is to apply <strong>the</strong> network<br />

to an application set for which <strong>the</strong> “true” classifications are known. Unfortunately,<br />

no o<strong>the</strong>r suitable set <strong>of</strong> spectra previously classified onto <strong>the</strong> Drilling et al. (2006) scale<br />

were available for <strong>the</strong> study presented here.<br />

An alternative is to split <strong>the</strong> Drilling sample into two similarly sized sets, with one<br />

used for training, and <strong>the</strong> o<strong>the</strong>r to quantify performance. However, as <strong>the</strong> Drilling<br />

sample is small, and its distribution across <strong>the</strong> parameter space is limited (see Figure<br />

2.1), a concern is that <strong>the</strong>re may not be enough data to constrain <strong>the</strong> model if <strong>the</strong><br />

sample is split into two smaller subsets.<br />

<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, if <strong>the</strong> two subset approach is changed slightly, <strong>the</strong>re is a way to<br />

determine how well a given ANN model performs using only <strong>the</strong> data in <strong>the</strong> Drilling<br />

sample. A technique called N-fold cross-validation, or <strong>the</strong> leave-one-out method, permits<br />

<strong>the</strong> greatest number <strong>of</strong> samples to be used in training while still giving an idea <strong>of</strong><br />

ANN performance over <strong>the</strong> whole sample set.<br />

The method proceeds by assuming a data set <strong>of</strong> size N. Each datum is left out in<br />

turn and <strong>the</strong> ANN is trained on <strong>the</strong> remaining N −1 samples. The ANN’s performance<br />

is <strong>the</strong>n assessed by classifying <strong>the</strong> omitted datum. No random sampling is involved<br />

with this method, so repeating <strong>the</strong> procedure for a particular ANN model always gives<br />

<strong>the</strong> same result.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


36 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

The leave-one-out method carries with it a large computational cost as each ANN<br />

model must be trained N times (in this case, N = 359). As several models are to<br />

be tested, <strong>the</strong> computational burden was alleviated by <strong>the</strong> construction <strong>of</strong> a small distributed<br />

cluster <strong>of</strong> 15 ordinary desktop workstations at <strong>Armagh</strong> <strong>Observatory</strong> using <strong>the</strong><br />

Condor batch system (e.g., Livny & Raman, 1998). The cluster reduced <strong>the</strong> computation<br />

time for <strong>the</strong> leave-one-out procedure by a factor <strong>of</strong> ∼10 compared to using only a<br />

single workstation.<br />

To determine <strong>the</strong> optimal complexity <strong>of</strong> <strong>the</strong> ANN model, two different ANN architectures<br />

were studied, one with a single hidden layer <strong>of</strong> 10 nodes, and one with two<br />

hidden layers <strong>of</strong> 5 nodes each. The notation used to refer to <strong>the</strong>se architectures are<br />

901:10:3,and 901:5:5:3, respectively.<br />

This notation explains <strong>the</strong> structure <strong>of</strong> <strong>the</strong> neural network in terms <strong>of</strong> layers <strong>of</strong><br />

processing nodes, and <strong>the</strong> number <strong>of</strong> nodes in each layer. For each network being<br />

tested, an input layer <strong>of</strong> 901 nodes corresponds to <strong>the</strong> 901 flux points in <strong>the</strong> preprocessed<br />

observational sample, and an output layer <strong>of</strong> three nodes corresponds to<br />

each parameter in <strong>the</strong> classification scale: spectral type, luminosity class, and helium<br />

class.<br />

For each model architecture, a committee <strong>of</strong> five ANNs was formed. The committee<br />

approach (see Bishop, 1995, sect. 9.6) trains a number <strong>of</strong> ANNs on <strong>the</strong> same data, and<br />

applies <strong>the</strong>m in unison on a new datum. The results from each <strong>of</strong> <strong>the</strong> ANNs are <strong>the</strong>n<br />

averaged toge<strong>the</strong>r to provide a combined result. In STATNET, each network in <strong>the</strong><br />

committee is initialised with different random values for <strong>the</strong> weights, so <strong>the</strong> committee<br />

approach seeks to achieve more robust results by averaging out ‘convergence noise’ due<br />

to <strong>the</strong> variance <strong>of</strong> <strong>the</strong> model causing <strong>the</strong> minimisation process to get caught in local<br />

minima, with <strong>the</strong> final set <strong>of</strong> weights <strong>the</strong>refore being different for each committee ANN.<br />

The leave-one-out method was carried out for five different training epochs for each<br />

architecture: 150, 300, 500, 700, and 1000 iterations <strong>of</strong> <strong>the</strong> optimisation procedure.<br />

This required about four days <strong>of</strong> continuous computation on <strong>the</strong> Condor cluster. The


2.1 Classifying Hot Subdwarfs 37<br />

approach <strong>of</strong> stopping <strong>the</strong> training procedure early is a method <strong>of</strong> regularising <strong>the</strong> ANN<br />

models.<br />

STATNET also implements a weight decay factor in <strong>the</strong> neural network’s sum-<strong>of</strong>squares<br />

error function, but this feature was not used here (or in <strong>the</strong> parameterisation<br />

network described in <strong>the</strong> next section). Weight decay attempts to prevent <strong>the</strong> ANN<br />

model from over-fitting <strong>the</strong> training data by discriminating against network weights<br />

that become too large during training. Large network weights (which can occur if <strong>the</strong><br />

network is trained for too long) increase <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> mapping because <strong>the</strong>y<br />

produce regions <strong>of</strong> high curvature in <strong>the</strong> input-output parameter space.<br />

As <strong>the</strong> classification training set was small, it was felt that weight decay should<br />

not be used here in order to preserve <strong>the</strong> structure and curvature <strong>of</strong> <strong>the</strong> input-output<br />

mapping. An alternative is early stopping which regularises <strong>the</strong> network by limiting<br />

<strong>the</strong> effective number <strong>of</strong> degrees <strong>of</strong> freedom. This number is supposed to start out small<br />

and <strong>the</strong>n grow during <strong>the</strong> minimisation <strong>of</strong> <strong>the</strong> sum-<strong>of</strong>-squares error function, which<br />

corresponds to a steady increase in <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> model.<br />

If <strong>the</strong> network error is measured against a validation sample, as is done here via <strong>the</strong><br />

leave-one-out method, it is typically observed that this error <strong>of</strong>ten shows a decrease at<br />

first, followed by an increase as <strong>the</strong> network starts to over-fit. The network’s training<br />

procedure can be terminated close to <strong>the</strong> point <strong>of</strong> smallest error since this gives a<br />

network which is expected to have <strong>the</strong> best generalisation performance.<br />

STATNET posesses <strong>the</strong> capability to add weighting factors to each <strong>of</strong> <strong>the</strong> network’s<br />

outputs so that certain outputs contribute more to <strong>the</strong> sum-<strong>of</strong>-squares error minimisation<br />

than o<strong>the</strong>rs. These weighting factors (called ‘β’ parameters in STATNET) allow<br />

<strong>the</strong> user to control <strong>the</strong> level <strong>of</strong> modelling precision for each output variable. If this is<br />

limited by <strong>the</strong> noise in <strong>the</strong> data, 1/ √ β should be approximately equal to <strong>the</strong> standard<br />

deviation <strong>of</strong> <strong>the</strong> noise in <strong>the</strong> output variable.<br />

STATNET includes a data scaling option which separately scales each input and<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


38 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

output variable to have zero mean and unit standard devition. With respect to <strong>the</strong> β<br />

parameters, <strong>the</strong> variance scaling casts each β in terms <strong>of</strong> <strong>the</strong> scaled variables. Therefore,<br />

1/ √ β roughly interprets as <strong>the</strong> fractional uncertainty in a particular output variable.<br />

As an example, <strong>the</strong> default β value <strong>of</strong> 6.0 corresponds to a standard deviation <strong>of</strong> 0.4.<br />

Thus, if <strong>the</strong> data are variance scaled and roughly normally distributed, 95% <strong>of</strong> <strong>the</strong> data<br />

will lie in <strong>the</strong> range −2 to +2, so this standard deviation corresponds to approximately<br />

a 10% uncertainty.<br />

In terms <strong>of</strong> <strong>the</strong> Drilling et al. (2006) classification parameters, <strong>the</strong> expected accuracy<br />

in each parameter for a human classifier is ±2 spectral types, ±1 luminosity class,<br />

and ±2 helium classes. These correspond to uncertainties <strong>of</strong> 10%, 12.5%, and 5%<br />

respectively. Therefore, <strong>the</strong> STATNET β parameters were set to 6.0 for <strong>the</strong> spectral<br />

type output, 4.0 for luminosity class, and 25 for helium class.<br />

2.1.3 Results<br />

Tables 2.1 and 2.2 give <strong>the</strong> σ rms and correlation coefficient values, r, comparing each<br />

ANN architecture’s results with <strong>the</strong> classifications assigned by Drilling et al. (2006) as<br />

determined by <strong>the</strong> leave-one-out method.<br />

901:10:3<br />

150 300 500 700 1000<br />

SpT 2.1041 2.1967 2.2338 2.2947 2.3434<br />

σ rms LC 1.1771 1.1835 1.2199 1.2435 1.2627<br />

HeC 5.5604 4.5434 4.3255 4.3540 4.5109<br />

SpT 0.8710 0.8621 0.8586 0.8523 0.8473<br />

r LC 0.8209 0.8201 0.8123 0.8061 0.8012<br />

HeC 0.9216 0.9483 0.9533 0.9527 0.9491<br />

Table 2.1: Results <strong>of</strong> <strong>the</strong> leave-one-out procedure as applied to a committee <strong>of</strong> five<br />

901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations.<br />

The large σ rms values for helium scale classifications, apparent in both tables, suggest<br />

<strong>the</strong> ANNs are having difficulty generalising for this parameter. However, <strong>the</strong> high


2.1 Classifying Hot Subdwarfs 39<br />

901:5:5:3<br />

150 300 500 700 1000<br />

SpT 1.7446 1.8296 1.9593 2.0626 2.2202<br />

σ rms LC 1.0574 1.0766 1.1078 1.1536 1.2156<br />

HeC 6.2962 5.2257 4.3019 4.1405 4.2633<br />

SpT 0.9065 0.8983 0.8858 0.8759 0.8599<br />

r LC 0.8507 0.8621 0.8389 0.8272 0.8116<br />

HeC 0.9007 0.9316 0.9528 0.9573 0.9547<br />

Table 2.2: As Table 2.1, but for <strong>the</strong> committee <strong>of</strong> five 901:5:5:3 ANNs.<br />

correlation coefficients suggest a good learning response. There are several possible<br />

reasons for this. Firstly, it could be due to a problem with <strong>the</strong> neural network model<br />

itself, ei<strong>the</strong>r a regularisation issue (e.g., not using weight decay), or sub-optimal settings<br />

<strong>of</strong> <strong>the</strong> β parameters. Secondly, it is possible that <strong>the</strong> neural networks simply cannot do<br />

any better for this parameter, in which case <strong>the</strong> attention turns to <strong>the</strong> Drilling et al.<br />

(2006) classification scale itself and <strong>the</strong> observational sample on which this study is<br />

based.<br />

If <strong>the</strong> S/N <strong>of</strong> <strong>the</strong> observational sample is not sufficiently high enough for <strong>the</strong> ANNs<br />

to generalise well for <strong>the</strong> helium scale, this could be affecting <strong>the</strong> bias and variance <strong>of</strong><br />

<strong>the</strong> models, making it difficult to ascertain <strong>the</strong> underlying mapping function. It is also<br />

possible that <strong>the</strong> helium scale itself is too fine-grained. If <strong>the</strong> helium scale was scaled<br />

down by a factor <strong>of</strong> ∼4, a corresponding four-fold reduction in <strong>the</strong> σ rms errors would<br />

be observed (<strong>the</strong> corresponding correlation coefficients would remain unchanged as this<br />

statistic is not affected by scaling effects). This would bring <strong>the</strong>m in line with those <strong>of</strong><br />

spectral type and luminosity class, i.e., σ rms ∼ ±1 helium class.<br />

Fur<strong>the</strong>r investigation <strong>of</strong> this issue is required.<br />

It can be seen in Tables 2.1 and 2.2 that both architectures are able to learn <strong>the</strong><br />

appropriate spectral features associated with spectral type and luminosity class within<br />

<strong>the</strong> first 250-300 epochs <strong>of</strong> <strong>the</strong> training procecdure. After this point, fur<strong>the</strong>r training<br />

only serves to degrade performance with respect to <strong>the</strong>se parameters which indicates<br />

that <strong>the</strong> models are starting to over-fit <strong>the</strong> training data.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


40 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

For <strong>the</strong> helium scale, both architectures yield optimal classifications after a few<br />

hundred more training epochs. The 901:10:3 architecture achieved best performance<br />

at around 500 iterations, and <strong>the</strong> 901:5:5:3 architecture reached its optimum at around<br />

700 iterations. A similar phenomenon was reported by Snider et al. (2001, sect. 5.1),<br />

although Willemsen et al. (2005) did not observe <strong>the</strong> same effect.<br />

The optimal trade-<strong>of</strong>f in accuracy between <strong>the</strong> classification parameters occurs at<br />

around 300 training epochs for <strong>the</strong> 901:10:3 architecture, and 500 epochs for <strong>the</strong><br />

901:5:5:3 architecture. The results <strong>of</strong> <strong>the</strong>se two ANNs are compared with <strong>the</strong> actual<br />

Drilling et al. (2006) classifications in Figure 2.2.<br />

2.2 Physical Parameters<br />

The ability <strong>of</strong> neural network models to obtain astrophysical parameters <strong>of</strong> hot subdwarf<br />

spectra was tested by generating a grid <strong>of</strong> syn<strong>the</strong>tic spectra to be used as a training<br />

set, and extracting two application sets from <strong>the</strong> Drilling et al. (2006) sample.<br />

The first application set contains 60 stars which were used by Drilling et al. to<br />

calibrate <strong>the</strong>ir classification system against <strong>the</strong> physical parameters <strong>of</strong> T eff , log g, and<br />

log(n He /n H ). These 60 stars have been previously analysed by <strong>the</strong>ir original observers,<br />

with astrophyscial parameters being derived mostly by <strong>the</strong> method <strong>of</strong> fine analysis.<br />

The second application set contains 133 stars from <strong>the</strong> Drilling et al. sample for<br />

which no astrophysical parameters have been listed in Drilling et al. (2006).<br />

Using <strong>the</strong> first application set, <strong>the</strong> neural network results for those stars can be<br />

compared against <strong>the</strong> results <strong>of</strong> <strong>the</strong> fine analyses performed by <strong>the</strong> original observers.<br />

However, <strong>the</strong> second application set has no measure <strong>of</strong> comparison. For that, <strong>the</strong> χ 2<br />

fitting code used at <strong>Armagh</strong> <strong>Observatory</strong>, SFIT, is used to derive a set <strong>of</strong> astrophysical<br />

parameters based on a grid <strong>of</strong> syn<strong>the</strong>tic spectra. SFIT is also applied to <strong>the</strong> first<br />

application set to serve as second comparison for <strong>the</strong> neural network results.


2.2 Physical Parameters 41<br />

Architecture 901:10:3<br />

Architecture 901:5:5:3<br />

A<br />

A<br />

ANN <strong>Spectra</strong>l Type<br />

B5<br />

B<br />

O5<br />

ANN <strong>Spectra</strong>l Type<br />

B5<br />

B<br />

O5<br />

O<br />

O<br />

O O5 B B5 A<br />

Drilling <strong>Spectra</strong>l Type<br />

O O5 B B5 A<br />

Drilling <strong>Spectra</strong>l Type<br />

0<br />

0<br />

I<br />

I<br />

ANN Luminosity Class<br />

II<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

ANN Luminosity Class<br />

II<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

VIII<br />

VIII<br />

IX<br />

IX<br />

IX<br />

VIII<br />

VII VI V IV III<br />

Drilling Luminosity Class<br />

II<br />

I<br />

0<br />

IX<br />

VIII<br />

VII VI V IV III<br />

Drilling Luminosity Class<br />

II<br />

I<br />

0<br />

40<br />

40<br />

ANN Helium Class<br />

30<br />

20<br />

10<br />

ANN Helium Class<br />

30<br />

20<br />

10<br />

0<br />

0<br />

0 10 20 30 40<br />

Drilling Helium Class<br />

0 10 20 30 40<br />

Drilling Helium Class<br />

Figure 2.2: Results <strong>of</strong> <strong>the</strong> leave-one-out procedure for both ANN architectures at <strong>the</strong><br />

near-optimal training time <strong>of</strong> 300 iterations for <strong>the</strong> 901:10:3 architecture (left column),<br />

and 500 iterations for <strong>the</strong> 901:5:5:3 architecture (right column). Also plotted is <strong>the</strong><br />

best-fit linear least squares line.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


42 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

The neural network training grid contains 2009 syn<strong>the</strong>tic spectra generated using<br />

<strong>the</strong> line-blanketed LTE spectral syn<strong>the</strong>sis code SPECTRUM (Jeffery et al., 2001).<br />

The grid covered <strong>the</strong> parameter space in T eff : 12000 - 50000K, ∆T eff ∼5000K; log g:<br />

3.5 - 6.0 dex, ∆log g = 0.5 dex; and log(n He /n H ): -3 - 3 dex, in 10 non-uniformly spaced<br />

intervals.<br />

In order to match this training grid to <strong>the</strong> Drilling et al. (2006) observations, each<br />

syn<strong>the</strong>tic spectrum was first convolved with a Gaussian to lower its resolution to that<br />

<strong>of</strong> <strong>the</strong> observations (∼ 2.5 Å), and <strong>the</strong>n re-binned onto <strong>the</strong> same wavelength grid as <strong>the</strong><br />

observations (4050–4950 Å at a 1.0 Å dispersion).<br />

Design limitations in <strong>the</strong> χ 2 minimisation code used at <strong>Armagh</strong> <strong>Observatory</strong>, SFIT<br />

(which are dealt with in Chapter 3), required a smaller grid <strong>of</strong> syn<strong>the</strong>tic spectra to<br />

be used. The grid covered <strong>the</strong> parameter space: T eff = {15,20,25,30,35, 40, 50}kK,<br />

log g = {3,4,5,6}, and log(n He /n H ) = {−3, −1,0,+1,+3}. This grid is commensurate<br />

with <strong>the</strong> dispersion and S/N present in <strong>the</strong> Drilling et al. (2006) sample.<br />

A default instrumental pr<strong>of</strong>ile <strong>of</strong> 1Å<br />

(FWHM) was assumed during <strong>the</strong> fitting for<br />

each application set, and all data points more than 5% above continuum were rejected.<br />

All three intrinsic parameters, T eff , log g, and log(n He /n H ), were free to vary in <strong>the</strong> χ 2<br />

optimisation.<br />

Solutions for v rad and v sin i were also obtained. The correction for v rad used during<br />

<strong>the</strong> pre-processing stage (Section 2.1.1) appeared to have left residual shifts <strong>of</strong> a few<br />

km s −1 , and, in one case, (possibly where Balmer lines were confused with Heii lines)<br />

<strong>of</strong> a couple <strong>of</strong> Ångströms. Overall, < v rad >= −1.9 ± 22.3kms −1 , <strong>the</strong> mean being<br />

satisfactorily close to <strong>the</strong> expectation value (0 km s −1 ). The solution for v sini allowed<br />

SFIT to be tolerant <strong>of</strong> both <strong>the</strong> varying instrumental resolution present in <strong>the</strong> data, and<br />

any rotational broadening present in <strong>the</strong> source. Formally, < v sin i >= 59 ± 39kms −1 .<br />

A single normalisation procedure was applied to remove small trends in <strong>the</strong> background<br />

continuum. Nine “continuum” regions free <strong>of</strong> hydrogen and helium lines were


2.2 Physical Parameters 43<br />

defined. After an initial optimisation step, <strong>the</strong> spectrum was divided by <strong>the</strong> initial fit.<br />

A second-order polynomial was fitted to this ratio using only <strong>the</strong> data in <strong>the</strong> continuum<br />

regions. An estimate <strong>of</strong> <strong>the</strong> true sample S/N was obtained from <strong>the</strong> RMS <strong>of</strong> <strong>the</strong> ratio<br />

around <strong>the</strong> polynomial fit in <strong>the</strong>se same regions. The sample was <strong>the</strong>n multiplied by<br />

<strong>the</strong> polynomial fit before a second optimisation step was applied.<br />

2.2.1 Methodology<br />

A control experiment was carried out to determine if neural network models trained<br />

on a set <strong>of</strong> syn<strong>the</strong>tic spectra at infinte S/N are able to accurately parameterise o<strong>the</strong>r<br />

syn<strong>the</strong>tic spectra over a range <strong>of</strong> S/Ns.<br />

The training grid <strong>of</strong> syn<strong>the</strong>tic spectra was randomly divided into two evenly sized<br />

training and application subsets. Several committees <strong>of</strong> different ANN architectures<br />

were trained on <strong>the</strong> training subset for range <strong>of</strong> training epochs. The intention here<br />

was to establish optimal model complexity for <strong>the</strong> task without using weight decay.<br />

The STATNET β parameters for each <strong>of</strong> <strong>the</strong> network output variables (T eff , log g,<br />

log(n He /n H )) were set to 6.0, estimating a 10% error in each parameter. This is commensurate<br />

with <strong>the</strong> spacing <strong>of</strong> <strong>the</strong> grid points over <strong>the</strong> parameter space, and assumes,<br />

conservatively, that <strong>the</strong> neural network model will do at least as well as nearest neighbour<br />

matching to <strong>the</strong> syn<strong>the</strong>tic spectra in <strong>the</strong> grid. Again, <strong>the</strong> Condor cluster allowed<br />

<strong>the</strong> different experiments to be carried out in parallel.<br />

The application subset was duplicated eight times. Each set was degraded to one <strong>of</strong><br />

<strong>the</strong> following S/Ns by <strong>the</strong> addition <strong>of</strong> Gaussian noise: {∞, 1000, 500, 100, 50, 20, 10,<br />

5}. Each trained ANN committee was applied in turn to <strong>the</strong> noised application sets.<br />

The experiments suggested that <strong>the</strong> optimal network architecture was a 901:10:10:3<br />

configuration, trained for 500 epochs for T eff and log g parameterisations, and 1350<br />

epochs for log(n He /n H ) parameterisations.<br />

The results showed positive correlations between <strong>the</strong> actual parameters and <strong>the</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


44 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

ANN’s results. However, <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> ANNs declined quickly as <strong>the</strong> S/N <strong>of</strong><br />

<strong>the</strong> application set fell below 100. This observation is important because <strong>the</strong> spectra<br />

in <strong>the</strong> Drilling et al. (2006) sample are not <strong>of</strong> a consistent S/N. The majority <strong>of</strong> <strong>the</strong><br />

sample has an S/N somewhere in <strong>the</strong> 50 – 100 range, so <strong>the</strong> neural network model<br />

should account for this. The results imply that an ANN trained on syn<strong>the</strong>tic spectra<br />

<strong>of</strong> infinite S/N will not give <strong>the</strong> most accurate parameterisations <strong>of</strong> <strong>the</strong> observational<br />

sample.<br />

This result was also reported by Snider et al. (2001) and Willemsen et al. (2005).<br />

In <strong>the</strong> latter, Willemsen et al. reported on <strong>the</strong>ir attempts to improve <strong>the</strong> generalisation<br />

abilities <strong>of</strong> <strong>the</strong>ir neural network models by increasing <strong>the</strong> amount <strong>of</strong> weight decay<br />

taking place. They found that performance improved only when <strong>the</strong> weight decay<br />

term was chosen to be ra<strong>the</strong>r large, indicating that <strong>the</strong> problem lies in regularising <strong>the</strong><br />

model, i.e., a neural network trained on high S/N spectra will over-fit <strong>the</strong> data unless<br />

“restrained”.<br />

The solution chosen in <strong>the</strong> study presented here was to make two copies <strong>of</strong> <strong>the</strong><br />

entire grid <strong>of</strong> 2009 <strong>the</strong>oretical models. <strong>On</strong>e copy being degraded to a S/N <strong>of</strong> 100, <strong>the</strong><br />

o<strong>the</strong>r to 50. The final training set for <strong>the</strong> optimal network architecture was <strong>the</strong>n a<br />

combination <strong>of</strong> all three grids, totalling 6027 syn<strong>the</strong>tic spectra. This addition <strong>of</strong> noise<br />

to <strong>the</strong> training grid serves as ano<strong>the</strong>r mechanism <strong>of</strong> regularisation. Willemsen et al.<br />

(2005) employed a similar solution. The noise serves to ‘smear out’ each training point,<br />

making it difficult for <strong>the</strong> network to fit individual points precisely, and hence reducing<br />

over-fitting.<br />

Despite increasing <strong>the</strong> size <strong>of</strong> <strong>the</strong> training set, <strong>the</strong>re is no reason to believe that <strong>the</strong><br />

optimal ANN configuration would be consequently changed. The fundamental structure<br />

and physical parameters <strong>of</strong> <strong>the</strong> noised spectra are no different than <strong>the</strong> unnoised<br />

spectra.


2.2 Physical Parameters 45<br />

2.2.2 Results<br />

Application Set 1: 60 Calibration Stars<br />

The results <strong>of</strong> applying <strong>the</strong> two ANN models to <strong>the</strong> 60 calibration stars are given in <strong>the</strong><br />

first column <strong>of</strong> Table 2.3, and <strong>the</strong> actual parameters obtained are listed in Appendix<br />

A.<br />

The correlation coefficients show a reasonable agreement between <strong>the</strong> ANN’s predicted<br />

T eff parameterisations and those <strong>of</strong> Drilling et al. (2006). However, <strong>the</strong> log(n He /n H )<br />

and log g correlation coefficients are not quite as positive. Looking at <strong>the</strong> middle and<br />

last plots in Figure 2.3, it can be seen that <strong>the</strong> ANN’s results in <strong>the</strong>se parameters<br />

(indicated by <strong>the</strong> blue crosses) are visibly more scattered than <strong>the</strong> T eff results given in<br />

<strong>the</strong> first plot.<br />

The typical errors quoted in <strong>the</strong> original fine analyses <strong>of</strong> <strong>the</strong>se stars are σ Teff<br />

=<br />

±2500K and σ log g = ±0.2dex. The results in <strong>the</strong> first column <strong>of</strong> Table 2.3 are still<br />

within ∼ 2σ <strong>of</strong> <strong>the</strong> fine analysis errors, which is significant (assuming, <strong>of</strong> course, that<br />

<strong>the</strong> method <strong>of</strong> fine analysis is more accurate than ei<strong>the</strong>r <strong>of</strong> <strong>the</strong> methods used here).<br />

ANN/Drilling χ 2 /Drilling ANN/χ 2<br />

T eff 4389.79 4338.85 3740.99<br />

σ rms log g 0.4577 0.3754 0.4908<br />

log(n He /n H ) 0.9796 0.4769 0.8382<br />

T eff 0.9207 0.9447 0.9131<br />

r log g 0.7844 0.8173 0.7525<br />

log(n He /n H ) 0.8705 0.9649 0.8816<br />

Table 2.3: Results <strong>of</strong> parameterising <strong>the</strong> 60 calibration stars.<br />

SFIT was applied to <strong>the</strong> 60 calibration stars and <strong>the</strong> results are listed in <strong>the</strong> second<br />

column <strong>of</strong> Table 2.3. The actual parameters obtained are listed in Appendix A.<br />

SFIT compares well with <strong>the</strong> neural network in T eff and log g, but gives slightly better<br />

performance in log(n He /n H ).<br />

A direct comparison between <strong>the</strong> neural network and SFIT’s results is given in <strong>the</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


46 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

80<br />

ANN/χ 2 T eff Parameterisations (kK)<br />

60<br />

40<br />

20<br />

0<br />

0 20 40 60 80<br />

Drilling T eff Calibrations (kK)<br />

8<br />

ANN/χ 2 log g Parameterisations<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

Drilling log g Calibrations<br />

ANN/χ 2 log( nHe / nH ) Parameterisations<br />

4<br />

2<br />

0<br />

-2<br />

-4<br />

-6<br />

-6<br />

-4<br />

-2<br />

0<br />

2<br />

4<br />

Drilling log( nHe / nH ) Calibrations<br />

Figure 2.3: Parameterisations <strong>of</strong> <strong>the</strong> 60 calibration stars. Results from each method<br />

have been combined onto each plot. ANN results are indicated by blue crosses, and χ 2<br />

minimiser results by red pluses.


2.2 Physical Parameters 47<br />

third column <strong>of</strong> Table 2.3. The disagreement between <strong>the</strong> neural network models and<br />

SFIT is <strong>of</strong> similar degree as <strong>the</strong> disagreement <strong>of</strong> each method with <strong>the</strong> Drilling et al.<br />

parameters. The σ rms values in column three <strong>of</strong> <strong>the</strong> table are still within twice <strong>the</strong><br />

quoted errors for <strong>the</strong> fine analyses <strong>of</strong> <strong>the</strong> 60 calibration stars, which is a significant<br />

result (again, assuming that fine analysis is <strong>the</strong> more accurate method) and confirms<br />

that ANNs have <strong>the</strong> potential <strong>of</strong> being able to parameterise hot subdwarf spectra to a<br />

similar degree <strong>of</strong> accuracy as <strong>the</strong> more traditional method <strong>of</strong> χ 2 minimisation.<br />

The poor generalisation <strong>of</strong> <strong>the</strong> neural network in <strong>the</strong> log(n He /n H ) parameter is a<br />

significant issue, and requires fur<strong>the</strong>r investigation.<br />

Application Set 2: 133 Unparameterised Stars<br />

The two ANN committees were applied to <strong>the</strong> remaining 133 unparameterised stars in<br />

<strong>the</strong> sample. These stars were also parameterised using SFIT. The parameters obtained<br />

from both methods are listed in Appendix A.<br />

A direct comparison between <strong>the</strong> two methods was made. The results are presented<br />

in Table 2.4, and Figure 2.4. For approximately twice as many stars, <strong>the</strong> σ rms values<br />

are only slightly worse than <strong>the</strong> values in <strong>the</strong> last column <strong>of</strong> Table 2.3. Tentatively<br />

speaking, <strong>the</strong> results could still be considered to support <strong>the</strong> view that ANNs have<br />

<strong>the</strong> potential <strong>of</strong> being able to parameterise hot subdwarf spectra to a similar degree <strong>of</strong><br />

accuracy as χ 2 minimisers.<br />

As has been pointed out previously, <strong>the</strong> neural network models seem to be suffering<br />

from regularisation issues when training on syn<strong>the</strong>tic spectra. With fur<strong>the</strong>r investigation<br />

on this matter, a significant imrpovement in <strong>the</strong> neural network’s generalisation<br />

performance could be obtained.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


48 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

χ 2 T eff Parameterisations (kK)<br />

60<br />

40<br />

20<br />

0<br />

0 20 40 60<br />

ANN T eff Parameterisations (kK)<br />

7<br />

χ 2 log g Parameterisations<br />

6<br />

5<br />

4<br />

3<br />

2<br />

2<br />

3 4 5 6<br />

ANN log g Parameterisations<br />

7<br />

χ 2 log( nHe / nH ) Parameterisations<br />

4<br />

2<br />

0<br />

-2<br />

-4<br />

-6<br />

-6<br />

-4<br />

-2<br />

0<br />

2<br />

4<br />

ANN log( nHe / nH ) Parameterisations<br />

Figure 2.4: Parameterisations <strong>of</strong> <strong>the</strong> 133 unparameterised stars using <strong>the</strong> ANNs and<br />

χ 2 minimiser. Also shown is <strong>the</strong> best-fit linear least squares line.


2.3 Summary 49<br />

ANN/χ 2<br />

T eff 5768.74<br />

σ rms log g 0.6853<br />

log(n He /n H ) 0.9926<br />

T eff 0.8850<br />

r log g 0.8003<br />

log(n He /n H ) 0.8875<br />

Table 2.4: A comparison between ANNs and χ 2 minimisation for parameterising <strong>the</strong><br />

133 unparameterised stars.<br />

2.3 Summary<br />

Artificial neural networks are a fast, and powerful method for automatically classifying<br />

astronomical spectra. A feed-forward neural network configured in a 901:5:5:3 architecture,<br />

and trained for 500 epochs, was able to classify hot subdwarf spectra onto <strong>the</strong><br />

Drilling et al. (2006) scale with global errors (σ rms ) <strong>of</strong> ∼ 2 sub-types for spectral type,<br />

∼ 1 sub-class for luminosity class, and ∼ 4 sub-classes for <strong>the</strong> helium class. This was<br />

<strong>the</strong> most accurate ANN discovered for <strong>the</strong> task.<br />

The use <strong>of</strong> ANNs for obtaining physical parameters from stellar spectra <strong>of</strong>fers <strong>the</strong><br />

possibility <strong>of</strong> having a fast method for deriving initial parameter estimates. However,<br />

establishing <strong>the</strong> optimal network architecture to accurately model <strong>the</strong> flux-space to<br />

physical parameter-space mapping function was found to be cumbersome with much<br />

experimentation required. It was also discovered that attempting to train <strong>the</strong> neural<br />

network model on infinite S/N syn<strong>the</strong>tic spectra led to over-fitting due to insufficient<br />

regularisation. A solution was attempted by <strong>the</strong> addition <strong>of</strong> noise to <strong>the</strong> training set,<br />

but fur<strong>the</strong>r investigation here is needed.<br />

χ 2 methods are <strong>the</strong>refore more desirable for parameterising astronomical spectra in<br />

a general data mining tool kit as <strong>the</strong>y <strong>of</strong>fer more flexibility and greater ease <strong>of</strong> use than<br />

ANNs. Of course, <strong>the</strong>se qualities come with <strong>the</strong> price <strong>of</strong> slower speed, with χ 2 methods<br />

unable to compete with ANNs in this regard. This issue is discussed fur<strong>the</strong>r in <strong>the</strong> next<br />

Chapter. However, if <strong>the</strong> regularisation issues with parameterising ANNs can be solved,<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


50 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>the</strong>ir extremely fast application speed would instantly make <strong>the</strong>m <strong>the</strong> preferred tool.


Chapter 3<br />

Parameterisation - χ 2 Fitting<br />

3.1 Analysing <strong>Stellar</strong> <strong>Spectra</strong><br />

Deriving physical parameters (i.e., T eff , log g, abundances) for a star is done by a fine<br />

analysis <strong>of</strong> its spectrum. The traditional method <strong>of</strong> spectroscopic fine analysis is a long,<br />

iterative process requiring several months to complete.<br />

The method is based on measuring equivalent widths <strong>of</strong> spectroscopic lines. The<br />

astronomer must go through a spectrum and manually identify as many spectral lines<br />

as possible, and <strong>the</strong> ions to which <strong>the</strong>y belong. Microturbulent and rotational velocities<br />

are first determined. Then, an initial grid <strong>of</strong> model atmospheres is calculated to cover<br />

<strong>the</strong> approximate T eff , log g, and composition <strong>of</strong> <strong>the</strong> star.<br />

Using <strong>the</strong>se models, <strong>the</strong> <strong>the</strong>oretical equivalent widths are calculated for each <strong>of</strong> <strong>the</strong><br />

identified ion lines in <strong>the</strong> star over <strong>the</strong> range <strong>of</strong> elemental abundances in <strong>the</strong> grid. These<br />

equivalent widths are combined to form curves <strong>of</strong> growth which can <strong>the</strong>n be used to<br />

read <strong>of</strong>f derived abundances for each <strong>of</strong> <strong>the</strong> measured ion line equivalent widths in <strong>the</strong><br />

stellar spectrum.<br />

Temperature and surface gravity are determined by using <strong>the</strong> derived abundances<br />

<strong>of</strong> lines known to be sensitive to temperature (e.g., Fe) and gravity (e.g., H or He), and<br />

51


52 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

performing a process <strong>of</strong> comparison and line fitting with each <strong>of</strong> <strong>the</strong> models in <strong>the</strong> grid.<br />

The derived values <strong>of</strong> T eff and log g are <strong>the</strong>n used, along with <strong>the</strong> measured equivalent<br />

widths, to calculate new abundances. A new grid <strong>of</strong> model atmospheres is computed<br />

with <strong>the</strong>se parameters. The entire analysis process <strong>of</strong> determining curves <strong>of</strong> growth, deriving<br />

values <strong>of</strong> T eff , log g, and abundances, and recomputing <strong>the</strong> model grid is repeated<br />

until <strong>the</strong> derived parameters agree with those used in <strong>the</strong> models (i.e., convergence is<br />

achieved).<br />

An excellent description <strong>of</strong> this process, and demonstration <strong>of</strong> its application, can<br />

be found in Dudley (1992).<br />

Progress Towards Automation<br />

Given <strong>the</strong> iterative nature <strong>of</strong> <strong>the</strong> method <strong>of</strong> fine analysis, and <strong>the</strong> time required to<br />

conduct an analysis for a single star, attempts have been made to find automated<br />

procedures for accomplishing <strong>the</strong> same goal much more quickly.<br />

Hutchison (1971) presents an automatic procedure for detecting spectral features<br />

and determining accurate line frequencies, line depths, and equivalent widths for highresolution<br />

infrared spectra.<br />

Morossi & Crivellari (1980) describe a method to obtain T eff and log g by comparing<br />

observations to a grid <strong>of</strong> models. Their method is based on a least-squares minimisation<br />

procedure which determines values for <strong>the</strong> parameters which optimise <strong>the</strong> fit between<br />

<strong>the</strong> <strong>the</strong>oretical models and observational data.<br />

Katz et al. (1998) use a χ 2 minimisation procedure to obtain values <strong>of</strong> T eff , log g,<br />

and [Fe/H] from ELODIE spectra by fitting observations to a library <strong>of</strong> 211 reference<br />

stars observed with <strong>the</strong> same instrument for which <strong>the</strong> atmospheric parameters are<br />

well-known.<br />

The method <strong>of</strong> χ 2 fitting has grown to be very much <strong>the</strong> de facto procedure <strong>of</strong>


3.1 Analysing <strong>Stellar</strong> <strong>Spectra</strong> 53<br />

automating <strong>the</strong> parameterisation <strong>of</strong> astronomical spectra. It is a specific case <strong>of</strong> <strong>the</strong><br />

more general class <strong>of</strong> fitting procedures known as metric distance minimisation (or<br />

minimum distance methods), where, as <strong>the</strong> name suggests, results are determined by<br />

minimising some distance metric between <strong>the</strong> object under analysis and each member<br />

<strong>of</strong> a set <strong>of</strong> templates. The object is assigned <strong>the</strong> parameters <strong>of</strong> <strong>the</strong> template which gives<br />

<strong>the</strong> smallest distance.<br />

Let x = (x 1 ,x 2 ,... ,x N ) be <strong>the</strong> spectrum to parameterise, and s = (s 1 ,s 2 ,... ,s N )<br />

be a template spectrum with known physical parameters. The distance metric to be<br />

minimised is <strong>of</strong> <strong>the</strong> form<br />

D = 1 N<br />

[ i=N<br />

] 1/p<br />

∑<br />

w i |x i − s i | p , (3.1)<br />

i=1<br />

where w i is a weight assigned to flux element s i <strong>of</strong> <strong>the</strong> template spectrum. Typically,<br />

s is only one template in a set <strong>of</strong> templates, S = {s 1 ,s 2 ,... ,s M }, and equation 3.1 is<br />

computed for all templates s j . Equation 3.1 becomes χ 2 fitting when p = 2 and<br />

w i = σ −2<br />

i<br />

, where σ i is <strong>the</strong> error in x i .<br />

For a straightforward nearest neighbour minimisation <strong>of</strong> <strong>the</strong> χ 2 metric over a grid <strong>of</strong><br />

templates, an accurate result requires <strong>the</strong> grid to be finely spaced in each parameter <strong>of</strong><br />

interest so that <strong>the</strong> effects <strong>of</strong> that parameter on <strong>the</strong> flux vector can be ascertained. This<br />

can create a large data requirement, and <strong>the</strong> computation time required to parameterise<br />

one spectrum can increase prohibitively as equation 3.1 must be evaluated for all <strong>the</strong><br />

templates.<br />

<strong>On</strong>e solution to this problem is to use some method <strong>of</strong> interpolation to “fill in <strong>the</strong><br />

gaps” between templates in a discrete grid. As interpolation creates <strong>the</strong> illusion <strong>of</strong><br />

continuity in <strong>the</strong> grid, it also opens <strong>the</strong> possibility <strong>of</strong> using search-based optimisation<br />

methods to locate <strong>the</strong> minimum <strong>of</strong> D in an efficient manner.<br />

Unfortunately, with χ 2 fitting, <strong>the</strong>re is no escaping <strong>the</strong> so-called curse <strong>of</strong> dimen-<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


54 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

sionality. As <strong>the</strong> number <strong>of</strong> parameters to be determined increases, <strong>the</strong> number <strong>of</strong><br />

templates in <strong>the</strong> grid also increases exponentially.<br />

χ 2 Fitting for Astronomical Data Mining<br />

The main disadvantage to using χ 2 fitting in <strong>the</strong> context <strong>of</strong> a data mining application<br />

is slowness. In contrast to artificial neural networks, no training procedure exists to<br />

extract information from <strong>the</strong> template grid, and <strong>the</strong> grid is required to be present and<br />

searched to minimise D for every new spectrum to be parameterised.<br />

<strong>On</strong>e solution to <strong>the</strong> speed issue is to distribute <strong>the</strong> grid search over many computers<br />

in a parallel cluster. The grid <strong>of</strong> templates could be broken up into N sections, where<br />

N is <strong>the</strong> number <strong>of</strong> processing nodes in <strong>the</strong> cluster. Each node <strong>the</strong>n receives its section<br />

<strong>of</strong> <strong>the</strong> grid, finds <strong>the</strong> local minimum <strong>of</strong> D for an observed spectrum, and reports this<br />

value back to a master processing node which <strong>the</strong>n selects <strong>the</strong> global minimum from<br />

all <strong>the</strong> node results.<br />

In a data mining context, it is likely that <strong>the</strong> template grid will cover a large region<br />

<strong>of</strong> <strong>the</strong> parameter space <strong>of</strong> interest in reasonable detail so as to account for <strong>the</strong> diversity<br />

<strong>of</strong> objects that will be encountered. Large template grids pose storage and accessing<br />

problems from within <strong>the</strong> χ 2 minimisation program because <strong>the</strong> main memory <strong>of</strong> <strong>the</strong><br />

computer may not be capacious enough to store all <strong>the</strong> templates at once.<br />

The work <strong>of</strong> this chapter is concerned with taking a pre-existing χ 2 minimisation<br />

program used at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong>, and beginning <strong>the</strong> modifications necessary<br />

in order to use <strong>the</strong> program more efficiently in a data mining context. Parallelising <strong>the</strong><br />

program is a relatively straightforward task, however <strong>the</strong> problem <strong>of</strong> managing large<br />

template grids is much more involved and needs to be tackled first.


3.2 SFIT 55<br />

3.2 SFIT<br />

SFIT (Jeffery et al., 2001) is a Fortran 90 implementation <strong>of</strong> <strong>the</strong> χ 2 minimisation<br />

method outlined in <strong>the</strong> previous section. Given a grid <strong>of</strong> <strong>the</strong>oretical model spectra,<br />

and an observed spectrum, SFIT finds <strong>the</strong> combination <strong>of</strong> physical parameters <strong>of</strong> <strong>the</strong><br />

model which most closely matches <strong>the</strong> observed spectrum by minimising <strong>the</strong> χ 2 distance<br />

metric.<br />

The program considers several broadening processes which must be applied to <strong>the</strong><br />

<strong>the</strong>oretical spectra before comparison with an observed spectrum. These include instrumental<br />

broadening I(∆λ), rotational broadening V(v sin i,β), acceleration broadening<br />

A(v), and projection broadening P(v − ¯v).<br />

Model grids are discrete in three-dimensions: T eff , log g, and n atm , <strong>the</strong> fractional<br />

atmospheric abundance <strong>of</strong> an element. A linear interpolation in tables method is used<br />

to estimate <strong>the</strong> model space between grid points. Fitting solutions can be obtained<br />

in several parameters (T eff , log g, n atm , v sini, and v rad ) for both single and composite<br />

spectra. The χ 2 minimisation can be carried out using ei<strong>the</strong>r <strong>the</strong> Nelder-Mead<br />

downhill simplex optimisation procedure, implemented as a variant <strong>of</strong> <strong>the</strong> AMOEBA<br />

algorithm <strong>of</strong> Press et al. (1986), or <strong>the</strong> Levenberg-Marquardt algorithm (Levenberg,<br />

1944; Marquardt, 1963). Nearest neighbour χ 2 fitting is also possible.<br />

The Amoeba algorithm minimises a function (in this case, <strong>the</strong> χ 2 difference between<br />

<strong>the</strong> observed spectrum and <strong>the</strong> models in <strong>the</strong> grid) by defining an initial simplex with<br />

N +1 vertices, where N is <strong>the</strong> number <strong>of</strong> dimensions in <strong>the</strong> function’s parameter space.<br />

The method <strong>the</strong>n takes a series <strong>of</strong> steps, most <strong>of</strong> which just move <strong>the</strong> point <strong>of</strong> <strong>the</strong><br />

simplex where <strong>the</strong> function to be minimised is largest through <strong>the</strong> opposite face <strong>of</strong><br />

<strong>the</strong> simplex to a lower point. The simplex “moves” through <strong>the</strong> parameter space by<br />

contracting and expanding until <strong>the</strong> distance “moved” is smaller than some tolerance<br />

threshold, at which point <strong>the</strong> method is determined to have converged on a solution.<br />

The Levenberg-Marquardt method is an interative method specifically catering for<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


56 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>the</strong> minimisation <strong>of</strong> sum-<strong>of</strong>-squares error functions (i.e., <strong>the</strong> χ 2 function used in SFIT).<br />

The algorithm expands <strong>the</strong> error function around a point and examines <strong>the</strong> derivatives<br />

to search for a minimum by dynamically setting <strong>the</strong> step size according to <strong>the</strong> direction<br />

<strong>of</strong> <strong>the</strong> gradient. As <strong>the</strong> solution approaches <strong>the</strong> minimum, <strong>the</strong> step size decreases, and<br />

<strong>the</strong> algorithm usually converges quickly. The use <strong>of</strong> this method in SFIT requires <strong>the</strong><br />

initial guess for <strong>the</strong> parameters to be reasonably close to <strong>the</strong> solution as Levenberg-<br />

Marquardt can get trapped in local minima. Although slower, Amoeba is more robust<br />

against this possibility.<br />

Both methods assume that <strong>the</strong> error function is ei<strong>the</strong>r continuous, or can be evaluated<br />

for any point within <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> parameter space. Evaluation <strong>of</strong> <strong>the</strong> χ 2<br />

error function in SFIT depends upon a grid <strong>of</strong> models which is discrete. As mentioned<br />

in <strong>the</strong> previous section, an interpolation method is used to “fill in <strong>the</strong> gaps” <strong>of</strong> <strong>the</strong><br />

grid, <strong>the</strong>reby creating <strong>the</strong> illusion <strong>of</strong> a continuous parameter space. As <strong>the</strong> Amoeba or<br />

Levenberg-Marquardt optimisers examine <strong>the</strong> properties <strong>of</strong> <strong>the</strong> error function throughout<br />

<strong>the</strong> parameter space, <strong>the</strong>y are more <strong>of</strong>ten than not examining an interpolation <strong>of</strong><br />

<strong>the</strong> model spectra.<br />

<strong>On</strong>ce T eff , log g, and v sin i have been determined, SFIT can estimate <strong>the</strong> composition<br />

<strong>of</strong> a star by adjusting <strong>the</strong> abundances <strong>of</strong> <strong>the</strong> different atomic species which<br />

contribute to <strong>the</strong> absorption spectrum until <strong>the</strong> <strong>the</strong>oretical spectrum matches <strong>the</strong> observed<br />

spectrum. As <strong>the</strong> number <strong>of</strong> free parameters in such an analysis is so large (i.e.,<br />

<strong>the</strong> abundances <strong>of</strong> H, C, N, O, Al, and so on, along with <strong>the</strong> microturbulent velocity,<br />

v t ), pre-computing multidimensional grids <strong>of</strong> <strong>the</strong>oretical spectra is infeasible. SFIT<br />

solves this problem by computing syn<strong>the</strong>tic spectra as demanded by <strong>the</strong> χ 2 minimisation<br />

algorithm.<br />

SFIT is currently distributed with STERNE and SPECTRUM, <strong>the</strong> model atmosphere<br />

and spectral syn<strong>the</strong>sis codes used at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong>. As part <strong>of</strong><br />

this <strong>the</strong>sis, <strong>the</strong> source codes for all three programs, and <strong>the</strong>ir associated libraries, were<br />

ported from a simple build system based on GNU make to a more flexible build system


3.2 SFIT 57<br />

based on <strong>the</strong> GNU autotools (see Appendix E).<br />

3.2.1 Limitations <strong>of</strong> SFIT<br />

Analyses performed with SFIT are hindered by <strong>the</strong> program’s restrictions on <strong>the</strong> size<br />

<strong>of</strong> <strong>the</strong> model grid. Grids are limited to three dimensions (T eff , log g, and n atm ), and, at<br />

maximum, nine points in T eff , five in log g, and five in n atm . Models are permitted to<br />

have no more than five thousand wavelength points.<br />

These limits are due to design decisions made during SFIT’s inital construction<br />

which choose to store <strong>the</strong> model grid entirely in <strong>the</strong> computer’s main memory. The<br />

restrictions on dimensionality and number <strong>of</strong> grid points are merely hard-coded numbers<br />

within <strong>the</strong> program, and <strong>the</strong>refore cannot be changed without recompiling <strong>the</strong> source<br />

codes.<br />

Storing <strong>the</strong> model grid in main memory, whilst providing fast access to <strong>the</strong> models,<br />

also presents a problem in that computer memory is finite - orders <strong>of</strong> magnitude more<br />

finite than <strong>the</strong> space available on secondary storage devices, such as hard disks. Despite<br />

ever increasing main memory capacities, <strong>the</strong> implied upper limit on <strong>the</strong> number <strong>of</strong><br />

models and <strong>the</strong>ir detail will always be much smaller than if secondary storage was<br />

used.<br />

Ano<strong>the</strong>r restriction SFIT places on <strong>the</strong> model grid is that it must be rectangular<br />

and complete, with no missing grid points. This is problematic because it may be<br />

difficult or impossible for model atmosphere simulations to converge for a given set <strong>of</strong><br />

physical parameters. In such an instance, a make-shift solution is employed wherein a<br />

converged model close to <strong>the</strong> desired physical parameters is used to “plug <strong>the</strong> gap”.<br />

The rectangularity and completeness requirements are a result <strong>of</strong> SFIT’s interpolation<br />

scheme which generates approximations <strong>of</strong> models in <strong>the</strong> parameter space between<br />

discrete grid points by linear interpolation in tables. An irregular grid, or a missing<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


58 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

grid point, prevents <strong>the</strong> interpolation scheme from operating correctly.<br />

3.2.2 Proposal to Remove SFIT’s Limitatons<br />

In summary, <strong>the</strong> limitations <strong>of</strong> SFIT’s treatment <strong>of</strong> model grids are<br />

1. Size limitations due to initial program design decisions and storage <strong>of</strong> grids in<br />

main memory.<br />

2. Interpolation scheme cannot handle irregular or incomplete grids.<br />

Modifying SFIT to be more useful in a data mining context requires removing <strong>the</strong>se<br />

two limitations.<br />

The solution to <strong>the</strong> first limitation is obvious: correct <strong>the</strong> limiting initial design<br />

decisions, and store <strong>the</strong> model grids on secondary storage, i.e. hard disk, reading <strong>the</strong>m<br />

into main memory on an individual basis only when needed. An indexing scheme is<br />

<strong>the</strong>n required, one that can be held in main memory in place <strong>of</strong> <strong>the</strong> models, and quickly<br />

searched to determine which models are to be read in and <strong>the</strong>ir location on disk.<br />

The nature <strong>of</strong> this index is dependent on <strong>the</strong> interpolation scheme chosen to correct<br />

<strong>the</strong> second SFIT limitation. Interpolation allows a complicated function to be<br />

approximated at an unknown point by using known surrounding points to construct a<br />

simpler, estimating function. Different interpolation schemes use <strong>the</strong> known surrounding<br />

points in different ways, so <strong>the</strong> design and function <strong>of</strong> <strong>the</strong> proposed model grid<br />

indexing scheme must be tailored accordingly.<br />

Many interpolation schemes exist in <strong>the</strong> literature. The ideal scheme for this application<br />

should be multidimensional (although this can be relaxed due to <strong>the</strong> curse <strong>of</strong><br />

dimensionality), have low computation cost, and be able to operate over potentially<br />

randomly sampled functions. The interpolating function must also be continuous, and<br />

be based on known data points local to <strong>the</strong> interpolation point (as opposed to global


3.2 SFIT 59<br />

methods, in which <strong>the</strong> interpolated value is influenced by all <strong>of</strong> <strong>the</strong> available data).<br />

Two interpolation functions which stand out in terms <strong>of</strong> <strong>the</strong>ir simplicity, multidimensionality,<br />

and ability to handle incomplete grids are weighted average interpolation<br />

and simplex interpolation.<br />

Weighted Average Interpolation<br />

The most common weighted average method referred to in <strong>the</strong> literature is that <strong>of</strong><br />

Shepard (1968) and its modifications, such as Renka (1988).<br />

Given an underlying function, f, with values f i at nodes (x i ,y i ) for i = 1,... ,N,<br />

<strong>the</strong> interpolating formula is <strong>of</strong> <strong>the</strong> form,<br />

F(x,y) =<br />

∑ N<br />

k=1 W k(x,y)f k (x,y)<br />

∑ N<br />

i=1 W . (3.2)<br />

i(x,y)<br />

The weighting function, W k , is defined by some inverse distance function,<br />

W k (x,y) = 1 d 2 , (3.3)<br />

k<br />

where d k (x,y) denotes <strong>the</strong> Euclidean distance between (x,y) and (x k ,y k ).<br />

A suitable indexing scheme for weighted average interpolation should allow fast<br />

searching <strong>of</strong> <strong>the</strong> node points to determine which are within a specified radius <strong>of</strong> <strong>the</strong><br />

interpolation point (i.e., nearest neighbour searching).<br />

The field <strong>of</strong> computational geometry contains many algorithms and data structures<br />

for indexing and searching a set <strong>of</strong> N-dimensional points in a computationally efficient<br />

manner. <strong>On</strong>e data structure that is very applicable to nearest neighbour searching<br />

problems is <strong>the</strong> k-D tree (Moore, 1991). Figure 3.1 demonstrates <strong>the</strong> k-D tree in two<br />

dimensions.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


60 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

[4,9]<br />

[4,9]<br />

[2,5]<br />

[8,7]<br />

[2,5] [8,7]<br />

[3,2]<br />

[3,2]<br />

Figure 3.1: Example <strong>of</strong> a k-D tree in two dimensions. <strong>On</strong> <strong>the</strong> left is <strong>the</strong> representation<br />

<strong>of</strong> how <strong>the</strong> k-D tree on <strong>the</strong> right splits up <strong>the</strong> x,y plane. (Adapted from Moore 1991.)<br />

This data structure is a binary tree which represents a series <strong>of</strong> partitions in k-<br />

dimensional space, organising a set <strong>of</strong> points into a collection <strong>of</strong> hyper-rectangular<br />

regions. Nearest neighbour searching can be carried out in O(log 2 N) time on average,<br />

where N is <strong>the</strong> number <strong>of</strong> nodes in <strong>the</strong> tree.<br />

All that remains is to determine how many nearest neighbours are needed, and <strong>the</strong><br />

weighted average interpolation can be performed immediately.<br />

Simplex Interpolation<br />

A simplex, or N-simplex, is <strong>the</strong> N-D analogue <strong>of</strong> a triangle in 2-D and a tetrahedron<br />

in 3-D, as demonstrated in Figure 3.2.<br />

Simplex-based interpolation uses a weighted linear combination <strong>of</strong> <strong>the</strong> simplex vertices<br />

to approximate a function at a point located on or within <strong>the</strong> simplex boundary.<br />

These weights are computed as <strong>the</strong> barycentric coordinates <strong>of</strong> <strong>the</strong> interpolation point<br />

within <strong>the</strong> simplex.<br />

Given a collection <strong>of</strong> N-dimensional points, such as a grid <strong>of</strong> model spectra, a suit-


3.2 SFIT 61<br />

(a) 1−simplex (b) 2−simplex (c) 3−simplex<br />

Figure 3.2: A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is a<br />

tetrahedron.<br />

able indexing scheme must allow <strong>the</strong> vertices <strong>of</strong> <strong>the</strong> enclosing N-simplex to be located<br />

quickly.<br />

As this is, again, ano<strong>the</strong>r nearest neighbour problem, <strong>the</strong> method <strong>of</strong> k-D trees could<br />

be a viable solution. However, if <strong>the</strong> dimensionality <strong>of</strong> <strong>the</strong> grid is kept to three dimensions<br />

or less, <strong>the</strong> field <strong>of</strong> computational geometry <strong>of</strong>fers ano<strong>the</strong>r approach.<br />

Several algorithms exist which can take a cloud <strong>of</strong> two or three-dimensional points<br />

and generate a triangular or tetrahedral mesh. All that is <strong>the</strong>n needed is a method to<br />

search <strong>the</strong> mesh for <strong>the</strong> triangle or tetrahedron that contains <strong>the</strong> interpolation point.<br />

Choosing <strong>the</strong> Solution<br />

Preliminary testing <strong>of</strong> both interpolation and indexing schemes was carried out to help<br />

determine which solution would be more viable.<br />

Constructing a suitable prototype <strong>of</strong> <strong>the</strong> weighted average/k-D tree solution was hindered<br />

by Fortran 90’s insufficient flexibility to support <strong>the</strong> implementation <strong>of</strong> advanced<br />

data structures. No suitable third-party libraries were available to speed development,<br />

and, as a result <strong>of</strong> time constraints, <strong>the</strong> pursuit <strong>of</strong> this solution had to be abandoned.<br />

<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, if it is assumed that SFIT model grids are limited to three<br />

dimensions, <strong>the</strong>n several freely available third-party libraries exist which can generate<br />

tetrahedral meshes from a cloud <strong>of</strong> random points. From a purely pragmatic stand-<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


62 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

point, this makes <strong>the</strong> simplex interpolation scheme very attractive. After <strong>the</strong> mesh has<br />

been generated, <strong>the</strong> methods <strong>the</strong>n required to search for <strong>the</strong> tetrahedron enclosing an<br />

interpolation point are simple geometric operations.<br />

Thus, <strong>the</strong> simplex interpolation method was chosen to solve SFIT’s grid management<br />

problems. The weighted-average/k-D tree solution is an interesting idea (which,<br />

unlike <strong>the</strong> simplex scheme, is not limited to three dimensions), and should be pursued<br />

in future work.<br />

3.3 Tetrahedralisation: Interpolation and Indexing<br />

In developing <strong>the</strong> simplex interpolation and corresponding grid indexing scheme, it<br />

was assumed that SFIT grids will always be three dimensional due to <strong>the</strong> curse <strong>of</strong><br />

dimensionality.<br />

From this assumption, <strong>the</strong> tetrahedral mesh indexing scheme, described previously,<br />

can be constructed using third-party libraries. This affords a very pragmatic solution<br />

to <strong>the</strong> problem.<br />

3.3.1 Simplex Interpolation<br />

Barycentric coordinates express <strong>the</strong> location <strong>of</strong> any point within an N-simplex in terms<br />

<strong>of</strong> a set <strong>of</strong> homogenous coordinates that form a linear combination <strong>of</strong> <strong>the</strong> simplex<br />

vertices. Given a tetrahedron defined by three arbitrary vertices, v 1 , v 2 , v 3 , and<br />

v 4 , and some point p within this tetrahedron, p can be expressed as <strong>the</strong> weighted<br />

combination <strong>of</strong> <strong>the</strong> four vertices<br />

p = λ 1 v 1 + λ 2 v 2 + λ 3 v 3 + λ 4 v 4 , (3.4)<br />

where λ 1 , λ 2 , λ 3 , and λ 4 are <strong>the</strong> barycentric coordinates. These are subject to <strong>the</strong>


3.3 Tetrahedralisation: Interpolation and Indexing 63<br />

constraints that<br />

0 ≤ λ 1 ,λ 2 ,λ 3 ,λ 4 ≤ 1, (3.5)<br />

and,<br />

λ 1 + λ 2 + λ 3 + λ 4 = 1. (3.6)<br />

Calculating <strong>the</strong> barycentric coordinates <strong>of</strong> a point inside a given tetrahedron is<br />

accomplished by reformulating equation 3.4 as follows,<br />

⎡<br />

⎢<br />

⎣<br />

p x<br />

p y<br />

p z<br />

1<br />

⎤ ⎡<br />

=<br />

⎥ ⎢<br />

⎦ ⎣<br />

v 1x v 2x v 3x v 4x<br />

v 1y v 2y v 3y v 4y<br />

v 1z v 2z v 3z v 4z<br />

1 1 1 1<br />

⎤ ⎡<br />

·<br />

⎥ ⎢<br />

⎦ ⎣<br />

⎤<br />

λ 1<br />

λ 2<br />

, (3.7)<br />

λ 3 ⎥<br />

⎦<br />

λ 4<br />

or, rewriting in matrix notation,<br />

b = A · x, (3.8)<br />

where b = [ p x p y p z 1 ] T , x =<br />

[<br />

λ 1 λ 2 λ 3 λ 4<br />

] T, and,<br />

⎡<br />

A =<br />

⎢<br />

⎣<br />

v 1x v 2x v 3x v 4x<br />

v 1y v 2y v 3y v 4y<br />

v 1z v 2z v 3z v 4z<br />

1 1 1 1<br />

⎤<br />

.<br />

⎥<br />

⎦<br />

Therefore, x can be found through <strong>the</strong> standard methods <strong>of</strong> solving equation 3.8.<br />

As will be useful later on, if <strong>the</strong> computed barycentric coordinates do not conform<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


64 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

to <strong>the</strong> constraints discussed earlier, <strong>the</strong>n <strong>the</strong> point <strong>of</strong> interest can be determined to lie<br />

outside <strong>the</strong> given tetrahedron.<br />

3.3.2 Grid Index - Delaunay Triangulation<br />

The Delaunay triangulation (O’Rourke, 1998) is frequently used to generate meshes<br />

<strong>of</strong> N-simplices from a set <strong>of</strong> N-dimensional points because it has certain desirable<br />

properties, <strong>the</strong> most important <strong>of</strong> which is <strong>the</strong> following: inside <strong>the</strong> circum-hypersphere<br />

<strong>of</strong> any simplex, <strong>the</strong>re are no o<strong>the</strong>r points <strong>of</strong> <strong>the</strong> set (see Figure 3.3).<br />

This property yields a resulting triangulation which is “natural” and provably optimal<br />

in many respects. It is known that <strong>the</strong> Delaunay triangulation exists and is unique<br />

for a set <strong>of</strong> points in general position, that is, no N + 1 points are on <strong>the</strong> same hyperplane<br />

and no N + 2 points are on <strong>the</strong> same hypersphere, for an N-dimensional set <strong>of</strong><br />

points.<br />

In <strong>the</strong> context <strong>of</strong> SFIT, <strong>the</strong> Delaunay tetrahedralisation <strong>of</strong> a model grid is generated<br />

by <strong>the</strong> third-party library TetGen 1 .<br />

TetGen is a portable C++ program implementing <strong>the</strong> Delaunay triangulation algorithm<br />

<strong>of</strong> Edelsbrunner & Shah (1992). This algorithm is simple, fast, and TetGen’s<br />

implementation is numerically robust due to <strong>the</strong> use <strong>of</strong> adaptive exact arithmetic code<br />

(Shewchuk, 1996). TetGen can be compiled as a set <strong>of</strong> library functions which can <strong>the</strong>n<br />

be integrated into o<strong>the</strong>r applications, in this case, SFIT.<br />

A technical difficulty arises in that SFIT is a Fortran 90 program, but TetGen is<br />

written in C++. Unfortunately, <strong>the</strong> Fortran 90 standard does not provide for calling<br />

functions written in o<strong>the</strong>r programming languages, and it has been left up to <strong>the</strong><br />

individual compiler implementors to include a solution.<br />

SFIT is currently based around <strong>the</strong> Intel Fortran compiler for Linux 2 , and it is rel-<br />

1 http://tetgen.berlios.de<br />

2 http://www.intel.com/cd/s<strong>of</strong>tware/products/asmo-na/eng/compilers/flin/


3.3 Tetrahedralisation: Interpolation and Indexing 65<br />

Figure 3.3: In two dimensions, <strong>the</strong> Delaunay triangulation guarantees that no o<strong>the</strong>r<br />

points lie in <strong>the</strong> circumcircle <strong>of</strong> any simplex.<br />

atively straightforward to call out to C/C++ functions using <strong>the</strong> mechanisms provided<br />

by this compiler.<br />

To simplify <strong>the</strong> process <strong>of</strong> calling <strong>the</strong> TetGen library from Fortran, a small “glue”<br />

function was written in C. This function accepts a flattened array <strong>of</strong> three-dimensional<br />

model grid points, copies <strong>the</strong> data into <strong>the</strong> data structure used by TetGen, calls TetGen<br />

to perform <strong>the</strong> tetrahedralisation, <strong>the</strong>n returns a flattened array <strong>of</strong> vertices for <strong>the</strong><br />

generated tetrahedra, and a flattened array denoting <strong>the</strong> neighbouring tetrahedra for<br />

each generated tetrahedron.<br />

This process <strong>of</strong> calling TetGen to construct <strong>the</strong> new model grid indexing scheme fits<br />

in with SFIT’s normal grid generation procedure, as outlined in algorithm 1.<br />

As noted in <strong>the</strong> pseudo-code, <strong>the</strong> parameters <strong>of</strong> <strong>the</strong> models are rescaled before being<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


66 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Algorithm 1 Generating a Tetrahedralisation <strong>of</strong> a Model Grid<br />

for all models in <strong>the</strong> grid file list do<br />

read model parameters from header<br />

record each parameter in corresponding grid axis array<br />

write model fluxes to direct access grid file<br />

append model parameters and corresponding direct access file record numbers to<br />

a linked list<br />

end for<br />

rescale parameters in linked list to yield more optimal tetrahedra<br />

flatten <strong>the</strong> parameters in linked list to an array <strong>of</strong> 3D points<br />

pass array <strong>of</strong> points into TetGen {TetGen returns two arrays: a list <strong>of</strong> tetrahedra<br />

vertices, and a list <strong>of</strong> tetrahedra neighbours}<br />

write grid axis arrays to beginning <strong>of</strong> index file<br />

write linked list <strong>of</strong> model data to index file<br />

write list <strong>of</strong> tetrahedra vertices to index file<br />

write list <strong>of</strong> tetrahedra neighbours to index file<br />

passed to TetGen. This is to allow <strong>the</strong> generation <strong>of</strong> a mesh which is composed, more<br />

optimally, <strong>of</strong> “fat” tetrahedra, avoiding degenerate tetrahedra or “slivers” which would<br />

cause numerical problems for <strong>the</strong> simplex interpolation scheme and <strong>the</strong> point location<br />

algorithm outlined in <strong>the</strong> next section.<br />

Such degenerate tetrahedra would arise because <strong>of</strong> a scale disparity between <strong>the</strong><br />

model grid axes. For instance, <strong>the</strong> T eff axis contains effective temperatues measured in<br />

Kelvin and rescaled in magnitude by a division by 100.<br />

<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> n atm axis typically contains fractional values 0 ≤ n atm ≤ 1.<br />

This disparity means that model grids are very compact in <strong>the</strong> n atm dimension, and<br />

comparatively widely spaced in <strong>the</strong> T eff dimension.<br />

Given <strong>the</strong> model grid axis arrays accumulated during <strong>the</strong> model grid creation process<br />

(which typically correspond to <strong>the</strong> dimensions <strong>of</strong> T eff , log g, and n atm ), each axis is<br />

rescaled in <strong>the</strong> following manner.


3.3 Tetrahedralisation: Interpolation and Indexing 67<br />

Let A i be <strong>the</strong> i th model grid axis comprising <strong>the</strong> list <strong>of</strong> m monotonically increasing<br />

points {a i1 ,a i2 ,...,a im }. A i is rescaled according to <strong>the</strong> mapping function f : A i ↦→ R i<br />

such that f(a) for every a ∈ A i is defined as<br />

f(a) = a − a i1<br />

a i2 − a i1<br />

∗ 100. (3.9)<br />

This simple function translates A i to <strong>the</strong> origin, and rescales <strong>the</strong> points onto a more<br />

widely spaced grid. Assuming a constant distance between all points a ii , this mapping<br />

yields a list <strong>of</strong> m monotonically increasing points R i , {0,100,200, · · · ,(m − 1) ∗ 100}.<br />

3.3.3 Navigating <strong>the</strong> Index - Point Location<br />

The algorithm for locating <strong>the</strong> tetrahedron which encloses any given interpolation point<br />

is based on a randomised jump-and-walk methodology, inspired by <strong>the</strong> work <strong>of</strong> Mücke<br />

et al. (1996).<br />

The basic idea is simple. A “good starting point” is established by randomly sampling<br />

<strong>the</strong> set <strong>of</strong> tetrahedra. The distances between each tetrahedron’s centroid and <strong>the</strong><br />

given interpolation point are calculated, and <strong>the</strong> tetrahedron closest to <strong>the</strong> interpolation<br />

point is selected.<br />

A line segment is <strong>the</strong>n constructed using <strong>the</strong> chosen tetrahedron’s centroid and <strong>the</strong><br />

interpolation point. The tetrahedron containing <strong>the</strong> interpolation point is located by<br />

“walking through” <strong>the</strong> tetrahedra which intersect this line. Figure 3.4 illustrates <strong>the</strong><br />

concept in two dimensions<br />

More formally, given <strong>the</strong> tetrahedralisation D <strong>of</strong> a model grid containing n tetrahedra,<br />

and an interpolation point p (rescaled using Equation 3.9), <strong>the</strong> following procedure<br />

locates <strong>the</strong> tetrahedron <strong>of</strong> D, if any, which contains p:<br />

1. Select m tetrahedra T 1 , · · · ,T m at random from D, where m = ⌈2n 1 3 ⌉<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


68 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

p<br />

L<br />

T<br />

Figure 3.4: The line segment, L, is constructed using <strong>the</strong> centroid <strong>of</strong> <strong>the</strong> starting<br />

tetrahedron, T, and <strong>the</strong> interpolation point, p. The tetrahedra visited on <strong>the</strong> walkthrough<br />

are coloured grey.<br />

2. Determine <strong>the</strong> index j ∈ {1, · · · ,m} <strong>of</strong> <strong>the</strong> tetrahedron minimising <strong>the</strong> Euclidian<br />

distance d(centroid(T j ),p). Set T = T j<br />

3. Locate <strong>the</strong> tetrahedron containing p (if it exists) by traversing all tetrahedra<br />

intersected by <strong>the</strong> line segment L = (centroid(T),p).<br />

Step 3 is implemented in constant time per tetrahedron visited once <strong>the</strong> initial<br />

tetrahedron, intersected by L and incident on starting point T, is determined. This is<br />

due to <strong>the</strong> fact that TetGen conveniently returns an array which describes, for every<br />

tetrahedron in <strong>the</strong> mesh, which tetrahedra are its neighbours.<br />

The implementation <strong>of</strong> <strong>the</strong> walk-though mechanism is based on <strong>the</strong> fast ray-triangle<br />

intersection algorithm <strong>of</strong> Möller & Trumbore (1997). This algorithm is very straight-


3.3 Tetrahedralisation: Interpolation and Indexing 69<br />

forward.<br />

A ray R(t) with origin O and normalised direction D is defined as<br />

R(t) = O + tD, (3.10)<br />

and a triangle is defined by three vertices V 0 , V 1 , and V 2 . A point, T(u,v), on a<br />

triangle is given by<br />

T(u,v) = (1 − u − v)V 0 + uV 1 + vV 2 , (3.11)<br />

where (u,v) are <strong>the</strong> barycentric coordinates which must fulfill u,v ≥ 0, and u + v ≤<br />

1. Computing <strong>the</strong> intersection between <strong>the</strong> ray, R(t), and <strong>the</strong> triangle, T(u,v), is<br />

equivalent to R(t) = T(u,v), which yields<br />

O + tD = (1 − u − v)V 0 + uV 1 + vV 2 . (3.12)<br />

Rearranging <strong>the</strong> terms gives<br />

[<br />

⎡<br />

]<br />

−D, V 1 − V 0 , V 2 − V 0<br />

·<br />

⎢<br />

⎣<br />

t<br />

u<br />

v<br />

⎤<br />

⎥<br />

⎦ = O − V 0. (3.13)<br />

The barycentric coordinates (u,v) and <strong>the</strong> distance, t, from <strong>the</strong> ray origin to <strong>the</strong><br />

intersection point can be found by solving <strong>the</strong> linear system <strong>of</strong> equations above. If <strong>the</strong><br />

barycentric coordinates meet <strong>the</strong> requirements stipulated earlier, <strong>the</strong>n <strong>the</strong> ray intersects<br />

<strong>the</strong> triangle.<br />

From <strong>the</strong> starting point <strong>of</strong> <strong>the</strong> walk-through method, each triangular face <strong>of</strong> <strong>the</strong><br />

tetrahedron is tested using this algorithm to determine if it is intersected by <strong>the</strong> line<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


70 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

segment L. If an intersecting face is found, <strong>the</strong> walk-through moves to <strong>the</strong> tetrahedron<br />

opposite that face (in constant time).<br />

This new tetrahedron is first tested to see if it contains point p by way <strong>of</strong> <strong>the</strong> simplex<br />

interpolation method discussed in section 3.3.1. If <strong>the</strong> tetrahedron does not contain<br />

p, <strong>the</strong> ray-triangle intersection test is performed, and <strong>the</strong> walk-through moves to <strong>the</strong><br />

neighbouring tetrahedron on <strong>the</strong> o<strong>the</strong>r side <strong>of</strong> <strong>the</strong> face intersected by <strong>the</strong> ray. If <strong>the</strong><br />

tetrahedron does contain p, <strong>the</strong>n <strong>the</strong> walk-through procedure can terminate successfully<br />

by returning <strong>the</strong> interpolation weights (i.e., <strong>the</strong> barycentric coordinates) obtained from<br />

<strong>the</strong> point-in-simplex test.<br />

It is possible that point p could lie outside <strong>the</strong> convex hull <strong>of</strong> <strong>the</strong> tetrahedralisation.<br />

The walk-through algorithm recognises this eventuality when <strong>the</strong> line segment L intersects<br />

<strong>the</strong> face <strong>of</strong> a tetrahedron which is a member <strong>of</strong> <strong>the</strong> convex hull and <strong>the</strong>refore has<br />

no neighbour listed in <strong>the</strong> array returned by TetGen.<br />

Ra<strong>the</strong>r than allowing <strong>the</strong> walk-through algorithm to spend time traversing <strong>the</strong> tetrahedralisation<br />

in order to discover that point p lies outside <strong>the</strong> convex hull, it is possible<br />

to test for this case immediately after forming <strong>the</strong> line segment L.<br />

In addition to generating <strong>the</strong> Delaunay tetrahedralisation <strong>of</strong> a model grid, TetGen is<br />

also able to return a list <strong>of</strong> those tetrahedron faces which comprise <strong>the</strong> convex hull. After<br />

forming <strong>the</strong> line segment L, each <strong>of</strong> <strong>the</strong>se faces could <strong>the</strong>n be tested for intersection.<br />

However, it doesn’t really matter which method is used because, if point p lies<br />

outside <strong>the</strong> convex hull, <strong>the</strong> simplex interpolation method dictates that SFIT can no<br />

longer proceed with a fitting run and must stop.<br />

In summary, pseudo-code for <strong>the</strong> algorithms outlined in this section are given in<br />

algorithms 2, 3, and 4.


3.3 Tetrahedralisation: Interpolation and Indexing 71<br />

Algorithm 2 Locating a Point in a Tetrahedralisation<br />

rescale point, p, onto axes <strong>of</strong> rescaled model grid<br />

if no starting tetrahedron exists <strong>the</strong>n<br />

find close starting tetrahedron by random selection<br />

end if<br />

walk through tetrahedralisation<br />

if enclosing tetrahedron found <strong>the</strong>n<br />

return barycentric coordinates <strong>of</strong> point p within <strong>the</strong> tetrahedron<br />

else<br />

point lies outside <strong>the</strong> convex hull <strong>of</strong> <strong>the</strong> tetrahedralisation<br />

exit SFIT<br />

end if<br />

Algorithm 3 Finding Walk-Through Starting Point<br />

select at random m = ⌈2n 1 3 ⌉ tetrahedra from <strong>the</strong> tetrahedralisation, where n is <strong>the</strong><br />

total number <strong>of</strong> tetrahedra in <strong>the</strong> tetrahedralisation<br />

compute <strong>the</strong> Euclidean distance from each selected tetrahedron’s centroid to <strong>the</strong><br />

interpolation point<br />

return <strong>the</strong> index <strong>of</strong> <strong>the</strong> closest tetrahedron<br />

Algorithm 4 Walk-Through <strong>of</strong> Tetrahedralisation<br />

construct <strong>the</strong> line segment, L, from given starting tetrahedron’s centroid to <strong>the</strong> interpolation<br />

point, p<br />

current tetrahedron = given starting tetrahedron<br />

loop<br />

if current tetrahedron contains <strong>the</strong> interpolation point <strong>the</strong>n<br />

return <strong>the</strong> barycentric coordinates <strong>of</strong> its location<br />

else<br />

test each triangular face <strong>of</strong> <strong>the</strong> starting tetrahedron for intersection with L<br />

current tetrahedron = neighbouring tetrahedron on o<strong>the</strong>r side <strong>of</strong> intersected face<br />

if current tetrahedron is null <strong>the</strong>n<br />

interpolation point lies outside convex hull<br />

exit SFIT<br />

end if<br />

end if<br />

end loop<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


72 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

3.4 Testing <strong>the</strong> Modifications<br />

The new simplex interpolation and indexing scheme was tested against <strong>the</strong> previous<br />

SFIT grid storage and interpolation in tables method using two case studies. The<br />

first conducts an analysis <strong>of</strong> a spectrum from <strong>the</strong> extreme helium star BD+10 2179<br />

(Klemola, 1961) to allow a comparison <strong>of</strong> each <strong>of</strong> <strong>the</strong> different optimisation routines<br />

<strong>of</strong>fered by SFIT over <strong>the</strong> two interpolation schemes. The second uses a coarse grid <strong>of</strong><br />

<strong>the</strong>oretical models to parameterise a large number <strong>of</strong> o<strong>the</strong>r models to give an indication<br />

<strong>of</strong> <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> two interpolation schemes whilst keeping <strong>the</strong> optimisation method<br />

constant.<br />

Case Study 1: BD+10 2179<br />

The observed high-resolution echelle spectrum <strong>of</strong> BD+10 2179 used in this study covers<br />

<strong>the</strong> wavelength range 3760–5230 Å, at a dispersion <strong>of</strong> 0.1Å pixel−1 . The spectrum<br />

has already been wavelength calibrated and normalised. Both versions <strong>of</strong> SFIT fit a<br />

window, 4054–4545 Å, <strong>of</strong> this spectrum to a grid <strong>of</strong> 48 <strong>the</strong>oretical models covering <strong>the</strong><br />

parameter space as described in Table 3.1.<br />

Parameter Values<br />

T eff (K) 14,000, 16,000, 18,000, 20,000<br />

log g 2.00, 2.50, 3.00, 3.50<br />

n He 0.9960, 0.9890, 0.9690<br />

Table 3.1: Details <strong>of</strong> <strong>the</strong> model grid used in <strong>the</strong> comparison<br />

.<br />

The grid also has a latent fourth parameter in carbon abundance.<br />

For each analysis, <strong>the</strong> same initial guesses for each parameter were used for <strong>the</strong><br />

Amoeba and Levenberg-Marquardt optimisation methods. These have been chosen to<br />

be close to expected values <strong>of</strong> <strong>the</strong> final parameter. They, and <strong>the</strong> step sizes given to<br />

<strong>the</strong> Amoeba routine, are listed in Table 3.2.


3.4 Testing <strong>the</strong> Modifications 73<br />

Parameter Initial Value Amoeba Step Size<br />

T eff (kK) 17.0 2.0<br />

log g (dex) 2.5 0.5<br />

n He 0.989 0.01<br />

v sin i 27.5 10.0<br />

v rad 137.4 10.0<br />

Table 3.2: Initial parameters used for <strong>the</strong> Amoeba and Levenberg-Marquardt optimisation<br />

routines. The step sizes used for Amoeba are also given<br />

.<br />

An analysis begins by fixing <strong>the</strong> helium abundance, and solving for T eff , log g, v sin i,<br />

and v rad . Then, <strong>the</strong> values for <strong>the</strong>se parameters are fixed, and a solution is found for<br />

n He which, with <strong>the</strong> latent n C parameter, is effectively a first approximation for n C .<br />

Finally, <strong>the</strong> value <strong>of</strong> n He is fixed again, and <strong>the</strong> solutions for T eff and log g are checked.<br />

The results obtained by each optimisation method available in SFIT (Nelder-Mead<br />

simplex (Amoeba), Levenberg-Marquardt (LM), and nearest neighbour (NN) fitting)<br />

are presented in Tables 3.3 and 3.4 for both <strong>the</strong> original SFIT and <strong>the</strong> modified SFIT.<br />

Listed in paren<strong>the</strong>ses for each parameter are <strong>the</strong> standard errors generated by SFIT.<br />

Unmodified SFIT<br />

Amoeba LM NN<br />

T eff (kK) 18.000 (±0.014) 18.087 (±0.016) 18.00 (±0.011)<br />

log g 2.743 (±0.004) 2.747 (±0.004) 2.50 (±0.003)<br />

n He 0.997 (±0.001) 0.994 (±0.001) 0.996 (±0.001)<br />

v sin i 33.44 (±0.13) 36.9 (±0.15) 27.50 (±0.11)<br />

v rad 136.23 (±0.0) 137.4 (±0.0) 137.4 (±0.0)<br />

χ 2 Fit 9.00 9.90 9.96<br />

Time (secs) 54.3 11.42 27.99<br />

Table 3.3: Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> unmodified version <strong>of</strong> SFIT<br />

This is a satisfactory result which shows that <strong>the</strong> simplex interpolation and grid<br />

indexing scheme performs slightly better (in terms <strong>of</strong> <strong>the</strong> final χ 2 value) than <strong>the</strong><br />

original linear interpolation in tables method. There is also a small gain in terms <strong>of</strong><br />

execution speed <strong>of</strong> <strong>the</strong> Amoeba method with <strong>the</strong> new simplex-based scheme.<br />

It should be noted that <strong>the</strong> 6-fold increase in speed for nearest neighbour searching<br />

reported in Table 3.4 is due to a re-write <strong>of</strong> some SFIT internals to take advantage <strong>of</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


74 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Modified SFIT<br />

Amoeba LM NN<br />

T eff (kK) 18.150 (±0.005) 17.870 (±0.015) 18.00 (±0.012)<br />

log g 2.836 (±0.004) 2.687 (±0.005) 2.50 (±0.003)<br />

n He 0.993 (±0.001) 0.992 (±0.001) 0.996 (±0.001)<br />

v sini 33.46 (±0.13) 36.90 (±0.14) 27.50 (±0.12)<br />

v rad 136.22 (±0.0) 137.4 (±0.0) 137.4 (±0.0)<br />

χ 2 Fit 8.88 9.770 10.20<br />

Time (secs) 20.649 14.01 4.71<br />

Table 3.4: Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> modified version <strong>of</strong> SFIT<br />

<strong>the</strong> data structures used in <strong>the</strong> simplex interpolation scheme. The data structures allow<br />

a fast iteration over all <strong>the</strong> models in a grid, reading each in from disk as needed. This<br />

means that <strong>the</strong> χ 2 computation is being performed directly with <strong>the</strong> model itself, in<br />

contrast with <strong>the</strong> linear interpolation in tables scheme which actually tries to interpolate<br />

to <strong>the</strong> grid point instead <strong>of</strong> accessing <strong>the</strong> model directly. This is ano<strong>the</strong>r design flaw<br />

in SFIT that <strong>the</strong> simplex-based scheme corrects. The difference in methodology also<br />

accounts for <strong>the</strong> slightly different χ 2 values for nearest neighbour fitting listed in Tables<br />

3.3 and 3.4.<br />

Case Study 2: Model-Based <strong>Analysis</strong><br />

The grid <strong>of</strong> <strong>the</strong>oretical models used in this case study is given in Table 3.5. It coarsely<br />

covers almost <strong>the</strong> entire parameter space <strong>of</strong> models available in <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong><br />

archives.<br />

Parameter Values<br />

T eff (kK) 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 50.0<br />

log g 3.00, 4.00, 5.00, 6.00<br />

n He 0.001, 0.1, 0.5, 0.9, 0.999<br />

Table 3.5: The model grid used to obtain physical parameters <strong>of</strong> <strong>the</strong> set <strong>of</strong> test models.<br />

The rationale <strong>of</strong> <strong>the</strong> experiment is to use this grid to parameterise a large set <strong>of</strong><br />

models which fall within its boundaries, but aren’t actually used in <strong>the</strong> grid. Keeping<br />

<strong>the</strong> optimisation method constant, <strong>the</strong> results <strong>of</strong> <strong>the</strong> parameterisations will give an


3.4 Testing <strong>the</strong> Modifications 75<br />

indication <strong>of</strong> <strong>the</strong> relative accuracy <strong>of</strong> <strong>the</strong> two interpolation schemes.<br />

1238 models were selected to be parameterised by each version <strong>of</strong> SFIT. Each model<br />

was convolved with a Gaussian <strong>of</strong> 1Å<br />

FWHM to degrade its resolution slightly, and<br />

<strong>the</strong>n resampled onto a wavelength grid <strong>of</strong> 4050–4950 Å.<br />

The optimisation method used was Nelder-Mead simplex, with initial parameters<br />

and step sizes as follows: T eff = 30kK, δT eff = 5.0kK; log g = 4.5, δ log g = 1.0; n He =<br />

0.5, δn He = 0.1. Results are presented in Figures 3.5 to 3.7, and in Table 3.6,<br />

Before discussing <strong>the</strong> results, <strong>the</strong> presence <strong>of</strong> some anomalies in <strong>the</strong> linear interpolation<br />

in tables parameterisations must be noted and dealt with. Figure 3.5 plots <strong>the</strong><br />

parameterisation results for all <strong>of</strong> <strong>the</strong> 1238 models. At T eff ∼ 50,000K, <strong>the</strong> optimiser<br />

returns unbelievable values <strong>of</strong> log g for some models. Something also seems to be going<br />

wrong with <strong>the</strong> T eff parameterisations at <strong>the</strong> 50,000K grid boundary as some models<br />

with log g ∼ 3.5 are assigned temperatures much larger than 50,000K.<br />

The implementation <strong>of</strong> <strong>the</strong> linear interpolation in tables method used in SFIT does<br />

not take any steps to limit <strong>the</strong> optimisation routines to <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> grid,<br />

and actually allows some extrapolation to occur at <strong>the</strong> edges <strong>of</strong> <strong>the</strong> grid. However,<br />

it is unclear whe<strong>the</strong>r <strong>the</strong> anomalous T eff and log g values are due to <strong>the</strong> optimisation<br />

routine (in this case, Amoeba) extrapolating too far outside <strong>the</strong> grid space (i.e., <strong>the</strong>re<br />

is a problem with <strong>the</strong> implementation <strong>of</strong> <strong>the</strong> interpolation routine), or if <strong>the</strong>re is a<br />

problem with <strong>the</strong> models.<br />

If Figure 3.5 is replotted with axes closer to <strong>the</strong> grid boundaries, as in Figure 3.6, <strong>the</strong><br />

best performance <strong>of</strong> <strong>the</strong> interpolation method appears to occur below T eff = 40,000K.<br />

Between 40,000K and 50,000K, <strong>the</strong> parameterisations are more randomly distributed<br />

indicating a greater level <strong>of</strong> “confusion” from <strong>the</strong> interpolation routine. A cursory<br />

inspection <strong>of</strong> <strong>the</strong> models reveals no significant problems, so it could be hypo<strong>the</strong>sised<br />

that <strong>the</strong>re is definitely an issue with <strong>the</strong> implementation. However, an inspection <strong>of</strong><br />

Figure 3.7 shows a similar “confusion” from <strong>the</strong> simplex-based method.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


76 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

-10<br />

0<br />

10<br />

20<br />

log g<br />

30<br />

40<br />

50<br />

60<br />

70<br />

80<br />

80000<br />

70000<br />

60000<br />

50000 40000<br />

T eff (K)<br />

30000<br />

20000<br />

10000<br />

3<br />

2<br />

1<br />

log( n He / n H )<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

80000<br />

70000<br />

60000<br />

50000 40000<br />

T eff (K)<br />

30000<br />

20000<br />

10000<br />

Figure 3.5: Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />

Clearly visible are anomalous results arising from a suspected defect in <strong>the</strong> method’s<br />

implementation.


3.4 Testing <strong>the</strong> Modifications 77<br />

2<br />

3<br />

4<br />

log g<br />

5<br />

6<br />

7<br />

50000<br />

40000<br />

30000<br />

T eff (K)<br />

20000<br />

10000<br />

2<br />

1<br />

log( n He / n H )<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

50000<br />

40000<br />

30000<br />

T eff (K)<br />

20000<br />

10000<br />

Figure 3.6: Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />

Axes have been restricted to give a view <strong>of</strong> <strong>the</strong> grid boundaries described in Table 3.5.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


78 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

2<br />

3<br />

4<br />

log g<br />

5<br />

6<br />

7<br />

50000<br />

40000<br />

30000<br />

T eff (K)<br />

20000<br />

10000<br />

2<br />

1<br />

log( n He / n H )<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

50000<br />

40000<br />

30000<br />

T eff (K)<br />

20000<br />

10000<br />

Figure 3.7: Parameterisation results from <strong>the</strong> simplex-based interpolation scheme.<br />

In contrast with Figures 3.5 and 3.6, <strong>the</strong> simplex-based scheme clearly restricts <strong>the</strong><br />

optimisers to <strong>the</strong> grid boundaries.


3.4 Testing <strong>the</strong> Modifications 79<br />

At T eff ≥ 40,000K, <strong>the</strong> helium-rich models are most likely confusing <strong>the</strong> optimiser<br />

because <strong>the</strong> HeII ion lines manifest at wavelengths close to those <strong>of</strong> <strong>the</strong> neutral hydrogen<br />

lines. This problem requires fur<strong>the</strong>r investigation, but, to work around <strong>the</strong> issue, a<br />

comparison <strong>of</strong> parameterisation results for those models with T eff ≤ 40,000K, and<br />

log g ≤ 6.0 is also given in Table 3.6. These RMS metrics give a better indication <strong>of</strong><br />

<strong>the</strong> relative performance <strong>of</strong> <strong>the</strong> two methods.<br />

σ rms<br />

σ rms<br />

All Models T eff ≤ 40kK, log g ≤ 6.0<br />

T eff (K) log g n He T eff (K) log g n He<br />

Simplex/Models 3592.74 0.329 0.102 2666.79 0.355 0.068<br />

Linear/Models 4695.11 3.362 0.149 1905.47 0.349 0.056<br />

Linear/Simplex 3455.88 3.376 0.150 1928.02 0.306 0.065<br />

Table 3.6: RMS comparison <strong>of</strong> parameterisation results from each interpolation<br />

method with <strong>the</strong> original parameters <strong>of</strong> each model. Also given is <strong>the</strong> RMS difference<br />

between <strong>the</strong> methods, and a comparison between <strong>the</strong> results in <strong>the</strong> region <strong>of</strong> parameter<br />

space for which both schemes seem to give <strong>the</strong>ir best results (see Figures 3.6 and 3.7).<br />

The linear interpolation in tables scheme yields slightly more accurate results than<br />

<strong>the</strong> simplex-based method for all three parameters. This is most likely due to <strong>the</strong> coarse<br />

grid spacing used in <strong>the</strong> experiment, with a finer-grained grid allowing <strong>the</strong> simplex<br />

interpolation method to achieve more accuracy. Using a finer-grained grid with SFIT<br />

is now possible because <strong>the</strong> simplex-based gird management scheme removes all <strong>the</strong><br />

grid size, shape, and completeness restrictions imposed by <strong>the</strong> old linear interpolation<br />

in tables method.<br />

The speed difference between <strong>the</strong> two methods should also be emphasised. To parameterise<br />

all <strong>the</strong> models, SFIT took approximately 10 minutes with <strong>the</strong> simplex-based<br />

scheme, but over 90 minutes with <strong>the</strong> old methodology. This significant gain in speed,<br />

along with <strong>the</strong> o<strong>the</strong>r advantages <strong>of</strong>fered by <strong>the</strong> new simplex-based scheme, outweigh<br />

<strong>the</strong> possible slight loss <strong>of</strong> accuracy indicated in Table 3.6.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


80 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

3.5 Summary<br />

The χ 2 fitting code, SFIT, has been modified and extended to handle arbitrarily large<br />

grids <strong>of</strong> <strong>the</strong>oretical model spectra. This paves <strong>the</strong> way to making SFIT more amenable<br />

to parameterising very large quantities <strong>of</strong> stellar spectra in an astronomical data mining<br />

application.<br />

Two major problems were identified with <strong>the</strong> way SFIT manages grids <strong>of</strong> models.<br />

Grids were restricted in size due to hard-coded limits written into <strong>the</strong> program, and<br />

<strong>the</strong> interpolation scheme used to approximate <strong>the</strong> space between grid points could not<br />

handle irregular or incomplete grids.<br />

These problems were solved by developing a new grid management and interpolation<br />

scheme based on simplex interpolation and Delaunay triangulation.<br />

This new scheme was tested against <strong>the</strong> old version <strong>of</strong> SFIT by parameterising a wellstudied<br />

spectrum, and a large quantity <strong>of</strong> <strong>the</strong>oretical models. The new version <strong>of</strong> SFIT<br />

was found to perform much faster than <strong>the</strong> old version, with a more accurate fit being<br />

reported for <strong>the</strong> individual spectrum, and slightly (but not significantly) worse results<br />

in <strong>the</strong> parameterisation <strong>of</strong> <strong>the</strong> models. This slight loss <strong>of</strong> accuracy is outweighed by <strong>the</strong><br />

increase in overall speed, and <strong>the</strong> removal <strong>of</strong> several severely constricting restrictions<br />

on <strong>the</strong> size, shape, and completeness <strong>of</strong> SFIT model grids.


Chapter 4<br />

Filtering - Principal Components<br />

<strong>Analysis</strong><br />

Modern astronomical data sets <strong>of</strong>ten contain observations <strong>of</strong> many different types <strong>of</strong><br />

objects, and are rarely typologically homogeneous (Chapter 1). Searching for particular<br />

types <strong>of</strong> objects in such large databases requires computer assistance. Query parameters<br />

can be used to narrow down <strong>the</strong> data set to objects <strong>of</strong> a particular colour range,<br />

redshift, morphology, or some o<strong>the</strong>r parameter combination <strong>of</strong> significance. However,<br />

this reduced data set will invariably still contain objects that <strong>the</strong> astronomer would<br />

like to discard. Manual inspection <strong>of</strong> <strong>the</strong> data is not time-efficient unless quantities are<br />

small. It is more expedient to have an automated, or semi-automated, tool that can be<br />

used to assist in filtering through <strong>the</strong> data.<br />

Filtering is essentially a coarse-grained classification problem. An unknown spectrum<br />

is compared with a collection <strong>of</strong> known, or template, spectra to determine if it<br />

belongs to that particular class <strong>of</strong> object. The well-known techniques <strong>of</strong> cross correlation<br />

(Tonry & Davis, 1979) and χ 2 minimisation (Chapter 3) are immediately applicable.<br />

However, in a data mining context, speed is <strong>of</strong> importance, and <strong>the</strong>se techniques are<br />

slow.<br />

81


82 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>On</strong>e way to construct a fast filtering method is to extract <strong>the</strong> defining features from<br />

a set <strong>of</strong> known spectra, and use <strong>the</strong>m to summarise and represent that set. Instead <strong>of</strong><br />

comparing an unknown spectrum with each template spectrum, it can <strong>the</strong>n be weighed<br />

against <strong>the</strong> summarised form in a more computationally expedient manner.<br />

Principal Components <strong>Analysis</strong> (PCA; Murtagh & Heck, 1987) can be used to<br />

construct such a summary. It is a multivariate statistical technique which seeks to<br />

summarise <strong>the</strong> variance <strong>of</strong> an N-dimensional data set in a handful <strong>of</strong> independent<br />

parameters. These parameters capture <strong>the</strong> main sources <strong>of</strong> linear variation in <strong>the</strong> data<br />

set, and can be used to construct a fast test to determine if an unknown spectrum is<br />

similar to a collection <strong>of</strong> known spectra.<br />

Ano<strong>the</strong>r advantage to using PCA as a filter is that <strong>the</strong> independent parameters produced<br />

are unique to each data set. This means that a PCA-based filter is generalisable,<br />

and can be used to construct a filter for any type <strong>of</strong> astronomical object.<br />

As a testament to its versatility, PCA has been applied on several occasions to <strong>the</strong><br />

classification <strong>of</strong> astronomical spectra. Deeming (1964) applied it to <strong>the</strong> classification <strong>of</strong><br />

G and K-type giants. Connolly et al. (1995) used PCA to classify galaxy spectra, and<br />

Francis et al. (1992) applied it to <strong>the</strong> classification <strong>of</strong> quasar spectra. Whereas <strong>the</strong>se<br />

studies used PCA in an unsupervised manner, it is used here in a supervised fashion.<br />

A Filter for Hot Subdwarfs<br />

Chapter 1 outlines a general data mining toolkit for astronomical spectra, with a specific<br />

application to hot subdwarfs. As such, <strong>the</strong> apparatus <strong>of</strong> <strong>the</strong> PCA-based filter outlined<br />

here will be applied to <strong>the</strong> data set obtained from Drilling et al. (2006) to construct a<br />

filter for hot subdwarfs.<br />

The operation <strong>of</strong> this filter will <strong>the</strong>n be applied to a set <strong>of</strong> real-world low-dispersion<br />

spectra obtained from <strong>the</strong> Sloan Digital Sky Survey in an attempt to data mine a


4.1 Constructing A PCA-Based Filter 83<br />

Y<br />

u 2<br />

u<br />

1<br />

Figure 4.1: Principal component analysis. u 1 is <strong>the</strong> first principal component and <strong>the</strong><br />

axis onto which <strong>the</strong> projected positions <strong>of</strong> <strong>the</strong> data have <strong>the</strong>ir maximum sum. u 2 is <strong>the</strong><br />

second principal component, and u 1 · u 2 = 0.<br />

X<br />

collection <strong>of</strong> hot subdwarf candidates for fur<strong>the</strong>r study.<br />

4.1 Constructing A PCA-Based Filter<br />

Principal components analysis transforms an N-dimensional data set onto a new set<br />

<strong>of</strong> optimally defined axes. These axes represent <strong>the</strong> directions <strong>of</strong> maximum variance<br />

between variables in <strong>the</strong> data set, and are called <strong>the</strong> Principal Components (PCs). The<br />

technique basically amounts to a rotation from <strong>the</strong> original axes to <strong>the</strong> new ones, and<br />

is <strong>the</strong>refore a linear transformation <strong>of</strong> <strong>the</strong> data.<br />

Figure 4.1 illustrates <strong>the</strong> concept with a two dimensional data set. The direction<br />

<strong>of</strong> maximum variance in <strong>the</strong> data is represented by u 1 . This new axis (<strong>the</strong> first PC <strong>of</strong><br />

<strong>the</strong> data set) better describes <strong>the</strong> data than ei<strong>the</strong>r x 1 or x 2 . The remaining variance<br />

in <strong>the</strong> data, once <strong>the</strong>y have been projected onto <strong>the</strong> first PC, is described by u 2 , <strong>the</strong><br />

second PC. Thus, u 1 and u 2 are a more optimally aligned directional basis set for this<br />

particular data set.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


84 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

The PCs are derived in decreasing order <strong>of</strong> importance, with <strong>the</strong> first PC describing<br />

most <strong>of</strong> <strong>the</strong> variance in <strong>the</strong> data, and subsequent PCs representing less and less<br />

information about <strong>the</strong> variance. In <strong>the</strong> case <strong>of</strong> a large N-dimensional data set, a successful<br />

derivation <strong>of</strong> <strong>the</strong> principal components means that <strong>the</strong> first few components can<br />

be used to give a compressed representation <strong>of</strong> <strong>the</strong> data without a significant loss <strong>of</strong><br />

information.<br />

Lesser principal components will typically contain information on features in <strong>the</strong><br />

data which are not very well correlated, such as noise or anomalies. By discarding<br />

<strong>the</strong>se components, a compressed representation will preferentially remove undesired<br />

features, and features which do not vary over a sufficient fraction <strong>of</strong> <strong>the</strong> data set.<br />

4.1.1 Ma<strong>the</strong>matics <strong>of</strong> PCA<br />

This presentation <strong>of</strong> PCA <strong>the</strong>ory follows that <strong>of</strong> Bailer-Jones (1996) and Murtagh &<br />

Heck (1987).<br />

Let <strong>the</strong> vector x = (x 1 ,x 2 ,x 3 ,... ,x N ) be a stellar spectrum with N flux bins.<br />

A spectrum can <strong>the</strong>n be considered a point in N-dimensional space, with each axis<br />

representing each flux bin. M such spectra can be described as <strong>the</strong> (M × N) matrix<br />

X T = (x 1 ,x 2 ,... ,x M ).<br />

The first principal component is <strong>the</strong> normalised vector, u, which best fits <strong>the</strong> points<br />

in X T . The criterion <strong>of</strong> goodness <strong>of</strong> fit <strong>of</strong> this axis to <strong>the</strong> point set is defined as <strong>the</strong><br />

squared deviation <strong>of</strong> <strong>the</strong> points from <strong>the</strong> axis. Minimising <strong>the</strong> sum <strong>of</strong> distances between<br />

<strong>the</strong> points and axis is equivalent to maximising <strong>the</strong> sum <strong>of</strong> squared projections onto<br />

<strong>the</strong> axis, i.e., maximising <strong>the</strong> variance <strong>of</strong> <strong>the</strong> points when projected onto this axis.<br />

The sum <strong>of</strong> squared projections <strong>of</strong> <strong>the</strong> points in X T onto <strong>the</strong> new axis, u, is<br />

(Xu) T (Xu). (4.1)


4.1 Constructing A PCA-Based Filter 85<br />

In maximising this quadratic form, <strong>the</strong> constraint must be made that u T u = 1 o<strong>the</strong>rwise<br />

<strong>the</strong> projection can be maximised arbitrarily. Setting S = X T X, and introducing<br />

<strong>the</strong> Lagrange multiplier, λ, <strong>the</strong> maximum is obtained by differentiating<br />

u T Su − λ(u T u − 1), (4.2)<br />

which gives,<br />

2Su − 2λu. (4.3)<br />

Setting this equal to zero, <strong>the</strong> optimal value <strong>of</strong> u is <strong>the</strong> solution <strong>of</strong><br />

Su = λu. (4.4)<br />

This is a standard eigenvector problem. The eigenvector <strong>of</strong> S, u, is <strong>the</strong> line <strong>of</strong> best<br />

fit, and <strong>the</strong> corresponding eigenvalue, λ, indicates <strong>the</strong> amount <strong>of</strong> variance described by<br />

this line.<br />

Calculating <strong>the</strong> remaining axes proceeds in a similar manner. The second axis is<br />

found by again maximising u T Su, but with <strong>the</strong> added constraint that <strong>the</strong> second axis<br />

be orthogonal to <strong>the</strong> first, i.e., u T 2 u 1 = 0. Introducing <strong>the</strong> Lagrange multipliers, λ 2 and<br />

µ, <strong>the</strong> maximum is obtained by differentiating<br />

u T 2 Su 2 − λ 2 (u T 2 u 2 − 1) − µ(u T 2 u 1 ), (4.5)<br />

giving,<br />

2Su 2 − 2λ 2 u 2 − µu 1 . (4.6)<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


86 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Setting this equal to zero, and multiplying through by u T 1 yields<br />

µu T 1 u 1 = 0, (4.7)<br />

which implies that µ = 0. Therefore, equation 4.6 is <strong>of</strong> <strong>the</strong> same form as equation<br />

4.4, meaning λ 2 and u 2 are <strong>the</strong> second largest eigenvalue and eigenvector <strong>of</strong> S.<br />

Thus, <strong>the</strong> principal components <strong>of</strong> a set <strong>of</strong> N-dimensional points, X, are <strong>the</strong> eigenvectors<br />

<strong>of</strong> <strong>the</strong> matrix <strong>of</strong> sums <strong>of</strong> squares and cross products, S = X T X. There are N<br />

eigenvectors for an N-dimensional problem.<br />

The principal components form a directional basis set, meaning that PCA is best<br />

applied to data that are centred. Geometrically speaking, centring is equivalent to<br />

a shift in <strong>the</strong> origin <strong>of</strong> <strong>the</strong> co-ordinate system, and is performed by calculating and<br />

subtracting <strong>the</strong> mean from <strong>the</strong> row vectors <strong>of</strong> X.<br />

Let x i be <strong>the</strong> average <strong>of</strong> element x i over all M data points. Therefore, <strong>the</strong> i th element<br />

<strong>of</strong> <strong>the</strong> p th point is given by<br />

∆x i,p = x i,p − x i . (4.8)<br />

S now becomes <strong>the</strong> covariance matrix <strong>of</strong> <strong>the</strong> data points. The result <strong>of</strong> equation 4.4<br />

remains unchanged. Subtracting <strong>the</strong> mean also has <strong>the</strong> advantage that <strong>the</strong> dynamic<br />

range <strong>of</strong> S is reduced, increasing <strong>the</strong> numerical stability <strong>of</strong> <strong>the</strong> solution to <strong>the</strong> eigenvector<br />

problem.<br />

4.1.2 Building A Hot Subdwarf Filter<br />

By retaining only <strong>the</strong> most significant principal components <strong>of</strong> an N-dimensional data<br />

set, a quick test can determine if a new data point is in a similar region <strong>of</strong> N-dimensional


4.1 Constructing A PCA-Based Filter 87<br />

1.0<br />

Normalised Flux<br />

0.8<br />

0.6<br />

4100<br />

4500<br />

Wavelength (Angstroms)<br />

4900<br />

Figure 4.2: Mean spectrum <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample.<br />

space as <strong>the</strong> original data set. This is <strong>the</strong> principle upon which a filter can be built<br />

to help search for astronomical objects <strong>of</strong> a particular type from a large collection <strong>of</strong><br />

unknown spectra.<br />

As described at <strong>the</strong> start <strong>of</strong> <strong>the</strong> chapter, such a filter will now be developed using<br />

<strong>the</strong> collection <strong>of</strong> 177 standard hot subdwarf spectra obtained from Drilling et al. (2006)<br />

(see also Chapter 2).<br />

The first step is to construct <strong>the</strong> mean spectrum, subtract it from each spectrum in<br />

<strong>the</strong> set, <strong>the</strong>reby forming <strong>the</strong> matrix <strong>of</strong> difference spectra using equation 4.8. The mean<br />

spectrum is plotted in Figure 4.2.<br />

The elements <strong>of</strong> <strong>the</strong> covariance matrix, S, are <strong>the</strong>n calculated from<br />

s i,j = ∆x i,p ∆x j,p . (4.9)<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


88 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

The use <strong>of</strong> <strong>the</strong> covariance matrix in <strong>the</strong> formulation <strong>of</strong> PCA assumes that <strong>the</strong> data<br />

do not need to be standardised, i.e., that all <strong>the</strong> variables are on <strong>the</strong> same scale. This<br />

assumption is valid here because <strong>the</strong> Drilling et al. (2006) spectra have all been continuum<br />

normalised, and <strong>the</strong> application <strong>of</strong> <strong>the</strong> filter will be to normalised spectra.<br />

If <strong>the</strong> variables were on different scales, e.g., if <strong>the</strong> Drilling et al. (2006) set <strong>of</strong> spectra<br />

were unnormalised and half had flux scales several orders <strong>of</strong> magnitude greater than<br />

<strong>the</strong> o<strong>the</strong>r, <strong>the</strong>n <strong>the</strong> large differences between <strong>the</strong> variances <strong>of</strong> <strong>the</strong> variables would cause<br />

weaker variables to be ignored. Likewise, PCA can be sensitive to outliers in <strong>the</strong> data<br />

set which can greatly contribute to <strong>the</strong> variance.<br />

Scale dependences must be removed if PCA is to generate useful components. Common<br />

approaches to normalisation include standardising <strong>the</strong> variables to have unit variance,<br />

compressing <strong>the</strong>m onto <strong>the</strong> scale 0-1, or taking logarithms. The results <strong>of</strong> <strong>the</strong><br />

PCA will depend on <strong>the</strong> normalisation method used.<br />

In this application <strong>of</strong> PCA to stellar spectra, <strong>the</strong> covariance matrix, S, will always<br />

be real and symmetric. As such, equation 4.4 does not need to solved as is because any<br />

real matrix is diagonalised by <strong>the</strong> matrix <strong>of</strong> its eigenvectors (see Golub & Van Loan<br />

1989).<br />

Any real and symmetric matrix can be reliably diagonalised using a technique such<br />

as Jacobi’s method. Here, a QR-based singular value decomposition (see Press et al.<br />

1986) routine has been used to calculate <strong>the</strong> eigenvectors. The results <strong>of</strong> <strong>the</strong> PCA<br />

analysis are presented in Figures 4.3 and 4.4 wherein <strong>the</strong> first ten principal components<br />

<strong>of</strong> <strong>the</strong> Drilling et al. (2006) spectra have been plotted.<br />

The PCs are rotations in <strong>the</strong> data space <strong>of</strong> <strong>the</strong> original axes, <strong>the</strong>refore <strong>the</strong>y resemble<br />

spectra, and have <strong>the</strong> same number <strong>of</strong> elements as <strong>the</strong> original spectra. It can be clearly<br />

seen that <strong>the</strong> first PC differentiates between hydrogen and helium lines. This reification<br />

makes sense as it is <strong>the</strong>se features which vary most across <strong>the</strong> Drilling et al. (2006) data<br />

set. The second PC also clearly differentiates between HeI and HeII line series. For <strong>the</strong>


4.1 Constructing A PCA-Based Filter 89<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

4100 4500 4900<br />

4100 4500 4900<br />

4100 4500 4900<br />

4100 4500 4900<br />

4100 4500 4900<br />

PC 4<br />

PC 3<br />

PC 2<br />

PC 1<br />

PC 0<br />

Figure 4.3: First five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


90 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

0.15<br />

0.0<br />

-0.15<br />

4100 4500 4900<br />

4100 4500 4900<br />

4100 4500 4900<br />

4100 4500 4900<br />

4100 4500 4900<br />

PC 9<br />

PC 8<br />

PC 7<br />

PC 6<br />

PC 5<br />

Figure 4.4: Second five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample.


4.1 Constructing A PCA-Based Filter 91<br />

100<br />

99<br />

Cumulative Percentage <strong>of</strong> Total Variance<br />

98<br />

97<br />

96<br />

95<br />

94<br />

0 1 2 3 4 5 6 7 8 9<br />

Principal Component<br />

Figure 4.5: Cumulative variance <strong>of</strong> <strong>the</strong> first ten PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006)<br />

sample.<br />

remaining PCs, it becomes harder to attach any meaningful interpretation.<br />

The question remains as to how many principal components should be retained in<br />

order to form an adequate representation <strong>of</strong> <strong>the</strong> Drilling et al. (2006) standard stars.<br />

Figure 4.5 shows <strong>the</strong> cumulative percentage variance accounted for by <strong>the</strong> first ten<br />

principal components.<br />

The first principal component itself accounts for 94.66% <strong>of</strong> <strong>the</strong> total variance, which<br />

is not surprising given <strong>the</strong> reification outlined previously. All ten PCs account for<br />

99.83% <strong>of</strong> <strong>the</strong> variance, however 99.30% is described by <strong>the</strong> first four PCs, making<br />

<strong>the</strong>m sufficiently adequate to give a compressed representation <strong>of</strong> <strong>the</strong> Drilling et al.<br />

(2006) hot standards.<br />

It should be noted that this selection criterion <strong>of</strong> maximal variance may unwisely<br />

discard <strong>the</strong> less significant PCs. Lahav et al. (1996) point out that, in <strong>the</strong> role <strong>of</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


92 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

classification <strong>of</strong> galaxy spectra, <strong>the</strong> fractional variance on its own was not sufficient to<br />

determine how many PCs were needed for classification. The reason for this may be<br />

due to non-linearity in <strong>the</strong> data (a spectrum is not a linear combination <strong>of</strong> line features,<br />

and <strong>the</strong> lines do not separate into different principal components), <strong>the</strong> effect <strong>of</strong> noise<br />

on <strong>the</strong> deduction <strong>of</strong> <strong>the</strong> PCs, or <strong>the</strong> fact that classification requires more information<br />

than that given simply by <strong>the</strong> maximal variance.<br />

In <strong>the</strong> application <strong>of</strong> PCA here to <strong>the</strong> filtering <strong>of</strong> stellar spectra, only an adequate<br />

representation <strong>of</strong> a data set is sought through PCA, and not an adequate discrimination<br />

between classes within a data set. As such, <strong>the</strong> criterion <strong>of</strong> maximal variance remains<br />

valid.<br />

Now, let <strong>the</strong> matrix E T = (u 1 ,u 2 ,u 3 ,u 4 ) contain <strong>the</strong> first four principal components<br />

<strong>of</strong> <strong>the</strong> Drilling et al. (2006) hot standards. To determine <strong>the</strong> similarity <strong>of</strong> some unknown<br />

spectrum y = (y 1 ,y 2 ,y 3 ,...,y N ) to <strong>the</strong> Drilling et al. (2006) standards, first, <strong>the</strong> vector,<br />

p, is constructed which is <strong>the</strong> magnitudes <strong>of</strong> <strong>the</strong> projection <strong>of</strong> y onto each <strong>of</strong> <strong>the</strong> four<br />

principal components in E,<br />

p = ∆y · E, (4.10)<br />

where ∆y is <strong>the</strong> mean subtracted difference spectrum <strong>of</strong> y (i.e., ∆y = y −x, where<br />

x is <strong>the</strong> mean spectrum <strong>of</strong> <strong>the</strong> Drilling et al. (2006)).<br />

The reduced reconstruction <strong>of</strong> y, y r , is <strong>the</strong>n given by<br />

y r = x + p · E T . (4.11)<br />

Figure 4.6 shows <strong>the</strong> results <strong>of</strong> projecting two hot subdwarf spectra onto <strong>the</strong> first<br />

four principal components <strong>of</strong> <strong>the</strong> Drilling et al. (2006) hot standards.<br />

At <strong>the</strong> top, spectrum A is a relatively good S/N observation <strong>of</strong> a cooler subdwarf.


4.1 Constructing A PCA-Based Filter 93<br />

1.5<br />

1.0<br />

A 1.89970<br />

Original Spectrum<br />

Reduced Reconstruction<br />

0.5<br />

0.0<br />

4100 4500 4900<br />

1.5<br />

1.0<br />

B 6.22063<br />

Original Spectrum<br />

Reduced Reconstruction<br />

0.5<br />

0.0<br />

4100 4500 4900<br />

Figure 4.6: Illustration <strong>of</strong> projecting hot subdwarf spectra onto <strong>the</strong> first four PCs <strong>of</strong><br />

<strong>the</strong> Drilling et al. (2006) standards.<br />

The original spectrum is plotted in red, and its reduced reconstruction in blue.<br />

Spectrum B shows a hotter subdwarf with a lower S/N observation. Again, <strong>the</strong><br />

original spectrum is plotted in red, with <strong>the</strong> reduced reconstruction in blue.<br />

Spectrum A compares well with its reduced reconstruction, <strong>the</strong> latter showing very<br />

little difference to <strong>the</strong> original. However, spectrum B is noiser, and its reduced reconstruction<br />

matches <strong>the</strong> spectrum well but for <strong>the</strong> noise (here, <strong>the</strong> noise-filtering capabilites<br />

<strong>of</strong> PCA can be observed).<br />

Certainly, spectrum A, if encountered in a large set <strong>of</strong> unknown spectra, would be<br />

desirable to <strong>the</strong> astronomer, whereas spectrum B could be considered too noisy for any<br />

fur<strong>the</strong>r analysis. Thus, when filtering through a large set <strong>of</strong> unknown spectra, those<br />

spectra which compare well with <strong>the</strong>ir reduced reconstructions will be <strong>of</strong> most interest<br />

to <strong>the</strong> hot subdwarf astronomer.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


94 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

A suitable quantitative measure for this comparison is <strong>the</strong> reconstruction error<br />

R = 100 × √ 1 N<br />

i=N<br />

∑<br />

(y i − y r,i ) 2 , (4.12)<br />

i=1<br />

where y i is <strong>the</strong> i th flux bin <strong>of</strong> <strong>the</strong> original spectrum, y, and y r,i is <strong>the</strong> i th flux bin<br />

<strong>of</strong> <strong>the</strong> reduced reconstruction <strong>of</strong> y, y r . This error metric gives <strong>the</strong> RMS difference<br />

in each flux bin between <strong>the</strong> original spectrum and its reconstruction. The factor <strong>of</strong><br />

100 is simply a scaling factor to make <strong>the</strong> final error values easier to work with (it is<br />

anticipated that <strong>the</strong> majority <strong>of</strong> values for R will lie in <strong>the</strong> range 0 ≤ R ≤ 1).<br />

The reconstruction errors for each spectrum in Figure 4.6 are shown in <strong>the</strong> top left<br />

region <strong>of</strong> each plot.<br />

How “well” a spectrum should compare in this manner with its reduced reconstruction<br />

is a subjective measure dependent on <strong>the</strong> type <strong>of</strong> object an astronomer is filtering<br />

for, and what fur<strong>the</strong>r analysis he has in mind. In <strong>the</strong> hot subdwarf case, for classification<br />

purposes, a spectrum such as B in Figure 4.6 may mark <strong>the</strong> lower threshold <strong>of</strong> <strong>the</strong><br />

reconstruction errors that are to be accepted. However, if <strong>the</strong> derivation <strong>of</strong> physical<br />

parameters is <strong>the</strong> goal, <strong>the</strong>n reconstruction errors close to that <strong>of</strong> spectrum A, but not<br />

as low as that <strong>of</strong> B, may be desired.<br />

As mentioned in <strong>the</strong> introduction to this chapter, PCA is a data-driven tool, with<br />

<strong>the</strong> principal components derived for one data set being unique to those data. As such,<br />

if, say, a galaxy spectrum is reconstructed using <strong>the</strong> PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006)<br />

standards, its reconstruction error will be very high as it won’t have many (if any)<br />

features in common with hot subdwarfs. The same is true for noisy, or incomplete<br />

spectra, making <strong>the</strong>m easy to filter out.


4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 95<br />

4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs<br />

The PCA hot subdwarf filter was applied to a sample <strong>of</strong> 4610 spectra obtained from<br />

<strong>the</strong> Sloan Digital Sky Survey, Data Release 3 database. The selection criteria used to<br />

obtain <strong>the</strong> sample from <strong>the</strong> SDSS are outlined in <strong>the</strong> following SQL query,<br />

SELECT s.plate, s.mjd,s.fiberid<br />

FROM BESTDR3..SpecPhotoAll as s<br />

WHERE s.specClass = dbo.fSpecClass(’STAR’)<br />

AND (s.primTarget & (dbo.fPrimTarget(’TARGET_STAR_BHB’)<br />

+ dbo.fPrimTarget(’TARGET_STAR_SUB_DWARF’)) > 0)<br />

AND (s.objType = 2)<br />

The criteria naively rely upon <strong>the</strong> classifications automatically assigned by <strong>the</strong> SDSS<br />

spectrophotometic pipeline.<br />

The SDSS supplies spectra in FITS format with each FITS file including a calibrated<br />

spectrum, a normalised spectrum, and all measured parameters (redshift, line fits, line<br />

indices, per-pixel resolution, etc.) stored in <strong>the</strong> FITS header.<br />

For convenience, <strong>the</strong> normalised spectra were extracted from <strong>the</strong> FITS files, and<br />

subsequently velocity corrected using <strong>the</strong> redshift stored in each FITS header. The<br />

spectra were <strong>the</strong>n rebinned onto <strong>the</strong> common wavelength grid <strong>of</strong> 4050–4950Å at a<br />

dispersion <strong>of</strong> 1Å pixel−1 to match <strong>the</strong> Drilling et al. (2006) spectra.<br />

The PCA filter was applied using equations 4.10 and 4.11, outlined in <strong>the</strong> previous<br />

section, to construct <strong>the</strong> set <strong>of</strong> reduced reconstructions. The reconstruction errors were<br />

<strong>the</strong>n calculated as per equation 4.12.<br />

The distribution <strong>of</strong> <strong>the</strong> reconstruction errors is displayed in Figure 4.7.<br />

The histogram shows that most <strong>of</strong> <strong>the</strong> spectra in <strong>the</strong> SDSS sample are concentrated<br />

in <strong>the</strong> region R ≤∼ 4.0. The contents <strong>of</strong> <strong>the</strong> first three error bins (R ≤∼ 1.8) are<br />

shown in Figures 4.8 and 4.9. Clearly, <strong>the</strong>se eight spectra are <strong>of</strong> a good S/N, strong<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


96 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

300<br />

250<br />

Number <strong>of</strong> <strong>Spectra</strong><br />

200<br />

150<br />

100<br />

50<br />

0<br />

15.00<br />

Reconstruction Error - R<br />

Figure 4.7: Histogram <strong>of</strong> reconstructions errors from <strong>the</strong> SDSS data sample.<br />

subdwarf candidates, and well-suited to fur<strong>the</strong>r analysis.<br />

As <strong>the</strong> reconstruction error increases, <strong>the</strong> S/N <strong>of</strong> <strong>the</strong> spectra starts to decrease.<br />

Figure 4.10 shows four spectra sampled from <strong>the</strong> maximal error bin, R ∼ 3.0.<br />

They are slightly noiser spectra than those in Figures 4.8 and 4.9, but yet <strong>the</strong><br />

reconstructions are still a close match, meaning <strong>the</strong>y could still be suitable for fur<strong>the</strong>r<br />

analysis.<br />

By around R ≈ 4.5, <strong>the</strong> reconstruction quality is becoming noticably poorer, as<br />

demonstrated in Figure 4.11. Here, <strong>the</strong> S/N is becoming progressively lower, and<br />

objects with spectra quite unlike those <strong>of</strong> subdwarfs, such as white dwarfs, begin to<br />

make an appearance in <strong>the</strong> succeding error bins.<br />

<strong>On</strong>e interesting feature <strong>of</strong> note is <strong>the</strong> final error bin which contains all <strong>the</strong> SDSS<br />

spectra with reconstruction errors R > 15.0. It contains a large number <strong>of</strong> spectra in


4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 97<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

A 1.52992<br />

J234137.25+000123.2<br />

4100 4500 4900<br />

B 1.66001<br />

J171531.67+271545.5<br />

4100 4500 4900<br />

C 1.62708<br />

J155612.59+022152.9<br />

4100 4500 4900<br />

D 1.74365<br />

J153701.88-011307.9<br />

4100 4500 4900<br />

Figure 4.8: <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


98 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

A 1.73210<br />

J152357.12+354009.4<br />

4100 4500 4900<br />

B 1.79120<br />

J151722.09+603546.3<br />

4100 4500 4900<br />

C 1.71810<br />

J125244.60-002512.9<br />

4100 4500 4900<br />

D 1.76950<br />

J112015.43+650003.2<br />

4100 4500 4900<br />

Figure 4.9: <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0).


4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 99<br />

comparision to <strong>the</strong> preceding error bins. Such high reconstruction errors are indicative<br />

<strong>of</strong> spectra with features poorly matched to typical subdwarf spectra. Figure 4.13 shows<br />

a sample <strong>of</strong> four spectra from this error bin.<br />

The first three spectra are dominated by noise, with spectrum B exhibiting an<br />

anomalous gap in <strong>the</strong> data at around<br />

4110Å. Spectrum D is incomplete, hence <strong>the</strong><br />

large reconstruction error.<br />

The PCA filter is effective at separating out <strong>the</strong> very low S/N exemplars, and incomplete<br />

spectra as shown in Figure 4.13. However, it does not magically separate out<br />

subdwarf candidates. Invariably, <strong>the</strong>y will be mixed in with stars that are very much<br />

spectroscopically similar to subdwarfs. An example <strong>of</strong> high S/N spectra that aren’t<br />

subdwarfs, and which are filtered out, is shown in Figure 4.12.<br />

The SDSS sample used here was predominantly composed <strong>of</strong> cooler BHB and main<br />

sequence stars, with some white dwarfs. Thus, any subdwarf candidates were difficult<br />

for <strong>the</strong> filter to extract from amidst <strong>the</strong> spectroscopically similar cooler stars. This<br />

problem was due to <strong>the</strong> search criteria used in <strong>the</strong> initial SQL query, but it can be<br />

rectified by altering <strong>the</strong> database query to select by photometric colour which would<br />

exclude most <strong>of</strong> <strong>the</strong> cooler stars and subdwarf-main sequence binaries.<br />

The reconstruction error calculation described in equation 4.12 provides a description<br />

<strong>of</strong> <strong>the</strong> mean difference between an original spectrum and its reconstruction. As<br />

such, it served to rank <strong>the</strong> SDSS spectra mostly according to noise content. This<br />

meant that objects such as white dwarfs started to be found ranked alongside lower<br />

S/N subdwarf candidates/BHB stars with reconstruction errors <strong>of</strong> around R ≈ 7.0.<br />

This is not necessarily a problem per se because, by about R ≈ 5.0, any subdwarfs<br />

to be found are going to be dominated by noise levels that may not be conducive to<br />

useful fur<strong>the</strong>r analysis.<br />

Practically speaking, <strong>the</strong> PCA filter allows a value <strong>of</strong> R to be established beyond<br />

which any spectra can be safely discarded on <strong>the</strong> grounds that <strong>the</strong>y are not <strong>of</strong> sufficient<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Figure 4.10: Sample <strong>of</strong> spectra from <strong>the</strong> eighth error bin (R ∼ 3.0).<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

A 2.97285<br />

J023624.84-072238.1<br />

4100 4500 4900<br />

B 3.00098<br />

J001832.61+155540.1<br />

4100 4500 4900<br />

C 2.89146<br />

J224640.34-090631.8<br />

4100 4500 4900<br />

D 2.88735<br />

J145418.66-022346.1<br />

4100 4500 4900<br />

100 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 101<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

A 4.66907<br />

J001146.72+152147.5<br />

4100 4500 4900<br />

B 4.50669<br />

J165401.98+294801.7<br />

4100 4500 4900<br />

C 4.54534<br />

J074623.09+205546.7<br />

4100 4500 4900<br />

D 4.50072<br />

J113044.42+612111.7<br />

4100 4500 4900<br />

Figure 4.11: Sample <strong>of</strong> spectra from <strong>the</strong> fourteenth error bin (R ∼ 4.5).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


102 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

1.5<br />

1.0<br />

0.5<br />

4900<br />

4100<br />

4500<br />

4500<br />

4500<br />

4500<br />

J085128.17+060551.2<br />

1.5<br />

1.0<br />

J110651.79+625024.0<br />

0.5<br />

A 6.48477<br />

4900<br />

4100<br />

J092252.13+524446.4<br />

1.5<br />

1.0<br />

0.5<br />

4900<br />

B 6.94160<br />

J080051.56+223558.5<br />

4100<br />

4900<br />

1.5<br />

1.0<br />

0.5<br />

C 6.99275<br />

D 7.03080<br />

4100<br />

Figure 4.12: Sample <strong>of</strong> high S/N DA white dwarfs from <strong>the</strong> 22 nd − 24 th error bins<br />

(R ∼ 6.4 − 7.1)


4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 103<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

1.5<br />

1.0<br />

0.5<br />

0.0<br />

A 15.15518<br />

J075647.73+232913.6<br />

4100 4500 4900<br />

B 20.11230<br />

J141509.80-021147.2<br />

4100 4500 4900<br />

C 38.54495<br />

J140804.49+011320.1<br />

4100 4500 4900<br />

D 66.91276<br />

J145616.92+024549.6<br />

4100 4500 4900<br />

Figure 4.13: Sample <strong>of</strong> spectra from <strong>the</strong> fifty-third error bin (R > 15.0).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


104 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

S/N for whatever fur<strong>the</strong>r analysis <strong>the</strong> astronomer has in mind. This will also safely<br />

discard objects whose spectra are not sufficiently similar to <strong>the</strong> objects <strong>of</strong> interest.<br />

For <strong>the</strong> spectra that remain, a visual inspection is still necessary to separate out<br />

candidates <strong>of</strong> interest from objects for which <strong>the</strong> reconstruction error calculation is<br />

not sensitive enough to mark for removal. In <strong>the</strong> obtained SDSS sample, spectra<br />

with a reconstruction error <strong>of</strong> R < 5.0 were generally suitable for classification or<br />

parameterisation, however, as mentioned previously, any real subdwarf candidates in<br />

that sub-sample were mixed in with cooler BHB and main-sequence stars.<br />

4.3 Summary<br />

The concept <strong>of</strong> <strong>the</strong> PCA-based filtering tool presented here is certainly sound from <strong>the</strong><br />

point <strong>of</strong> necessity. In <strong>the</strong> construction <strong>of</strong> a filter for hot subdwarfs, and its application<br />

to search for such stars in <strong>the</strong> SDSS, it was discovered that <strong>the</strong> SDSS-assigned spectral<br />

classifications are not a useful criterion to include in an initial search.<br />

The data set obtained was composed <strong>of</strong> a large quantity <strong>of</strong> blue horizontal branch<br />

stars. As <strong>the</strong>y are spectroscopically very similar to hot subdwarfs, this made it difficult<br />

for <strong>the</strong> filter to provide a robust discrimination between <strong>the</strong> two object types.<br />

This point highlights <strong>the</strong> need to use appropriate and specific search criteria when<br />

extracting data from a very large survey database. In <strong>the</strong> case <strong>of</strong> hot subdwarfs and <strong>the</strong><br />

SDSS, a photometric colour-based search would allow cooler BHB stars to be avoided.<br />

Still, <strong>the</strong> PCA filter is not completely automated, and cannot be treated as a black<br />

box. A user must be aware <strong>of</strong> <strong>the</strong> correct manner <strong>of</strong> operation:<br />

1. The set <strong>of</strong> training data from which a filter is to be constructed must be preprocessed<br />

into an homogeneous form.<br />

2. Application data must be pre-processed to have <strong>the</strong> same properties as <strong>the</strong> train-


4.3 Summary 105<br />

ing set (i.e., wavelength range, dispersion, etc.).<br />

3. An acceptable reconstruction error threshold is a subjective decision that <strong>the</strong><br />

user must make. It can only be determined through examination <strong>of</strong> <strong>the</strong> filtering<br />

results, and prior experience.<br />

4. A visual inspection <strong>of</strong> data below <strong>the</strong> acceptable error threshold is still required<br />

to ensure <strong>the</strong> correct extraction <strong>of</strong> candidate objects from undesired but spectroscopically<br />

similar objects.<br />

The diversity <strong>of</strong> real-world data makes decisive filtering a very hard problem, but<br />

<strong>the</strong> PCA filter presented here is able to reduce <strong>the</strong> search space by at least an order <strong>of</strong><br />

magnitude, making <strong>the</strong> job <strong>of</strong> visual inspection a lot more tractable.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Chapter 5<br />

Application I - SDSS Hot<br />

Subdwarfs<br />

Having established a set <strong>of</strong> tools in Chapters 2 to 4 for data mining large sets <strong>of</strong> astronomical<br />

spectra, <strong>the</strong>y are now applied in unison to extract and analyse hot subdwarf<br />

candidates from <strong>the</strong> Sloan Digital Sky Survey.<br />

Firstly, a set <strong>of</strong> search criteria based on SDSS photometric colours is devised to<br />

obtain a data set which excludes most <strong>of</strong> <strong>the</strong> horizontal branch stars encountered in<br />

<strong>the</strong> previous chapter. This data set is <strong>the</strong>n filtered with <strong>the</strong> aid <strong>of</strong> <strong>the</strong> PCA filter, and<br />

pre-processed before being fed into <strong>the</strong> analysis pipeline for classification and parameterisation.<br />

5.1 Search Criteria And Data Sets<br />

After <strong>the</strong> work <strong>of</strong> Harris et al. (2003) and Kleinman et al. (2004) (based on <strong>the</strong> photometric<br />

simulations <strong>of</strong> Fan 1999), a search was made <strong>of</strong> <strong>the</strong> SDSS Data Release 3<br />

database using <strong>the</strong> following selection criteria <strong>of</strong> SDSS ugriz point spread function<br />

colour magnitudes,<br />

107


108 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

SELECT s.plate, s.mjd,s.fiberid<br />

FROM BESTDR3..SpecPhotoAll as s<br />

WHERE s.psfMag_u < 21<br />

AND (s.psfMag_u - s.psfMag_g) < 0.7<br />

AND (s.psfMag_g - s.psfMag_r) < -0.1<br />

AND s.specClass dbo.fSpecClass(’QSO’)<br />

For completeness, <strong>the</strong> spectra chosen by <strong>the</strong> SDSS as <strong>the</strong>ir hot standards were also<br />

retrieved using a separate query,<br />

5.1.<br />

SELECT s.plate, s.mjd,s.fiberid<br />

FROM BESTDR3..SpecObj as s<br />

WHERE s.objType = dbo.fObjType(’HOT_STD’)<br />

The total data quantities retrieved by <strong>the</strong>se two queries are summarised in Table<br />

Data Set <strong>Spectra</strong><br />

Retrieved<br />

Colour-Colour 6539<br />

Hot Standards 1411<br />

Total 7950<br />

(6764 Unique)<br />

Table 5.1: Summary <strong>of</strong> data quantities obtained from <strong>the</strong> SDSS DR3.<br />

5.2 PCA Filtering<br />

The PCA filter from Chapter 4 was applied to <strong>the</strong> 6764 unique spectra obtained from<br />

<strong>the</strong> SDSS. The SDSS-normalised spectrum was extracted from <strong>the</strong> each <strong>of</strong> <strong>the</strong> downloaded<br />

FITS files, and velocity corrected using <strong>the</strong> SDSS-derived redshift stored in<br />

each file’s FITS header. The histogram <strong>of</strong> reconstruction errors is plotted in Figure<br />

5.1.<br />

The large quantity <strong>of</strong> spectra located at <strong>the</strong> error bin R ≈ 2.46 are blank – <strong>the</strong><br />

normalised flux level is constant at 1.0 for all wavelengths. This is due to <strong>the</strong> rebinning


5.2 PCA Filtering 109<br />

500<br />

450<br />

400<br />

350<br />

Number <strong>of</strong> <strong>Spectra</strong><br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

35.00<br />

Reconstruction Error - R<br />

Figure 5.1: Histogram <strong>of</strong> reconstruction errors for <strong>the</strong> colour-colour selected SDSS<br />

sample.<br />

routine’s default behaviour <strong>of</strong> assigning a flux value <strong>of</strong> 1.0 to those wavelengths where<br />

no flux information is available for interpolation. In this case, <strong>the</strong> spectra in question<br />

seem to originally cover a lower wavelength range than <strong>the</strong> chosen 4050–4950 Å<br />

range.<br />

O<strong>the</strong>rwise, visual examination <strong>of</strong> <strong>the</strong> error bins reveals that all <strong>of</strong> <strong>the</strong> hot subdwarf<br />

candidates <strong>of</strong> reasonable S/N are located below a reconstruction error level <strong>of</strong> R ≤ 6.4,<br />

and are mixed in with many white dwarf and blue horizontal branch spectra which are<br />

hard to separate out because <strong>the</strong>y <strong>of</strong>ten show almost no spectral features which allow<br />

<strong>the</strong> PCA filter to clearly distinguish <strong>the</strong>m from hot subdwarf candidates. At R > 6.4,<br />

<strong>the</strong> error bins are almost entirely comprised <strong>of</strong> various types <strong>of</strong> white dwarfs, with only<br />

a few very low S/N hot subdwarf candidates.<br />

Selecting all those spectra with reconstruction errors R ≤ 6.4 yields 817 samples,<br />

approximately 400 <strong>of</strong> which are <strong>the</strong> “blank” spectra discussed previously. Removing<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


110 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>the</strong>m left a final set <strong>of</strong> 400 spectra which were manually processed to select <strong>the</strong> hot<br />

subdwarf candidates from amidst <strong>the</strong> white dwarfs. This proceeded quickly as white<br />

dwarf spectra are quite distinct.<br />

A final data set <strong>of</strong> 282 hot subdwarf candidates was obtained.<br />

5.3 <strong>Analysis</strong><br />

The SDSS-normalised spectra are created by fitting a pseudo-continuum using a median/mean<br />

filter. A sliding window is created <strong>of</strong> length 300 pixels for stars, and a<br />

set <strong>of</strong> reference lines are used to mask out major absorption features by excluding<br />

pixels closer than 8 pixels to any reference line. The remaining pixels are ordered,<br />

and <strong>the</strong> values between to 40th and 60th percentile are averaged to give <strong>the</strong> pseudocontinuum.<br />

However, this pseudo-continuum tends to underfit <strong>the</strong> real continuum for <strong>the</strong> higherorder<br />

Balmer lines, with blending between <strong>the</strong> broad wings pulling <strong>the</strong> pseudo-continuum<br />

down. Although <strong>the</strong> SDSS-normalised spectra are sufficient for <strong>the</strong> coarse filtering performed<br />

by <strong>the</strong> PCA filter, <strong>the</strong> underfitting associated with <strong>the</strong> pseudo-continuum makes<br />

<strong>the</strong>m unsuitable for use in classification or parameterisation.<br />

Instead, <strong>the</strong> SDSS-calibrated spectra <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates were renormalised<br />

using an automated method based on cubic spline fitting, after having been<br />

velocity corrected, again, using <strong>the</strong> SDSS redshifts. Each spectrum was <strong>the</strong>n resampled<br />

onto <strong>the</strong> common wavelength grid <strong>of</strong> 4050–4950 Å at a sampling <strong>of</strong> 1Å pixel−1 , ready<br />

for analysis by <strong>the</strong> classification neural network and SFIT.<br />

Physical parameters in T eff , log g, and log(n He /n H ) were derived by fitting each<br />

spectrum to a large grid <strong>of</strong> 2426 LTE model spectra generated using STERNE and<br />

SPECTRUM. Details <strong>of</strong> <strong>the</strong> grid are summarised in Table 5.2.


5.4 Results 111<br />

Parameter Values<br />

T eff (kK) 8.0, 9.0, 10.0, 12.0, 14.0, 15.0, 16.0, 18.0, 20.0, 22.0,<br />

24.0 25.0, 26.0, 28.0, 30.0, 32.0, 34.0, 35.0, 36.0, 38.0,<br />

40.0, 45.0, 50.0<br />

log g 2.50, 3.00, 3.50, 4.00, 4.50, 5.00, 5.50, 6.00<br />

n He 0.001, 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 0.99, 0.999<br />

Table 5.2: The model grid used to obtain physical parameters from <strong>the</strong> SDSS hot<br />

subdwarf candidates.<br />

.<br />

5.4 Results<br />

The results <strong>of</strong> both classification and parameterisation are presented in Figures 5.2-5.8,<br />

and tabulated in Appendix B.<br />

5.4.1 Parameterisation<br />

A number <strong>of</strong> interesting features are present in <strong>the</strong> diagrams <strong>of</strong> Figure 5.2. Most<br />

prominent in <strong>the</strong> log g–T eff plot is <strong>the</strong> low density region centred at T eff ≈ 22,500K.<br />

Figure 5.4 overlays Figure 5.2 with density estimate contours which better illustrate<br />

<strong>the</strong> presence <strong>of</strong> <strong>the</strong> gap.<br />

This low density region appears to separate <strong>the</strong> blue horizontal branch stars from<br />

<strong>the</strong> extended horizontal branch. However, it occurs at <strong>the</strong> same position as <strong>the</strong> zero-age<br />

main sequence, so could it be <strong>the</strong> result <strong>of</strong> selection effects? The answer is probably no<br />

because an early B-type main sequence star with an apparent magnitude <strong>of</strong> m v = 15,<br />

similar to <strong>the</strong> stars in <strong>the</strong> hot subdwarf sample, and an absolute magnitude <strong>of</strong> M V =<br />

−2.4, would be located ∼ 30kpc away out <strong>of</strong> <strong>the</strong> plane <strong>of</strong> <strong>the</strong> galaxy. The existence <strong>of</strong><br />

such a star at this position is unlikely.<br />

The same low density region was also observed by Green et al. (2006) and Saffer<br />

et al. (1994), and corresponds with <strong>the</strong> second gap indentified in observations <strong>of</strong> blue<br />

halo stars by Newell (1973).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


112 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

2<br />

3<br />

4<br />

log g<br />

ZAMS<br />

5<br />

ZAHB<br />

6<br />

He-MS<br />

7<br />

50000<br />

40000<br />

30000<br />

Effective Temperature (K)<br />

20000<br />

10000<br />

3<br />

2<br />

1<br />

log( nHe / nH )<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

50000<br />

40000<br />

30000<br />

Effective Temperature (K)<br />

20000<br />

10000<br />

Figure 5.2: Parameterisation results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. The<br />

helium main sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary tracks <strong>of</strong> Dorman<br />

et al. (1993) are also plotted.


5.4 Results 113<br />

1.0<br />

0.8<br />

0.6<br />

sdO4VII:He26 50654, 6.001, -0.913<br />

1.0<br />

0.8<br />

0.6<br />

sdB1VI:He29 34502, 5.581, -0.568<br />

1.0<br />

0.5<br />

0.0<br />

sdB3VI:He2 25219, 5.303, -2.769<br />

1.0<br />

0.5<br />

sdB7III:He2 12653, 3.342, -3.004<br />

0.0<br />

4000 4200 4400 4600 4800 5000<br />

Wavelength (Angstroms)<br />

Figure 5.3: Four example fits from <strong>the</strong> 282 SDSS hot subdwarfs. The classification<br />

and physical parameters (T eff (K), log g, log(n He /n H )) obtained for each star are printed<br />

in <strong>the</strong> lower corners <strong>of</strong> each plot.<br />

Heber et al. (1984) and Newell (1973) propose evolutionary explanations for this<br />

gap based on variations in hydrogen envelope mass along <strong>the</strong> horizontal branch, but<br />

this was before <strong>the</strong> discovery that possibly 2/3 <strong>of</strong> <strong>the</strong> sdB stars blueward <strong>of</strong> <strong>the</strong> gap<br />

are short-period binaries (Maxted et al., 2001) (and <strong>the</strong>refore products <strong>of</strong> <strong>the</strong> common<br />

envelope binary evolutionary channel).<br />

Monte Carlo simulations <strong>of</strong> single star evolution on <strong>the</strong> extended horizontal branch,<br />

carried out at St. Andrews (Jeffery & Jardine 1984, unpublished), did not reveal<br />

<strong>the</strong> existence <strong>of</strong> such a gap. It is <strong>the</strong>refore our hypo<strong>the</strong>sis that <strong>the</strong> second gap <strong>of</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


114 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

2<br />

3<br />

4<br />

log g<br />

ZAMS<br />

5<br />

ZAHB<br />

6<br />

He-MS<br />

7<br />

50000<br />

40000<br />

30000<br />

Effective Temperature (K)<br />

20000<br />

10000<br />

3<br />

2<br />

1<br />

log( nHe / nH )<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

50000<br />

40000<br />

30000<br />

Effective Temperature (K)<br />

20000<br />

10000<br />

Figure 5.4: The results <strong>of</strong> applying a kernel density estimate analysis to <strong>the</strong> data<br />

from Figure 5.2. The low-density at T eff ≈ 22,500K is prominent, along with ano<strong>the</strong>r<br />

possible low-density region at T eff ≈ 41,000K.


5.4 Results 115<br />

Newell (1973) reflects differing evolutionary scenarios for blue horizontal branch stars<br />

and extended horizontal branch stars, primarily that subdwarf B stars result from<br />

common-envelope binary evolution.<br />

In <strong>the</strong> single star evolution hypo<strong>the</strong>sis, a strong stellar wind on <strong>the</strong> RGB is believed<br />

to occur, but which fails to remove <strong>the</strong> entire outer hydrogen envelope before <strong>the</strong> helium<br />

core flash takes place. After <strong>the</strong> helium flash, a star evolves to <strong>the</strong> horizontal branch.<br />

The distribution <strong>of</strong> stellar masses along <strong>the</strong> horizontal branch must be continuous because<br />

evolutionary models do not predict gaps if <strong>the</strong> factors affecting mass loss in single<br />

stars (e.g., metallicity, rotation rate, magnetic field strength, etc.) are not discrete.<br />

In <strong>the</strong> binary star evolution scenario, most <strong>of</strong> <strong>the</strong> hydrogen-rich envelope is removed<br />

(ei<strong>the</strong>r by Roche Lobe overflow, or by a common envelope phase) at <strong>the</strong> tip <strong>of</strong> <strong>the</strong> RGB,<br />

meaning that evolution proceeds to <strong>the</strong> blue end <strong>of</strong> <strong>the</strong> horizontal branch. The distribution<br />

<strong>of</strong> post-common envelope binaries is not continuous because a partial removal<br />

<strong>of</strong> <strong>the</strong> hydrogen envelope does not occur.<br />

The second feature <strong>of</strong> interest in Figure 5.2 is <strong>the</strong> cluster <strong>of</strong> stars at T eff ≈ 44,000K,<br />

log g = 5.7. The clump is also noticable in <strong>the</strong> log(n He /n H )–T eff plot in Figure 5.2 as<br />

<strong>the</strong> group <strong>of</strong> extremely helium rich stars at log(n He /n H ) ≈ 1.2. Heber et al. (2006), in<br />

a spectral analysis <strong>of</strong> sdO stars selected from <strong>the</strong> Supernova Ia Progenitor Survey, <strong>the</strong><br />

Hamburg Quasar Survey, and <strong>the</strong> SDSS, show a similar clustering at <strong>the</strong> same location<br />

on <strong>the</strong>ir log g–T eff diagram.<br />

The log(n He /n H )–T eff diagram in Figure 5.2 shows that <strong>the</strong> majority <strong>of</strong> <strong>the</strong> stars in<br />

<strong>the</strong> sample have helium deficient atmospheres (less than 0.5 times <strong>the</strong> solar abundance).<br />

This has been attributed to diffusion and gravitational settling processes at work in<br />

<strong>the</strong> extended horizontal branch stars (Wesemael et al., 1982).<br />

For 28,000K ≤ T eff ≤ 40,000K, a correlation between helium abundance and T eff<br />

can be seen, with <strong>the</strong> helium abundance increasing with temperature. The same phenomenon<br />

was reported by Edelmann et al. (2003) in <strong>the</strong>ir analysis <strong>of</strong> sdBs from <strong>the</strong><br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


116 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Hamburg Quasar Survey, and Saffer et al. (1994) in a study <strong>of</strong> 92 field sdBs drawn<br />

largely from <strong>the</strong> PG catalogue. Both studies also report <strong>the</strong> existence <strong>of</strong> two sequences<br />

in <strong>the</strong> correlation, with a smaller fraction <strong>of</strong> stars having lower helium abundances at<br />

<strong>the</strong> same temperatures than <strong>the</strong> bulk <strong>of</strong> <strong>the</strong> sdBs. There is evidence to suggest <strong>the</strong><br />

existence <strong>of</strong> <strong>the</strong>se two sequences in Figure 5.2. Heber et al. (2006) also expand on this<br />

phenomenon by showing that <strong>the</strong> “cooler” sdO stars in <strong>the</strong>ir sample adhere to two<br />

distinct sequences, and extend <strong>the</strong> trend to higher T eff .<br />

The band <strong>of</strong> stars evident at log(n He /n H ) = −3 corresponds to <strong>the</strong> boundary <strong>of</strong> <strong>the</strong><br />

model grid used in <strong>the</strong> analysis.<br />

5.4.2 Classification<br />

The neural network classification results <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates are shown<br />

in Figure 5.5. Although <strong>the</strong> neural network gives real-value outputs for each classification<br />

parameter, <strong>the</strong>se have been rounded to <strong>the</strong>ir closest value on <strong>the</strong> discrete Drilling<br />

et al. (2006) system to reflect how a human classifier would use <strong>the</strong> system.<br />

A correlation can be seen between luminosity class and spectral type, with luminosity<br />

decreasing as spectral type progresses from O to A. As <strong>the</strong> physical analogues<br />

to luminosity and spectral type are log g and T eff respectively, this trend mirrors that<br />

found in <strong>the</strong> log g–T eff plot <strong>of</strong> Figure 5.2.<br />

From <strong>the</strong> plot <strong>of</strong> helium class against spectral type, it can be seen that <strong>the</strong> stars<br />

in <strong>the</strong> sample are ei<strong>the</strong>r helium poor or helium rich. There is a group <strong>of</strong> early-type<br />

sdBs showing a higher helium class than <strong>the</strong> bulk <strong>of</strong> such stars at <strong>the</strong> same spectral<br />

type. These are most likely <strong>the</strong> interesting subset hot subdwarf stars known as He-sdBs<br />

(Jeffery et al., 1996; Ahmad, 2004).<br />

Figure 5.6 gives a comparison <strong>of</strong> <strong>the</strong> neural network classification results with <strong>the</strong><br />

distribution <strong>of</strong> stars originally classified by Drilling et al. (2006) in <strong>the</strong>ir paper. The


5.4 Results 117<br />

0<br />

I<br />

II<br />

Luminosity Class<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

VIII<br />

IX<br />

O<br />

O5<br />

B<br />

B5<br />

A<br />

<strong>Spectra</strong>l Type<br />

40<br />

30<br />

Helium Class<br />

20<br />

10<br />

0<br />

O<br />

O5<br />

B<br />

B5<br />

A<br />

<strong>Spectra</strong>l Type<br />

Figure 5.5: Classification results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. Points<br />

have been given small random <strong>of</strong>fsets in each axis for clarity.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


118 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

40<br />

A<br />

30<br />

B5<br />

20<br />

10<br />

0<br />

A<br />

O<br />

O5 B B5<br />

<strong>Spectra</strong>l Type<br />

A<br />

40<br />

30<br />

20<br />

10<br />

0<br />

B5<br />

B<br />

<strong>Spectra</strong>l Type<br />

O<br />

O5<br />

Helium Class<br />

Helium Class<br />

0<br />

I<br />

II<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

VIII<br />

IX<br />

O<br />

O5 B B5<br />

<strong>Spectra</strong>l Type<br />

A<br />

0<br />

I<br />

II<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

VIII<br />

IX<br />

B<br />

O<br />

O5<br />

<strong>Spectra</strong>l Type<br />

Luminosity Class<br />

Luminosity Class<br />

Figure 5.6: A comparison <strong>of</strong> <strong>the</strong> ANN classifications <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf<br />

candidates (left-most plots) with all <strong>the</strong> stars classified by Drilling et al. (2006) (rightmost<br />

plots). Points have been given small random <strong>of</strong>fsets in each axis for clarity.


5.4 Results 119<br />

50000<br />

Effective Temperature (K)<br />

40000<br />

30000<br />

20000<br />

10000<br />

O<br />

O5<br />

B<br />

<strong>Spectra</strong>l Type<br />

B5<br />

A<br />

7<br />

6<br />

5<br />

log g<br />

4<br />

3<br />

2<br />

0<br />

I<br />

II<br />

III<br />

IV V VI<br />

Luminosity Class<br />

VII<br />

VIII<br />

IX<br />

3<br />

2<br />

log( nHe / nH )<br />

1<br />

0<br />

-1<br />

-2<br />

-3<br />

0<br />

10<br />

20<br />

Helium Class<br />

30<br />

40<br />

Figure 5.7: A calibration <strong>of</strong> <strong>the</strong> ANN classifications onto <strong>the</strong> Drilling et al. (2006)<br />

system using <strong>the</strong> 282 SDSS hot subdwarf candidates.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


120 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

trends in <strong>the</strong> two distributions are similar if one takes into account <strong>the</strong> differing sample<br />

sizes.<br />

<strong>On</strong>e feature <strong>of</strong> interest in <strong>the</strong> luminosity class–spectral type plot <strong>of</strong> <strong>the</strong> Drilling<br />

et al. (2006) data is <strong>the</strong> group <strong>of</strong> high-luminosity B-type giant stars. These correspond<br />

with a group <strong>of</strong> MK stars used by Drilling et al. (2006) to interface <strong>the</strong>ir hot subdwarf<br />

classification system with <strong>the</strong> MK system. In <strong>the</strong> corresponding plot for <strong>the</strong> 282 SDSS<br />

hot subdwarfs studied here, no such low luminosity class B-type stars are contained in<br />

<strong>the</strong> sample.<br />

A third-order calibration <strong>of</strong> <strong>the</strong> Drilling et al. (2006) classification system is shown<br />

in Figure 5.7 (i.e., <strong>the</strong> Drilling et al. (2006) parameters are being correlated to <strong>the</strong>ir<br />

corresponding physical parameters using a sample <strong>of</strong> spectra that is not comprised <strong>of</strong><br />

<strong>the</strong> original standard stars, and has not been classified by Drilling et al. or any o<strong>the</strong>r<br />

human trained to use <strong>the</strong> Drilling et al. (2006) scale).<br />

Although a linear correlation can be discerned between T eff vs. spectral type, and<br />

log(n He /n H ) vs. helium class, <strong>the</strong> correlations are quite poor. This could be due to<br />

systematic noise introduced during <strong>the</strong> renormalisation <strong>of</strong> <strong>the</strong> SDSS data, and may<br />

also signify that <strong>the</strong> neural network is having difficulty interpolating in regions not<br />

well represented by <strong>the</strong> original Drilling et al. (2006) training data (Figure 2.1 shows<br />

two low-density regions around spectral types O5 and B5, which is where <strong>the</strong> most<br />

“confusion” is seen in <strong>the</strong> correlation <strong>of</strong> Figure 5.7).<br />

Despite <strong>the</strong> noise, <strong>the</strong> log(n He /n H ) vs. helium class plot still follows <strong>the</strong> trend <strong>of</strong><br />

Figure 14 <strong>of</strong> Drilling et al. (2006).<br />

Between log g and luminosity class, no significant correlation can be seen. This is<br />

due to <strong>the</strong> majority <strong>of</strong> subdwarfs residing in <strong>the</strong> luminosity classes VI and VII, and<br />

between log g values <strong>of</strong> 5.0 and 6.0. The seemingly bi-modal distribution <strong>of</strong> this plot<br />

corresponds to <strong>the</strong> separation between <strong>the</strong> lower-T eff , lower-log g BHB stars in <strong>the</strong> SDSS<br />

sample, and <strong>the</strong> higher-T eff , higher-log g subdwarfs. It is impossible to constrain any


5.4 Results 121<br />

25<br />

20<br />

Stars Per Bin<br />

15<br />

10<br />

5<br />

0<br />

-600 -400 -200 0 200 400 600<br />

Redshift (Km s -1 )<br />

Figure 5.8: The distribution <strong>of</strong> SDSS-derived redshifts <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates.<br />

linear fit to <strong>the</strong> distribution due to <strong>the</strong> under-representation <strong>of</strong> <strong>the</strong> lower-log g, higher<br />

luminosity class region. The concentration <strong>of</strong> points in luminosity classes VI and VII<br />

reflect a similar pattern observed in Figure 15 <strong>of</strong> Drilling et al. (2006).<br />

5.4.3 Radial Velocities<br />

As an interesing aside, <strong>the</strong> radial velocities <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates, as<br />

measured by <strong>the</strong> SDSS, are plotted in Figure 5.8. The errors in <strong>the</strong> radial velocities are<br />

<strong>of</strong> <strong>the</strong> order <strong>of</strong> 30kms −1 . Several studies <strong>of</strong> <strong>the</strong> kinematical behaviour <strong>of</strong> hot subdwarfs<br />

have been conducted in <strong>the</strong> past, e.g., Altmann et al. (2004), Maxted et al. (2001), de<br />

Boer et al. (1997), Colin et al. (1994).<br />

Altmann et al. (2004) point out that short-period sdB binaries could exhibit orbital<br />

velocities in excess <strong>of</strong> 200kms −1 , but with most being <strong>of</strong> <strong>the</strong> order <strong>of</strong> 50kms −1 or less.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


122 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Based on <strong>the</strong> parameterisation and classification results <strong>of</strong> <strong>the</strong> hot subdwarf sample<br />

studied here, it is clear that <strong>the</strong> majority <strong>of</strong> <strong>the</strong> sample are sdBs, and, consequently,<br />

possibly short-period binaries (see also Maxted et al. 2001).<br />

As <strong>the</strong> SDSS observes out <strong>of</strong> <strong>the</strong> galactic plane, most <strong>of</strong> <strong>the</strong> hot subdwarf candidates<br />

will be ei<strong>the</strong>r thick disk, or halo objects with greater radial velocities due to <strong>the</strong>ir orbits<br />

not conforming with <strong>the</strong> local standard <strong>of</strong> rest (see Altmann et al. 2004). There are<br />

a few objects in <strong>the</strong> hot subdwarf sample with velocities cz > ±400kms −1 . Although<br />

<strong>the</strong>se velocities are unverified and could be anomalous, <strong>the</strong>y are greater than what can<br />

be accounted for by <strong>the</strong> previously outlined mechanisms. As such, <strong>the</strong>y are <strong>of</strong> interest<br />

for fur<strong>the</strong>r study (e.g., Hirsch et al. 2005).<br />

5.5 Sources <strong>of</strong> Error<br />

The results <strong>of</strong> this chapter are affected by a number <strong>of</strong> error sources. The issues <strong>of</strong><br />

primary concern are systematic errors arising from <strong>the</strong> internal accuracy <strong>of</strong> <strong>the</strong> tools<br />

<strong>the</strong>mselves, whe<strong>the</strong>r <strong>the</strong> training data for <strong>the</strong> tools are representative <strong>of</strong> <strong>the</strong> application<br />

domain, <strong>the</strong> assumptions used in generating <strong>the</strong> model spectra, and random errors in<br />

<strong>the</strong> application spectra along with systematic errors introduced during <strong>the</strong> observation<br />

and reduction stage.<br />

In terms <strong>of</strong> <strong>the</strong> physical parameters derived using SFIT, SFIT produces standard<br />

errors for each parameter it fits based on <strong>the</strong> curvature <strong>of</strong> <strong>the</strong> χ 2 function in <strong>the</strong> region<br />

<strong>of</strong> parameter space about <strong>the</strong> located minimum. These errors give an indication <strong>of</strong><br />

<strong>the</strong> internal accuracy <strong>of</strong> <strong>the</strong> fittin method, with <strong>the</strong> χ 2 function giving an indication<br />

<strong>of</strong> <strong>the</strong> goodness-<strong>of</strong>-fit. At <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> grid, where <strong>the</strong> curvature is difficult<br />

to estimate, or in regions <strong>of</strong> low curvature, <strong>the</strong> standard errors may not be as useful a<br />

measure <strong>of</strong> SFIT’s internal uncertainty.<br />

A major error source is <strong>the</strong> grid <strong>of</strong> <strong>the</strong>oretical models to which observations are fit.<br />

Here, models have been used which assume a stellar atmosphere that is plane-parallel,


5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency 123<br />

and in local <strong>the</strong>rmal, radiative and hydrostatic equilibrium. Opacities are modelled<br />

using opacity distribution functions, which differs fundamentally from <strong>the</strong> methods<br />

used in stellar atmospheres that do not make <strong>the</strong> LTE assumption. It is known that<br />

<strong>the</strong> LTE approximation is good up to 40,000K, after which NLTE effects become more<br />

significant. There is also <strong>the</strong> question <strong>of</strong> whe<strong>the</strong>r or not <strong>the</strong> inclusion <strong>of</strong> physical effects,<br />

such as magnetic fields, is an important issue.<br />

Within SFIT itself, <strong>the</strong> assumption is made that changes in <strong>the</strong> physical parameters<br />

<strong>of</strong> a model have a corresponding linear effect on <strong>the</strong> flux distribution. It is known from<br />

<strong>the</strong>ory that changes in <strong>the</strong> physical parameters have a nonlinear effect on <strong>the</strong> flux<br />

distribution, but a trade-<strong>of</strong>f must be made between accuracy and efficiency, expecially<br />

in a data mining context.<br />

O<strong>the</strong>r sources <strong>of</strong> error, such as from <strong>the</strong> SDSS observation and reduction pipeline<br />

or <strong>the</strong> hot subdwarf classification standards obtained from Drilling et al. (2006), are<br />

difficult to quantify. For <strong>the</strong> same reason, discussion <strong>of</strong> errors arising from models is<br />

a complicated topic and beyond <strong>the</strong> scope <strong>of</strong> this <strong>the</strong>sis. However, see, for example,<br />

Behara & Jeffery (2006) for an investigation <strong>of</strong> <strong>the</strong> influence <strong>of</strong> improving <strong>the</strong> opacities<br />

used in <strong>the</strong> models.<br />

Never<strong>the</strong>less, <strong>the</strong> issue <strong>of</strong> <strong>the</strong> robustness <strong>of</strong> <strong>the</strong> results presented in this chapter<br />

(and also <strong>the</strong> conclusions which are drawn from <strong>the</strong> results) is very important, but<br />

quantifying <strong>the</strong> influence <strong>of</strong> all <strong>the</strong> possible error sources requires fur<strong>the</strong>r investigation.<br />

5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency<br />

Figure 5.9 gives some examples <strong>of</strong> <strong>the</strong> BHB and white dwarf contaminants mentioned<br />

earlier in <strong>the</strong> chapter. In cases B, C, and D, <strong>the</strong> differences between <strong>the</strong> original<br />

spectrum and its reconstruction are not sufficient to produce a reconstruction error<br />

greater than <strong>the</strong> chosen threshold <strong>of</strong> 6.4. In case A, <strong>the</strong> BHB star, <strong>the</strong> reconstruction<br />

matches <strong>the</strong> original spectrum very closely, except for a slight difference in Hδ. Physical<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


124 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

parameters obtained for this star using SFIT show that it is too cool to be a subdwarf<br />

(T eff = 12,000K,log g = 3.42,n He = 0.004).<br />

The simple RMS error calculation <strong>of</strong> Equation 4.12 yields <strong>the</strong> scaled RMS difference<br />

between each flux point <strong>of</strong> <strong>the</strong> original spectrum and its PCA reconstruction.<br />

Clearly, <strong>the</strong>n, for such small differences this error metric is not sensitive enough to<br />

filter out <strong>the</strong> BHB and white dwarf contaminants. This limitation <strong>of</strong> <strong>the</strong> PCA filter<br />

could be dimished by fur<strong>the</strong>r developing <strong>the</strong> reconstruction error calculation to include<br />

a weighting scheme that gives more significance to <strong>the</strong> spectral lines and features commonly<br />

found in <strong>the</strong> objects under investigation. A disadvantage to this approach is that<br />

a <strong>the</strong> weighting scheme must be crafted and optimised manually to suit <strong>the</strong> quirks <strong>of</strong><br />

<strong>the</strong> PCA filter and spectral features <strong>of</strong> <strong>the</strong> target objects. A more robust error metric<br />

that does not require user input is a topic for future work.<br />

Quantitative Estimation <strong>of</strong> Filter Efficiency<br />

To give an estimate <strong>of</strong> <strong>the</strong> success (and failure) <strong>of</strong> <strong>the</strong> PCA filter as deployed in this<br />

chapter, <strong>the</strong> word “success” needs to be more clearly defined.<br />

Based on <strong>the</strong> results plotted in Figure 5.2, <strong>the</strong> assumption can be made that<br />

most subdwarfs in <strong>the</strong> SDSS sample lie, with good probability, in a region T eff ≥<br />

23,000K,log g ≥ 4.7, as demonstrated in Figure 5.10.<br />

For any chosen value <strong>of</strong> R for <strong>the</strong> reconstruction error threshold, stars with a reconstruction<br />

error and parameters inside this region will be assumed to be true positives,<br />

i.e., actual subdwarfs that <strong>the</strong> filter has successfully separated out. False positives are<br />

those stars which are within <strong>the</strong> value <strong>of</strong> R but lie outside this region, i.e., stars which<br />

<strong>the</strong> filter should have excluded but didn’t. True negatives lie both outside <strong>the</strong> shaded<br />

region and beyond <strong>the</strong> threshold <strong>of</strong> R. And, finally, false negatives lie within <strong>the</strong> shaded<br />

region but are outside <strong>of</strong> <strong>the</strong> filter’s error threshold.


5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency 125<br />

1.5<br />

1.0<br />

0.5<br />

4900<br />

4100<br />

4500<br />

4500<br />

4500<br />

4500<br />

J220403.45+122507.3<br />

1.5<br />

1.0<br />

0.5<br />

1.5<br />

A 2.30372<br />

4900<br />

4100<br />

J213301.41+122831.1<br />

1.0<br />

0.5<br />

1.0<br />

0.9<br />

J135532.42+001124.0<br />

B 2.78036<br />

4900<br />

4100<br />

J101805.04+011123.5<br />

C 3.34386<br />

D 3.90771<br />

4100<br />

4900<br />

Figure 5.9: Examples <strong>of</strong> white dwarf and BHB contaminants. A - BHB star with<br />

deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines due to high<br />

surface gravity. C - DB white dwarf. D - Uncertain (some evidence <strong>of</strong> weak carbon<br />

absorption, so possibly a DQ white dwarf).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


126 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

2<br />

3<br />

4<br />

log g<br />

ZAMS<br />

5<br />

ZAHB<br />

He-MS<br />

6<br />

7<br />

50000<br />

40000<br />

30000<br />

Effective Temperature (K)<br />

20000<br />

10000<br />

Figure 5.10: This gray-shaded region <strong>of</strong> <strong>the</strong> log g–T eff plane represents an area <strong>of</strong> good<br />

probability that <strong>the</strong> stars within it are subdwarfs.<br />

Using <strong>the</strong>se definitions, <strong>the</strong> PCA filter’s efficiency can be quantitatively stated for<br />

any value <strong>of</strong> R. Of course, <strong>the</strong> assumption is that every star passing through <strong>the</strong> filter<br />

has values for T eff and log g. Estimates <strong>of</strong> <strong>the</strong>se parameters for <strong>the</strong> SDSS sample were<br />

obtained by applying SFIT to <strong>the</strong> whole data set.<br />

The quantitative measures used are <strong>the</strong> percentage rate <strong>of</strong> true positives (which<br />

measures how successful <strong>the</strong> PCA filter is, according to <strong>the</strong> aforementioned definition<br />

<strong>of</strong> “success”),<br />

TPRate =<br />

TP<br />

× 100% (5.1)<br />

TP + FN<br />

where TP is <strong>the</strong> number <strong>of</strong> true positives and FN <strong>the</strong> number <strong>of</strong> false negatives, and<br />

also <strong>the</strong> rate <strong>of</strong> false positives (which measures how <strong>of</strong>ten <strong>the</strong> filter fails),


5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency 127<br />

100<br />

TP Rate<br />

FP Rate<br />

TP - FP<br />

80<br />

Percentage - %<br />

60<br />

40<br />

20<br />

0<br />

0 10 20 30 40 50<br />

Reconstruction Error - R<br />

Figure 5.11: TP rates (red) and FP rates (blue) <strong>of</strong> <strong>the</strong> PCA filter as a function <strong>of</strong> <strong>the</strong><br />

reconstruction error threshold, R. The green curve is <strong>the</strong> difference between <strong>the</strong> TP<br />

and FP rates.<br />

FPRate =<br />

FP<br />

× 100% (5.2)<br />

FP + TN<br />

where FP is <strong>the</strong> number <strong>of</strong> false positives, and TN is <strong>the</strong> number <strong>of</strong> true negatives.<br />

Figure 5.11 shows how <strong>the</strong> TP and FP rates vary as a function <strong>of</strong> R in <strong>the</strong> application<br />

to <strong>the</strong> SDSS data set. The rate <strong>of</strong> true positives increases rapidly until R ∼ 10 after<br />

which it begins to level <strong>of</strong>f. The percentage <strong>of</strong> false positives increases slowly until<br />

R ∼ 5.5. From this point until R ∼ 13 <strong>the</strong> filter begins to produce false positives at<br />

<strong>the</strong> maximum rate before starting to level <strong>of</strong>f. At R ∼ 28, <strong>the</strong> rate <strong>of</strong> false positives<br />

surpasses that <strong>of</strong> true positives meaning that <strong>the</strong> filter now fails more than it succeeds.<br />

An idea <strong>of</strong> <strong>the</strong> optimum value for R can be determined by plotting <strong>the</strong> difference<br />

between <strong>the</strong> rates <strong>of</strong> true positives and false positives for each R. This is <strong>the</strong> green curve<br />

in Figure 5.11. There is a noticeable and very definite peak. Figure 5.12 shows a close<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


128 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

100<br />

TP Rate<br />

FP Rate<br />

TP - FP<br />

80<br />

Percentage - %<br />

60<br />

40<br />

20<br />

0<br />

0<br />

1<br />

2<br />

3<br />

4 5 6<br />

Reconstruction Error - R<br />

7<br />

8<br />

9<br />

10<br />

Figure 5.12: A closer examination <strong>of</strong> <strong>the</strong> TP and FP rates. The peak in <strong>the</strong> green<br />

TP-FP curve occurs at R ∼ 7.0 and signifies <strong>the</strong> optimum value for R in <strong>the</strong> SDSS<br />

sample.<br />

up view <strong>of</strong> <strong>the</strong> region <strong>of</strong> this peak, which occurs at R ∼ 7.0. At this error threshold,<br />

<strong>the</strong> PCA filter is producing <strong>the</strong> maximum number <strong>of</strong> true positives compared to false<br />

positives. In o<strong>the</strong>r words, this is <strong>the</strong> optimum value <strong>of</strong> R for this particular application.<br />

This compares favourably with <strong>the</strong> chosen reconstruction error threshold <strong>of</strong> R ≤ 6.4<br />

reported in section 5.2.<br />

It should be pointed out that <strong>the</strong>re does not seem to be a reliable method for<br />

determining <strong>the</strong> optimal threshold value <strong>of</strong> R for a filter and data set, a priori, without<br />

first establishing at least a rough estimate <strong>of</strong> physical or classification parameters. If<br />

<strong>the</strong> PCA filter (which is fast in its operation) was paired with a parameterising neural<br />

network or a fast nearest neighbour χ 2 fitting, <strong>the</strong>n an estimate <strong>of</strong> <strong>the</strong> optimal PCA<br />

error threshold could be obtained using <strong>the</strong> same method as above.


5.7 Summary 129<br />

5.7 Summary<br />

The tools developed in Chapters 2 to 4 have been deployed on a real-world data set<br />

with some interesting outcomes. The hot subdwarf candidates extracted from <strong>the</strong><br />

SDSS represent a completely homogeneous set, and <strong>the</strong>ir analysis evidences several<br />

unexplained phenomena:<br />

1. Existence <strong>of</strong> <strong>the</strong> second horizontal branch gap <strong>of</strong> Newell (1973) at T eff ≈ 22,500K.<br />

2. Two sdB n He –T eff sequences, also observed by Edelmann et al. (2003).<br />

3. A clustering <strong>of</strong> hot, helium rich sdO stars at T eff ≈ 44,000K, log g = 5.7, also<br />

observed by Heber et al. (2006).<br />

These results reiterate <strong>the</strong> challenge to provide evolutionary explanations for <strong>the</strong> variety<br />

<strong>of</strong> stars present on <strong>the</strong> extended horizontal branch, and <strong>the</strong> subsequent importance<br />

<strong>of</strong> continuing research into hot subdwarfs.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Chapter 6<br />

Application II - O<strong>the</strong>r Data Sets<br />

The work presented in this chapter details <strong>the</strong> application <strong>of</strong> <strong>the</strong> analysis pipeline to<br />

three smaller data sets obtained in collaboration with o<strong>the</strong>rs in <strong>the</strong> field. This reflects<br />

<strong>the</strong> situation described in Chapter 1 regarding <strong>the</strong> heterogenous data sets amassed<br />

by various ground-based observatories. When data from <strong>the</strong>se observatories are made<br />

available, robust tools will be needed to process <strong>the</strong>m into a homogeneous form, and<br />

provide fast analyses.<br />

6.1 2MASS-Selected Sample<br />

A preliminary analysis <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates in <strong>the</strong> previous chapter<br />

was presented at <strong>the</strong> Second Meeting on Hot Subdwarfs and Related Objects in La<br />

Palma, June 2005. As a result <strong>of</strong> this conference, E. M. Green provided <strong>the</strong> author<br />

with a sample <strong>of</strong> high S/N, low-resolution spectra selected from 2MASS 1 photometry<br />

(see Green et al. 2006) to be classified and parameterised with <strong>the</strong> tools developed in<br />

this <strong>the</strong>sis.<br />

83 2MASS-selected spectra were made available with an average S/N <strong>of</strong> about 133,<br />

1 http://www.ipac.caltech.edu/2mass<br />

131


132 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

but varying as high as 273 and as low as 70. The wavelength range covered is 3615–<br />

6900 Å at a resolution <strong>of</strong> R ≈ 922.<br />

<strong>Spectra</strong> for two known stars, Balloon 090900004 and BD+48 2721, were also supplied<br />

along with physcial parameters (T eff , log g, log(n He /n H )) obtained using NLTE<br />

model atmospheres (H+He, zero metal). The purpose <strong>of</strong> <strong>the</strong>se stars is to provide a<br />

temperature calibration for <strong>the</strong> hot and cool ends <strong>of</strong> <strong>the</strong> sdOB sequence, so that <strong>the</strong><br />

parameterisation results obtained with SFIT (and LTE model atmospheres) can be<br />

compared with those derived from o<strong>the</strong>r model atmospheres.<br />

All <strong>of</strong> <strong>the</strong> spectra were previously flux and wavelength calibrated. Normalisation<br />

was carried out using a cubic spline fitting routine, and <strong>the</strong> spectra were <strong>the</strong>n resampled<br />

onto a common wavelength grid <strong>of</strong> 4050–4950 Å at a sampling <strong>of</strong> 1 Å pixel−1 . Radial<br />

velocities were corrected for by cross correlating each spectrum with a grid <strong>of</strong> 101<br />

<strong>the</strong>oretical models coarsely varying over T eff , log g, and log(n He /n H ).<br />

During this pre-processing stage, it was discovered that two <strong>of</strong> <strong>the</strong> stars in <strong>the</strong> sample<br />

were white dwarfs, so <strong>the</strong>y were excluded from any fur<strong>the</strong>r analysis. Application <strong>of</strong> <strong>the</strong><br />

PCA filter <strong>of</strong> Chapter 4 was deemed unnecessary given <strong>the</strong> small sample size.<br />

<strong>Analysis</strong> And Results<br />

Classification and parameterisation on <strong>the</strong> final 83 stars was carried out using <strong>the</strong><br />

classification neural network <strong>of</strong> Chapter 2, and SFIT using <strong>the</strong> same grid <strong>of</strong> models as<br />

in Chapter 5 (Table 5.2). Results are plotted in Figures 6.1 and 6.2, and tabulated in<br />

Appendix C.<br />

The parameterisation results <strong>of</strong> <strong>the</strong> two calibration stars, Balloon 090900004 and<br />

BD+48 2721, are given in Table 6.1. Small differences exist between <strong>the</strong> parameters for<br />

both stars, with <strong>the</strong> hotter star, Balloon 090900004, showing a temperature difference<br />

<strong>of</strong> ∼ 9700K. This is not unexpected considering <strong>the</strong> inherent differences between <strong>the</strong>


6.1 2MASS-Selected Sample 133<br />

LTE and NLTE approaches.<br />

Identifier NLTE LTE<br />

T eff (K) 23017 (248) 22979 (240)<br />

BD+48 2721 log g 5.035 (0.028) 5.267 (0.032)<br />

log(n He /n H ) -2.135 (0.022) -1.629 (0.018)<br />

T eff (K) 40897 (248) 31147 (278)<br />

Balloon 090900004 log g 5.369 (0.022) 4.757 (0.054)<br />

log(n He /n H ) -2.842 (0.046) -1.811 (0.056)<br />

Table 6.1: Parameters <strong>of</strong> <strong>the</strong> two calibration stars as obtained by χ 2 -fitting to NLTE<br />

(Green et al., 2006) and LTE (<strong>Armagh</strong>) model atmospheres. Formal errors are given<br />

in paren<strong>the</strong>ses.<br />

The parameterisation results <strong>of</strong> Figure 6.1 show distributions with some similarity<br />

to those <strong>of</strong> <strong>the</strong> SDSS hot subdwarf candidates in Figure 5.2. The second gap <strong>of</strong> Newell<br />

(1973) seems to be present at T eff ≈ 23,000K (however, it is unsure if Green’s sample<br />

suffers from any selection effects). Some main sequence late-type B and A stars appear<br />

to be present in <strong>the</strong> sample.<br />

The log(n He /n H )–T eff results in Figure 6.1 show <strong>the</strong> atmospheric helium deficiency<br />

<strong>of</strong> <strong>the</strong> sdB stars, and <strong>the</strong> cluster <strong>of</strong> blue horizontal branch stars with normal helium<br />

abundances. The main sequence stars present in <strong>the</strong> sample can be seen again as <strong>the</strong><br />

low temperature, hydrogen-rich data points. Not enough sdB stars are present in <strong>the</strong><br />

sample to confirm any correlation between helium abundance and T eff , although such<br />

a correlation appears to be suggested by <strong>the</strong> results.<br />

The distribution <strong>of</strong> classifications in Figure 6.2 again shows some similarity to that<br />

<strong>of</strong> <strong>the</strong> SDSS hot subdwarf candidates in Figure 5.5. Not plotted in Figure 6.2 are <strong>the</strong><br />

late-A and early-F spectral classifications assigned to some stars by <strong>the</strong> neural network.<br />

The parameterisation results suggest <strong>the</strong> existence <strong>of</strong> such stars in <strong>the</strong> sample, but it<br />

is <strong>of</strong> interest that <strong>the</strong> neural network would distinguish and assign <strong>the</strong>m (unreliable)<br />

classes for which no samples were present in <strong>the</strong> training data. Figure 6.3 plots <strong>the</strong>se<br />

stars. The deep and broad hydrogen Balmer lines correspond with <strong>the</strong> late-A and<br />

early-F spectral types. This would seem to demonstrate that <strong>the</strong> neural network has<br />

very good generalisation properties.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


134 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

2<br />

3<br />

log g<br />

4<br />

ZAMS<br />

5<br />

ZAHB<br />

He-MS<br />

6<br />

7<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

Effective Temperature (K)<br />

2<br />

1<br />

log( nHe / nH )<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

Effective Temperature (K)<br />

Figure 6.1: SFIT physical parameters for 2MASS-selected sample. The helium main<br />

sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary tracks <strong>of</strong> Dorman et al.<br />

(1993) are also plotted.


6.1 2MASS-Selected Sample 135<br />

0<br />

I<br />

II<br />

Luminosity Class<br />

III<br />

IV<br />

V<br />

VI<br />

VII<br />

VIII<br />

IX<br />

O O5 B B5 A<br />

<strong>Spectra</strong>l Type<br />

40<br />

30<br />

Helium Class<br />

20<br />

10<br />

0<br />

O O5 B B5 A<br />

<strong>Spectra</strong>l Type<br />

Figure 6.2: ANN classification for 2MASS-selected sample. Points have been given<br />

small random <strong>of</strong>fsets in each axis for clarity.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


136 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

6<br />

J143155.30+172404.9<br />

sdA7V:He3<br />

5.5<br />

5<br />

J095854.23+360314.3<br />

sdF8VI:He2<br />

4.5<br />

Flux (continuum = 1) + const.<br />

4<br />

3.5<br />

3<br />

2.5<br />

J114454.50+031550.2<br />

J112832.64+603859.3<br />

sdA7V:He4<br />

sdF5V:He3<br />

2<br />

J111819.13+093144.4<br />

sdA2V:He5<br />

1.5<br />

1<br />

J083127.37+422201.7<br />

sdA5VI:He2<br />

0.5<br />

0<br />

4100<br />

4500<br />

4900<br />

Wavelength (Angstroms)<br />

Figure 6.3: The stars assigned late-A and early-F spectral types by <strong>the</strong> neural network.


6.2 SDSS sdB-He Stars <strong>of</strong> Harris et al. (2003) 137<br />

6.2 SDSS sdB-He Stars <strong>of</strong> Harris et al. (2003)<br />

In collaboration with Ahmad (Ahmad et al., 2006) <strong>the</strong> classification neural network<br />

was used to classify a small set <strong>of</strong> “helium-rich” sdB-He stars obtained from <strong>the</strong> SDSS<br />

by Harris et al. (2003). Results <strong>of</strong> this analysis, along with helium abundances derived<br />

by Ahmad using SFIT and a grid <strong>of</strong> LTE model atmospheres, are presented in Table<br />

6.2.<br />

SDSS Identifier n He ANN Class<br />

J094044.08+004759 0.16 sdB0VIII:He23<br />

J113840.69-003531 0.01 sdB3V:He1<br />

J124346.38+002534 0.05 sdB1V:He23<br />

J125410.86-010408 0.01 sdB3III:He5<br />

J131745.80+010450 0.01 sdB0VI:He3<br />

J134545.24-000641 0.15 sdO9VII:He21<br />

J134635.68-001804 0.09 sdA2IV:He0<br />

J135707.35+010454 0.36 sdO6VII:He30<br />

J141556.68-005814 0.21 sdB8VI:He14<br />

J143917.64+010251 0.01 sdB6V:He3<br />

J144514.93+000249 0.02 sdB1VII:He11<br />

J152708.31+003308 0.45 sdO9VIII:He35<br />

J152905.62+002137 0.06 sdO9VII:He10<br />

J154238.43-003758 0.07 sdA2III:He2<br />

Table 6.2: Classification results for <strong>the</strong> sdB-He stars <strong>of</strong> Harris et al. (2003).<br />

The aim <strong>of</strong> this work was to determine if <strong>the</strong> sdB-He stars <strong>of</strong> Harris et al. (2003) are<br />

similar to He-sdB stars (see Ahmad 2004) as this would increase <strong>the</strong> number <strong>of</strong> known<br />

helium-rich subdwarfs for fur<strong>the</strong>r study.<br />

However, it is clear from <strong>the</strong> classification and parameterisation results obtained<br />

that most <strong>of</strong> <strong>the</strong> sdB-He stars show very little helium enrichment, with half <strong>of</strong> <strong>the</strong><br />

stars in <strong>the</strong> sample having surface gravities too low to be subdwarfs (Ahmad, private<br />

communication). Out <strong>of</strong> <strong>the</strong> remaining subdwarfs, only a handful are helium rich (i.e.<br />

having n He ≥ 0.10, or He class > 20).<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


138 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

6.3 Ahmad & Jeffery (2003) He-sdBs<br />

Ahmad & Jeffery (2003) undertook <strong>the</strong> first systematic study <strong>of</strong> a set <strong>of</strong> helium-rich<br />

subdwarf B stars, obtaining observations and physical parameters for 17 targets.<br />

These stars have been previously classified by Drilling et al. (2006) using observations<br />

from different sources. As such, <strong>the</strong> re-classification <strong>of</strong> <strong>the</strong>se stars by <strong>the</strong> neural network<br />

in Chapter 2, using <strong>the</strong> new observations <strong>of</strong> Ahmad & Jeffery (2003), presents an<br />

opportunity to verify <strong>the</strong> neural network’s performance.<br />

Ahmad & Jeffery (2003) observed <strong>the</strong> targets over a variety <strong>of</strong> wavelength ranges<br />

between 3900 and 5000 Å, with <strong>the</strong> spectra being bias corrected, flat-fielded, sky subtracted,<br />

and wavelength calibrated using standard procedures. All spectra were normalised<br />

by defining a smooth polynomial continuum from sections <strong>of</strong> local continuum,<br />

with care being taken to avoid <strong>the</strong> wings <strong>of</strong> broad absorption lines.<br />

Before passing <strong>the</strong> spectra to <strong>the</strong> neural network, <strong>the</strong>y were rebinned onto <strong>the</strong> common<br />

wavelength grid <strong>of</strong> 4050–4950 Å at a sampling <strong>of</strong> 1 Å pixel−1 . Any wavelength bins<br />

in this grid for which no flux data were available in <strong>the</strong> original observations (i.e., in<br />

<strong>the</strong> case <strong>of</strong> a short spectrum) were automatically assigned a flux value <strong>of</strong> 1.0.<br />

The results are presented in Table 6.3, with a graphical comparison between <strong>the</strong><br />

neural network classifications and those <strong>of</strong> Drilling et al. (2006) plotted in Figure 6.4.<br />

Although <strong>the</strong> sample is limited in distribution in <strong>the</strong> classification parameter space,<br />

a good agreement can be seen between <strong>the</strong> neural network and Drilling et al. (2006),<br />

providing confirmation <strong>of</strong> <strong>the</strong> work presented in Chapter 2.<br />

6.4 Summary<br />

The application <strong>of</strong> <strong>the</strong> analytical tools developed in previous chapters to a collection<br />

<strong>of</strong> small data sets from different sources highlights <strong>the</strong>ir versatility and usefulness.


6.4 Summary 139<br />

40<br />

ANN Helium Class<br />

30<br />

20<br />

10<br />

10 20 30 40<br />

Drilling Helium Class<br />

IV<br />

V<br />

ANN Luminosity Class<br />

VI<br />

VII<br />

VIII<br />

IX<br />

IX<br />

VIII<br />

VII<br />

VI<br />

Drilling Luminosity Class<br />

V<br />

IV<br />

B5<br />

ANN <strong>Spectra</strong>l Type<br />

B<br />

O5<br />

O5 B B5<br />

Drilling <strong>Spectra</strong>l Type<br />

Figure 6.4: Comparison <strong>of</strong> ANN classifications with those <strong>of</strong> Drilling et al. (2006)<br />

for <strong>the</strong> 17 He-sdBs <strong>of</strong> Ahmad & Jeffery (2003). Points have been given small random<br />

<strong>of</strong>fsets in each axis for clarity. Also plotted is <strong>the</strong> best fit least squares regression line<br />

with error bars showing <strong>the</strong> RMS <strong>of</strong> <strong>the</strong> residuals.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


140 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Identifier Drilling Class ANN Class<br />

HS1000+471 sdBC0.2VII:He28 sdB0VII:He29<br />

HS1844+637 sdB1VII:He39 sdB2VII:He37<br />

LSIV-14 116 sdB0.2VII:He17 sdB0VIII:He20<br />

PG0229+064 sdB3V:He13 sdB4V:He18<br />

PG0240+046 sdBC0.2VII:He24 sdB2VII:He28<br />

PG0902+057 sdB0VII:He38 sdO9VII:He35<br />

PG1127+019 sdOC9VII:He40 sdO8VI:He41<br />

PG1415+492 sdBC1VI:He39 sdB0VI:He38<br />

PG1544+488 sdBC1VIII:He39 sdB0VII:He37<br />

PG1554+408 sdB0.2VII:He39 sdB0VII:He36<br />

PG1600+171 sdOC8.5VII:He39 sdO8VI:He37<br />

PG1615+413 sdB1VII:He37 sdB2VII:He34<br />

PG1658+273 sdOC9.5VII:He39 sdO8VII:He40<br />

PG1715+273 sdB1VII:He37 sdO5VIII:He36<br />

PG2258+155 sdB0.2VII:He39 sdB1VII:He35<br />

PG2321+214 sdB0VII:He37 sdB2VII:He37<br />

TON107 sdBC0.5VII:He28 sdB1VII:He27<br />

Table 6.3: Classification results for <strong>the</strong> Ahmad & Jeffery (2003) He-sdBs.<br />

The results <strong>of</strong> <strong>the</strong> 2MASS-selected sample appear to confirm <strong>the</strong> findings <strong>of</strong> Green<br />

et al. (2006), and lend support to <strong>the</strong> results described in <strong>the</strong> previous chapter. Before<br />

<strong>the</strong> evolutionary details causing <strong>the</strong> observed distributions can be understood,<br />

additional data, e.g., stellar masses, needs to be ga<strong>the</strong>red.<br />

The application <strong>of</strong> <strong>the</strong> classification neural network to <strong>the</strong> helium-rich subdwarf B<br />

stars <strong>of</strong> Harris et al. (2003) highlights <strong>the</strong> need for a homogeneous classification scheme<br />

for hot subdwarfs.


Chapter 7<br />

Conclusions And Future Work<br />

This project set out to examine <strong>the</strong> problem <strong>of</strong> analysing large sets <strong>of</strong> astronomical<br />

spectra. Specifically, <strong>the</strong> intention was to establish a set <strong>of</strong> tools that can automatically<br />

extract and analyse <strong>the</strong> spectra <strong>of</strong> any type <strong>of</strong> object from a large database <strong>of</strong> unknown<br />

observations, and <strong>the</strong>n apply <strong>the</strong>se tools to a real survey database.<br />

Analysing large sets <strong>of</strong> astronomical spectra consists <strong>of</strong> three core problems: classification,<br />

physical parameterisation, and <strong>the</strong> extraction <strong>of</strong> particular types <strong>of</strong> objects<br />

from an unknown data set.<br />

In this project, classification was tackled by <strong>the</strong> highly versatile statistical machine<br />

learning method <strong>of</strong> artificial neural networks, which has seen widespread use in astronomy.<br />

Chapter 2 studied <strong>the</strong> use <strong>of</strong> ANNs to classify hot subdwarf spectra onto <strong>the</strong><br />

system defined by Drilling et al. (2006). Global errors (σ rms ) on <strong>the</strong> classifications <strong>of</strong><br />

∼ 2 subtypes for spectral type, ∼ 1 subclass for luminosity class, and ∼ 4 subclasses for<br />

<strong>the</strong> helium class were achieved. These errors are in line with <strong>the</strong> accuracies achieved<br />

by human classifiers.<br />

Physical parameters were obtained by fitting observations to grids <strong>of</strong> <strong>the</strong>oretical<br />

models using a χ 2 minimisation procedure. SFIT, <strong>the</strong> χ 2 minimisation code used at <strong>the</strong><br />

<strong>Armagh</strong> <strong>Observatory</strong>, has been improved in Chapter 3 using concepts from <strong>the</strong> domain<br />

141


142 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>of</strong> computational geometry to provide a new methodology for storing and accessing<br />

arbitrarily large, three-dimensional grids <strong>of</strong> models, paving <strong>the</strong> way to extending <strong>the</strong><br />

code to operate in distributed parallel computing environments.<br />

Locating <strong>the</strong> spectra <strong>of</strong> a particular type <strong>of</strong> object in a large set <strong>of</strong> unknown observations<br />

was accomplished using <strong>the</strong> multivariate statistical technique, Principal Components<br />

<strong>Analysis</strong>. Chapter 4 outlined <strong>the</strong> mechanics <strong>of</strong> <strong>the</strong> filter, and demonstrated how<br />

it was used to extract hot subdwarf spectra from a data set obtained from <strong>the</strong> SDSS.<br />

This solution provides a means to reduce unknown data sets to quantities suitable for<br />

closer visual inspection.<br />

Collectively, <strong>the</strong>se tools were applied to <strong>the</strong> archives <strong>of</strong> <strong>the</strong> SDSS to extract and<br />

analyse <strong>the</strong> spectra <strong>of</strong> hot subdwarf stars. The PCA filter was able to reduce a set<br />

<strong>of</strong> almost 7000 unknown spectra to a collection <strong>of</strong> approximately 400 samples from<br />

which 282 hot subdwarf candidates were quickly extracted by visual inspection. The<br />

classification ANN successfully assigned classes to <strong>the</strong>se stars based on <strong>the</strong> Drilling et al.<br />

(2006) system, and physical parameters were derived using SFIT and a grid <strong>of</strong> LTE<br />

model atmospheres. The results revealed several unexplained phenomena <strong>of</strong> extended<br />

horizontal branch stars, namely,<br />

1. Existence <strong>of</strong> <strong>the</strong> second horizontal branch gap <strong>of</strong> Newell (1973) at T eff ≈ 22,500K.<br />

2. Two sdB n He –T eff sequences, also observed by Edelmann et al. (2003).<br />

3. A clustering <strong>of</strong> hot, helium rich sdO stars at T eff ≈ 44,000K, log g = 5.7, also<br />

observed by Heber et al. (2006).<br />

These findings pose important questions for stellar evolution <strong>the</strong>ory, and represent<br />

a successful demonstration <strong>of</strong> what this project set out to achieve.


143<br />

Future Directions<br />

Working with <strong>the</strong> data from <strong>the</strong> SDSS highlighted a number <strong>of</strong> improvements that could<br />

be made to <strong>the</strong> tools <strong>the</strong>mselves, but several important problems concerning spectral<br />

analysis and its large-scale application were also made apparent.<br />

Continuum Normalisation<br />

<strong>On</strong>e <strong>of</strong> <strong>the</strong> most troubling was <strong>the</strong> normalisation <strong>of</strong> stellar continua. As noted in Chapter<br />

5, <strong>the</strong> SDSS uses a method based on median/mean filtering which tends to underfit<br />

<strong>the</strong> continuum in regions where <strong>the</strong> blending <strong>of</strong> lines becomes very strong.<br />

An automatic renormalisation method based on cubic spline fitting was employed<br />

in Chapter 5 in an attempt to gain a more precise fit to <strong>the</strong> continuum. This method<br />

used several sets <strong>of</strong> pre-programmed wavelength locations as control points for <strong>the</strong> cubic<br />

spline fit. The control points in each set were chosen manually by iterative refinement,<br />

and <strong>the</strong> different sets essentially conformed to a coarse temperature–abundance classification<br />

system because different control points were needed for hot, helium-rich stars<br />

and cooler, helium-poor stars.<br />

<strong>On</strong>ce <strong>the</strong> sets <strong>of</strong> control points were established, <strong>the</strong> method gave good results<br />

for <strong>the</strong> final set <strong>of</strong> hot subdwarf candidates. Obviously, this particular methodology<br />

is poorly catered for a general data mining application because it is tied to one<br />

particular type <strong>of</strong> object.<br />

A more robust and general automatic algorithm is required.<br />

However, this is an extremely difficult problem because such an algorithm must take<br />

into account many factors: noise, regions where <strong>the</strong> spectral flux changes rapidly, cosmic<br />

spikes and o<strong>the</strong>r anomalies, and troublesome regions like that <strong>of</strong> <strong>the</strong> higher-order<br />

Balmer lines where <strong>the</strong> actual continuum runs above <strong>the</strong> flux information present. An<br />

acceptable solution will be very hard to come by.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


144 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Data Management<br />

Ano<strong>the</strong>r major problem encountered was <strong>the</strong> management <strong>of</strong> large data sets. The two<br />

main issues are storing sets <strong>of</strong> spectra in a meaningful and easily accessible manner,<br />

and keeping track <strong>of</strong> <strong>the</strong> changes to each spectrum over <strong>the</strong> course <strong>of</strong> time.<br />

Almost 7000 unique spectra were extracted from <strong>the</strong> SDSS in Chapter 5. Over <strong>the</strong><br />

course <strong>of</strong> <strong>the</strong> analysis, <strong>the</strong> spectra were converted from FITS files to ASCII format,<br />

filtered, renormalised, velocity corrected, resampled, and collected toge<strong>the</strong>r into <strong>the</strong><br />

specific formats required by <strong>the</strong> classification and parameterisation codes. Eventually,<br />

this trail <strong>of</strong> data became cumbersome to manage and keep track <strong>of</strong> as it was replicated<br />

into different folders and different files across <strong>the</strong> computer’s file system. There was<br />

also an unfortunate incident where a badly typed command accidently deleted several<br />

very important folders <strong>of</strong> data.<br />

When <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> 282 hot subdwarfs was complete, <strong>the</strong> results were stored<br />

in several ASCII-format files which had to be processed manually in order to correlate<br />

<strong>the</strong> classifications <strong>of</strong> <strong>the</strong> ANN with <strong>the</strong> parameters found by SFIT. This led to several<br />

such files in different folders with no attached information to say when <strong>the</strong> results were<br />

obtained, from what data set, using which models, and which ANN.<br />

Both <strong>of</strong> <strong>the</strong>se issues highlight <strong>the</strong> need for a centralised database which can keep<br />

track <strong>of</strong> <strong>the</strong> changes made to <strong>the</strong> data as an analysis proceeds. Such an idea is already<br />

widely used in tools to help manage computer s<strong>of</strong>tware projects (e.g., CVS 1 ). These<br />

tools record all <strong>the</strong> changes made to each individual source file, allowing <strong>the</strong> changes to<br />

be rolled back to any previous version should something go awry. Auditing analyses <strong>of</strong><br />

astronomical spectra in this manner would bring with it not only data integrity, but a<br />

trail <strong>of</strong> operations conducted on <strong>the</strong> data which could be analysed in detail later should<br />

an erroneous methodology need verified.<br />

A centralised database would also allow structured metadata to be recorded con-<br />

1 http://www.nongnu.org/cvs/


145<br />

cerning <strong>the</strong> dates and times <strong>of</strong> analyses, <strong>the</strong> tools used and <strong>the</strong>ir version numbers,<br />

<strong>the</strong> <strong>the</strong>oretical models used, <strong>the</strong> date <strong>the</strong>y were generated, and <strong>the</strong> codes and atomic<br />

data used to generate <strong>the</strong>m, and so on. Such metadata would prove invaluable if, for<br />

example, an analysis is revised at a later date.<br />

Finally, storing results alongside <strong>the</strong> data in a homogeneous database would greatly<br />

simplify tasks such as producing plots for publication, applying clustering algorithms<br />

to automatically look for patterns in <strong>the</strong> results, and cross-correlating <strong>the</strong> database<br />

with o<strong>the</strong>r databases accessible over <strong>the</strong> internet.<br />

Data Visualisation<br />

When dealing with large quantities <strong>of</strong> data, one extremely useful tool is interactive<br />

visualisation. Being able to graphically represent data in useful ways, and manipulate<br />

<strong>the</strong>m by way <strong>of</strong> visualisation, facilitates <strong>the</strong> process <strong>of</strong> discovery and understanding.<br />

When analysing <strong>the</strong> SDSS data in Chapter 5, <strong>the</strong> final hot subdwarf sample was manually<br />

selected from <strong>the</strong> PCA filtering results. This stage would have proceeded much<br />

more quickly if a good visualisation tool had been in place.<br />

In this project, extensive use was made <strong>of</strong> Gnuplot 2 to visualise spectra. Although<br />

Gnuplot is an excellent plotting tool, it is not designed for interactive investigation <strong>of</strong><br />

<strong>the</strong> data being plotted. As such, to visualise <strong>the</strong> SDSS data, Gnuplot was invoked from<br />

a script to produce thousands <strong>of</strong> plots that were subsequently displayed in a series <strong>of</strong><br />

static web pages. Clearly, this is awkward, adding ano<strong>the</strong>r layer <strong>of</strong> data management<br />

to complicate <strong>the</strong> problems mentioned previously. A better solution is desperately<br />

needed.<br />

2 http://www.gnuplot.info/<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


146 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Algorithm Development<br />

Working with <strong>the</strong> main analytical tools used in this project showed that <strong>the</strong>y could<br />

be improved in several ways. The errors obtained for <strong>the</strong> classifications produced<br />

by <strong>the</strong> neural network in Chapter 2 are global estimates based on <strong>the</strong> leave-one-out<br />

cross-validation that was carried out. It would be far more useful if proper confidence<br />

intervals were available for each individual result produced by <strong>the</strong> ANN. Such confidence<br />

intervals can be obtained through <strong>the</strong> bootstrap statistical technique (e.g., Willemsen<br />

et al., 2005), or Bayesian methods (see Bishop, 1995, sect. 10.2).<br />

The SFIT model grid indexing and searching methodology in Chapter 3 works well<br />

for two and three-dimensional grids. Although it was stipulated in that chapter that<br />

higher dimensional grids are not likely to be used due to <strong>the</strong> curse <strong>of</strong> dimensionality, <strong>the</strong><br />

use <strong>of</strong> four or possibly five-dimensional grids may not be out <strong>of</strong> <strong>the</strong> question as computer<br />

technology continues to improve. In <strong>the</strong>ory, <strong>the</strong> Delaunay triangulation methodology in<br />

Chapter 3 could be extended to higher dimensional geometries, but a different approach<br />

(perhaps <strong>the</strong> k-D tree-based algorithm discussed in <strong>the</strong> chapter) may be more flexible<br />

and less complicated.<br />

As it stands, SFIT, with <strong>the</strong> modifications <strong>of</strong> Chapter 3, is a flexible and robust<br />

tool for spectral parameterisation. The next step forward is to introduce parallel programming<br />

constructs to allow its use in a distributed computing environment, such<br />

as <strong>the</strong> computing cluster at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong> (see Appendix D), or <strong>the</strong> Grid.<br />

Programatically speaking, this is not a very difficult task, but it does require some<br />

planning.<br />

The Principal Components <strong>Analysis</strong> filtering tool <strong>of</strong> Chapter 4 worked well for <strong>the</strong><br />

application to hot subdwarf spectra. A visual selection process is still required on<br />

<strong>the</strong> final filtered data set because precise filtering is a hard problem. Never<strong>the</strong>less,<br />

future work could help improve <strong>the</strong> efficiency <strong>of</strong> <strong>the</strong> PCA filter perhaps by devising<br />

a new reconstruction error calculation that is more sensitive to <strong>the</strong> finer details <strong>of</strong>


147<br />

astronomical spectra.<br />

In <strong>the</strong> application <strong>of</strong> <strong>the</strong> hot subdwarf filter to <strong>the</strong> data sets obtained from <strong>the</strong> SDSS<br />

in Chapters 4 and 5, <strong>the</strong> filter could have worked better if more weighting was given to<br />

differences in <strong>the</strong> cores and wings <strong>of</strong> spectral lines. This would burden <strong>the</strong> user with<br />

supplying some sort <strong>of</strong> line list giving <strong>the</strong> wavelengths and perhaps equivalent widths<br />

<strong>of</strong> spectral lines to which <strong>the</strong> error calculation should pay attention, but a little effort<br />

spent in preparation could save a lot <strong>of</strong> time when it comes to <strong>the</strong> visual inspection<br />

stage.<br />

The tools used in this project were chosen based on <strong>the</strong>ir previous successful applications<br />

to analysing astronomical spectra, but many o<strong>the</strong>r machine learning techniques<br />

have <strong>the</strong> potential to be employed (see Russell & Norvig, 2003). Algorithms such as<br />

<strong>the</strong> Kohonen self-organising map (Kohonen, 1990; Kohonen et al., 1996), and Bayesian<br />

probabilistic methods like those embodied in <strong>the</strong> AutoClass program 3 , can take an unknown<br />

dataset and automatically derive classes for that set based on <strong>the</strong> information<br />

present in <strong>the</strong> data. This makes <strong>the</strong>m <strong>of</strong> particular interest for filtering and classification<br />

problems, and it would be a worthwhile project to investigate <strong>the</strong>ir ability in this<br />

regard.<br />

Afterword<br />

As noted in Chapter 1, improvements in observational and information technology<br />

mean that <strong>the</strong> amount <strong>of</strong> data being ga<strong>the</strong>red in astronomy is always increasing. The<br />

specific result <strong>of</strong> this <strong>the</strong>sis is a set <strong>of</strong> tools which can be used to analyse <strong>the</strong> very large<br />

databases that will be generated by new survey projects such as SDSS-II and <strong>the</strong> GAIA<br />

space mission.<br />

The ultimate future goal <strong>of</strong> <strong>the</strong> work presented in this <strong>the</strong>sis is, however, to continue<br />

<strong>the</strong> development <strong>of</strong> <strong>the</strong> computational framework <strong>of</strong> Jeffery (2003). This framework<br />

3 http://ic.arc.nasa.gov/ic/projects/bayes-group/autoclass/<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


148 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

incorporates <strong>the</strong> tool set developed here into a much wider system to analyse and<br />

manage astronomical data, making use <strong>of</strong> distributed computing initiatives such as <strong>the</strong><br />

Grid (see Figure 7.1). This system will help us set sail on <strong>the</strong> seas <strong>of</strong> astronomical data,<br />

charting our way into <strong>the</strong> unknown mysteries <strong>of</strong> <strong>the</strong> universe.


<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Figure 7.1: Schematic diagram showing how <strong>the</strong> work <strong>of</strong> this <strong>the</strong>sis fits in with <strong>the</strong><br />

wider system envisaged by Jeffery (2003).<br />

Training<br />

Data<br />

Unknown<br />

Data Set<br />

(eg SDSS)<br />

ANN<br />

Classification<br />

Remote<br />

Astronomical<br />

Databases<br />

(eg Simbad)<br />

Pre−Processing<br />

PCA<br />

Filtering<br />

Manual<br />

Selection<br />

Results<br />

Database<br />

Results<br />

Exploration<br />

&<br />

Visualisation<br />

χ 2 Model<br />

Fitting<br />

Distributed<br />

Computing<br />

Resources<br />

Theoretical<br />

Models<br />

Database<br />

Parameter<br />

Space<br />

Exploration<br />

Model<br />

Generation<br />

Remote<br />

Atomic<br />

Database<br />

Third−Party<br />

Codes<br />

Request<br />

New<br />

Data<br />

R−Matrix II<br />

Calculation<br />

149


151


Bibliography<br />

Ahmad, A. 2004, PhD <strong>the</strong>sis, The Queen’s University <strong>of</strong> Belfast<br />

Ahmad, A. & Jeffery, C. S. 2003, A&A, 402, 335<br />

Ahmad, A., Winter, C., & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2,<br />

The Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs and Related Objects, ed.<br />

R. H. Østensen, 159–162<br />

Allende Prieto, C., Rebolo, R., López, R. J. G., Serra-Ricart, M., Beers, T. C., Rossi,<br />

S., Bonifacio, P., & Molaro, P. 2000, AJ, 120, 1516<br />

Altmann, M., Edelmann, H., & de Boer, K. S. 2004, A&A, 414, 181<br />

Bailer-Jones, C. A. L. 1996, PhD <strong>the</strong>sis, University <strong>of</strong> Cambridge<br />

—. 1997, PASP, 109, 932<br />

Bailer-Jones, C. A. L., Irwin, M., Gilmore, G., & von Hippel, T. 1997, MNRAS, 292,<br />

157<br />

Bailer-Jones, C. A. L., Irwin, M., & von Hippel, T. 1998, MNRAS, 298, 361<br />

Behara, N. T. & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, The<br />

Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs and Related Objects, ed. R. H.<br />

Østensen, 115–122<br />

Bishop, C. M. 1995, Neural Networks for Pattern Recognition (Oxford: Oxford University<br />

Press)<br />

Brown, T. M., Bowers, C. W., Kimble, R. A., & Ferguson, H. C. 2000, ApJ, 529, L89<br />

Brown, T. M., Ferguson, H. C., Davidsen, A. F., & Dorman, B. 1997, ApJ, 482, 685<br />

Caloi, V. 1976, A&A, 50, 471<br />

—. 1989, A&A, 221, 27<br />

Colin, J., de Boer, K. S., Dauphole, B., Ducourant, C., Dulou, M. R., Geffert, M., Le<br />

Campion, J.-F., Moehler, S., Odenkirchen, M., Schmidt, J. H. K., & Theissen, A.<br />

1994, A&A, 287, 38<br />

153


154 BIBLIOGRAPHY<br />

Colless, M., Dalton, G., Maddox, S., Su<strong>the</strong>rland, W., Norberg, P., Cole, S., Bland-<br />

Hawthorn, J., Bridges, T., Cannon, R., Collins, C., Couch, W., Cross, N., Deeley,<br />

K., De Propris, R., Driver, S. P., Efstathiou, G., Ellis, R. S., Frenk, C. S., Glazebrook,<br />

K., Jackson, C., Lahav, O., Lewis, I., Lumsden, S., Madgwick, D., Peacock, J. A.,<br />

Peterson, B. A., Price, I., Seaborne, M., & Taylor, K. 2001, MNRAS, 328, 1039<br />

Connolly, A. J. & Szalay, A. S. 1999, AJ, 117, 2052<br />

Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L., & Calzetti, D. 1995,<br />

AJ, 110, 1071<br />

D’Cruz, N. L., Dorman, B., Rood, R. T., & O’Connell, R. W. 1996, ApJ, 466, 359<br />

de Boer, K. S., Aguilar Sanchez, Y., Altmann, M., Geffert, M., Odenkirchen, M.,<br />

Schmidt, J. H. K., & Colin, J. 1997, A&A, 327, 577<br />

Deeming, T. J. 1964, MNRAS, 127, 493<br />

Djorgovski, S. G., Gal, R. R., Odewahn, S. C., de Carvalho, R. R., Brunner, R., Longo,<br />

G., & Scaramella, R. 1998, in Wide Field Surveys in Cosmology, 14th IAP meeting<br />

held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN: 2-8 6332-241-9,<br />

p. 89., ed. S. Colombi, Y. Mellier, & B. Raban, 89–+<br />

Dorman, B., Rood, R. T., & O’Connell, R. W. 1993, ApJ, 419, 596<br />

Dreizler, S., Heber, U., Werner, K., Moehler, S., & de Boer, K. S. 1990, A&A, 235, 234<br />

Drilling, J. S. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient Stars, 461<br />

Drilling, J. S., Jeffery, C. S., Moehler, S., Heber, U., & Napiwotzki, R. 2006, in preparation<br />

Dudley, R., E. 1992, PhD <strong>the</strong>sis, The University <strong>of</strong> St. Andrews<br />

Edelmann, H., Heber, U., Hagen, H.-J., Lemke, M., Dreizler, S., Napiwotzki, R., &<br />

Engels, D. 2003, A&A, 400, 939<br />

Edelsbrunner, H. & Shah, N. R. 1992, in SCG ’92: Proceedings <strong>of</strong> <strong>the</strong> eighth annual<br />

symposium on Computational geometry (New York, NY, USA: ACM Press), 43–52<br />

Fan, X. 1999, AJ, 117, 2528<br />

Folkes, S. R., Lahav, O., & Maddox, S. J. 1996, MNRAS, 283, 651<br />

Francis, P. J., Hewett, P. C., Foltz, C. B., & Chaffee, F. H. 1992, ApJ, 398, 476<br />

Galaz, G. & de Lapparent, V. 1998, A&A, 332, 459<br />

Glazebrook, K., Offer, A. R., & Deeley, K. 1998, ApJ, 492, 98<br />

Golub, G. H. & Van Loan, C. F. 1989, Matrix Computations, 2nd edn. (Baltimore,<br />

Maryland 21218: The Johns Hopkins University Press)


BIBLIOGRAPHY 155<br />

Green, E. M., Fontaine, G., Hyde, E. A., Charpinet, S., & Chayer, P. 2006, in Baltic<br />

Astronomy, Vol. 15, Nos. 1-2, The Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs<br />

and Related Objects, ed. R. H. Østensen, 167–174<br />

Green, R. F., Schmidt, M., & Liebert, J. 1986, ApJS, 61, 305<br />

Greenstein, J. L. & Sargent, A. I. 1974, ApJS, 28, 157<br />

Gulati, R., Gupta, R., & Singh, H. 1997a, PASP, 109, 843<br />

Gulati, R. K., Gupta, R., Gothoskar, P., & Khobragade, S. 1994a, ApJ, 426, 340<br />

—. 1994b, Vistas in Astronomy, 38, 293<br />

—. 1996, Bulletin <strong>of</strong> <strong>the</strong> Astronomical Society <strong>of</strong> India, 24, 21<br />

Gulati, R. K., Gupta, R., & Rao, N. K. 1997b, A&A, 322, 933<br />

Harris, H. C., Liebert, J., Kleinman, S. J., Nitta, A., Anderson, S. F., Knapp, G. R.,<br />

Krzesiński, J., Schmidt, G., Strauss, M. A., Vanden Berk, D., Eisenstein, D., Hawley,<br />

S., Margon, B., Munn, J. A., Silvestri, N. M., Smith, J. A., Szkody, P., Collinge,<br />

M. J., Dahn, C. C., Fan, X., Hall, P. B., Schneider, D. P., Brinkmann, J., Burles,<br />

S., Gunn, J. E., Hennessy, G. S., Hindsley, R., Ivezić, Z., Kent, S., Lamb, D. Q.,<br />

Lupton, R. H., Nichol, R. C., Pier, J. R., Schlegel, D. J., SubbaRao, M., Uomoto,<br />

A., Yanny, B., & York, D. G. 2003, AJ, 126, 1023<br />

Heber, U. 1986, A&A, 155, 33<br />

Heber, U. 1991, in IAU Symp. 145: Evolution <strong>of</strong> Stars: <strong>the</strong> Photospheric Abundance<br />

Connection, ed. G. Michaud & A. V. Tutukov, 363–+<br />

Heber, U., Hirsch, H., Ströer, A., O’Toole, S., Haas, S., & Dreizler, S. 2006, in Baltic<br />

Astronomy, Vol. 15, Nos. 1-2, The Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs<br />

and Related Objects, ed. R. H. Østensen, 104–111<br />

Heber, U. & Hunger, K. 1987, in IAU Colloq. 95: Second Conference on Faint Blue<br />

Stars, ed. A. G. D. Philip, D. S. Hayes, & J. W. Liebert, 599–602<br />

Heber, U., Hunger, K., Jonas, G., & Kudritzki, R. P. 1984, A&A, 130, 119<br />

Hirsch, H. A., Heber, U., O’Toole, S. J., & Bresolin, F. 2005, A&A, 444, L61<br />

Husfeld, D., Butler, K., Heber, U., & Drilling, J. S. 1989, A&A, 222, 150<br />

Hutchison, R. B. 1971, AJ, 76, 711<br />

Iben, I., Kaler, J. B., Truran, J. W., & Renzini, A. 1983, ApJ, 264, 605<br />

Iben, I. J. 1990, ApJ, 353, 215<br />

Jeffery, C. S. 2003, in ASP Conf. Ser. 288: <strong>Stellar</strong> Atmosphere Modeling, ed. I. Hubeny,<br />

D. Mihalas, & K. Werner, 141–+<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


156 BIBLIOGRAPHY<br />

Jeffery, C. S., Drilling, J. S., Harrison, P. M., Heber, U., & Moehler, S. 1997, A&AS,<br />

125, 501<br />

Jeffery, C. S., Heber, U., Hill, P. W., Dreizler, S., Drilling, J. S., Lawson, W. A.,<br />

Leuenhagen, U., & Werner, K. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient<br />

Stars, ed. C. S. Jeffery & U. Heber, 471–+<br />

Jeffery, C. S., Woolf, V. M., & Pollacco, D. L. 2001, A&A, 376, 497<br />

Katz, D., Soubiran, C., Cayrel, R., Adda, M., & Cautain, R. 1998, A&A, 338, 151<br />

Kleinman, S. J., Harris, H. C., Eisenstein, D. J., Liebert, J., Nitta, A., Krzesiński, J.,<br />

Munn, J. A., Dahn, C. C., Hawley, S. L., Pier, J. R., Schmidt, G., Silvestri, N. M.,<br />

Smith, J. A., Szkody, P., Strauss, M. A., Knapp, G. R., Collinge, M. J., Mukadam,<br />

A. S., Koester, D., Uomoto, A., Schlegel, D. J., Anderson, S. F., Brinkmann, J.,<br />

Lamb, D. Q., Schneider, D. P., & York, D. G. 2004, ApJ, 607, 426<br />

Klemola, A. R. 1961, ApJ, 134, 130<br />

Kohonen, T. 1990, in New Concepts in Computer Science: Proc. Symp. in Honour <strong>of</strong><br />

Jean-Claude Simon (Paris, France: AFCET), 181–190<br />

Kohonen, T., Hynninen, J., Kangas, J., & Laaksonen, J. 1996, SOM PAK: The Self-<br />

Organizing Map program package, Tech. Rep. A31, Laboratory <strong>of</strong> Computer and<br />

Information Science, Helsinki University <strong>of</strong> Technology<br />

Kurtz, M. J. 1982, Ph.D. Thesis<br />

Lahav, O., Naim, A., Sodré, L., & Storrie-Lombardi, M. C. 1996, MNRAS, 283, 207<br />

Lamy, H. & Hutsemékers, D. 2004, A&A, 427, 107<br />

Lasala, J. 1994, in ASP Conf. Ser. 60: The MK Process at 50 Years: A Powerful Tool<br />

for Astrophysical Insight, ed. C. J. Corbally, R. O. Gray, & R. F. Garrison, 312–+<br />

Levenberg, K. 1944, Questions <strong>of</strong> Applied Ma<strong>the</strong>matics, 2, 164<br />

Livny, M. & Raman, R. 1998, in The Grid: Blueprint for a New Computing Infrastructure,<br />

ed. I. Foster & C. Kesselman (Morgan Kaufmann)<br />

Marquardt, D. W. 1963, Journal <strong>of</strong> <strong>the</strong> Society for Industrial and Applied Ma<strong>the</strong>matics,<br />

11, 431<br />

Maxted, P. f. L., Heber, U., Marsh, T. R., & North, R. C. 2001, MNRAS, 326, 1391<br />

Mengel, J. G., Norris, J., & Gross, P. G. 1976, ApJ, 204, 488<br />

Moehler, S., de Boer, K. S., & Heber, U. 1990a, A&A, 239, 265<br />

Moehler, S., Richtler, T., de Boer, K. S., Dettmar, R. J., & Heber, U. 1990b, A&AS,<br />

86, 53<br />

Möller, T. & Trumbore, B. 1997, Journal <strong>of</strong> Graphics Tools, 2, 21, see:<br />

http://www.acm.org/jgt/papers/MollerTrumbore97/


BIBLIOGRAPHY 157<br />

Moore, A. 1991, A tutorial on kd-trees, Extract from PhD Thesis, available from<br />

http://www.cs.cmu.edu/∼awm/papers.html<br />

Morgan, W. W., Abt, H. A., & Tapscott, J. W. 1978, Revised MK <strong>Spectra</strong>l Atlas for<br />

stars earlier than <strong>the</strong> sun (Williams Bay: Yerkes <strong>Observatory</strong>, and Tucson: Kitt Peak<br />

National <strong>Observatory</strong>, 1978)<br />

Morossi, C. & Crivellari, L. 1980, A&AS, 41, 299<br />

Mücke, E. P., Saias, I., & Zhu, B. 1996, in SCG ’96: Proceedings <strong>of</strong> <strong>the</strong> twelfth annual<br />

symposium on Computational geometry (New York, NY, USA: ACM Press), 274–283<br />

Murtagh, F. & Heck, A. 1987, Multivariate Data <strong>Analysis</strong> (Dordrecht, Holland: D.<br />

Reidel Publishing Co.)<br />

Napiwotzki, R., Karl, C. A., Lisker, T., Heber, U., Christlieb, N., Reimers, D., Nelemans,<br />

G., & Homeier, D. 2004, Ap&SS, 291, 321<br />

Newell, E. B. 1973, ApJS, 26, 37<br />

O’Rourke, J. 1998, Computational Geometry in C, 2nd edn. (Cambridge (UK) and<br />

New York: Cambridge University Press)<br />

Paczyński, B. 1971, Acta Astronomica, 21, 1<br />

Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. 1986, Numerical<br />

Recipes: The Art <strong>of</strong> Scientific Computing, 1st edn. (Cambridge (UK) and New York:<br />

Cambridge University Press)<br />

Qin, D.-M., Guo, P., Hu, Z.-Y., & Zhao, Y.-H. 2003, Chinese Journal <strong>of</strong> Astronony and<br />

Astrophysics, 3, 277<br />

Reid, I. N., Brewer, C., Brucato, R. J., McKinley, W. R., Maury, A., Mendenhall,<br />

D., Mould, J. R., Mueller, J., Neugebauer, G., Phinney, J., Sargent, W. L. W.,<br />

Schombert, J., & Thicksten, R. 1991, PASP, 103, 661<br />

Renka, R. J. 1988, ACM Trans. Math. S<strong>of</strong>tw., 14, 139<br />

Rhee, J., Beers, T. C., & Irwin, M. J. 1999, Bulletin <strong>of</strong> <strong>the</strong> American Astronomical<br />

Society, 31, 971<br />

Russell, S. & Norvig, P. 2003, Artificial Intelligence A Modern Approach, 2nd edn.<br />

(Upper Saddle River, New Jersey 07458: Pearson Education Inc.)<br />

Saffer, R. A., Bergeron, P., Koester, D., & Liebert, J. 1994, ApJ, 432, 351<br />

Shepard, D. 1968, in Proceedings <strong>of</strong> <strong>the</strong> 1968 23rd ACM national conference (New<br />

York, NY, USA: ACM Press), 517–524<br />

Shewchuk, J. R. 1996, in SCG ’96: Proceedings <strong>of</strong> <strong>the</strong> twelfth annual symposium on<br />

Computational geometry (New York, NY, USA: ACM Press), 141–150<br />

Simkin, S. M. 1974, A&A, 31, 129<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


158 BIBLIOGRAPHY<br />

Singh, H. P., Gulati, R. K., & Gupta, R. 1998, MNRAS, 295, 312<br />

Skrutskie, M. F., Cutri, R. M., Stiening, R., Weinberg, M. D., Schneider, S., Carpenter,<br />

J. M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Liebert, J.,<br />

Lonsdale, C., Monet, D. G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J. D.,<br />

Gizis, J. E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R.,<br />

Kopan, E. L., Marsh, K. A., McCallon, H. L., Tam, R., Van Dyk, S., & Wheelock,<br />

S. 2006, AJ, 131, 1163<br />

Snider, S., Allende Prieto, C., von Hippel, T., Beers, T. C., Sneden, C., Qu, Y., &<br />

Rossi, S. 2001, ApJ, 562, 528<br />

Sodre, L. J., Cuevas, H., & Capelato, H. V. 1998, in Wide Field Surveys in Cosmology,<br />

14th IAP meeting held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN:<br />

2-8 6332-241-9, p. 424., ed. S. Colombi, Y. Mellier, & B. Raban, 424–+<br />

Storrie-Lombardi, M. C., Irwin, M. J., von Hippel, T., & Storrie-Lombardi, L. J. 1994,<br />

Vistas in Astronomy, 38, 331<br />

Sweigart, A. V. 1997, ApJ, 474, L23+<br />

Theissen, A., Moehler, S., Heber, U., & de Boer, K. S. 1993, A&A, 273, 524<br />

Thejll, P., Bauer, F., Saffer, R., Liebert, J., Kunze, D., & Shipman, H. L. 1994, ApJ,<br />

433, 819<br />

Tonry, J. & Davis, M. 1979, AJ, 84, 1511<br />

von Hippel, T., Storrie-Lombardi, L. J., Storrie-Lombardi, M. C., & Irwin, M. J. 1994,<br />

MNRAS, 269, 97<br />

Weaver, W. B. 2000a, Bulletin <strong>of</strong> <strong>the</strong> American Astronomical Society, 32, 1430<br />

—. 2000b, ApJ, 541, 298<br />

Weaver, W. B. & Torres-Dodgen, A. V. 1995, ApJ, 446, 300<br />

—. 1997, ApJ, 487, 847<br />

Weir, N., Fayyad, U. M., Djorgovski, S. G., & Roden, J. 1995, PASP, 107, 1243<br />

Wesemael, F., Winget, D. E., Cabot, W., van Horn, H. M., & Fontaine, G. 1982, ApJ,<br />

254, 221<br />

Whitney, C. A. 1983, A&AS, 51, 443<br />

Willemsen, P. G., Hilker, M., Kayser, A., & Bailer-Jones, C. A. L. 2005, A&A, 436,<br />

379<br />

York, D. G., Adelman, J., Anderson, J. E., Anderson, S. F., Annis, J., Bahcall, N. A.,<br />

Bakken, J. A., Barkhouser, R., Bastian, S., Berman, E., Boroski, W. N., Bracker, S.,<br />

Briegel, C., Briggs, J. W., Brinkmann, J., Brunner, R., Burles, S., Carey, L., Carr,<br />

M. A., Castander, F. J., Chen, B., Colestock, P. L., Connolly, A. J., Crocker, J. H.,


BIBLIOGRAPHY 159<br />

Csabai, I., Czarapata, P. C., Davis, J. E., Doi, M., Dombeck, T., Eisenstein, D.,<br />

Ellman, N., Elms, B. R., Evans, M. L., Fan, X., Federwitz, G. R., Fiscelli, L., Friedman,<br />

S., Frieman, J. A., Fukugita, M., Gillespie, B., Gunn, J. E., Gurbani, V. K.,<br />

de Haas, E., Haldeman, M., Harris, F. H., Hayes, J., Heckman, T. M., Hennessy,<br />

G. S., Hindsley, R. B., Holm, S., Holmgren, D. J., Huang, C.-h., Hull, C., Husby, D.,<br />

Ichikawa, S.-I., Ichikawa, T., Ivezić, Ž., Kent, S., Kim, R. S. J., Kinney, E., Klaene,<br />

M., Kleinman, A. N., Kleinman, S., Knapp, G. R., Korienek, J., Kron, R. G., Kunszt,<br />

P. Z., Lamb, D. Q., Lee, B., Leger, R. F., Limmongkol, S., Lindenmeyer, C.,<br />

Long, D. C., Loomis, C., Loveday, J., Lucinio, R., Lupton, R. H., MacKinnon, B.,<br />

Mannery, E. J., Mantsch, P. M., Margon, B., McGehee, P., McKay, T. A., Meiksin,<br />

A., Merelli, A., Monet, D. G., Munn, J. A., Narayanan, V. K., Nash, T., Neilsen,<br />

E., Neswold, R., Newberg, H. J., Nichol, R. C., Nicinski, T., Nonino, M., Okada, N.,<br />

Okamura, S., Ostriker, J. P., Owen, R., Pauls, A. G., Peoples, J., Peterson, R. L.,<br />

Petravick, D., Pier, J. R., Pope, A., Pordes, R., Prosapio, A., Rechenmacher, R.,<br />

Quinn, T. R., Richards, G. T., Richmond, M. W., Rivetta, C. H., Rockosi, C. M.,<br />

Ruthmansdorfer, K., Sandford, D., Schlegel, D. J., Schneider, D. P., Sekiguchi, M.,<br />

Sergey, G., Shimasaku, K., Siegmund, W. A., Smee, S., Smith, J. A., Snedden, S.,<br />

Stone, R., Stoughton, C., Strauss, M. A., Stubbs, C., SubbaRao, M., Szalay, A. S.,<br />

Szapudi, I., Szokoly, G. P., Thakar, A. R., Tremonti, C., Tucker, D. L., Uomoto, A.,<br />

Vanden Berk, D., Vogeley, M. S., Waddell, P., Wang, S.-i., Watanabe, M., Weinberg,<br />

D. H., Yanny, B., & Yasuda, N. 2000, AJ, 120, 1579<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Appendices<br />

161


Appendix A<br />

Results for 192 Drilling et al.<br />

(2006) Hot Subdwarfs<br />

This table lists <strong>the</strong> parameterisation results for both <strong>the</strong> calibrated and uncalibrated<br />

stars obtained from Drilling et al. (2006). Results obtained from <strong>the</strong> parameterisation<br />

neural network and SFIT are given, with <strong>the</strong> internal errors <strong>of</strong> SFIT also listed.<br />

163


Table A.1: Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs<br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

BD-07 3477 27748 1362 5.420 0.120 -2.673 0.204 1.90E-01 26360.4620 5.4370 -2.4641<br />

BD+25 3941 28478 447 4.645 0.058 -1.422 0.034 1.89E+00 30794.9018 4.7807 -2.2476<br />

BD+28 4211 48135 1120 5.773 0.072 -1.121 0.040 3.72E-01 59518.9743 6.6219 -3.3647<br />

BD+40 4032 27895 713 4.083 0.069 -1.079 0.036 1.02E+00 27197.0607 3.7571 -0.9165<br />

Feige 110 40000 196 5.776 0.042 -2.020 0.136 8.78E-01 45638.2315 5.9658 -3.3709<br />

Feige 15 12000 162 4.500 0.053 -2.201 0.276 2.48E+00 13763.3319 3.9102 -1.4225<br />

Feige 38 29629 504 5.546 0.055 -2.483 0.132 2.71E-01 30014.7462 5.5768 -2.5487<br />

Feige 56 15571 279 3.608 0.048 -1.765 0.101 2.80E-01 17780.0476 3.7579 -2.3262<br />

Feige 98 11590 196 3.793 0.064 -2.541 0.302 7.42E-01 13083.8300 3.8179 -3.0230<br />

FHB 18 10819 133 4.179 0.037 -1.602 0.052 5.28E-01 12901.1081 4.2528 -2.7523<br />

FHB 23 11646 189 4.394 0.053 -2.586 0.334 4.27E-01 14271.8636 4.4317 -3.3414<br />

HD 144941 22000 348 3.835 0.055 0.963 0.008 1.31E+00 21681.7225 3.6821 1.6263<br />

HD 160641 30614 207 3.153 0.025 2.237 0.225 5.06E+00 31303.9914 2.8086 1.8212<br />

HD 17520 34793 723 3.804 0.067 -0.661 0.030 1.46E+00 35330.3819 4.0278 -0.9810<br />

HD 184279 25927 292 3.917 0.034 -0.400 0.016 2.06E+00 28409.0447 3.9086 -0.0132<br />

HD 192281 11337 194 4.012 0.044 -1.856 0.405 6.63E-01 11143.6105 3.6825 -2.2308<br />

HD 217086 37427 439 4.691 0.065 -1.237 0.045 8.02E-01 45007.2314 5.1943 -1.5241<br />

Hiltner 600 29739 564 4.717 0.063 -1.112 0.034 6.06E-01 27869.8814 4.7985 -0.9717<br />

HR 6092 22693 545 4.026 0.070 -1.477 0.039 4.36E+00 14259.7656 2.7584 -0.0691<br />

HR 6588 23305 466 3.682 0.063 -0.938 0.033 3.60E+00 16634.1650 2.5556 -0.1572<br />

continued on next page<br />

164 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table A.1: continued<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

HR 6719 25813 814 3.500 0.056 -0.597 0.026 2.68E+00 19240.4897 2.3722 -0.0260<br />

HR 7287 19088 391 3.598 0.058 -1.929 0.074 4.50E+00 12007.6482 2.1732 0.1142<br />

HR 8622 28951 1186 2.919 0.063 0.107 0.006 3.06E+00 29216.7530 2.8812 0.2961<br />

HS 0016+0044 29725 540 5.523 0.075 -2.939 0.377 6.74E-01 30591.2646 5.7893 -3.2113<br />

HS 1000+471 40766 137 5.659 0.036 0.521 0.013 6.30E+00 35501.0360 4.3467 -0.6256<br />

HS 1844+637 29045 177 3.633 0.026 0.976 0.008 8.46E+00 39133.7152 4.5950 3.4902<br />

HS 2253+0900 13534 281 3.863 0.060 -1.156 0.062 5.37E-01 13688.8067 3.9192 -1.8474<br />

HS 2301+0728 17658 528 4.378 0.065 -2.771 0.257 2.29E+00 10788.7216 2.7773 -3.8880<br />

HZ 15 20434 584 3.000 0.065 -0.630 0.025 2.37E+00 24751.3086 3.0214 -1.0013<br />

HZ 44 38507 224 5.381 0.040 0.088 0.003 1.58E+00 37595.6085 4.8985 -0.1747<br />

LSIV-14 37999 240 5.648 0.045 -0.573 0.017 3.63E+00 37185.7816 5.3742 -0.8631<br />

LSIV-6 28974 257 3.747 0.046 2.522 0.144 2.73E+00 26745.5011 2.7521 3.2416<br />

LSS 5121 30511 245 3.216 0.037 2.273 0.163 3.95E+00 32165.1540 3.2400 1.0310<br />

PG0001+275 35180 300 5.406 0.053 -3.000 0.434 4.31E+00 34205.9951 5.0684 -4.2459<br />

PG0004+133 26205 1118 4.828 0.110 -2.037 0.094 3.67E+00 32994.6787 5.2114 -1.8659<br />

PG0009+036 20214 629 4.496 0.084 -2.488 0.134 2.29E+00 23334.3874 4.8444 -2.5336<br />

PG0039+049 28606 391 4.668 0.068 -2.995 0.430 2.26E+00 20927.6745 3.7393 -3.3758<br />

PG0039+135 45029 212 5.408 0.081 0.514 0.030 3.51E+00 42789.5834 5.3039 -0.3035<br />

PG0057+155 32203 375 5.500 0.063 -1.785 0.053 1.02E+00 33110.1795 5.5769 -2.0610<br />

PG0101+039 27565 1344 5.357 0.114 -3.000 0.434 1.66E+00 27132.9594 5.1346 -2.8721<br />

PG0133+114 35999 61 6.000 0.034 -2.995 0.430 1.54E+00 30996.2889 5.0018 -3.0707<br />

continued on next page<br />

165


Table A.1: continued<br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG0135+242 25308 323 3.375 0.057 2.452 0.123 3.30E+00 22720.6398 2.5661 2.3728<br />

PG0142+148 28738 565 5.022 0.080 -2.966 0.402 3.72E+00 33990.4595 5.0653 -3.3974<br />

PG0208+016 44506 160 5.926 0.041 1.218 0.043 4.17E+00 42334.8969 5.7844 0.7455<br />

PG0229+064 18305 192 3.991 0.056 -0.798 0.035 9.77E-01 23366.1468 4.5001 -1.1572<br />

PG0232+095 35000 435 4.861 0.030 -1.392 0.053 5.88E+00 26042.8910 3.8106 -2.5768<br />

PG0304+183 29953 710 5.182 0.077 -2.913 0.356 7.50E+00 25550.8542 4.7584 -3.9436<br />

PG0314+146 8541 37 1.235 0.028 -2.004 0.088 7.27E+00 4603.7402 1.2050 -3.8688<br />

PG0342+026 21878 819 4.731 0.073 -3.000 0.434 2.06E+00 30227.8532 5.3591 -3.3190<br />

PG0838+133 40055 139 4.500 0.031 0.990 0.013 3.52E+00 46872.5237 4.9328 4.0786<br />

PG0856+121 26869 620 5.600 0.067 -3.000 0.434 2.70E+00 27571.6464 5.7342 -3.3096<br />

PG0902+058 42352 111 6.000 0.041 1.912 0.106 3.85E+00 42180.4296 5.7766 1.9591<br />

PG0907+123 26482 641 5.075 0.071 -3.000 0.434 4.06E+00 24278.8101 4.9274 -3.3162<br />

PG0909+164 31880 562 4.847 0.086 -3.000 0.434 1.57E+00 35575.3699 4.6247 -3.7231<br />

PG0909+275 34009 427 4.795 0.057 -0.685 0.022 1.93E+00 36507.4485 5.1157 -1.2373<br />

PG0918+029 31029 310 5.500 0.066 -2.649 0.193 2.41E+00 24129.8781 4.4794 -2.2902<br />

PG0920+029 25992 1329 4.781 0.111 -3.000 0.434 1.48E+00 27712.3799 4.8930 -3.3238<br />

PG0921+161 32868 292 5.329 0.060 -1.640 0.057 2.59E+00 35284.4531 5.1844 -3.0325<br />

PG0921+311 42320 137 5.873 0.047 1.068 0.015 2.72E+00 41033.2928 5.5140 0.5536<br />

PG0934+145 16681 212 4.031 0.037 -0.899 0.034 2.22E+00 13314.2430 3.7870 -0.7299<br />

PG0954+049 13384 239 3.399 0.058 -1.526 0.087 1.31E+00 12465.6053 3.0812 -2.6841<br />

PG1000+375 32896 237 5.814 0.053 -1.761 0.050 1.31E+00 20945.0047 4.8115 -1.6003<br />

continued on next page<br />

166 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table A.1: continued<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG1017+431 32192 454 4.816 0.074 -2.991 0.425 5.45E-01 44773.4090 5.8294 -3.6347<br />

PG1018-047 30361 252 5.365 0.057 -3.000 0.434 2.23E+00 30145.1825 5.3885 -5.4213<br />

PG1047+003 32977 318 5.459 0.066 -2.130 0.117 4.07E-01 34846.5851 5.5663 -2.5655<br />

PG1049+013 32754 427 4.725 0.069 -2.610 0.177 8.38E-01 44195.8859 5.5828 -2.9385<br />

PG1050-065 34509 236 5.591 0.047 -1.212 0.035 2.91E+00 36241.7231 5.3524 -1.2791<br />

PG1118+061 28321 386 5.224 0.065 -3.000 0.434 3.84E-01 27695.5481 5.1031 -2.9965<br />

PG1127+019 40812 131 4.965 0.070 1.940 0.265 1.81E+00 39675.7493 4.6297 2.1534<br />

PG1136-003 30576 260 5.250 0.056 -3.000 0.434 4.32E-01 28147.6630 5.0249 -3.0350<br />

PG1154-070 28000 963 5.430 0.092 -2.200 0.069 3.92E-01 26478.4624 5.2617 -2.1939<br />

PG1220-056 49308 883 5.460 0.107 0.309 0.013 8.09E-01 52185.9827 6.0800 -1.0282<br />

PG1230+067 38843 191 4.926 0.056 1.013 0.013 2.36E+00 40314.1570 5.0314 2.8373<br />

PG1245-042 15232 266 3.803 0.056 -1.820 0.144 5.97E-01 15548.7313 3.8416 -1.4344<br />

PG1246-122 32573 361 4.001 0.044 -1.012 0.035 3.11E+00 33475.4066 3.7321 -2.2059<br />

PG1249+762 50000 276 5.623 0.087 0.368 0.028 1.06E+00 65439.9717 6.7602 0.0994<br />

PG1255+547 32774 330 5.500 0.059 -1.547 0.046 1.20E+00 30935.1727 5.6242 -2.1342<br />

PG1258-030 13075 278 3.656 0.048 -2.286 0.168 1.03E+00 12502.8404 3.3526 -2.0501<br />

PG1300+279 48677 632 5.955 0.053 -0.041 0.002 1.56E+00 46441.3479 6.5777 -0.9764<br />

PG1303-114 31245 354 5.502 0.068 -3.000 0.434 1.82E+00 30588.0996 5.4478 -5.0836<br />

PG1325+054 44232 208 5.915 0.033 0.326 0.011 2.18E+00 45019.5994 6.0397 1.9269<br />

PG1336-018 31271 245 5.567 0.049 -2.519 0.143 6.11E-01 37061.5891 5.9196 -3.4263<br />

PG1343-102 29958 707 5.424 0.087 -2.910 0.353 6.71E-01 31263.5069 5.3800 -3.4850<br />

continued on next page<br />

167


Table A.1: continued<br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG1343+578 15335 301 3.396 0.062 -1.835 0.089 7.67E-01 14789.0812 3.2773 -2.8106<br />

PG1348+607 45000 223 5.386 0.076 0.000 0.000 2.17E+00 56671.0493 6.6681 2.0785<br />

PG1352-023 47661 1197 6.000 0.091 -1.723 0.092 2.46E+00 54114.0712 6.2103 -3.7261<br />

PG1355-064 49999 733 5.299 0.115 0.364 0.029 9.95E-01 55817.7736 5.5139 -1.3330<br />

PG1401+289 47629 247 5.753 0.087 0.184 0.014 9.22E-01 43645.2377 5.6074 0.1812<br />

PG1409-103 39399 768 5.008 0.089 -2.166 0.127 1.45E+00 50621.5726 5.7496 -4.4764<br />

PG1413+114 43416 117 6.000 0.039 1.261 0.063 3.54E+00 44247.9760 5.7971 3.0220<br />

PG1415+492 31467 283 4.135 0.047 2.512 0.141 2.26E+00 30378.3273 3.7367 2.7468<br />

PG1426-067 34262 380 5.368 0.066 -3.000 0.434 1.17E+00 34904.3068 5.1176 -2.8789<br />

PG1432+004 24561 1051 4.987 0.114 -2.308 0.088 9.80E-01 24852.4845 5.0079 -2.5775<br />

PG1433+239 35306 378 5.345 0.061 -2.970 0.405 3.89E+00 39665.7300 5.3344 -3.5664<br />

PG1441+407 49802 277 6.000 0.089 0.464 0.038 1.61E+00 46391.6786 5.5498 0.7923<br />

PG1448-052 32489 304 5.189 0.058 -3.000 0.434 7.68E-01 50271.3965 6.1452 -3.8949<br />

PG1449+652 30456 269 4.598 0.057 -3.000 0.434 1.23E+00 22392.8613 3.6074 -4.2284<br />

PG1451+492 17996 482 3.999 0.067 -1.919 0.108 8.85E-01 21898.4813 4.1696 -2.0822<br />

PG1453-081 16393 281 3.977 0.068 -1.196 0.095 1.02E+00 20030.3017 3.9852 -1.7400<br />

PG1453-085 12264 169 3.175 0.044 -2.777 0.260 6.93E-01 14759.0551 3.3091 -3.1086<br />

PG1458+423 29151 811 5.000 0.104 -2.995 0.430 6.94E-01 29104.1751 4.8454 -3.9124<br />

PG1506-052 38002 479 5.227 0.071 -2.226 0.073 2.32E+00 57963.7346 6.0973 -6.7294<br />

PG1510+635 12489 184 3.256 0.048 -2.906 0.350 2.53E+00 14137.8470 3.2197 -3.4824<br />

PG1518-098 47134 1376 5.000 0.098 0.276 0.029 1.04E+00 63938.6641 5.7489 2.8699<br />

continued on next page<br />

168 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table A.1: continued<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG1519+640 28801 669 5.000 0.099 -2.779 0.261 8.55E-01 29147.1760 4.9799 -3.1609<br />

PG1526+440 41199 139 5.826 0.037 0.738 0.023 3.90E+00 37868.3371 4.9471 -0.1320<br />

PG1532+523 30618 263 5.257 0.056 -2.935 0.374 1.03E+00 25389.6717 4.5495 -3.1041<br />

PG1534-018 45010 166 5.656 0.043 1.224 0.087 8.31E-01 43114.3697 5.3541 0.8758<br />

PG1536+690 50000 277 5.647 0.087 0.364 0.028 9.38E-01 55913.0919 6.1882 -0.0670<br />

PG1537-046 47494 676 5.249 0.108 0.174 0.008 1.17E+00 58302.0139 6.0350 2.8671<br />

PG1538+401 32260 320 5.446 0.064 -3.000 0.434 8.57E-01 34944.9166 5.5433 -2.9674<br />

PG1538+611 30528 307 5.416 0.063 -3.000 0.434 1.32E+00 29648.9555 5.0240 -4.0399<br />

PG1543+629 40002 185 5.536 0.043 -2.137 0.119 1.07E+00 51462.0631 5.9720 -3.9767<br />

PG1544+488 30992 273 4.202 0.046 2.522 0.144 1.50E+00 33677.6132 4.4517 3.5425<br />

PG1544+601 30000 518 5.543 0.054 -2.579 0.165 4.27E-01 28454.0634 5.5260 -2.6698<br />

PG1545+035 38820 419 5.000 0.083 -1.081 0.036 1.43E+00 47852.7894 5.5488 -4.0558<br />

PG1549+006 31939 559 5.288 0.070 -2.074 0.103 1.08E+00 29109.5245 5.0910 -1.8754<br />

PG1553-077 44904 210 5.707 0.045 0.256 0.014 1.56E+00 42387.5197 5.3536 1.0981<br />

PG1554+408 34356 255 4.320 0.056 2.522 0.289 3.41E+00 38902.7912 4.9593 2.1864<br />

PG1558-007 21419 753 4.922 0.081 -2.709 0.222 4.26E-01 26843.3693 5.3108 -2.7541<br />

PG1559+048 36325 228 5.600 0.049 -0.938 0.033 1.05E+00 36003.1956 5.3491 -0.9926<br />

PG1559+222 42434 138 6.000 0.047 2.129 0.117 2.79E+00 41856.4568 5.5616 1.0422<br />

PG1559+533 29420 633 5.500 0.090 -2.817 0.285 1.52E+00 21995.1194 5.0622 -1.9979<br />

PG1600+171 45883 363 5.999 0.092 0.943 0.026 3.97E+00 42817.5697 5.4414 1.6103<br />

PG1602+013 39961 202 5.592 0.042 -1.995 0.129 1.49E+00 53840.6098 6.1985 -3.8367<br />

continued on next page<br />

169


Table A.1: continued<br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG1605+072 30000 824 4.779 0.088 -2.790 0.268 2.97E-01 31978.9966 4.9655 -2.6540<br />

PG1607+174 32181 335 4.674 0.046 -0.268 0.009 5.28E+00 31842.6743 4.4473 -0.4410<br />

PG1610+519 31595 380 4.537 0.070 -3.000 0.434 5.74E-01 38165.4762 4.8614 -4.8178<br />

PG1613+467 23590 1075 4.612 0.083 -2.541 0.151 9.68E-01 29674.0969 5.1593 -3.3097<br />

PG1615+413 28553 189 3.759 0.018 0.990 0.008 3.70E+00 36907.9108 4.9056 2.1208<br />

PG1618+563 33309 341 5.481 0.069 -1.513 0.042 2.17E+00 36466.0770 5.8061 -2.4338<br />

PG1619+525 31178 318 5.500 0.067 -2.371 0.102 4.17E-01 30888.0423 5.4437 -2.4573<br />

PG1624+085 43897 205 5.752 0.058 0.623 0.045 1.94E+00 41755.1142 5.5719 0.1620<br />

PG1627+006 23222 593 5.193 0.068 -2.899 0.344 1.01E+00 20611.6950 4.8277 -2.7701<br />

PG1627+017 22959 522 5.206 0.065 -2.700 0.218 3.61E-01 22424.9747 5.2100 -2.8087<br />

PG1629+466 35779 273 5.000 0.035 0.767 0.030 1.21E+00 37328.4040 5.1599 0.6611<br />

PG1640+645 34458 262 5.591 0.048 -1.590 0.051 4.84E-01 35698.5308 5.9373 -2.2544<br />

PG1644+404 28221 457 5.060 0.062 -3.000 0.434 7.48E-01 32326.3238 5.4986 -3.2872<br />

PG1645+610 28377 522 5.411 0.080 -2.391 0.107 8.12E-01 17587.7044 4.7845 -2.2044<br />

PG1646+607 47595 668 6.000 0.057 -0.032 0.001 2.31E+00 46365.3539 6.0318 0.4509<br />

PG1648+315 41835 130 6.000 0.046 1.040 0.014 4.08E+00 36749.1785 4.9233 0.0589<br />

PG1648+536 30145 630 5.001 0.093 -3.000 0.434 7.59E-01 31819.6176 5.0395 -4.0618<br />

PG1653+633 34667 268 5.790 0.055 -1.693 0.043 1.01E+00 38094.6200 6.2811 -2.4102<br />

PG1656+600 30481 252 6.000 0.056 -2.628 0.184 1.51E+00 33912.5833 6.0668 -3.3172<br />

PG1658+273 43059 113 6.000 0.038 1.261 0.063 3.25E+00 40736.4516 4.8451 1.2766<br />

PG1701+359 32615 88 5.918 0.021 -2.604 0.175 5.87E-01 32564.5119 5.5194 -3.9213<br />

continued on next page<br />

170 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table A.1: continued<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG1704+222 15926 334 2.561 0.053 -1.131 0.053 5.49E-01 19339.7888 2.9039 -1.5097<br />

PG1705+537 15025 341 3.437 0.062 -1.356 0.049 1.23E+00 19121.6772 3.7978 -1.7138<br />

PG1707+657 36119 114 5.993 0.033 -1.902 0.104 6.97E-01 32424.6910 5.4695 -1.9184<br />

PG1708+142 18154 459 3.480 0.080 -1.245 0.068 2.06E+00 20510.9488 3.6958 -1.5666<br />

PG1708+602 42477 775 5.256 0.074 -0.945 0.060 8.49E-01 52356.2200 5.9487 -3.2523<br />

PG1710+490 29497 546 5.102 0.069 -2.705 0.220 1.06E+00 30354.3920 5.2502 -3.0947<br />

PG1710+567 33908 486 5.110 0.033 -1.538 0.045 9.87E-01 30094.2953 4.8112 -2.7425<br />

PG1715+273 29074 227 3.797 0.016 1.070 0.010 4.32E+00 35385.5095 4.5349 3.0662<br />

PG1717+423 23293 615 4.654 0.074 -2.991 0.425 1.12E+00 26991.4606 4.9110 -3.2374<br />

PG1722+286 33802 292 5.795 0.058 -1.757 0.050 1.33E+00 31508.9630 5.7798 -1.6469<br />

PG1724+590 28706 605 5.000 0.094 -2.420 0.114 9.74E-01 27423.4056 4.9714 -3.2446<br />

PG1738+505 24113 746 4.922 0.081 -1.829 0.059 9.28E-01 28143.7622 5.3359 -1.8641<br />

PG1739+489 22474 539 4.569 0.066 -2.744 0.241 7.71E-01 28940.6842 5.0868 -3.4167<br />

PG1743+477 26873 1206 4.904 0.123 -2.214 0.071 1.82E+00 26957.5253 4.8522 -3.3545<br />

PG2059+013 33086 233 5.583 0.046 -1.607 0.053 2.64E+00 31942.9438 5.5146 -1.9470<br />

PG2111+023 14681 254 4.000 0.055 -1.274 0.073 3.69E+00 18712.6960 4.0693 -2.2958<br />

PG2120+062 32008 512 4.133 0.052 -0.950 0.046 7.43E-01 32664.6531 4.1185 -1.1104<br />

PG2148+095 30001 806 4.555 0.082 -3.000 0.434 5.46E-01 25977.6010 3.9359 -3.3697<br />

PG2151+100 35789 66 5.941 0.034 -2.677 0.206 7.84E-01 41296.8002 5.7262 -3.0819<br />

PG2158+082 49999 743 5.500 0.119 0.364 0.029 1.28E+00 53960.7462 5.8944 -1.2564<br />

PG2159+051 13496 252 3.244 0.050 -1.441 0.084 3.91E-01 13921.4240 3.3088 -1.7585<br />

continued on next page<br />

171


Table A.1: continued<br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PG2204+035 31535 302 5.917 0.059 -2.506 0.139 3.75E+00 23039.5221 5.2470 -2.1168<br />

PG2205+023 27156 646 5.622 0.069 -3.000 0.434 1.06E+00 24413.7057 5.4836 -3.0047<br />

PG2215+151 45566 318 5.841 0.068 1.951 0.620 4.56E+00 41433.7407 5.4859 -0.1429<br />

PG2218+020 21062 711 4.768 0.070 -2.707 0.221 1.34E+00 30254.5605 5.4740 -2.9763<br />

PG2219+094 19206 754 3.546 0.068 -1.457 0.112 1.39E+00 24813.0492 4.0618 -1.7815<br />

PG2229+099 16940 236 3.755 0.053 -0.958 0.031 1.41E+00 18337.4404 3.8720 -1.3692<br />

PG2258+155 34000 381 4.481 0.046 0.945 0.008 3.68E+00 37528.2798 5.2630 2.4797<br />

PG2259+134 31323 396 5.772 0.057 -1.975 0.082 1.35E+00 31934.3911 5.8685 -2.0224<br />

PG2301+259 18959 395 4.217 0.061 -1.904 0.070 6.67E-01 16716.7103 4.1106 -1.5974<br />

PG2314+076 30140 305 5.640 0.050 -3.000 0.434 3.38E+00 31564.2513 5.1605 -3.1646<br />

PG2317+046 33177 797 4.504 0.104 -3.000 0.434 1.54E+00 40658.1786 4.6877 -3.3510<br />

PG2318+239 16940 296 3.778 0.036 -1.392 0.075 1.47E+00 20890.7160 4.2557 -1.8445<br />

PG2321+214 38502 268 4.977 0.063 2.314 0.179 1.52E+00 39171.8164 5.0203 2.4143<br />

PG2331+038 29017 428 5.642 0.054 -2.401 0.109 3.59E+00 32370.5612 5.8341 -2.7235<br />

PG2337+070 29563 670 5.735 0.060 -1.997 0.086 7.85E-01 29133.8280 5.8738 -1.4688<br />

PG2339+199 30763 396 4.189 0.043 1.042 0.009 4.17E+00 33495.0283 4.4077 1.0705<br />

PG2345+241 17743 288 3.699 0.044 -0.954 0.046 6.76E-01 19581.9082 3.8835 -1.0702<br />

PG2349+002 28383 334 5.600 0.053 -3.000 0.434 2.00E+00 26081.2311 5.2581 -4.0008<br />

PG2351+198 14539 269 3.768 0.046 -1.380 0.094 1.64E+00 13733.0764 4.0071 -1.6889<br />

PG2352+181 47309 246 5.873 0.089 0.264 0.021 2.81E+00 44348.4299 5.7750 -0.4701<br />

PG2358+107 23768 966 4.978 0.105 -2.685 0.210 3.04E+00 24626.4068 4.9117 -3.6593<br />

continued on next page<br />

172 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table A.1: continued<br />

SFIT Results<br />

ANN Results<br />

Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />

(K) (cgs) (K) (cgs)<br />

PHL 1079 31862 389 5.589 0.055 -2.303 0.087 1.32E+00 31739.7364 5.6372 -2.0741<br />

PHL 4 40197 685 5.000 0.093 -1.254 0.062 6.59E-01 50528.5221 5.6359 -2.8566<br />

TON 107 39369 266 5.602 0.039 -0.076 0.003 4.42E+00 36793.9981 4.8072 -0.2664<br />

VZ1128 M3 34893 388 4.500 0.058 -0.968 0.044 8.72E-01 35857.4448 4.3289 -2.0402<br />

173<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Appendix B<br />

Results for 282 SDSS DR3 Hot<br />

Subdwarf Candidates<br />

This table lists <strong>the</strong> classification and parameterisation results for <strong>the</strong> SDSS hot subdwarf<br />

candidates <strong>of</strong> Chapter 5. Also listed for each star are its position and redshift as<br />

obtained by <strong>the</strong> SDSS. The internal errors <strong>of</strong> SFIT are given, along with <strong>the</strong> value <strong>of</strong><br />

χ 2 for <strong>the</strong> best fit.<br />

175


Table B.1: Results for 282 SDSS Hot Subdwarf Candidates<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J000607.88-010320.8 00:06:07.88 -01:03:20.8 -279.634 26.6456 sdO7VII:He39 42788 155 5.280 0.060 2.093 0.108 6.17E+00<br />

J001651.42-011329.3 00:16:51.42 -01:13:29.3 115.816 44.9755 sdO2VII:He26 47737 3077 3.726 0.083 0.364 0.044 2.59E+00<br />

J001837.14+152150.0 00:18:37.14 +15:21:50.0 -73.1296 25.5077 sdO9VII:He39 38579 94 4.555 0.026 2.522 0.433 6.46E+00<br />

J001930.36+135530.9 00:19:30.36 +13:55:30.9 -225.249 34.6605 sdB0VI:He12 29686 475 5.382 0.045 -1.120 0.023 6.17E+00<br />

J002323.99-002953.3 00:23:23.99 -00:29:53.3 -5.64728 33.4529 sdB3VI:He5 30771 241 5.746 0.042 -1.873 0.032 6.68E-01<br />

J002852.26+135446.5 00:28:52.26 +13:54:46.5 35.3602 31.5028 sdO9VI:He8 33187 387 4.944 0.062 -2.692 0.214 9.48E-01<br />

J004233.43+004717.6 00:42:33.43 +00:47:17.6 24.2894 32.9952 sdB4V:He1 29989 668 4.987 0.080 -3.000 0.434 2.73E+00<br />

J011506.17+140513.5 01:15:06.17 +14:05:13.5 -162.802 29.9172 sdB3IV:He7 32000 380 4.545 0.056 -2.991 0.425 7.56E+00<br />

J013847.59+141532.1 01:38:47.59 +14:15:32.1 -188.872 33.771 sdO9VIII:He7 27012 612 4.972 0.072 -2.212 0.071 7.17E+00<br />

J015026.10-094226.9 01:50:26.10 -09:42:26.9 -18.51 33.4439 sdB8VI:He10 33657 341 5.303 0.029 -1.402 0.022 1.58E+00<br />

J021617.11-095513.1 02:16:17.11 -09:55:13.1 -29.2216 33.6757 sdO8VI:He12 35372 180 5.561 0.034 -1.195 0.027 9.74E-01<br />

J023032.65-081439.5 02:30:32.65 -08:14:39.5 -194.8 28.2265 sdB6IV:He3 13683 176 3.755 0.037 -1.399 0.065 7.83E-01<br />

J031620.13+004222.9 03:16:20.13 +00:42:22.9 -23.1856 31.5085 sdB0V:He8 32906 81 5.692 0.016 -2.227 0.073 1.67E+00<br />

J031854.14+004135.0 03:18:54.14 +00:41:35.0 6.24576 32.428 sdB7IV:He2 20554 512 4.500 0.063 -2.047 0.097 1.82E+00<br />

J033358.21+002007.5 03:33:58.21 +00:20:07.5 117.167 34.2372 sdB2VI:He13 34164 234 5.270 0.036 -0.771 0.017 6.59E+00<br />

J073712.28+264224.7 07:37:12.28 +26:42:24.7 8.71841 31.8302 sdO9VI:He7 32799 75 5.788 0.019 -2.278 0.082 1.71E+00<br />

J073856.99+401942.1 07:38:56.99 +40:19:42.1 -202.597 22.922 sdO1VII:He34 50000 547 5.495 0.087 0.644 0.047 2.18E+00<br />

J074001.91+240127.4 07:40:01.91 +24:01:27.4 -130.711 32.684 sdA1IV:He0 30156 269 5.085 0.042 -2.991 0.425 7.72E+00<br />

J074458.10+324259.9 07:44:58.10 +32:42:59.9 50.6134 38.7134 sdO9VII:He24 38824 74 5.996 0.025 -0.512 0.010 8.41E+00<br />

J074534.16+372718.6 07:45:34.16 +37:27:18.6 29.853 33.6007 sdB0VII:He4 35000 155 5.394 0.033 -3.000 0.434 2.44E+00<br />

J074613.17+333307.6 07:46:13.17 +33:33:07.6 -13.6868 23.2616 sdO7VII:He39 44962 100 6.000 0.031 1.261 0.047 1.60E+00<br />

J074720.59+384910.7 07:47:20.59 +38:49:10.7 25.1426 29.7373 sdB7IV:He3 16696 250 3.821 0.047 -1.588 0.101 2.74E+00<br />

continued on next page<br />

176 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J074806.15+342927.7 07:48:06.15 +34:29:27.7 -54.5131 34.0109 sdB0VI:He6 33653 269 5.478 0.053 -1.697 0.043 1.38E+00<br />

J074811.34+435239.6 07:48:11.34 +43:52:39.6 26.4415 32.7577 sdB1VI:He8 25154 617 5.003 0.067 -1.463 0.025 1.29E+00<br />

J075236.78+441642.5 07:52:36.78 +44:16:42.5 -102.636 34.6248 sdB1V:He9 34243 241 5.496 0.019 -1.059 0.020 2.63E+00<br />

J075249.96+305935.2 07:52:49.96 +30:59:35.2 144.115 29.591 sdB4V:He34 39999 182 5.576 0.027 1.134 0.012 9.97E+00<br />

J080259.80+411438.0 08:02:59.80 +41:14:38.0 -26.9931 22.3976 sdO7VII:He39 44340 95 6.000 0.030 1.261 0.047 1.68E+00<br />

J080628.10+323059.4 08:06:28.10 +32:30:59.4 -38.8078 32.5904 sdB2V:He11 30983 240 5.650 0.035 -1.332 0.019 8.91E-01<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

J080726.80+303501.8 08:07:26.80 +30:35:01.8 -65.0792 28.5258 sdB5IV:He4 13573 172 3.697 0.035 -1.552 0.062 8.53E-01<br />

J081342.92+275034.8 08:13:42.92 +27:50:34.8 -5.38343 32.9205 sdB3VI:He2 27634 681 5.500 0.070 -2.294 0.085 1.28E+00<br />

J081540.66+430524.5 08:15:40.66 +43:05:24.5 131.458 38.0934 sdB4V:He32 37418 195 5.536 0.033 -0.307 0.008 1.52E+01<br />

J081607.91+480349.7 08:16:07.91 +48:03:49.7 -26.1981 31.9453 sdB2VI:He5 24992 545 5.359 0.058 -2.293 0.085 1.18E+00<br />

J082751.06+410925.9 08:27:51.06 +41:09:25.9 -21.3762 30.7086 sdO5VII:He36 48823 522 5.503 0.084 0.397 0.023 1.30E+01<br />

J082802.04+404009.0 08:28:02.04 +40:40:09.0 -182.813 30.9737 sdO4VII:He11 36831 344 5.088 0.050 -2.117 0.114 3.48E+00<br />

J083006.17+475150.4 08:30:06.17 +47:51:50.4 -6.3029 32.2109 sdB0VII:He3 27934 1042 5.330 0.084 -2.892 0.339 7.32E-01<br />

J083241.96+483445.1 08:32:41.96 +48:34:45.1 26.2838 29.9143 sdB4V:He5 18422 288 4.241 0.036 -1.521 0.072 2.29E+00<br />

J083456.98+422053.2 08:34:56.98 +42:20:53.2 13.8063 28.1785 sdB6III:He5 10816 93 3.075 0.028 -1.789 0.187 3.02E+00<br />

J083842.71+053309.5 08:38:42.71 +05:33:09.5 68.4768 32.1614 sdO9VII:He5 30554 214 5.502 0.047 -2.281 0.083 9.13E-01<br />

J083935.91+030840.8 08:39:35.91 +03:08:40.8 21.5494 31.5714 sdB0VI:He10 35310 415 4.854 0.057 -2.592 0.170 7.74E-01<br />

J084122.67+063029.6 08:41:22.67 +06:30:29.6 -11.2503 31.7429 sdB0VI:He6 32357 79 5.620 0.015 -2.141 0.060 8.36E-01<br />

J084413.77+023229.3 08:44:13.77 +02:32:29.3 268.878 31.7126 sdB6IV:He0 13225 156 4.036 0.033 -2.156 0.124 3.87E+00<br />

J084556.16+542357.6 08:45:56.16 +54:23:57.6 6.03341 31.7933 sdB5IV:He11 30069 344 5.053 0.036 -1.170 0.026 1.02E+01<br />

J084727.88+024814.8 08:47:27.88 +02:48:14.8 136.219 28.589 sdB5IV:He2 13429 182 3.539 0.035 -2.100 0.164 1.47E+00<br />

J085422.40+013651.0 08:54:22.40 +01:36:51.0 -7.05121 26.6296 sdB0VI:He25 34383 203 5.287 0.037 -0.784 0.018 1.70E+00<br />

continued on next page<br />

177


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J085650.28+401730.9 08:56:50.28 +40:17:30.9 -31.3088 19.8957 sdO4VI:He11 27006 1513 3.878 0.098 -3.000 0.434 1.20E+00<br />

J085727.66+424215.4 08:57:27.66 +42:42:15.4 192.101 26.8266 sdB3VI:He35 38783 151 5.406 0.025 0.689 0.018 9.33E+00<br />

J085900.33+023313.1 08:59:00.33 +02:33:13.1 19.9144 33.5645 sdB0VI:He4 33042 250 5.451 0.050 -1.685 0.042 9.75E-01<br />

J090559.15+055442.1 09:05:59.15 +05:54:42.1 316.248 26.3361 sdA0III:He3 12411 140 3.377 0.044 -1.647 0.192 6.02E+00<br />

J091225.13+421922.5 09:12:25.13 +42:19:22.5 -82.0724 32.7113 sdB4V:He6 30073 349 5.150 0.044 -2.647 0.193 2.67E+00<br />

J091544.44+511338.8 09:15:44.44 +51:13:38.8 -19.1172 29.6024 sdB2V:He6 34140 265 5.386 0.051 -1.514 0.028 1.92E+00<br />

J092436.41+040135.7 09:24:36.41 +04:01:35.7 114.477 29.0217 sdB4IV:He4 15224 225 3.614 0.032 -1.715 0.135 2.78E+00<br />

J092520.70+470330.6 09:25:20.70 +47:03:30.6 38.5098 33.0974 sdB5VI:He4 29376 480 5.196 0.054 -2.459 0.125 3.43E+00<br />

J092634.88+473036.0 09:26:34.88 +47:30:36.0 -6.49944 26.6692 sdB3V:He4 17256 215 4.128 0.038 -1.783 0.053 8.97E+00<br />

J092830.55+561811.8 09:28:30.55 +56:18:11.8 -135.281 31.0954 sdO6VI:He18 41016 436 5.000 0.067 -0.737 0.030 1.50E+00<br />

J093059.63+025032.4 09:30:59.63 +02:50:32.4 1.1058 33.0569 sdB1VI:He7 30308 171 5.537 0.037 -2.122 0.058 1.11E+00<br />

J093215.32-002108.5 09:32:15.32 -00:21:08.5 178.664 28.1481 sdB4V:He5 14602 166 3.723 0.036 -2.053 0.147 2.31E+00<br />

J093245.91+081618.6 09:32:45.91 +08:16:18.6 83.619 33.7491 sdB4VI:He4 32561 239 5.360 0.042 -1.662 0.040 1.54E+00<br />

J093322.20+440322.7 09:33:22.20 +44:03:22.7 -49.4753 29.8445 sdB4V:He2 16544 229 4.143 0.026 -1.629 0.111 2.44E+00<br />

J093549.72+544101.0 09:35:49.72 +54:41:01.0 -99.729 34.2498 sdB0VI:He4 36231 258 5.497 0.054 -1.463 0.025 3.48E+00<br />

J094143.53+535833.4 09:41:43.53 +53:58:33.4 -41.6598 28.0576 sdB5IV:He2 12276 163 3.500 0.041 -2.066 0.152 1.12E+00<br />

J094346.62+531429.1 09:43:46.62 +53:14:29.1 -150.267 25.7962 sdB1V:He26 35458 147 5.043 0.030 -0.573 0.014 3.93E+00<br />

J094623.03+040456.1 09:46:23.03 +04:04:56.1 55.3774 34.0528 sdB0VI:He9 37074 124 6.000 0.032 -1.395 0.022 1.01E+00<br />

J094900.45+025702.9 09:49:00.45 +02:57:02.9 245.283 29.1801 sdB8IV:He0 14000 212 3.251 0.040 -2.259 0.158 1.88E+01<br />

J095101.29+034757.0 09:51:01.29 +03:47:57.0 136.159 32.8147 sdB1VI:He3 30001 560 5.417 0.056 -2.228 0.073 9.20E-01<br />

J095847.23+602147.4 09:58:47.23 +60:21:47.4 -298.171 27.6608 sdB5IV:He2 12650 125 3.631 0.031 -1.726 0.115 9.37E-01<br />

J100019.99-003413.3 10:00:19.99 -00:34:13.3 90.0055 48.3655 sdO3VIII:He15 45730 659 5.151 0.072 -0.716 0.043 4.93E+00<br />

continued on next page<br />

178 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J100317.05+025510.4 10:03:17.05 +02:55:10.4 203.522 29.2731 sdB3VI:He34 38522 164 5.908 0.032 0.045 0.001 8.16E+00<br />

J100740.10+454252.5 10:07:40.10 +45:42:52.5 2.52572 33.5207 sdF0IV:He3 30057 388 5.239 0.051 -2.987 0.421 4.46E+00<br />

J101025.64+045357.0 10:10:25.64 +04:53:57.0 94.7419 30.4631 sdB7II:He5 15489 177 4.171 0.034 -2.103 0.110 8.80E+00<br />

J101213.21+064030.7 10:12:13.21 +06:40:30.7 96.1234 36.4263 sdO4VII:He26 42824 249 5.928 0.056 -0.568 0.010 2.14E+00<br />

J101218.95+004413.4 10:12:18.95 +00:44:13.4 -16.2037 36.3286 sdB0VII:He9 35915 227 5.373 0.042 -1.194 0.027 3.83E+00<br />

J101242.22+484937.4 10:12:42.22 +48:49:37.4 30.3123 31.632 sdB2VI:He4 28000 423 5.109 0.042 -2.195 0.068 1.27E+00<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

J101640.84-010900.6 10:16:40.84 -01:09:00.6 7.0939 31.8275 sdO9VI:He5 30235 296 5.000 0.054 -2.659 0.198 2.13E+00<br />

J102057.16+013751.4 10:20:57.16 +01:37:51.4 238.157 32.448 sdB1VI:He3 28380 289 5.066 0.045 -3.000 0.434 1.43E+00<br />

J102120.45+444636.9 10:21:20.45 +44:46:36.9 -95.3694 22.147 sdO8VII:He40 45002 100 5.828 0.030 1.261 0.047 4.02E+00<br />

J102320.37+462026.8 10:23:20.37 +46:20:26.8 38.9026 20.1206 sdO6VI:He35 45164 147 5.814 0.034 1.272 0.057 5.39E+00<br />

J103022.08+020524.3 10:30:22.08 +02:05:24.3 62.5766 28.0526 sdB7IV:He3 13334 236 3.487 0.045 -2.140 0.120 1.92E+00<br />

J103549.68+092551.9 10:35:49.68 +09:25:51.9 193.763 23.8682 sdO5VII:He39 45000 147 5.745 0.048 1.473 0.232 1.78E+00<br />

J103854.02+525847.8 10:38:54.02 +52:58:47.8 49.5557 30.5354 sdB2VI:He2 27999 411 5.289 0.045 -2.582 0.166 9.41E-01<br />

J104248.95+033355.4 10:42:48.95 +03:33:55.4 370.055 29.0912 sdB0VII:He9 34770 212 5.105 0.041 -2.269 0.081 3.92E+00<br />

J105608.43+034821.3 10:56:08.43 +03:48:21.3 6.83794 29.6862 sdA0III:He-0 14204 151 3.805 0.037 -1.997 0.129 4.82E+00<br />

J110053.56+034622.8 11:00:53.56 +03:46:22.8 278.131 37.5145 sdO0VIII:He18 46166 513 5.616 0.056 -1.181 0.039 3.28E+00<br />

J110215.46+024034.2 11:02:15.46 +02:40:34.2 25.8182 27.2368 sdO3VII:He36 50000 208 5.804 0.066 0.364 0.020 3.51E+00<br />

J110255.98+521858.2 11:02:55.98 +52:18:58.2 -179.727 20.5426 sdO6VII:He38 45355 380 5.549 0.045 0.588 0.030 3.77E+00<br />

J110256.32+010012.3 11:02:56.32 +01:00:12.3 301.18 31.4893 sdB4V:He9 15645 228 3.701 0.037 -1.483 0.053 1.35E+01<br />

J110302.37-010338.7 11:03:02.37 -01:03:38.7 139.329 33.2725 sdO9VII:He4 29649 500 5.096 0.052 -2.579 0.165 1.46E+00<br />

J110445.01+092530.9 11:04:45.01 +09:25:30.9 182.534 29.6543 sdO5VI:He8 30800 333 4.500 0.059 -2.692 0.214 1.38E+00<br />

J111438.57-004024.1 11:14:38.57 -00:40:24.1 110.919 35.1207 sdO1VII:He21 42076 569 5.272 0.055 -0.866 0.038 5.76E+00<br />

continued on next page<br />

179


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J111633.30+052507.9 11:16:33.30 +05:25:07.9 -72.9026 27.5999 sdO9VI:He29 34332 221 4.894 0.037 -0.496 0.010 6.62E+00<br />

J112056.23+093641.8 11:20:56.23 +09:36:41.8 143.332 30.1321 sdO5VII:He11 35000 357 4.714 0.055 -2.584 0.167 1.68E+00<br />

J112242.70+613758.5 11:22:42.70 +61:37:58.5 -64.5492 33.3237 sdB3VI:He6 30001 495 5.622 0.043 -2.053 0.049 8.95E-01<br />

J112504.73+671658.3 11:25:04.73 +67:16:58.3 -88.07 25.0661 sdO9VI:He12 34249 130 4.948 0.033 -1.800 0.082 1.91E+00<br />

J112719.00+660538.7 11:27:19.00 +66:05:38.7 -8.00635 34.4941 sdB5V:He6 30552 197 5.303 0.042 -2.661 0.199 7.43E+00<br />

J113312.13+010824.9 11:33:12.13 +01:08:24.9 535.441 27.4911 sdB5IV:He4 11978 183 3.412 0.047 -2.020 0.319 6.60E+00<br />

J113840.69-003531.8 11:38:40.69 -00:35:31.8 -81.2527 32.8276 sdF5VI:He3 30867 228 5.424 0.048 -2.209 0.070 7.48E-01<br />

J113935.45+614954.0 11:39:35.45 +61:49:54.0 -3.84451 33.6331 sdB0VI:He3 29767 646 5.000 0.079 -2.995 0.430 2.80E+00<br />

J114352.74+660723.4 11:43:52.74 +66:07:23.4 -131.878 35.8462 sdB3VII:He2 30474 218 5.389 0.046 -2.096 0.054 6.31E+00<br />

J114417.53-012914.1 11:44:17.53 -01:29:14.1 286.516 30.0023 sdB4V:He2 13292 160 3.583 0.032 -2.300 0.173 3.84E+00<br />

J114821.30+033625.8 11:48:21.30 +03:36:25.8 -4.57768 35.4397 sdB5V:He5 33168 178 5.686 0.036 -1.602 0.035 6.88E+00<br />

J115009.49+061042.1 11:50:09.49 +06:10:42.1 44.1678 35.0436 sdO3V:He19 40640 288 5.147 0.057 -0.844 0.030 8.63E+00<br />

J115101.04+541003.5 11:51:01.04 +54:10:03.5 -210.473 28.6527 sdO6VI:He8 32000 602 4.484 0.077 -2.241 0.076 2.78E+00<br />

J115115.19-015255.2 11:51:15.19 -01:52:55.2 108.73 21.8454 sdB6V:He12 16099 329 2.774 0.052 -2.995 0.430 1.35E+01<br />

J115654.09-032510.2 11:56:54.09 -03:25:10.2 48.0882 26.4144 sdO9V:He14 32698 279 4.499 0.043 -1.126 0.029 2.16E+00<br />

J115716.38+612410.8 11:57:16.38 +61:24:10.8 -160.892 34.0894 sdB3VII:He1 29999 521 5.224 0.057 -3.000 0.434 2.15E+00<br />

J120311.26+045419.6 12:03:11.26 +04:54:19.6 317.09 31.3655 sdB8IV:He0 12883 188 3.596 0.033 -2.212 0.142 1.39E+01<br />

J120626.55+663352.5 12:06:26.55 +66:33:52.5 -83.8421 22.4985 sdO7VII:He39 43748 87 5.875 0.032 2.007 0.132 2.07E+00<br />

J121123.37+611203.9 12:11:23.37 +61:12:03.9 -147.955 39.0234 sdB7V:He9 32971 192 5.833 0.041 -1.924 0.073 7.23E+00<br />

J121424.81+550226.3 12:14:24.81 +55:02:26.3 -69.0878 35.2988 sdO5VII:He12 41632 331 5.253 0.046 -1.631 0.056 2.43E+00<br />

J121625.83-014804.6 12:16:25.83 -01:48:04.6 67.9746 31.4899 sdB1V:He5 12307 116 3.392 0.041 -1.159 0.087 1.45E+01<br />

J121643.73+020835.9 12:16:43.73 +02:08:35.9 65.6063 30.4367 sdO8VI:He38 40933 127 5.595 0.021 0.065 0.001 1.07E+01<br />

continued on next page<br />

180 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J122057.48-012642.4 12:20:57.48 -01:26:42.4 -4.416 32.305 sdB9III:He1 18548 245 4.613 0.038 -1.862 0.063 1.10E+01<br />

J122444.98+583313.9 12:24:44.98 +58:33:13.9 -85.1992 29.6831 sdB2IV:He-0 11106 135 3.124 0.037 -2.291 0.170 1.74E+01<br />

J122637.12+575927.6 12:26:37.12 +57:59:27.6 -194.41 31.7309 sdB0VII:He3 30382 187 5.350 0.042 -2.718 0.227 2.14E+00<br />

J123808.66+053318.2 12:38:08.66 +05:33:18.2 25.9705 24.1427 sdO4VII:He38 44999 127 5.352 0.041 2.125 0.289 3.96E+00<br />

J123821.49-021211.5 12:38:21.49 -02:12:11.5 83.2689 33.3849 sdO1VIII:He29 48017 516 5.158 0.081 0.210 0.008 7.27E+00<br />

J124706.79-003925.9 12:47:06.79 -00:39:25.9 7.95478 36.143 sdO9VII:He4 36859 318 5.551 0.042 -3.000 0.434 9.50E-01<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

J124728.16+562958.3 12:47:28.16 +56:29:58.3 -159.902 33.0009 sdO9VII:He1 24936 896 4.876 0.085 -2.823 0.289 3.60E+00<br />

J124819.08+035003.3 12:48:19.08 +03:50:03.3 -56.2564 26.8671 sdO6VIII:He16 49129 1057 5.500 0.069 -1.226 0.051 2.62E+00<br />

J125229.60-030129.6 12:52:29.60 -03:01:29.6 47.7554 33.4044 sdB0V:He6 30736 221 5.454 0.047 -2.345 0.096 1.46E+00<br />

J125248.84+521604.1 12:52:48.84 +52:16:04.1 -128.9 29.8305 sdB7III:He4 13184 155 3.605 0.032 -1.715 0.090 8.25E+00<br />

J125328.45+042044.0 12:53:28.45 +04:20:44.0 -1.7345 29.2904 sdB4IV:He3 12250 168 3.500 0.043 -2.231 0.148 2.41E+00<br />

J125408.32+014324.1 12:54:08.32 +01:43:24.1 2.43989 23.7974 sdO6VII:He40 45000 100 5.999 0.031 1.311 0.053 2.30E+00<br />

J125410.86-010408.4 12:54:10.86 -01:04:08.4 88.5476 33.4203 sdB3IV:He6 20858 320 4.632 0.039 -1.689 0.042 5.36E+00<br />

J125941.88-003928.8 12:59:41.88 -00:39:28.8 45.2168 31.1712 sdO5VIII:He15 34375 213 5.149 0.025 -1.570 0.048 2.75E+00<br />

J130025.53+004530.2 13:00:25.53 +00:45:30.2 69.5908 33.3195 sdO9VII:He16 38249 102 5.756 0.025 -0.845 0.021 1.90E+00<br />

J130059.21+005711.8 13:00:59.21 +00:57:11.8 -30.0593 28.2369 sdB0VII:He25 37650 234 5.212 0.034 -0.443 0.010 2.90E+00<br />

J131425.39+011153.4 13:14:25.39 +01:11:53.4 -116.094 28.1469 sdB5IV:He2 13063 134 3.663 0.032 -1.696 0.086 6.81E-01<br />

J131452.97+023740.3 13:14:52.97 +02:37:40.3 -67.101 30.4253 sdB9IV:He2 17290 316 4.000 0.047 -1.858 0.063 5.24E+00<br />

J131638.48+034818.5 13:16:38.48 +03:48:18.5 6.70663 27.1188 sdB1V:He9 32528 59 5.250 0.031 -1.376 0.021 2.06E+00<br />

J131658.35+641522.5 13:16:58.35 +64:15:22.5 -178.639 24.6498 sdB1VI:He14 34296 190 5.544 0.038 -1.427 0.023 2.10E+00<br />

J131745.80+010450.4 13:17:45.80 +01:04:50.4 60.094 30.6874 sdB1VI:He3 26694 850 4.929 0.089 -2.159 0.063 1.48E+00<br />

J131916.15-011405.0 13:19:16.15 -01:14:05.0 227.376 30.5863 sdB4VI:He3 16894 205 4.132 0.035 -1.737 0.047 1.39E+00<br />

continued on next page<br />

181


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J132503.17+043239.4 13:25:03.17 +04:32:39.4 -179.976 30.226 sdB9V:He3 12173 89 3.128 0.023 -0.750 0.047 1.16E+01<br />

J132556.94-032329.6 13:25:56.94 -03:23:29.6 66.5581 34.6956 sdB5VI:He10 41733 589 5.500 0.064 -1.997 0.129 5.42E+00<br />

J132619.95+035754.4 13:26:19.95 +03:57:54.4 -82.3389 32.8285 sdO9VI:He13 34266 169 5.312 0.011 -1.206 0.028 1.41E+00<br />

J133200.96+673325.8 13:32:00.96 +67:33:25.8 -101.751 33.9539 sdO9VII:He7 34458 233 5.474 0.048 -1.532 0.030 2.48E+00<br />

J133449.26+041014.9 13:34:49.26 +04:10:14.9 83.6733 27.9956 sdB0VII:He29 34610 210 4.778 0.033 -0.136 0.004 5.32E+00<br />

J133546.10+555429.8 13:35:46.10 +55:54:29.8 -21.676 31.3781 sdB4V:He3 16963 259 4.022 0.039 -1.767 0.051 9.94E+00<br />

J133757.40-005647.2 13:37:57.40 -00:56:47.2 -159.448 30.4751 sdB6III:He6 12195 126 3.450 0.037 -2.019 0.181 5.28E+00<br />

J134344.11+465825.3 13:43:44.11 +46:58:25.3 3.03651 30.0884 sdO5VIII:He10 29224 223 4.133 0.047 -3.000 0.434 7.49E+00<br />

J134545.24-000641.6 13:45:45.24 -00:06:41.6 -3.70993 37.7028 sdO8VII:He28 35903 207 5.268 0.035 -0.269 0.006 3.79E+00<br />

J134600.55+052034.3 13:46:00.55 +05:20:34.3 -39.3819 33.4071 sdB4V:He0 30255 205 5.216 0.041 -3.000 0.434 1.85E+00<br />

J134948.30-024639.3 13:49:48.30 -02:46:39.3 -149.397 31.2636 sdB4IV:He3 11972 150 3.635 0.044 -2.284 0.167 7.21E+00<br />

J135140.69+023429.2 13:51:40.69 +02:34:29.2 20.1812 31.8161 sdO9VI:He4 33545 283 5.190 0.048 -3.000 0.434 1.74E+00<br />

J135707.35+010454.4 13:57:07.35 +01:04:54.4 -164.908 27.8615 sdO7VIII:He29 31999 274 3.749 0.049 0.597 0.027 4.46E+00<br />

J135746.59+530758.7 13:57:46.59 +53:07:58.7 -88.3881 32.612 sdB2VI:He2 29997 447 5.104 0.044 -2.552 0.155 3.56E+00<br />

J140118.74-012024.8 14:01:18.74 -01:20:24.8 -138.416 33.4862 sdO6VII:He7 30768 263 5.000 0.053 -2.479 0.131 2.19E+00<br />

J140252.20+465918.5 14:02:52.20 +46:59:18.5 -178.233 32.0421 sdB5VI:He6 29111 465 5.131 0.052 -2.315 0.090 5.23E+00<br />

J140545.26+014419.1 14:05:45.26 +01:44:19.1 -66.9086 31.5385 sdO9VI:He6 28704 421 5.389 0.051 -1.645 0.038 1.07E+00<br />

J140715.42+033147.6 14:07:15.42 +03:31:47.6 31.4863 28.4411 sdO8VI:He35 49941 546 5.500 0.087 0.572 0.039 6.76E+00<br />

J140839.10+653124.4 14:08:39.10 +65:31:24.4 -170.614 33.9581 sdB0VI:He4 30105 320 5.122 0.045 -3.000 0.434 2.97E+00<br />

J141812.51-024427.0 14:18:12.51 -02:44:27.0 -183.21 30.1228 sdO1VIII:He32 50000 209 5.951 0.067 0.364 0.020 2.07E+00<br />

J142226.93-023100.5 14:22:26.93 -02:31:00.5 129.172 31.5639 sdO6V:He2 14553 198 3.828 0.040 -1.470 0.090 1.70E+01<br />

J142339.81+014947.3 14:23:39.81 +01:49:47.3 27.9633 33.792 sdB4V:He6 30705 324 5.509 0.049 -1.753 0.049 2.50E+00<br />

continued on next page<br />

182 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J142416.88-014335.0 14:24:16.88 -01:43:35.0 91.6777 27.5647 sdB3V:He3 14999 189 3.634 0.034 -2.168 0.128 9.53E-01<br />

J142459.58+031943.4 14:24:59.58 +03:19:43.4 37.3083 33.0024 sdO8VII:He6 34325 274 5.321 0.038 -1.414 0.023 1.74E+00<br />

J142551.30-013317.3 14:25:51.30 -01:33:17.3 -37.9939 32.3239 sdO2VII:He13 41093 307 5.375 0.046 -1.716 0.045 1.54E+00<br />

J142956.63+563144.0 14:29:56.63 +56:31:44.0 -91.5749 31.4605 sdB1VI:He3 28557 299 5.068 0.046 -2.598 0.172 1.03E+00<br />

J143006.23+510314.1 14:30:06.23 +51:03:14.1 -112.744 21.9973 sdO4VII:He37 45000 99 5.767 0.030 1.261 0.047 2.27E+00<br />

J143153.06-002824.3 14:31:53.06 -00:28:24.3 -31.9216 39.9306 sdO6VIII:He7 35581 232 5.499 0.043 -0.977 0.020 5.72E+00<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

J143917.64+010250.8 14:39:17.64 +01:02:50.8 28.1612 30.2275 sdB1VI:He6 19939 368 4.134 0.043 -1.676 0.041 1.34E+00<br />

J143917.64+010251.1 14:39:17.64 +01:02:51.1 24.1876 30.1996 sdB5V:He3 19464 338 4.116 0.041 -1.709 0.044 2.06E+00<br />

J144024.72+022118.7 14:40:24.72 +02:21:18.7 -35.8459 36.2062 sdO6VI:He11 34898 91 5.332 0.021 -1.067 0.020 3.55E+00<br />

J144141.37+450651.4 14:41:41.37 +45:06:51.4 -154.864 30.9137 sdB2VI:He6 19982 484 4.500 0.059 -1.690 0.043 2.50E+00<br />

J144301.69+514410.3 14:43:01.69 +51:44:10.3 -171.018 29.5335 sdB8IV:He4 19181 308 4.358 0.050 -1.592 0.034 1.22E+00<br />

J144346.62+491733.7 14:43:46.62 +49:17:33.7 -80.1405 35.8834 sdO8VIII:He12 35572 224 5.421 0.048 -1.450 0.024 1.91E+00<br />

J144514.93+000249.0 14:45:14.93 +00:02:49.0 0.302733 25.936 sdB1VII:He12 34682 78 5.019 0.033 -1.399 0.033 3.37E+00<br />

J144709.20+511639.8 14:47:09.20 +51:16:39.8 -148.603 33.0281 sdO7VI:He34 45000 100 5.908 0.031 1.260 0.047 1.74E+01<br />

J144737.76+020942.6 14:47:37.76 +02:09:42.6 35.4762 34.5196 sdO5VII:He11 32773 127 5.563 0.016 -2.085 0.053 7.14E+00<br />

J145049.50+624940.9 14:50:49.50 +62:49:40.9 -159.082 31.7249 sdB0VI:He2 28051 771 4.739 0.071 -3.000 0.434 6.07E+00<br />

J145426.67+472004.4 14:54:26.67 +47:20:04.4 21.6808 32.273 sdB2VI:He5 29437 522 5.478 0.056 -2.180 0.066 9.68E-01<br />

J145606.42+500155.3 14:56:06.42 +50:01:55.3 -69.5387 32.8387 sdB5V:He8 30764 300 5.343 0.044 -1.598 0.034 1.78E+00<br />

J145657.73+495310.8 14:56:57.73 +49:53:10.8 -39.8157 24.9202 sdB1VI:He9 33478 223 5.348 0.046 -1.547 0.031 1.62E+00<br />

J145748.84+561323.5 14:57:48.84 +56:13:23.5 -202.486 30.5689 sdB3VI:He2 12512 109 3.669 0.029 -2.057 0.148 8.49E+00<br />

J150829.03+494051.0 15:08:29.03 +49:40:51.0 -136.073 34.0157 sdB0VII:He1 27313 913 5.012 0.087 -2.359 0.099 4.82E+00<br />

J151030.69-014345.9 15:10:30.69 -01:43:45.9 -152.78 22.8201 sdO7VII:He39 44820 94 5.982 0.033 1.866 0.191 2.06E+00<br />

continued on next page<br />

183


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J151042.06+040955.5 15:10:42.06 +04:09:55.5 -63.2658 33.3369 sdO9VI:He10 34792 231 5.512 0.046 -1.360 0.020 1.64E+00<br />

J151105.38+515956.4 15:11:05.38 +51:59:56.4 -151.756 33.1678 sdB3VI:He1 18406 281 4.286 0.047 -1.793 0.081 1.09E+01<br />

J151231.29+005317.7 15:12:31.29 +00:53:17.7 -47.1969 36.6202 sdB1VI:He29 36493 164 5.723 0.031 -0.486 0.010 7.25E+00<br />

J151306.72+011439.1 15:13:06.72 +01:14:39.1 -93.1191 35.0598 sdB3IV:He2 26002 436 5.037 0.045 -2.301 0.087 3.36E+00<br />

J151415.66-012925.2 15:14:15.66 -01:29:25.2 -121.987 24.7826 sdO5VII:He39 45000 164 5.687 0.044 0.560 0.029 2.33E+00<br />

J151617.94+412948.4 15:16:17.94 +41:29:48.4 -181.258 32.9988 sdB3VI:He2 28962 432 5.175 0.052 -2.447 0.122 3.69E+00<br />

J151743.47+514445.4 15:17:43.47 +51:44:45.4 -79.9052 32.279 sdO8VII:He10 34518 236 5.299 0.046 -1.494 0.027 2.61E+00<br />

J151808.48+041043.7 15:18:08.48 +04:10:43.7 -38.9928 26.0169 sdO9VI:He14 35439 221 5.426 0.048 -1.549 0.031 1.38E+00<br />

J151847.69+551154.2 15:18:47.69 +55:11:54.2 -34.4602 29.707 sdB5V:He3 17317 318 4.000 0.047 -1.605 0.035 2.56E+00<br />

J152332.82+353237.0 15:23:32.82 +35:32:37.0 11.3484 34.981 sdB3VI:He24 36703 193 5.500 0.035 -0.685 0.016 2.50E+00<br />

J152607.88+001640.8 15:26:07.88 +00:16:40.8 87.1892 31.4734 sdO5VI:He34 43975 320 5.050 0.042 -0.072 0.004 2.08E+00<br />

J152833.16+440009.7 15:28:33.16 +44:00:09.7 45.6215 28.5694 sdB7III:He2 12000 184 3.413 0.049 -2.108 0.223 2.66E+00<br />

J153056.33+024222.6 15:30:56.33 +02:42:22.6 -69.1411 21.3778 sdO6VII:He39 44761 109 6.000 0.029 1.163 0.019 1.62E+00<br />

J153204.36+324152.7 15:32:04.36 +32:41:52.7 -152.653 30.2446 sdB0VI:He38 39980 117 5.783 0.018 -0.194 0.004 1.69E+01<br />

J153217.24+454621.0 15:32:17.24 +45:46:21.0 -56.1277 31.2027 sdO8VII:He9 33385 349 5.063 0.024 -1.818 0.057 2.48E+00<br />

J153411.10+543345.3 15:34:11.10 +54:33:45.3 -89.259 33.0638 sdB0VI:He3 31907 321 5.106 0.048 -2.394 0.108 1.78E+00<br />

J153508.52+032456.3 15:35:08.52 +03:24:56.3 11.5925 26.1501 sdB4II:He10 13657 282 3.012 0.047 -1.400 0.065 1.69E+01<br />

J154043.10+435950.1 15:40:43.10 +43:59:50.1 -70.6263 24.0808 sdO7VII:He38 44843 123 5.904 0.033 1.218 0.050 2.40E+00<br />

J154338.69+001202.1 15:43:38.69 +00:12:02.1 -29.8052 32.9385 sdB2VI:He2 29654 499 5.247 0.057 -2.589 0.169 2.86E+00<br />

J154531.02+563944.7 15:45:31.02 +56:39:44.7 -104.244 30.7923 sdB3VI:He5 26509 389 4.933 0.053 -2.039 0.047 1.55E+00<br />

J154809.97-004931.4 15:48:09.97 -00:49:31.4 -28.7556 35.4187 sdO9VI:He4 32645 255 5.499 0.049 -2.978 0.413 1.42E+00<br />

J154830.67+003656.7 15:48:30.67 +00:36:56.7 -126.171 29.7408 sdB4IV:He4 11429 135 3.277 0.035 -0.911 0.052 9.04E-01<br />

continued on next page<br />

184 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J155628.35+011335.0 15:56:28.35 +01:13:35.0 67.4947 34.8931 sdB3V:He2 30988 255 5.425 0.049 -3.000 0.434 8.72E-01<br />

J155642.95+501537.5 15:56:42.95 +50:15:37.5 -173.848 29.5376 sdO2VII:He33 50000 208 5.852 0.066 0.364 0.020 1.69E+00<br />

J160241.13-001207.1 16:02:41.13 -00:12:07.1 -69.1264 35.5392 sdO7VII:He2 31639 283 5.111 0.046 -2.624 0.183 4.62E+00<br />

J160759.27+383746.4 16:07:59.27 +38:37:46.4 -49.3069 34.2912 sdO9VII:He14 34081 201 5.542 0.035 -0.941 0.022 1.90E+00<br />

J160810.18+425845.1 16:08:10.18 +42:58:45.1 -163.815 33.3498 sdB6V:He0 31066 258 5.316 0.047 -3.000 0.434 5.77E+00<br />

J161328.22+004703.2 16:13:28.22 +00:47:03.2 -149.526 32.9946 sdB7IV:He5 21059 530 4.379 0.062 -2.274 0.082 5.34E+00<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

J161418.97+261628.8 16:14:18.97 +26:16:28.8 106.961 33.64 sdB5V:He5 28481 296 5.192 0.048 -2.471 0.128 4.44E+00<br />

J161627.11-002933.0 16:16:27.11 -00:29:33.0 -0.0543509 28.2373 sdO7VI:He39 45000 128 5.573 0.046 2.083 0.210 6.30E+00<br />

J161631.29-003853.3 16:16:31.29 -00:38:53.3 52.866 36.9722 sdB1VII:He6 33369 346 5.202 0.027 -1.568 0.032 2.41E+00<br />

J162250.09+002631.9 16:22:50.09 +00:26:31.9 -15.4032 35.4912 sdB2VI:He5 30731 222 5.410 0.047 -2.238 0.075 1.79E+00<br />

J162256.66+473051.1 16:22:56.66 +47:30:51.1 -68.6 32.3725 sdB2VI:He5 29637 557 5.500 0.059 -1.764 0.050 7.12E-01<br />

J162310.50+425831.2 16:23:10.50 +42:58:31.2 -8.93843 34.4132 sdB6IV:He0 35771 59 5.768 0.025 -2.504 0.139 6.83E+00<br />

J162359.61+375435.3 16:23:59.61 +37:54:35.3 -273.35 27.2163 sdB5V:He3 12662 107 3.617 0.032 -1.457 0.124 1.16E+00<br />

J162535.78+362039.3 16:25:35.78 +36:20:39.3 -218.657 34.2576 sdB5V:He3 30223 254 5.481 0.048 -2.371 0.102 8.39E+00<br />

J162616.71+380710.5 16:26:16.71 +38:07:10.5 -45.5658 21.99 sdO6VII:He40 44792 98 6.000 0.031 1.556 0.094 1.87E+00<br />

J162628.92+370448.6 16:26:28.92 +37:04:48.6 -51.0648 27.457 sdB6IV:He2 12281 171 3.500 0.043 -2.359 0.198 1.25E+00<br />

J162711.81-000950.9 16:27:11.81 -00:09:50.9 34.3853 29.4658 sdB3VII:He34 38699 179 5.927 0.027 0.121 0.003 8.53E+00<br />

J163148.85+372617.2 16:31:48.85 +37:26:17.2 -92.9749 28.006 sdB5III:He2 14660 186 3.539 0.035 -1.962 0.119 2.44E+00<br />

J163306.58+003216.3 16:33:06.58 +00:32:16.3 -61.1765 34.9432 sdB2VI:He1 31471 274 5.390 0.050 -3.000 0.434 1.45E+00<br />

J163446.48-005345.6 16:34:46.48 -00:53:45.6 52.9502 34.1242 sdB4V:He1 29977 519 5.293 0.060 -3.000 0.434 2.42E+00<br />

J163509.13+000235.0 16:35:09.13 +00:02:35.0 41.7581 34.7076 sdB6VI:He1 28046 733 5.097 0.059 -3.000 0.434 3.41E+00<br />

J163702.79-011351.7 16:37:02.79 -01:13:51.7 -20.6207 26.3892 sdO7VI:He39 45001 99 5.664 0.030 1.261 0.047 2.51E+00<br />

continued on next page<br />

185


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J163800.17+010259.7 16:38:00.17 +01:02:59.7 -108.217 29.5044 sdB6V:He3 17182 323 4.000 0.047 -2.192 0.135 1.50E+00<br />

J163815.97-001919.2 16:38:15.97 -00:19:19.2 -136.585 31.9708 sdB4V:He2 20734 493 4.617 0.045 -2.172 0.064 4.45E+00<br />

J163913.62+384957.1 16:39:13.62 +38:49:57.1 -66.4781 29.4706 sdB3V:He2 16016 225 3.671 0.037 -1.574 0.049 3.06E+00<br />

J163936.03+343230.6 16:39:36.03 +34:32:30.6 -228.974 26.0607 sdB2IV:He15 26684 424 4.531 0.046 -1.121 0.023 1.44E+00<br />

J164042.91+311734.6 16:40:42.91 +31:17:34.6 -369.398 41.0764 sdB1VII:He33 30989 270 3.876 0.033 0.945 0.008 1.04E+01<br />

J164122.33+334452.1 16:41:22.33 +33:44:52.1 -49.8183 33.4568 sdO9VI:He7 29235 394 5.530 0.043 -2.085 0.053 1.54E+00<br />

J164204.38+440303.3 16:42:04.38 +44:03:03.3 -336.658 30.8894 sdO9VII:He5 30160 380 4.958 0.058 -2.545 0.152 2.84E+00<br />

J164326.04+330113.2 16:43:26.04 +33:01:13.2 -66.2194 33.4823 sdB2VI:He3 29898 554 5.500 0.059 -2.254 0.078 1.22E+00<br />

J164419.45+452326.8 16:44:19.45 +45:23:26.8 -356.849 35.8627 sdB1VI:He5 32287 245 5.499 0.047 -2.080 0.052 2.67E+00<br />

J164444.94+312345.4 16:44:44.94 +31:23:45.4 -64.8034 31.1266 sdO8VII:He7 32067 275 5.483 0.051 -3.000 0.434 2.03E+00<br />

J165022.05+312749.7 16:50:22.05 +31:27:49.7 8.82859 29.3108 sdB2VI:He26 35685 174 5.204 0.032 0.118 0.003 3.37E+00<br />

J165404.27+303701.8 16:54:04.27 +30:37:01.8 134.707 31.3034 sdB1VI:He6 27306 638 5.502 0.067 -2.177 0.065 1.04E+00<br />

J165422.26+631534.3 16:54:22.26 +63:15:34.3 -20.424 35.2667 sdB2VI:He5 34568 196 5.672 0.037 -1.387 0.021 7.54E-01<br />

J165424.30+303941.3 16:54:24.30 +30:39:41.3 -249.1 28.7513 sdB7IV:He0 13840 313 3.500 0.055 -3.000 0.434 9.77E+00<br />

J165841.83+413115.6 16:58:41.83 +41:31:15.6 -40.2591 30.1816 sdB2VI:He8 32230 294 5.038 0.037 -1.819 0.057 1.59E+00<br />

J170045.67+604308.5 17:00:45.67 +60:43:08.5 -270.047 28.8328 sdO4VII:He36 48271 357 5.869 0.071 0.731 0.052 5.12E+00<br />

J170356.68+341505.0 17:03:56.68 +34:15:05.0 -83.8331 32.767 sdB3VI:He3 28252 453 5.000 0.069 -2.653 0.195 1.91E+00<br />

J170714.27+654025.6 17:07:14.27 +65:40:25.6 -92.8553 34.0747 sdB4VI:He7 34656 250 5.350 0.049 -1.539 0.030 1.04E+00<br />

J171424.17+614711.0 17:14:24.17 +61:47:11.0 -12.9255 33.6796 sdB0VII:He2 32113 394 4.999 0.065 -3.000 0.434 1.81E+00<br />

J171629.93+575121.2 17:16:29.93 +57:51:21.2 -312.791 35.8141 sdO5VIII:He21 34186 229 5.465 0.038 -0.798 0.019 6.08E+00<br />

J171722.10+580558.9 17:17:22.10 +58:05:58.9 -110.822 33.3324 sdO9VI:He9 34248 187 5.153 0.021 -1.578 0.049 1.84E+00<br />

J171813.87+595355.2 17:18:13.87 +59:53:55.2 -109.876 28.7311 sdB5IV:He2 13053 128 3.685 0.032 -2.096 0.163 2.32E+00<br />

continued on next page<br />

186 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J171929.52+273229.3 17:19:29.52 +27:32:29.3 -82.9454 31.6473 sdB1VI:He4 30755 350 4.961 0.061 -3.000 0.434 2.30E+00<br />

J171947.87+591604.3 17:19:47.87 +59:16:04.3 89.6508 29.0229 sdB6IV:He2 16126 217 3.854 0.048 -1.194 0.068 2.05E+00<br />

J172037.66+534009.4 17:20:37.66 +53:40:09.4 -72.6856 35.312 sdO5VI:He7 29304 477 5.200 0.054 -2.384 0.105 8.30E+00<br />

J172338.54+601444.1 17:23:38.54 +60:14:44.1 -41.3729 24.0531 sdO9VI:He12 34424 79 5.235 0.024 -1.353 0.029 2.21E+00<br />

J203729.93+001954.1 20:37:29.93 +00:19:54.1 -79.0277 25.4161 sdO8VII:He9 33681 230 5.173 0.046 -1.762 0.050 1.58E+00<br />

J203826.42+010953.5 20:38:26.42 +01:09:53.5 -113.742 33.1624 sdB4V:He3 29406 458 5.398 0.063 -2.442 0.120 2.16E+00<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

J204546.82-054355.7 20:45:46.82 -05:43:55.7 -44.8684 28.8934 sdO9VI:He8 32441 54 5.167 0.031 -1.477 0.026 3.53E+00<br />

J204658.84-055100.1 20:46:58.84 -05:51:00.1 -57.0742 33.6601 sdB2VI:He3 33102 231 5.450 0.047 -1.545 0.030 1.77E+00<br />

J204726.94-060325.8 20:47:26.94 -06:03:25.8 -1.97117 33.4697 sdB2VI:He13 35395 177 5.648 0.037 -1.330 0.028 3.50E+00<br />

J205030.40-061957.9 20:50:30.40 -06:19:57.9 -489.474 28.5065 sdO5VI:He38 45663 558 5.545 0.047 0.388 0.023 8.00E+00<br />

J210454.89+110645.6 21:04:54.89 +11:06:45.6 -41.5566 31.6509 sdO5VIII:He8 35000 210 5.086 0.037 -3.000 0.434 2.34E+00<br />

J211045.16+000142.1 21:10:45.16 +00:01:42.1 -103.57 31.6623 sdO7VII:He10 33998 369 4.976 0.061 -2.257 0.079 1.96E+00<br />

J211104.97+091042.9 21:11:04.97 +09:10:42.9 157.456 32.7679 sdB4V:He3 29504 485 5.321 0.059 -2.820 0.287 4.77E+00<br />

J211318.37+001738.4 21:13:18.37 +00:17:38.4 -14.4957 21.8039 sdO7VII:He38 45000 100 5.905 0.031 1.475 0.078 1.70E+00<br />

J211338.31-000940.7 21:13:38.31 -00:09:40.7 -23.6116 37.4435 sdB0VII:He7 36649 409 5.500 0.053 -2.148 0.061 3.58E+00<br />

J211339.69+100640.4 21:13:39.69 +10:06:40.4 -65.7901 26.6768 sdO6VII:He9 32140 432 4.916 0.061 -2.154 0.062 2.72E+00<br />

J211425.02+005517.6 21:14:25.02 +00:55:17.6 14.6469 36.5492 sdO9VII:He17 36832 199 5.805 0.038 -1.058 0.020 5.22E+00<br />

J211651.96-003328.5 21:16:51.96 -00:33:28.5 11.2966 32.9385 sdB4V:He5 28210 447 5.495 0.064 -2.409 0.111 6.76E+00<br />

J211921.36+005749.8 21:19:21.36 +00:57:49.8 -49.1174 20.1741 sdO5VII:He38 44901 114 6.000 0.039 1.816 0.170 1.47E+00<br />

J213112.24+112936.2 21:31:12.24 +11:29:36.2 3.08591 34.7669 sdO8VII:He9 35450 179 5.595 0.036 -1.432 0.023 1.28E+00<br />

J213718.87+123303.3 21:37:18.87 +12:33:03.3 -106.859 28.2812 sdB6IV:He2 14718 176 3.601 0.032 -1.630 0.056 2.40E+00<br />

J213808.12+105741.8 21:38:08.12 +10:57:41.8 26.7674 26.7784 sdB2VI:He10 34133 180 5.095 0.017 -1.620 0.036 3.81E+00<br />

continued on next page<br />

187


Table B.1: continued<br />

SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(km s −1 ) (K) (cgs)<br />

J215049.19+010338.4 21:50:49.19 +01:03:38.4 34.2033 36.462 sdB1V:He6 34388 84 5.351 0.023 -1.279 0.033 6.39E+00<br />

J215053.84+131650.6 21:50:53.84 +13:16:50.6 -108.905 33.9992 sdB1VI:He5 30233 248 5.423 0.047 -2.222 0.072 1.65E+00<br />

J215227.25+115726.7 21:52:27.25 +11:57:26.7 5.51462 32.5347 sdB3VI:He5 34586 272 5.273 0.041 -1.354 0.029 3.73E+00<br />

J215307.34-071948.4 21:53:07.34 -07:19:48.4 -13.1178 35.7679 sdB1VI:He7 32449 166 5.683 0.037 -1.996 0.086 1.58E+00<br />

J215631.56+121237.7 21:56:31.56 +12:12:37.7 -66.6073 23.9981 sdO7VII:He40 45581 220 5.891 0.035 1.303 0.052 3.56E+00<br />

J220403.45+122507.3 22:04:03.45 +12:25:07.3 -153.325 28.335 sdB7IV:He1 11740 170 3.500 0.049 -2.050 0.244 1.14E+00<br />

J220810.05+115913.9 22:08:10.05 +11:59:13.9 -69.4664 29.3261 sdB2VI:He7 26862 735 5.000 0.076 -2.150 0.061 3.14E+00<br />

J221816.78+121400.7 22:18:16.78 +12:14:00.7 -74.2256 32.2427 sdB7IV:He4 25415 605 5.000 0.066 -1.422 0.023 2.25E+00<br />

J222238.69+005125.0 22:22:38.69 +00:51:25.0 -114.061 30.8319 sdO9VII:He11 33853 422 4.627 0.058 -3.000 0.434 1.20E+00<br />

J222932.81-004822.5 22:29:32.81 -00:48:22.5 -18.7916 34.4566 sdO9VII:He8 34802 243 5.328 0.044 -1.360 0.030 1.69E+00<br />

J223008.26+132734.2 22:30:08.26 +13:27:34.2 -30.7785 29.1779 sdB5IV:He2 14433 235 3.762 0.036 -2.304 0.175 3.47E+00<br />

J223839.13+122517.9 22:38:39.13 +12:25:17.9 -96.1084 31.3337 sdB0IV:He6 30656 195 5.244 0.042 -3.000 0.434 5.02E+00<br />

J224105.19+141810.2 22:41:05.19 +14:18:10.2 -202.31 30.0845 sdB9III:He0 13763 258 3.501 0.052 -2.669 0.203 1.51E+01<br />

J231956.10-093937.6 23:19:56.10 -09:39:37.6 16.0919 28.428 sdO9VI:He16 35985 299 4.845 0.049 -0.774 0.023 3.23E+00<br />

J233914.00+134214.3 23:39:14.00 +13:42:14.3 -411.792 25.2673 sdO5VI:He39 45402 513 5.500 0.077 0.607 0.030 4.55E+00<br />

J234421.80-101142.8 23:44:21.80 -10:11:42.8 -72.6667 27.5984 sdB7IV:He2 11132 98 3.217 0.026 -0.954 0.054 7.84E-01<br />

J234853.52+151215.5 23:48:53.52 +15:12:15.5 -69.1169 32.1839 sdB0VI:He5 30651 167 5.556 0.035 -2.315 0.090 6.37E-01<br />

J235108.66+002623.0 23:51:08.66 +00:26:23.0 -195.75 28.5503 sdB0VII:He30 38778 177 5.407 0.039 -0.288 0.007 4.21E+00<br />

188 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Appendix C<br />

Results for 83 2MASS-Selected<br />

Hot Subdwarf Candidates<br />

Parameters and classifications are listed in this table for <strong>the</strong> 2MASS-selected stars<br />

obtained from E.M. Green (Green et al., 2006). The internal errors <strong>of</strong> SFIT are given<br />

along with <strong>the</strong> value <strong>of</strong> χ 2 for <strong>the</strong> best fit.<br />

Table C.1: Results for 83 2MASS-Selected Hot Subdwarf Candidates<br />

Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(2MASX J-) (K) (cgs)<br />

Balloon 090900004 sdO7VII:He11 31147 278 4.757 0.054 -1.811 0.056 1.77E+00<br />

BD+48 2721 sdB2VI:He6 22979 240 5.267 0.032 -1.629 0.018 2.54E+00<br />

J011407.62+160800.6 sdB8VI:He5 10795 61 3.156 0.016 -0.368 0.007 1.16E+00<br />

J020656.17+143858.6 sdB1VII:He7 29873 484 5.850 0.046 -1.897 0.034 1.70E+00<br />

J021555.50+234314.3 sdB1VI:He8 32485 57 5.623 0.008 -0.758 0.010 2.60E+00<br />

J021619.04+275902.0 sdB1VI:He6 27594 292 5.719 0.034 -2.100 0.055 1.66E+00<br />

J021742.16+280329.5 sdO9VII:He8 32698 196 5.838 0.033 -1.341 0.019 1.34E+00<br />

J022512.51+234820.7 sdO6VII:He13 38384 119 6.000 0.030 -1.417 0.023 1.86E+00<br />

J030725.66+175248.0 sdB2V:He10 28000 352 5.095 0.030 -0.701 0.010 2.94E+00<br />

J041550.17+015421.0 sdB0VII:He9 32883 197 5.943 0.035 -1.390 0.011 1.38E+00<br />

J042034.85+012041.0 sdO6VII:He38 40547 120 5.117 0.035 1.301 0.017 3.44E+00<br />

J043037.82-010308.3 sdB5VI:He3 13447 91 3.640 0.028 -0.293 0.004 8.08E-01<br />

J074722.07+622545.2 sdB3VI:He12 27665 271 5.752 0.031 -0.696 0.006 8.00E+00<br />

continued on next page<br />

189


190 Chapter C - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

Table C.1: continued<br />

Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(2MASX J-) (K) (cgs)<br />

J075407.66+651540.2 sdB4V:He5 11002 53 2.750 0.020 0.000 0.000 1.42E+00<br />

J075815.66+514348.0 sdB3V:He7 11175 56 2.706 0.020 0.000 0.000 1.67E+00<br />

J080245.68+474817.7 sdB5VII:He2 9783 93 3.797 0.034 -0.954 0.093 1.50E+00<br />

J082643.33+330859.2 sdB4V:He7 18872 171 4.356 0.034 -0.827 0.009 2.38E+00<br />

J082822.23+295131.3 sdB3V:He7 16877 137 4.122 0.022 -0.723 0.013 1.60E+00<br />

J083127.37+422201.7 sdA5VI:He2 10463 50 3.179 0.013 -0.395 0.015 1.39E+00<br />

J083320.34+202424.8 sdB4VI:He14 22956 173 5.216 0.026 -0.407 0.004 6.77E+00<br />

J083535.58+194412.6 sdB3VI:He6 27775 429 5.784 0.041 -1.846 0.030 1.64E+00<br />

J083734.74+672413.6 sdB0V:He17 30001 392 4.688 0.041 -0.578 0.009 4.39E+00<br />

J083909.92+182416.6 sdB4V:He5 10235 43 2.698 0.017 -0.079 0.002 1.78E+00<br />

J084447.93+404426.5 sdB4V:He6 12351 59 3.321 0.024 0.001 0.000 1.29E+00<br />

J084535.67+194150.3 sdB3VI:He7 22899 242 5.236 0.033 -1.309 0.009 1.97E+00<br />

J084937.68+234847.3 sdB3VI:He5 18128 162 4.653 0.029 -1.348 0.019 2.31E+00<br />

J085148.86+434402.5 sdB7VI:He4 10292 49 3.139 0.014 -0.368 0.007 1.55E+00<br />

J085649.27+170114.7 sdB1VI:He5 29527 276 5.747 0.035 -2.111 0.056 1.47E+00<br />

J090158.77+395931.3 sdB6VI:He2 11188 80 3.359 0.026 -0.257 0.006 1.13E+00<br />

J091206.53+091621.7 sdB2V:He10 27999 442 4.689 0.041 -0.672 0.012 1.86E+00<br />

J091706.65+541817.3 sdB5VI:He2 11016 71 3.157 0.019 -0.122 0.002 8.49E-01<br />

J091751.45+615630.1 sdB4V:He5 10286 32 2.662 0.014 0.159 0.002 2.01E+00<br />

J092116.62+023741.0 sdB3VI:He15 27144 227 5.295 0.030 -0.300 0.003 6.12E+00<br />

J092246.92+001741.0 sdB4V:He5 11427 69 2.797 0.019 -0.000 0.000 1.39E+00<br />

J093112.84+051040.4 sdB5VI:He3 10558 56 3.376 0.018 -0.278 0.005 1.11E+00<br />

J093150.58+031848.0 sdB6VI:He1 10097 62 3.303 0.027 -0.645 0.018 1.34E+00<br />

J093426.95+821304.3 sdB7IV:He7 9909 57 2.917 0.025 0.297 0.012 5.71E+00<br />

J093453.32+841851.5 sdO7VII:He37 42886 87 5.789 0.030 1.526 0.015 1.38E+00<br />

J093832.18+041343.9 sdB7V:He1 10290 71 3.306 0.026 -0.553 0.013 1.64E+00<br />

J093935.15+104321.9 sdB3V:He6 10868 53 2.793 0.020 0.000 0.000 1.43E+00<br />

J094047.71+185332.9 sdB4VI:He5 10589 44 2.823 0.014 0.308 0.005 1.64E+00<br />

J094105.31-004755.8 sdO4VII:He33 45003 121 5.312 0.043 0.000 0.000 2.53E+00<br />

J094107.57+375342.6 sdB3V:He7 15712 55 3.398 0.008 -0.597 0.008 1.31E+00<br />

J094353.47+783140.7 sdB2VI:He5 27999 382 5.662 0.036 -2.105 0.055 1.92E+00<br />

J094509.99+553450.2 sdB4V:He6 18717 165 4.251 0.031 -0.898 0.014 1.65E+00<br />

J094637.19+351755.8 sdB7VI:He3 11021 69 3.387 0.021 -0.140 0.005 1.01E+00<br />

J095219.06+441941.9 sdB4V:He7 13014 73 3.197 0.025 0.046 0.000 1.24E+00<br />

J095708.88+223055.6 sdB4V:He5 10910 45 2.714 0.012 0.272 0.004 1.45E+00<br />

J095854.23+360314.3 sdF8VI:He2 9618 68 3.384 0.022 -0.467 0.019 1.81E+00<br />

J095855.78-044413.9 sdB7IV:He6 9321 48 2.795 0.021 -0.368 0.012 3.84E+00<br />

continued on next page


191<br />

Table C.1: continued<br />

Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />

(2MASX J-) (K) (cgs)<br />

J095859.91+082504.4 sdB6VII:He4 9806 74 3.560 0.013 -0.319 0.015 1.35E+00<br />

J100058.89+024804.4 sdB4V:He8 17115 148 4.229 0.026 -0.632 0.009 2.76E+00<br />

J100145.47+375733.2 sdB3VI:He3 9842 50 3.190 0.011 -0.336 0.016 1.21E+00<br />

J100509.89+384615.2 sdB6VII:He2 10547 70 3.793 0.030 -0.700 0.021 9.22E-01<br />

J100607.62+005326.2 sdB8V:He3 11614 83 3.096 0.017 -0.368 0.009 1.27E+00<br />

J100739.11+202546.7 sdB5VII:He5 10000 51 3.748 0.014 0.065 0.002 1.27E+00<br />

J104130.43+184209.8 sdB0VII:He8 32521 166 5.635 0.030 -1.401 0.022 1.15E+00<br />

J104653.08+515435.9 sdO8VII:He9 30750 262 4.799 0.053 -1.978 0.083 9.67E-01<br />

J104912.91+380014.9 sdB2V:He9 20087 213 4.130 0.030 -0.718 0.015 1.70E+00<br />

J111631.06+305838.7 sdB4V:He5 9474 52 2.754 0.021 -0.368 0.012 1.84E+00<br />

J111719.94+241207.1 sdB4V:He5 11157 73 2.806 0.023 -0.292 0.006 1.10E+00<br />

J111819.13+093144.4 sdA2V:He5 12109 52 3.182 0.022 -0.131 0.003 1.04E+00<br />

J112129.35+111917.0 sdB4V:He6 12729 67 3.204 0.024 0.056 0.000 1.14E+00<br />

J112832.64+603859.3 sdF5V:He3 10302 52 3.025 0.017 -0.307 0.005 1.53E+00<br />

J113435.70+664252.6 sdB4V:He6 12712 67 3.131 0.023 0.057 0.000 1.07E+00<br />

J113633.63+750653.7 sdO9VII:He7 35699 62 6.000 0.016 -1.672 0.041 1.43E+00<br />

J113837.54+250043.4 sdB5IV:He4 10255 43 2.594 0.016 -0.136 0.003 2.43E+00<br />

J114454.50+031550.2 sdA7V:He4 9764 53 3.118 0.021 -0.368 0.010 1.37E+00<br />

J122617.00+774312.4 sdB1VI:He6 28443 239 5.889 0.035 -1.996 0.043 2.38E+00<br />

J122745.99+113636.1 sdB6IV:He6 10356 40 2.666 0.017 0.030 0.000 3.30E+00<br />

J122843.58+282036.6 sdB4VI:He5 10230 46 2.844 0.018 -0.083 0.002 1.87E+00<br />

J123014.92+463720.0 sdB5V:He6 12513 44 3.084 0.018 0.154 0.003 1.15E+00<br />

J125049.06+743943.5 sdB2VI:He5 27913 436 5.654 0.038 -1.993 0.043 2.28E+00<br />

J131359.98+183131.3 sdB6VI:He3 10504 52 3.505 0.017 -0.083 0.003 1.15E+00<br />

J132546.78+400827.0 sdB3V:He8 11653 68 2.761 0.023 0.122 0.002 1.50E+00<br />

J132546.78+400827.0 sdB3V:He8 16440 74 4.149 0.014 -0.641 0.023 5.48E+00<br />

J132546.78+400827.0 sdB4V:He6 11653 68 2.761 0.023 0.122 0.002 1.50E+00<br />

J132546.78+400827.0 sdB4V:He6 16440 74 4.149 0.014 -0.641 0.023 5.48E+00<br />

J135515.91+533442.5 sdB3VI:He12 25842 231 5.254 0.030 -0.673 0.008 2.54E+00<br />

J135648.63+210510.1 sdB4V:He5 10516 42 2.609 0.012 0.135 0.002 1.45E+00<br />

J140123.40+742150.5 sdB4V:He7 16490 73 4.140 0.014 -0.689 0.026 1.77E+00<br />

J142127.88+712421.4 sdB2VI:He5 25982 319 5.847 0.037 -2.346 0.096 2.57E+00<br />

J143155.38+172404.9 sdA7V:He3 10112 55 2.881 0.019 -0.404 0.015 1.19E+00<br />

J145239.03+412618.1 sdB7V:He10 9479 31 2.852 0.021 0.439 0.005 5.63E+00<br />

J152653.06+794130.7 sdB0VI:He6 32936 67 5.770 0.015 -2.235 0.075 1.69E+00<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Appendix D<br />

The <strong>Armagh</strong> <strong>Observatory</strong> Cluster<br />

Over <strong>the</strong> course <strong>of</strong> this project, <strong>Armagh</strong> <strong>Observatory</strong>, as part <strong>of</strong> <strong>the</strong> CosmoGrid 1 initiative,<br />

acquired a dedicated computing cluster which I helped to set up and administer.<br />

The s<strong>of</strong>tware configuration used by <strong>the</strong> cluster at <strong>the</strong> time <strong>of</strong> writing is documented<br />

herein.<br />

D.1 Hardware Configuration<br />

The cluster presently consists <strong>of</strong> sixteen vertically mounted Blade nodes: one master<br />

node, and fifteen slave nodes. Each slave node contains:<br />

• Two Intel Xeon 3GHz processors each with 1MB cache<br />

• 2GB RAM<br />

• <strong>On</strong>e 40GB Maxtor SATA UDMA/133 hard drive<br />

• <strong>On</strong>e Broadcom BCM5721 1000Base-T PCI Express NIC<br />

for:<br />

The master node has <strong>the</strong> same basic hardware configuration as per <strong>the</strong> slaves except<br />

1 http://www.cosmogrid.ie/<br />

193


194 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

• Two 240GB Maxtor SATA UDMA/133 hard drives<br />

• <strong>On</strong>e CDRW/DVDR drive<br />

• <strong>On</strong>e floppy disk drive<br />

• Two 1000Base-T network cards<br />

All <strong>of</strong> <strong>the</strong> nodes are interlinked by one 24 port gigabit e<strong>the</strong>rnet switch, and are<br />

connected to one 16 port KVM unit.<br />

D.2 S<strong>of</strong>tware Configuration<br />

System S<strong>of</strong>tware<br />

The operating system used on all <strong>of</strong> <strong>the</strong> nodes is currently Red Hat Enterprise Linux<br />

AS release 3 (Taroon Update 3).<br />

The following s<strong>of</strong>tware packages form <strong>the</strong> core <strong>of</strong> <strong>the</strong> cluster setup:<br />

• Condor 2 version 6.6.10<br />

• Intel Fortran Compiler Version 8.1<br />

• MPICH 1.2.4<br />

• Ganglia 3.0<br />

User Account Management<br />

User accounts are managed centrally on <strong>the</strong> master node by editing /etc/passwd and<br />

/etc/shadow using <strong>the</strong> standard account management tools. <strong>On</strong>ce any changes to <strong>the</strong><br />

2 http://www.cs.wisc.edu/condor/


D.2 S<strong>of</strong>tware Configuration 195<br />

user accounts have been made, /etc/passwd and /etc/shadow must be refreshed on<br />

all <strong>of</strong> <strong>the</strong> slave nodes by using <strong>the</strong> brcp and brsh commands.<br />

Home Directories<br />

The central partition <strong>of</strong> user home directories is located on <strong>the</strong> master node, and is<br />

shared out to all <strong>the</strong> slave nodes using NFS. This creates a single storage domain for <strong>the</strong><br />

cluster, allowing user jobs running on <strong>the</strong> slave nodes to read/write data from/to <strong>the</strong><br />

user’s home directory, thus avoiding <strong>the</strong> need for any bo<strong>the</strong>rsome manual file transfer<br />

operations.<br />

Each user has a disk space quota <strong>of</strong> 10GB.<br />

Condor<br />

Condor is a specialised batch system for managing compute-intensive jobs. Like most<br />

batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme,<br />

and resource classifications. Users submit <strong>the</strong>ir compute jobs to Condor, Condor puts<br />

<strong>the</strong> jobs in a queue, runs <strong>the</strong>m, and <strong>the</strong>n informs <strong>the</strong> user as to <strong>the</strong> result.<br />

A Condor cluster is comprised <strong>of</strong> a single machine which serves as <strong>the</strong> central manager,<br />

and an arbitrary number <strong>of</strong> o<strong>the</strong>r machines that are part <strong>of</strong> <strong>the</strong> cluster. Conceptually,<br />

<strong>the</strong> cluster is a collection <strong>of</strong> resources (machines) and resource requests (jobs).<br />

The role <strong>of</strong> Condor is to match waiting requests with available resources. Every part<br />

<strong>of</strong> Condor sends periodic updates to <strong>the</strong> central manager, <strong>the</strong> centralised repository <strong>of</strong><br />

information about <strong>the</strong> state <strong>of</strong> <strong>the</strong> cluster. Periodically, <strong>the</strong> central manager assesses<br />

<strong>the</strong> current state <strong>of</strong> <strong>the</strong> cluster and tries to match pending requests with appropriate<br />

resources.<br />

The basic Condor setup for <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong> cluster nominates <strong>the</strong> mas-<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


196 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

ter node as <strong>the</strong> central manager for <strong>the</strong> cluster, with <strong>the</strong> slave nodes functioning as<br />

dedicated computing resources. No jobs are permitted to run on <strong>the</strong> master.<br />

Directory Layout And NFS Shares<br />

The Condor s<strong>of</strong>tware is installed<br />

solely on <strong>the</strong> master in <strong>the</strong> directory<br />

/opt/condor-6.6.10. As <strong>the</strong> name <strong>of</strong> this directory is dependent on <strong>the</strong> version <strong>of</strong><br />

Condor installed, a symbolic link called /opt/condor points to whatever directory contains<br />

<strong>the</strong> latest version. This symbolic link has been added to /etc/exports, and <strong>the</strong><br />

Condor installation directory is shared out to all <strong>the</strong> slaves over NFS.<br />

Condor is set up to require that every node has a directory on its local filesystem<br />

to which <strong>the</strong> Condor daemons can write log information and create temporary work<br />

folders for user jobs. This directory is typically located at /home/condor, however,<br />

<strong>the</strong> central NFS share <strong>of</strong> home directories from <strong>the</strong> master does not allow a unique<br />

/home/condor for every node.<br />

Instead, each slave node has a disk partition called /condorhome which contains <strong>the</strong><br />

directory /condorhome/condor/ that can be used by <strong>the</strong> local Condor daemons.<br />

<strong>On</strong> <strong>the</strong> master node, /condorhome is a symbolic link pointing to <strong>the</strong> /home partition<br />

wherein a directory called condor exists.<br />

Boot Script<br />

To ensure <strong>the</strong> Condor daemons are loaded up when a node is first powered on, a boot<br />

script named condor is located in /etc/init.d on each node. This boot script is <strong>the</strong>n<br />

sym-linked into <strong>the</strong> runlevel 3 startup scripts directory, /etc/rc3.d/, as <strong>the</strong> entry<br />

S98condor.<br />

The boot script listing is:


D.2 S<strong>of</strong>tware Configuration 197<br />

#! /bin/sh<br />

export CONDOR_CONFIG=/opt/condor/etc/condor_config<br />

MASTER=/opt/condor/sbin/condor_master<br />

PS="/bin/ps auwx"<br />

case $1 in<br />

’start’)<br />

if [ -x $MASTER ]; <strong>the</strong>n<br />

echo "Starting up Condor"<br />

$MASTER<br />

else<br />

echo "$MASTER is not executable. Skipping Condor startup."<br />

exit 1<br />

fi<br />

;;<br />

’stop’)<br />

pid=‘$PS | grep condor_master | grep -v grep | awk ’{print $2}’‘<br />

if [ -n "$pid" ]; <strong>the</strong>n<br />

# send SIGQUIT to <strong>the</strong> condor_master, which initiates its fast<br />

# shutdown method. The master itself will start sending<br />

# SIGKILL to all it’s children if <strong>the</strong>y’re not gone in 20<br />

# seconds.<br />

echo "Shutting down Condor (fast-shutdown mode)"<br />

kill -QUIT $pid<br />

else<br />

echo "Condor not running"<br />

fi<br />

;;<br />

*)<br />

echo "Usage: condor {start|stop}"<br />

;;<br />

esac<br />

User Path Setup<br />

The Condor user commands for submitting a job to <strong>the</strong> cluster, checking cluster status<br />

and job queues, etc., along with <strong>the</strong>ir associated manual pages, are located in <strong>the</strong><br />

/opt/condor subtree.<br />

To give users easy access to <strong>the</strong> commands and man pages, <strong>the</strong> appropriate shell<br />

variables are modified on login by two system-wide shell pr<strong>of</strong>ile files, condor.sh and<br />

condor.csh, located in/etc/pr<strong>of</strong>ile.d. They also set up <strong>the</strong> environment for MPICH<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


198 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

and Intel’s Fortran compiler.<br />

For bash users, condor.sh effects this configuration:<br />

export CONDOR_CONFIG=/opt/condor/etc/condor_config<br />

if [ -z "${PATH}" ]<br />

<strong>the</strong>n<br />

export PATH=/opt/condor/bin:/opt/mpich/bin<br />

else<br />

export PATH=/opt/condor/bin:/opt/mpich/bin:$PATH<br />

fi<br />

if [ -z "${MANPATH}" ]<br />

<strong>the</strong>n<br />

export MANPATH=/opt/condor/man:/opt/mpich/man<br />

else<br />

export MANPATH=/opt/condor/man:/opt/mpich/man:$MANPATH<br />

fi<br />

if [ ‘id -u‘ = 0 ]; <strong>the</strong>n<br />

export PATH=$PATH:/opt/condor/sbin:/opt/mpich/sbin<br />

fi<br />

### Set up ifort and idb<br />

. /opt/intel_fc_80/bin/ifortvars.sh<br />

. /opt/intel_idb_80/bin/idbvars.sh<br />

And condor.csh does <strong>the</strong> same for tcsh users:<br />

setenv CONDOR_CONFIG /opt/condor/etc/condor_config<br />

if !($?PATH) <strong>the</strong>n<br />

setenv PATH /opt/condor/bin:/opt/mpich/bin<br />

else<br />

setenv PATH /opt/condor/bin:/opt/mpich/bin:$PATH<br />

endif<br />

if !($?MANPATH) <strong>the</strong>n<br />

setenv MANPATH /opt/condor/man:/opt/mpich/man<br />

else<br />

setenv MANPATH /opt/condor/man:/opt/mpich/man:$MANPATH<br />

endif<br />

### Set up ifort and idb<br />

source /opt/intel_fc_80/bin/ifortvars.csh<br />

source /opt/intel_idb_80/bin/idbvars.csh


D.2 S<strong>of</strong>tware Configuration 199<br />

Condor Configuration Files<br />

/opt/condor/etc/condor_config is <strong>the</strong> global Condor configuration file containing<br />

settings for everything from basic cluster setup details, to network permissions, user<br />

policies, flocking, daemon controls, and so on.<br />

Most <strong>of</strong> <strong>the</strong> setting in this file can be left at <strong>the</strong>ir defaults. However, Part <strong>On</strong>e <strong>of</strong> <strong>the</strong><br />

file contains settings that must be customised for <strong>the</strong> particular Condor installation at<br />

a site. For <strong>the</strong> <strong>Observatory</strong> cluster, <strong>the</strong> settings for Part <strong>On</strong>e are as follows:<br />

CONDOR_HOST<br />

RELEASE_DIR<br />

LOCAL_DIR<br />

LOCAL_CONFIG_FILE<br />

= master<br />

= /opt/condor<br />

= /condorhome/condor<br />

= $(RELEASE_DIR)/etc/$(HOSTNAME).local<br />

REQUIRE_LOCAL_CONFIG_FILE = TRUE<br />

CONDOR_ADMIN<br />

MAIL<br />

UID_DOMAIN<br />

FILESYSTEM_DOMAIN<br />

= root@master<br />

= /usr/bin/mail<br />

= arm.ac.uk<br />

= $(FULL_HOSTNAME)<br />

O<strong>the</strong>r miscellaneous settings that have been changed are:<br />

### <strong>On</strong>ly allow daemon read/write access to <strong>the</strong><br />

### slave nodes connected on <strong>the</strong> LAN.<br />

HOSTALLOW_READ = 192.168.0.*<br />

HOSTALLOW_WRITE = 192.168.0.*<br />

### Fully qualified names are not used in /etc/hosts<br />

### so Condor likes this set.<br />

DEFAULT_DOMAIN_NAME = arm.ac.uk<br />

Each <strong>of</strong> <strong>the</strong> nodes in <strong>the</strong> cluster has its own Condor configuration file in/opt/condor/etc.<br />

The master node and <strong>the</strong> slave nodes are treated differently with <strong>the</strong> master having its<br />

own specific settings, and <strong>the</strong> slaves all having <strong>the</strong> same settings.<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


200 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

The master’s configuration file, m44.local, contains <strong>the</strong> following:<br />

### The master never runs jobs<br />

START = FALSE<br />

### There are two NICs in <strong>the</strong> master. This tells<br />

### Condor to use <strong>the</strong> internal NIC.<br />

NETWORK_INTERFACE = 192.168.0.149<br />

COLLECTOR<br />

NEGOTIATOR<br />

DAEMON_LIST<br />

= $(SBIN)/condor_collector<br />

= $(SBIN)/condor_negotiator<br />

= MASTER, COLLECTOR, STARTD, NEGOTIATOR, SCHEDD<br />

JAVA = /usr/bin/java<br />

### Turn <strong>of</strong>f reporting <strong>of</strong> pool stats to <strong>the</strong><br />

### Condor people<br />

CONDOR_DEVELOPERS_COLLECTOR = NONE<br />

CONDOR_DEVELOPERS = NONE<br />

### PRIORITY_HALFLIFE = 1 adjust a user’s Condor<br />

### priority in real-time. Thus, when <strong>the</strong>ir job<br />

### releases any resources, <strong>the</strong> user’s priority<br />

### returns to 0.5 very quickly.<br />

PRIORITY_HALFLIFE = 1<br />

### Turn <strong>of</strong>f any job preemption. No jobs will be<br />

### preempted for any reason.<br />

PREEMPTION_REQUIREMENTS = FALSE<br />

PREEMPTION_RANK = FALSE<br />

As each slave node has <strong>the</strong> same configuration, a time-saving device has been employed<br />

wherein any modifications to <strong>the</strong> slave setup are made in a template file called<br />

node.local.template. This file is <strong>the</strong>n copied using a shell script to create all <strong>the</strong><br />

nodeXX.local files for <strong>the</strong> slaves.<br />

At present, node.local.template contains:<br />

### Dedicated scheduler for running MPI jobs.<br />

DedicatedScheduler = "DedicatedScheduler@master"<br />

STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler<br />

START = TRUE<br />

SUSPEND = FALSE<br />

CONTINUE = TRUE<br />

PREEMPT = FALSE<br />

KILL = FALSE<br />

WANT_SUSPEND = FALSE<br />

WANT_VACATE = FALSE


D.2 S<strong>of</strong>tware Configuration 201<br />

RANK<br />

= Scheduler =?= $(DedicatedScheduler)<br />

### Tell <strong>the</strong> daemons not to pay attention to any<br />

### console activity. Prevents <strong>the</strong>ir Condor status<br />

### changing to ’Owner’ if someone logs in to a<br />

### node to perform maintenance.<br />

VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE = 0<br />

VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD = 0<br />

The shell script which performs <strong>the</strong> copying, refresh.sh, works by slurping <strong>the</strong><br />

node names from /etc/brshtab, and <strong>the</strong>n copies <strong>the</strong> template file using a for loop:<br />

#!/bin/bash<br />

NODES="‘cat /etc/brshtab‘"<br />

for I in $NODES<br />

do<br />

cp node.local.template $I.local<br />

done<br />

Condor User Policies<br />

Most users <strong>of</strong> <strong>the</strong> cluster tend to run large batches <strong>of</strong> relatively short jobs (<strong>of</strong> <strong>the</strong> order<br />

<strong>of</strong> < 1 hour per job), but some have submitted a small number <strong>of</strong> long-running jobs (<strong>of</strong><br />

<strong>the</strong> order <strong>of</strong> several hours to several days).<br />

In general, each job submitted must be allowed to run to completion without being<br />

preempted, o<strong>the</strong>rwise <strong>the</strong> job must start again at <strong>the</strong> beginning when it is reallocated<br />

to a slave node. For users who submit large batches <strong>of</strong> short jobs, such preemption is<br />

merely troubling. However, for users with long-running jobs, any interruption could<br />

mean <strong>the</strong> serious loss <strong>of</strong> several days <strong>of</strong> computation.<br />

To ensure fair use <strong>of</strong> cluster resources without job preemption, <strong>the</strong><br />

PRIORITY_HALFLIFE Condor variable has been set to equal 1 in <strong>the</strong> local configuration<br />

file for <strong>the</strong> master node. This allows Condor to adjust a user’s priority level<br />

almost as soon as <strong>the</strong>ir jobs start running. As <strong>the</strong>ir jobs begin to use cluster resources,<br />

Condor lowers <strong>the</strong> user’s priority. If someone else submits a batch <strong>of</strong> jobs to <strong>the</strong> queue,<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


202 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

<strong>the</strong>ir user priority will be higher than that <strong>of</strong> <strong>the</strong> o<strong>the</strong>r user. So, as one <strong>of</strong> <strong>the</strong> o<strong>the</strong>r<br />

user’s jobs finishes on a node, Condor will <strong>the</strong>n allocate that node to a job belonging to<br />

<strong>the</strong> user with <strong>the</strong> highest priority. This will gradually allow Condor to balance out <strong>the</strong><br />

allocation <strong>of</strong> resources so that no one user can use all <strong>of</strong> <strong>the</strong> resources all <strong>of</strong> <strong>the</strong> time.<br />

To prevent Condor from preempting currently running jobs if someone with a higher<br />

user priority submits jobs to <strong>the</strong> queue, <strong>the</strong> configuration variables<br />

PREEMPTION_REQUIREMENTS and PREEMPTION_RANK have both been set to false in <strong>the</strong><br />

master’s local configuration file.<br />

Over time, undoubtedly <strong>the</strong> user policy for <strong>the</strong> cluster will change. Refer to Section<br />

3 <strong>of</strong> <strong>the</strong> Condor manual.<br />

D.3 MPICH 1.2.4 RPM Spec File<br />

This spec file can be used to build RPM packages from a standard MPICH v1.2.4 tarball.<br />

The spec file ensures that <strong>the</strong> current installation <strong>of</strong> Intel’s Fortran compiler is<br />

used to build <strong>the</strong> F77 and F90 bindings, and it produces two RPMs: one standard RPM<br />

which contains <strong>the</strong> MPICH runtime libraries that should be installed on all <strong>the</strong> nodes,<br />

and a development RPM containing all <strong>the</strong> MPI compiler wrappers which should only<br />

be installed on <strong>the</strong> master node.<br />

Name: mpich<br />

License: O<strong>the</strong>r License(s), see package<br />

Group: Development/Libraries/Parallel<br />

URL: ftp://ftp.mcs.anl.gov/pub/mpi/old/<br />

Version: 1.2.4<br />

Release: 3<br />

Summary: A Portable Implementation <strong>of</strong> MPI<br />

Source: mpich-%{version}.tar.gz<br />

BuildRoot: %{_tmppath}/%{name}-%{version}-build<br />

Autoreqprov: on<br />

%define _mpich_root /opt/mpich<br />

%description<br />

MPICH is a freely available, portable implementation <strong>of</strong><br />

MPI, <strong>the</strong> Standard for message-passing libraries.


D.3 MPICH 1.2.4 RPM Spec File 203<br />

%package devel<br />

Summary: A Portable Implementation <strong>of</strong> MPI<br />

Group: Development/Libraries/Parallel<br />

Autoreqprov: on<br />

Requires: mpich<br />

Provides: mpich-doc<br />

Obsoletes: mpich-doc<br />

%description devel<br />

MPICH is a freely available, portable implementation <strong>of</strong><br />

MPI, <strong>the</strong> Standard for message-passing libraries.<br />

%prep<br />

%setup -q<br />

DIRS=$(find -type d)<br />

%build<br />

CFLAGS=$RPM_OPT_FLAGS; export CFLAGS;<br />

export F90="ifort" ;<br />

export FC="ifort" ;<br />

export CCFLAGS="-O2";<br />

export FFLAGS="-O2";<br />

export RSHCOMMAND="/opt/condor/sbin/rsh";<br />

sh configure --with-arch=LINUX \<br />

--with-device=ch_p4 \<br />

--with-comm=ch_p4 \<br />

--with-romio \<br />

--with-mpe \<br />

--libdir=$RPM_BUILD_ROOT%{_mpich_root}/%_lib \<br />

--enable-sharedlib \<br />

--enable-c++ \<br />

--enable-f77 \<br />

--enable-f90modules \<br />

--disable-mpedbg \<br />

--disable-devdebug \<br />

--disable-debug \<br />

-prefix=$RPM_BUILD_ROOT%{_mpich_root} \<br />

-c++=/usr/bin/g++ \<br />

-opt=-O2 \<br />

-cc=/usr/bin/gcc \<br />

-fc=/opt/intel_fc_80/bin/ifort \<br />

-f90=/opt/intel_fc_80/bin/ifort \<br />

-f90flags=-O2 \<br />

-optcc=-O2 \<br />

-mpe_opts=-O2<br />

make<br />

%install<br />

rm -rf $RPM_BUILD_ROOT<br />

make install PREFIX=$RPM_BUILD_ROOT%{_mpich_root} \<br />

MPIINSTALL_OPTS="-manpath=$RPM_BUILD_ROOT/%{_mpich_root}/man" \<br />

-libdir=$RPM_BUILD_ROOT/%{_mpich_root}/%_lib<br />

find $RPM_BUILD_ROOT%{_mpich_root} -type l -name "mpirun" | \<br />

xargs rm -f<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


204 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

grep -lr "$RPM_BUILD_ROOT" $RPM_BUILD_ROOT/%{_mpich_root}/ | \<br />

xargs perl -pi -e "s@$RPM_BUILD_ROOT@@g"<br />

rm -f examples/perftest/config.cache \<br />

examples/perftest/config.log \<br />

examples/perftest/config.status \<br />

examples/test/config.log \<br />

examples/test/config.status<br />

# libs<br />

rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/lib*<br />

rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared/lib*<br />

[ -e lib/libmpich.a ] && cp -f lib/*.a $RPM_BUILD_ROOT%{_mpich_root}/%_lib<br />

[ -e lib/*.o ] && cp -f lib/*.o $RPM_BUILD_ROOT%{_mpich_root}/%_lib<br />

[ -e lib/*.s* ] && cp -f lib/*.s* $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared<br />

for i in libfmpich libmpich libpmpich; do<br />

echo Working on $i;<br />

cp -f lib/shared/$i.so.1.0 $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared<br />

(<br />

cd $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared;<br />

ln -sf $i.so.1.0 $i.so<br />

)<br />

done<br />

# docs<br />

rm -fr $RPM_BUILD_ROOT%{_mpich_root}/www<br />

export manpath="$manpath /opt/mpich/man"<br />

%clean<br />

#rm -rf $RPM_BUILD_ROOT<br />

%files<br />

%defattr(-,root,root,755)<br />

%doc COPYRIGHT<br />

%{_mpich_root}/sbin/*<br />

%{_mpich_root}/bin/mpirun*<br />

%{_mpich_root}/bin/mpiman<br />

%{_mpich_root}/bin/mpireconfig<br />

%{_mpich_root}/bin/mpireconfig.dat<br />

%{_mpich_root}/bin/tarch<br />

%{_mpich_root}/bin/tdevice<br />

%{_mpich_root}/bin/serv_p4<br />

%{_mpich_root}/%_lib/shared/*.so.*<br />

%{_mpich_root}/share/*<br />

%{_mpich_root}/man/mandesc<br />

%{_mpich_root}/man/man1/*.1*<br />

%files devel<br />

%defattr(-,root,root,755)<br />

%{_mpich_root}/doc/*<br />

%{_mpich_root}/examples/*<br />

%{_mpich_root}/man/man3/*.3*<br />

%{_mpich_root}/man/man4/*.4*<br />

%doc COPYRIGHT<br />

%{_mpich_root}/include/mpi2c++/*.h<br />

%{_mpich_root}/include/f90base/*.mod<br />

%{_mpich_root}/include/f90choice/*.mod<br />

%{_mpich_root}/include/*.h


D.3 MPICH 1.2.4 RPM Spec File 205<br />

%{_mpich_root}/%_lib/*.a<br />

%{_mpich_root}/%_lib/shared/*.so<br />

%{_mpich_root}/etc/*<br />

%{_mpich_root}/bin/mpicc<br />

%{_mpich_root}/bin/mpiCC<br />

%{_mpich_root}/bin/mpif77<br />

%{_mpich_root}/bin/mpif90<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


Appendix E<br />

LTE-CODES<br />

LTE-CODES is a package <strong>of</strong> Fortran programs and supporting libraries for analysing <strong>the</strong><br />

spectra <strong>of</strong> hot stars. The main components <strong>of</strong> <strong>the</strong> package are:<br />

STERNE computes plane-parallel, line-blanketed model atmospheres for hot stars,<br />

T eff > 8000K, in local <strong>the</strong>rmal, radiative, and hydrostatic equilibrium. The code<br />

handles extremely H-deficient mixtures and composition stratification.<br />

SPECTRUM computes syn<strong>the</strong>tic spectra, line pr<strong>of</strong>iles, equivalent widths, and specific<br />

intensities, assuming LTE, from model atmospheres <strong>of</strong> hot stars, T eff ><br />

8000K. It can handle atmospheres <strong>of</strong> arbitrary chemical composition.<br />

SFIT is a general-purpose code designed to optimise <strong>the</strong>oretical stellar spectra to<br />

an observed spectrum. The code <strong>of</strong>fers several different parameter optimisation<br />

methods, including Levenburg-Marquardt, Amoeba, and Genetic Algorithms. It<br />

has also been designed for both single and composite (binary) stellar spectra.<br />

As part <strong>of</strong> this <strong>the</strong>sis, <strong>the</strong> old build system for <strong>the</strong>se codes (which was based on<br />

a series <strong>of</strong> hand-coded Makefiles) was overhauled and ported to <strong>the</strong> GNU Autotools 1<br />

system. GNU Autotools is a suite <strong>of</strong> tools that assists in making s<strong>of</strong>tware projects<br />

1 http://www.gnu.org/<br />

207


208 Chapter E - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

easy to build across many platforms. It <strong>of</strong>fers a flexible environment for automatically<br />

configuring and generating Makefiles according to <strong>the</strong> needs <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware project,<br />

and adapting <strong>the</strong>m to suit <strong>the</strong> specifics <strong>of</strong> whatever operating system, compilers, and<br />

o<strong>the</strong>r system tools are at hand.<br />

E.1 Directory Layout<br />

The hierarchical layout <strong>of</strong> <strong>the</strong> LTE-CODES package is straightforward. The top-level directory<br />

branches into several subdirectories, <strong>the</strong> most important <strong>of</strong> which issrc. Within<br />

src are two subdirectories pointing towards <strong>the</strong> source code for all <strong>the</strong> libraries and<br />

apps (i.e., <strong>the</strong> applications STERNE, SPECTRUM, and SFIT). In summary:<br />

lte-codes-x.x<br />

|<br />

|-- config<br />

|-- include<br />

\-- src<br />

|<br />

|-- libraries<br />

| |<br />

| |--------------\<br />

| | |<br />

| |-- at |-- pr<strong>of</strong><br />

| |-- bb |-- qub<br />

| |-- chr |-- rot<br />

| |-- dp |-- rtf<br />

| |-- mth |-- sdb<br />

| |-- mx |-- stn2<br />

| |-- nr |-- str<br />

| |-- nr_d |-- tap<br />

| |-- op |-- tap95<br />

| |-- opk2 |-- util<br />

| |-- phot \-- xfit<br />

| \-- phys<br />

|<br />

|<br />

\-- apps<br />

|<br />

|-- sfit2<br />

|-- spectrum<br />

\-- sterne


E.2 Build System Organisation 209<br />

E.2 Build System Organisation<br />

The central components <strong>of</strong> <strong>the</strong> autotools-based build system are <strong>the</strong> configure.in<br />

file which resides in <strong>the</strong> top-level directory, and <strong>the</strong> Makefile.am files which are to be<br />

found one in every directory.<br />

configure.in<br />

configure.in is actually a Bourne shell script which contains a number <strong>of</strong> calls to<br />

autoconf and automake macros in order to set up <strong>the</strong> build environment. The particular<br />

language used for <strong>the</strong> project can be selected, and specific details such as compiler<br />

commands and flags can be defined. The autoconf macros also allow <strong>the</strong> programmer<br />

to tell <strong>the</strong> build system to test <strong>the</strong> underlying operating system for <strong>the</strong> existence <strong>of</strong><br />

particular tools, libraries, and files, and to modify <strong>the</strong> source files <strong>of</strong> <strong>the</strong> project as<br />

appropriate.<br />

configure.in is processed by autoconf to generate a configure script. When this<br />

script is executed, it traverses <strong>the</strong> build tree and generates all <strong>the</strong> necessary Makefiles<br />

in <strong>the</strong> correct manner.<br />

The contents <strong>of</strong> configure.in for LTE-CODES 1.4 are as follows:<br />

AC_INIT<br />

AC_CONFIG_AUX_DIR(config)<br />

AM_INIT_AUTOMAKE(lte-codes, 1.4, "http://www.arm.ac.uk/~csj")<br />

AC_SUBST(ac_aux_dir)<br />

# Checks for programs.<br />

AC_PROG_F77(ifort ifc)<br />

AC_PROG_LIBTOOL<br />

AC_PROG_MAKE_SET<br />

FFLAGS=’-I$(top_srcdir)/include -I$(top_srcdir)/include/mod -cm -w -w90 -w95’<br />

AC_OUTPUT(Makefile \<br />

src/Makefile \<br />

src/libraries/Makefile \<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


210 Chapter E - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

src/libraries/at/Makefile \<br />

src/libraries/bb/Makefile \<br />

src/libraries/chr/Makefile \<br />

src/libraries/dp/Makefile \<br />

src/libraries/mth/Makefile \<br />

src/libraries/mx/Makefile \<br />

src/libraries/nr/Makefile \<br />

src/libraries/nr_d/Makefile \<br />

src/libraries/op/Makefile \<br />

src/libraries/opk2/Makefile \<br />

src/libraries/phot/Makefile \<br />

src/libraries/phys/Makefile \<br />

src/libraries/pr<strong>of</strong>/Makefile \<br />

src/libraries/qub/Makefile \<br />

src/libraries/rot/Makefile \<br />

src/libraries/rtf/Makefile \<br />

src/libraries/sdb/Makefile \<br />

src/libraries/stn2/Makefile \<br />

src/libraries/str/Makefile \<br />

src/libraries/tap/Makefile \<br />

src/libraries/tap95/Makefile \<br />

src/libraries/util/Makefile \<br />

src/libraries/xfit/Makefile \<br />

src/apps/Makefile \<br />

src/apps/sfit2/Makefile \<br />

src/apps/spectrum/Makefile \<br />

src/apps/spectrum/data/Makefile \<br />

src/apps/spectrum/models/Makefile \<br />

src/apps/spectrum/scripts/Makefile \<br />

src/apps/sterne/Makefile \<br />

src/apps/sterne/scripts/Makefile \<br />

src/apps/sterne/utils/Makefile)<br />

If any modifications are made to configure.in, autoconf must be invoked on it to<br />

effect <strong>the</strong> changes. A small shell script called bootstrap has been defined to call<br />

autoconf in this instance, and <strong>the</strong> o<strong>the</strong>r autotools utilities, to ensure <strong>the</strong> entire build<br />

system is updated correctly. bootstrap is defined as:<br />

#!/bin/sh<br />

libtoolize --force --copy<br />

aclocal -I config<br />

automake --add-missing --force-missing --gnu --copy<br />

autoconf<br />

In-depth documentation on autoconf can be found in <strong>the</strong> manual located at: http:<br />

//www.gnu.org/s<strong>of</strong>tware/autoconf/manual/index.html


E.2 Build System Organisation 211<br />

Makefile.am<br />

Every Makefile.am is processed by automake to produce a Makefile.in file. This is<br />

subsequently used by <strong>the</strong> configure script to create a Makefile at every point in <strong>the</strong><br />

build tree. Typically, each Makefile.am contains a number <strong>of</strong> variable assignments<br />

that are used to describe what source files are to be compiled, if <strong>the</strong> sources form a<br />

library or a binary, what subdirectories lie beneath <strong>the</strong> current directory, and so on.<br />

In <strong>the</strong> top-level directory, Makefile.am contains <strong>the</strong> following:<br />

include $(top_srcdir)/config/am_global_include.mk<br />

## Proces this file with automake to produce Makefile.in<br />

SUBDIRS = src<br />

# Include bootstrap script and o<strong>the</strong>r folders in distribution<br />

EXTRA_DIST = bootstrap include test<br />

# Include files in config directory in distribution<br />

AUX_DIST = $(ac_aux_dir)/config.guess \<br />

$(ac_aux_dir)/config.sub \<br />

$(ac_aux_dir)/install-sh \<br />

$(ac_aux_dir)/ltmain.sh \<br />

$(ac_aux_dir)/missing \<br />

$(ac_aux_dir)/mkinstalldirs \<br />

$(ac_aux_dir)/am_global_include.mk<br />

MAINTAINERCLEANFILES = Makefile.in aclocal.m4 configure config-h.in $(AUX_DIST)<br />

## Make sure config directory and files it contains are correctly<br />

## added to distribution by ’make dist’<br />

dist-hook:<br />

for file in $(AUX_DIST); do \<br />

cp $$file $(distdir)/$$file; \<br />

done<br />

This file is fairly basic, <strong>the</strong> most significant entry being <strong>the</strong> SUBDIRS variable which<br />

specifies what subdirectories must be traversed from here during <strong>the</strong> build. The rest<br />

<strong>of</strong> <strong>the</strong> assignments are mostly concerned with telling <strong>the</strong> build system about o<strong>the</strong>r files<br />

which are part <strong>of</strong> <strong>the</strong> project but don’t need to be compiled.<br />

The Makefile.am for a program or a library looks like this:<br />

<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>


212 Chapter E - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />

include $(top_srcdir)/config/am_global_include.mk<br />

SUBDIRS = scripts data models<br />

bin_PROGRAMS = spectrum<br />

spectrum_SOURCES = Spectrum.f<br />

spectrum_LDADD = \<br />

../../libraries/dp/libdp.a \<br />

../../libraries/qub/libqub.a \<br />

../../libraries/opk2/libopk2.a \<br />

../../libraries/op/libop.a \<br />

../../libraries/tap95/libtap95.a \<br />

../../libraries/str/libstr.a \<br />

../../libraries/chr/libchr.a \<br />

../../libraries/rtf/librtf.a \<br />

../../libraries/nr/libnr.a \<br />

../../libraries/nr_d/libnr_d.a \<br />

../../libraries/mth/libmth.a<br />

Here, <strong>the</strong> name <strong>of</strong> <strong>the</strong> final program is specified along with its source files and libraries<br />

upon which it depends. The build system takes care to ensure that any such dependencies<br />

are compiled first before any attempt is made to compile <strong>the</strong> current program<br />

or library.<br />

Fur<strong>the</strong>r documentation onautomake can be found athttp://www.gnu.org/s<strong>of</strong>tware/<br />

automake/manual/automake.html<br />

E.3 Installation Instructions<br />

To install LTE-CODES from <strong>the</strong> source tarball as a non-root user to an arbitrary directory:<br />

1. Unpack <strong>the</strong> archive: tar -xvzf lte-codes-x.x.tar.gz<br />

2. cd lte-codes-x.x<br />

3. ./configure --prefix=/path/to/install<br />

4. make<br />

5. make install<br />

6. Set <strong>the</strong> shell environment variable LTECODES to point to <strong>the</strong> install location


213

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!