On the Automatic Analysis of Stellar Spectra - Armagh Observatory
On the Automatic Analysis of Stellar Spectra - Armagh Observatory
On the Automatic Analysis of Stellar Spectra - Armagh Observatory
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong><br />
<strong>Spectra</strong><br />
A <strong>the</strong>sis submitted for <strong>the</strong> degree <strong>of</strong><br />
Doctor <strong>of</strong> Philosophy<br />
by<br />
Christopher Winter, B.Eng.<br />
<strong>Armagh</strong> <strong>Observatory</strong><br />
<strong>Armagh</strong>, Nor<strong>the</strong>rn Ireland<br />
&<br />
Faculty <strong>of</strong> Science and Agriculture<br />
Department <strong>of</strong> Pure and Applied Physics<br />
The Queen’s University <strong>of</strong> Belfast<br />
Belfast, Nor<strong>the</strong>rn Ireland<br />
March 2006
“Quia non erit impossibile apud Deum omne verbum”
To Stacey<br />
“Qui invenit mulierem invenit bonum<br />
et hauriet iucunditatem a Domino”
Acknowledgements<br />
I would like to acknowledge and thank my supervisor, C.S. Jeffery, for his sound advice<br />
and direction over <strong>the</strong> course <strong>of</strong> this project, and <strong>the</strong> staff and students <strong>of</strong> <strong>the</strong> <strong>Armagh</strong><br />
<strong>Observatory</strong> for <strong>the</strong>ir helpful support and assistance.<br />
I am very grateful to J.S. Drilling, E.M. Green, and A. Ahmad, all <strong>of</strong> whom supplied<br />
spectroscopic data that was used in this project. In addition, my thanks go to C.A.L<br />
Bailer-Jones for <strong>the</strong> use <strong>of</strong> his neural network code, STATNET.<br />
This work was carried out as part <strong>of</strong> <strong>the</strong> CosmoGrid project, funded under <strong>the</strong><br />
Programme for Research in Third Level Institutions (PRTLI) administered by <strong>the</strong> Irish<br />
Higher Education Authority under <strong>the</strong> National Development Plan and with partial<br />
support from <strong>the</strong> European Regional Development Fund.<br />
This work also uses data from <strong>the</strong> Sloan Digital Sky Survey (SDSS) data archive.<br />
Funding for <strong>the</strong> creation and distribution <strong>of</strong> <strong>the</strong> SDSS Archive has been provided by <strong>the</strong><br />
Alfred P. Sloan Foundation, <strong>the</strong> Participating Institutions, <strong>the</strong> National Aeronautics<br />
and Space Administration, <strong>the</strong> National Science Foundation, <strong>the</strong> U.S. Department <strong>of</strong><br />
Energy, <strong>the</strong> Japanese Monbukagakusho, and <strong>the</strong> Max Planck Society. The SDSS Web<br />
site is http://www.sdss.org/.<br />
The SDSS is managed by <strong>the</strong> Astrophysical Research Consortium (ARC) for <strong>the</strong> Participating<br />
Institutions. The Participating Institutions are The University <strong>of</strong> Chicago,<br />
Fermilab, <strong>the</strong> Institute for Advanced Study, <strong>the</strong> Japan Participation Group, The Johns<br />
Hopkins University, <strong>the</strong> Korean Scientist Group, Los Alamos National Laboratory,<br />
<strong>the</strong> Max-Planck-Institute for Astronomy (MPIA), <strong>the</strong> Max-Planck-Institute for Astrophysics<br />
(MPA), New Mexico State University, University <strong>of</strong> Pittsburgh, University <strong>of</strong><br />
Portsmouth, Princeton University, <strong>the</strong> United States Naval <strong>Observatory</strong>, and <strong>the</strong> University<br />
<strong>of</strong> Washington.<br />
Chris Winter<br />
March, 2006<br />
iii
Abstract<br />
This project investigates <strong>the</strong> problem <strong>of</strong> automatically searching for and analysing<br />
astronomical spectra from large data sets. The three core problems <strong>of</strong> (1) spectral classification,<br />
(2) physical parameterisation, and (3) searching are examined, and a generalisable<br />
set <strong>of</strong> tools is established based on <strong>the</strong> techniques <strong>of</strong> artificial neural networks<br />
(ANNs), χ 2 minimisation, and principal components analysis (PCA). These tools are<br />
<strong>the</strong>n applied to <strong>the</strong> archives <strong>of</strong> <strong>the</strong> Sloan Digital Sky Survey (SDSS) to automatically<br />
search for and analyse <strong>the</strong> spectra <strong>of</strong> hot subdwarf stars.<br />
<strong>Spectra</strong>l classification is tackled by <strong>the</strong> versatile statistical machine learning method<br />
<strong>of</strong> ANNs. An ANN is trained to classify hot subdwarf spectra onto <strong>the</strong> classification<br />
system defined by Drilling et al. (2006), obtaining global errors (σ rms ) <strong>of</strong> ∼ 2 subtypes<br />
for spectral type, ∼ 1 subclass for luminosity class, and ∼ 4 subclasses for <strong>the</strong> helium<br />
class. These errors are in line with accuracies achieved by human classifiers.<br />
Physical parameters are obtained by fitting observations to grids <strong>of</strong> <strong>the</strong>oretical models<br />
using a χ 2 minimisation procedure. A new methodology has been developed for<br />
managing and indexing large grids <strong>of</strong> <strong>the</strong>oretical models in <strong>the</strong> χ 2 minimisation code,<br />
SFIT. Concepts from <strong>the</strong> field <strong>of</strong> computational geometry are used to remove several<br />
limitations from this code, and pave <strong>the</strong> way for its use in a distributed parallel<br />
computing environment.<br />
Searching for <strong>the</strong> spectra <strong>of</strong> a particular type <strong>of</strong> object in large, unknown data sets<br />
is accomplished using <strong>the</strong> multivariate statistical technique, PCA. The mechanics <strong>of</strong><br />
this tool are outlined, and its use demonstrated by searching for hot subdwarf spectra<br />
in <strong>the</strong> SDSS. This solution provides a means to reduce unknown data sets to quantities<br />
suitable for visual inspection.<br />
282 spectra <strong>of</strong> hot subdwarf candidates are obtained from <strong>the</strong> SDSS and analysed.<br />
The results evidence several unexplained phenomena <strong>of</strong> extended horizontal branch<br />
stars, namely: 1) <strong>the</strong> existence <strong>of</strong> <strong>the</strong> second horizontal branch gap <strong>of</strong> Newell (1973);<br />
2) two sdB n He –T eff sequences; and 3) a clustering <strong>of</strong> hot, helium rich stars at T eff ≈<br />
44,000K, log g = 5.7. These findings pose important questions for stellar evolution<br />
<strong>the</strong>ory in <strong>the</strong> realms <strong>of</strong> <strong>the</strong> extended horizontal branch.<br />
v
Contents<br />
Acknowledgements<br />
iii<br />
Abstract<br />
v<br />
List <strong>of</strong> Tables<br />
xii<br />
List <strong>of</strong> Figures<br />
xvi<br />
1 Introduction 1<br />
1.1 Astronomical Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.2 Large Data Sets And Their Sources . . . . . . . . . . . . . . . . . . . . . 6<br />
1.3 Astronomical <strong>Spectra</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
1.3.1 Types Of Objects And Their <strong>Spectra</strong> . . . . . . . . . . . . . . . . 13<br />
1.3.2 <strong>Automatic</strong> Methods <strong>of</strong> <strong>Analysis</strong> . . . . . . . . . . . . . . . . . . . 17<br />
1.4 Hot Subdwarf Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />
1.4.1 Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />
1.4.2 <strong>Stellar</strong> Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />
1.4.3 Why Study Them? . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />
1.4.4 Why Search For Them In The SDSS? . . . . . . . . . . . . . . . 26<br />
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
2 Classification - Artificial Neural Networks 29<br />
2.1 Classifying Hot Subdwarfs . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
vii
viii<br />
CONTENTS<br />
2.1.1 The Training Sample . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />
2.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />
2.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />
2.2 Physical Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />
2.2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />
2.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
3 Parameterisation - χ 2 Fitting 51<br />
3.1 Analysing <strong>Stellar</strong> <strong>Spectra</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />
3.2 SFIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55<br />
3.2.1 Limitations <strong>of</strong> SFIT . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />
3.2.2 Proposal to Remove SFIT’s Limitatons . . . . . . . . . . . . . . 58<br />
3.3 Tetrahedralisation: Interpolation and Indexing . . . . . . . . . . . . . . 62<br />
3.3.1 Simplex Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 62<br />
3.3.2 Grid Index - Delaunay Triangulation . . . . . . . . . . . . . . . . 64<br />
3.3.3 Navigating <strong>the</strong> Index - Point Location . . . . . . . . . . . . . . . 67<br />
3.4 Testing <strong>the</strong> Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />
4 Filtering - Principal Components <strong>Analysis</strong> 81<br />
4.1 Constructing A PCA-Based Filter . . . . . . . . . . . . . . . . . . . . . 83<br />
4.1.1 Ma<strong>the</strong>matics <strong>of</strong> PCA . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />
4.1.2 Building A Hot Subdwarf Filter . . . . . . . . . . . . . . . . . . 86<br />
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs . . . . . . . . . . . . . . . . . . . 95<br />
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104<br />
5 Application I - SDSS Hot Subdwarfs 107<br />
5.1 Search Criteria And Data Sets . . . . . . . . . . . . . . . . . . . . . . . 107
CONTENTS<br />
ix<br />
5.2 PCA Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />
5.3 <strong>Analysis</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />
5.4.1 Parameterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />
5.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<br />
5.4.3 Radial Velocities . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />
5.5 Sources <strong>of</strong> Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />
5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency . . . . . . . . . . . . . . . . . . . . . . 123<br />
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br />
6 Application II - O<strong>the</strong>r Data Sets 131<br />
6.1 2MASS-Selected Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . 131<br />
6.2 SDSS sdB-He Stars <strong>of</strong> Harris et al. (2003) . . . . . . . . . . . . . . . . . 137<br />
6.3 Ahmad & Jeffery (2003) He-sdBs . . . . . . . . . . . . . . . . . . . . . . 138<br />
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138<br />
7 Conclusions And Future Work 141<br />
Bibliography 152<br />
Appendices 161<br />
A Results for 192 Drilling et al. (2006) Hot Subdwarfs 163<br />
B Results for 282 SDSS DR3 Hot Subdwarf Candidates 175<br />
C Results for 83 2MASS-Selected Hot Subdwarf Candidates 189<br />
D The <strong>Armagh</strong> <strong>Observatory</strong> Cluster 193<br />
D.1 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />
D.2 S<strong>of</strong>tware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
x<br />
CONTENTS<br />
D.3 MPICH 1.2.4 RPM Spec File . . . . . . . . . . . . . . . . . . . . . . . . 202<br />
E LTE-CODES 207<br />
E.1 Directory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208<br />
E.2 Build System Organisation . . . . . . . . . . . . . . . . . . . . . . . . . 209<br />
E.3 Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
List <strong>of</strong> Tables<br />
2.1 Results <strong>of</strong> <strong>the</strong> leave-one-out procedure as applied to a committee <strong>of</strong> five<br />
901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations. . . 38<br />
2.2 As Table 2.1, but for <strong>the</strong> committee <strong>of</strong> five 901:5:5:3 ANNs. . . . . . . 39<br />
2.3 Results <strong>of</strong> parameterising <strong>the</strong> 60 calibration stars. . . . . . . . . . . . . 45<br />
2.4 A comparison between ANNs and χ 2 minimisation for parameterising<br />
<strong>the</strong> 133 unparameterised stars. . . . . . . . . . . . . . . . . . . . . . . . 49<br />
3.1 Details <strong>of</strong> <strong>the</strong> model grid used in <strong>the</strong> comparison . . . . . . . . . . . . . 72<br />
3.2 Initial parameters used for <strong>the</strong> Amoeba and Levenberg-Marquardt optimisation<br />
routines. The step sizes used for Amoeba are also given . . . . 73<br />
3.3 Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> unmodified version <strong>of</strong> SFIT . 73<br />
3.4 Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> modified version <strong>of</strong> SFIT . . . 74<br />
3.5 The model grid used to obtain physical parameters <strong>of</strong> <strong>the</strong> set <strong>of</strong> test<br />
models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />
3.6 RMS comparison <strong>of</strong> parameterisation results from each interpolation<br />
method with <strong>the</strong> original parameters <strong>of</strong> each model. Also given is <strong>the</strong><br />
RMS difference between <strong>the</strong> methods, and a comparison between <strong>the</strong><br />
results in <strong>the</strong> region <strong>of</strong> parameter space for which both schemes seem to<br />
give <strong>the</strong>ir best results (see Figures 3.6 and 3.7). . . . . . . . . . . . . . 79<br />
5.1 Summary <strong>of</strong> data quantities obtained from <strong>the</strong> SDSS DR3. . . . . . . . 108<br />
5.2 The model grid used to obtain physical parameters from <strong>the</strong> SDSS hot<br />
subdwarf candidates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />
xi
xii<br />
LIST OF TABLES<br />
6.1 Parameters <strong>of</strong> <strong>the</strong> two calibration stars as obtained by χ 2 -fitting to NLTE<br />
(Green et al., 2006) and LTE (<strong>Armagh</strong>) model atmospheres. Formal<br />
errors are given in paren<strong>the</strong>ses. . . . . . . . . . . . . . . . . . . . . . . . 133<br />
6.2 Classification results for <strong>the</strong> sdB-He stars <strong>of</strong> Harris et al. (2003). . . . . 137<br />
6.3 Classification results for <strong>the</strong> Ahmad & Jeffery (2003) He-sdBs. . . . . . 140<br />
A.1 Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs . 164<br />
B.1 Results for 282 SDSS Hot Subdwarf Candidates . . . . . . . . . . . . . . 176<br />
C.1 Results for 83 2MASS-Selected Hot Subdwarf Candidates . . . . . . . . 189
List <strong>of</strong> Figures<br />
1.1 A stellar spectrum (top), and a galaxy spectrum (bottom). (Taken from<br />
<strong>the</strong> SDSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />
1.2 Example <strong>of</strong> a quasar (top) and carbon star (bottom) spectrum. (Taken<br />
from <strong>the</strong> SDSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />
1.3 The emission spectrum <strong>of</strong> <strong>the</strong> Orion nebula (M42). . . . . . . . . . . . 16<br />
1.4 Examples from each hot subdwarf spectrographic subgroup. Classifications<br />
listed are those from Drilling et al. (2006). . . . . . . . . . . . . . 20<br />
1.5 Schematic temperature-luminosity diagrams showing: a) <strong>the</strong> positions<br />
<strong>of</strong> stars belonging to <strong>the</strong> main stellar groups; b) <strong>the</strong> normal sequence <strong>of</strong><br />
stellar evolution experienced by a star <strong>of</strong> a few solar masses; c) possible<br />
evolution <strong>of</strong> an sdB star in a binary system. (Diagram courtesy <strong>of</strong> C.S.<br />
Jeffery). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />
2.1 The training sample shows clustering in certain regions <strong>of</strong> <strong>the</strong> classification<br />
space. For clarity, points have been <strong>of</strong>fset by small random shifts in<br />
both coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />
2.2 Results <strong>of</strong> <strong>the</strong> leave-one-out procedure for both ANN architectures at <strong>the</strong><br />
near-optimal training time <strong>of</strong> 300 iterations for <strong>the</strong> 901:10:3 architecture<br />
(left column), and 500 iterations for <strong>the</strong> 901:5:5:3 architecture (right<br />
column). Also plotted is <strong>the</strong> best-fit linear least squares line. . . . . . . 41<br />
2.3 Parameterisations <strong>of</strong> <strong>the</strong> 60 calibration stars. Results from each method<br />
have been combined onto each plot. ANN results are indicated by blue<br />
crosses, and χ 2 minimiser results by red pluses. . . . . . . . . . . . . . 46<br />
2.4 Parameterisations <strong>of</strong> <strong>the</strong> 133 unparameterised stars using <strong>the</strong> ANNs and<br />
χ 2 minimiser. Also shown is <strong>the</strong> best-fit linear least squares line. . . . . 48<br />
xiii
xiv<br />
LIST OF FIGURES<br />
3.1 Example <strong>of</strong> a k-D tree in two dimensions. <strong>On</strong> <strong>the</strong> left is <strong>the</strong> representation<br />
<strong>of</strong> how <strong>the</strong> k-D tree on <strong>the</strong> right splits up <strong>the</strong> x,y plane. (Adapted<br />
from Moore 1991.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
3.2 A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is<br />
a tetrahedron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />
3.3 In two dimensions, <strong>the</strong> Delaunay triangulation guarantees that no o<strong>the</strong>r<br />
points lie in <strong>the</strong> circumcircle <strong>of</strong> any simplex. . . . . . . . . . . . . . . . 65<br />
3.4 The line segment, L, is constructed using <strong>the</strong> centroid <strong>of</strong> <strong>the</strong> starting<br />
tetrahedron, T, and <strong>the</strong> interpolation point, p. The tetrahedra visited<br />
on <strong>the</strong> walk-through are coloured grey. . . . . . . . . . . . . . . . . . . 68<br />
3.5 Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />
Clearly visible are anomalous results arising from a suspected defect in<br />
<strong>the</strong> method’s implementation. . . . . . . . . . . . . . . . . . . . . . . . 76<br />
3.6 Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />
Axes have been restricted to give a view <strong>of</strong> <strong>the</strong> grid boundaries described<br />
in Table 3.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br />
3.7 Parameterisation results from <strong>the</strong> simplex-based interpolation scheme.<br />
In contrast with Figures 3.5 and 3.6, <strong>the</strong> simplex-based scheme clearly<br />
restricts <strong>the</strong> optimisers to <strong>the</strong> grid boundaries. . . . . . . . . . . . . . . 78<br />
4.1 Principal component analysis. u 1 is <strong>the</strong> first principal component and<br />
<strong>the</strong> axis onto which <strong>the</strong> projected positions <strong>of</strong> <strong>the</strong> data have <strong>the</strong>ir maximum<br />
sum. u 2 is <strong>the</strong> second principal component, and u 1 · u 2 = 0. . . . 83<br />
4.2 Mean spectrum <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample. . . . . . . . . . . . . 87<br />
4.3 First five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample. . . . . . . . . . . . . 89<br />
4.4 Second five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample. . . . . . . . . . . . 90<br />
4.5 Cumulative variance <strong>of</strong> <strong>the</strong> first ten PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006)<br />
sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />
4.6 Illustration <strong>of</strong> projecting hot subdwarf spectra onto <strong>the</strong> first four PCs <strong>of</strong><br />
<strong>the</strong> Drilling et al. (2006) standards. . . . . . . . . . . . . . . . . . . . . . 93<br />
4.7 Histogram <strong>of</strong> reconstructions errors from <strong>the</strong> SDSS data sample. . . . . 96<br />
4.8 <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0). . 97
LIST OF FIGURES<br />
xv<br />
4.9 <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0). . 98<br />
4.10 Sample <strong>of</strong> spectra from <strong>the</strong> eighth error bin (R ∼ 3.0). . . . . . . . . . . 100<br />
4.11 Sample <strong>of</strong> spectra from <strong>the</strong> fourteenth error bin (R ∼ 4.5). . . . . . . . 101<br />
4.12 Sample <strong>of</strong> high S/N DA white dwarfs from <strong>the</strong> 22 nd − 24 th error bins<br />
(R ∼ 6.4 − 7.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />
4.13 Sample <strong>of</strong> spectra from <strong>the</strong> fifty-third error bin (R > 15.0). . . . . . . . 103<br />
5.1 Histogram <strong>of</strong> reconstruction errors for <strong>the</strong> colour-colour selected SDSS<br />
sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />
5.2 Parameterisation results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. The<br />
helium main sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary<br />
tracks <strong>of</strong> Dorman et al. (1993) are also plotted. . . . . . . . . . . . . . 112<br />
5.3 Four example fits from <strong>the</strong> 282 SDSS hot subdwarfs. The classification<br />
and physical parameters (T eff (K), log g, log(n He /n H )) obtained for each<br />
star are printed in <strong>the</strong> lower corners <strong>of</strong> each plot. . . . . . . . . . . . . 113<br />
5.4 The results <strong>of</strong> applying a kernel density estimate analysis to <strong>the</strong> data<br />
from Figure 5.2. The low-density at T eff ≈ 22,500K is prominent, along<br />
with ano<strong>the</strong>r possible low-density region at T eff ≈ 41,000K. . . . . . . . 114<br />
5.5 Classification results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. Points<br />
have been given small random <strong>of</strong>fsets in each axis for clarity. . . . . . . 117<br />
5.6 A comparison <strong>of</strong> <strong>the</strong> ANN classifications <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf<br />
candidates (left-most plots) with all <strong>the</strong> stars classified by Drilling et al.<br />
(2006) (right-most plots). Points have been given small random <strong>of</strong>fsets<br />
in each axis for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />
5.7 A calibration <strong>of</strong> <strong>the</strong> ANN classifications onto <strong>the</strong> Drilling et al. (2006)<br />
system using <strong>the</strong> 282 SDSS hot subdwarf candidates. . . . . . . . . . . 119<br />
5.8 The distribution <strong>of</strong> SDSS-derived redshifts <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates.<br />
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />
5.9 Examples <strong>of</strong> white dwarf and BHB contaminants. A - BHB star with<br />
deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines<br />
due to high surface gravity. C - DB white dwarf. D - Uncertain (some<br />
evidence <strong>of</strong> weak carbon absorption, so possibly a DQ white dwarf). . . 125<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
xvi<br />
LIST OF FIGURES<br />
5.10 This gray-shaded region <strong>of</strong> <strong>the</strong> log g–T eff plane represents an area <strong>of</strong> good<br />
probability that <strong>the</strong> stars within it are subdwarfs. . . . . . . . . . . . . 126<br />
5.11 TP rates (red) and FP rates (blue) <strong>of</strong> <strong>the</strong> PCA filter as a function <strong>of</strong><br />
<strong>the</strong> reconstruction error threshold, R. The green curve is <strong>the</strong> difference<br />
between <strong>the</strong> TP and FP rates. . . . . . . . . . . . . . . . . . . . . . . . 127<br />
5.12 A closer examination <strong>of</strong> <strong>the</strong> TP and FP rates. The peak in <strong>the</strong> green<br />
TP-FP curve occurs at R ∼ 7.0 and signifies <strong>the</strong> optimum value for R<br />
in <strong>the</strong> SDSS sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />
6.1 SFIT physical parameters for 2MASS-selected sample. The helium main<br />
sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary tracks <strong>of</strong> Dorman<br />
et al. (1993) are also plotted. . . . . . . . . . . . . . . . . . . . . . 134<br />
6.2 ANN classification for 2MASS-selected sample. Points have been given<br />
small random <strong>of</strong>fsets in each axis for clarity. . . . . . . . . . . . . . . . 135<br />
6.3 The stars assigned late-A and early-F spectral types by <strong>the</strong> neural network.<br />
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136<br />
6.4 Comparison <strong>of</strong> ANN classifications with those <strong>of</strong> Drilling et al. (2006)<br />
for <strong>the</strong> 17 He-sdBs <strong>of</strong> Ahmad & Jeffery (2003). Points have been given<br />
small random <strong>of</strong>fsets in each axis for clarity. Also plotted is <strong>the</strong> best<br />
fit least squares regression line with error bars showing <strong>the</strong> RMS <strong>of</strong> <strong>the</strong><br />
residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139<br />
7.1 Schematic diagram showing how <strong>the</strong> work <strong>of</strong> this <strong>the</strong>sis fits in with <strong>the</strong><br />
wider system envisaged by Jeffery (2003). . . . . . . . . . . . . . . . . . 149
Chapter 1<br />
Introduction<br />
The spectroscopy <strong>of</strong> light from astronomical objects is one <strong>of</strong> <strong>the</strong> most important<br />
methods for understanding <strong>the</strong> physics at work in <strong>the</strong> universe. Many fundamental<br />
parameters <strong>of</strong> those objects can be determined by analysing <strong>the</strong>ir spectrum, including<br />
temperature, chemical composition, motion, and o<strong>the</strong>r clues about <strong>the</strong>ir origin and<br />
evolution.<br />
Advances in information technology over <strong>the</strong> past 35 years, and <strong>the</strong>ir subsequent influence<br />
on observational methods, have allowed spectroscopic studies <strong>of</strong> unprecedented<br />
numbers <strong>of</strong> objects to be carried out over a short period <strong>of</strong> time. Modern astronomy<br />
is now about dealing with very large quantities <strong>of</strong> data, and <strong>the</strong> problems associated<br />
with its management and analysis.<br />
This project develops a collection <strong>of</strong> tools to assist astronomers in data mining large<br />
sets <strong>of</strong> astronomical spectra. The tools are general in nature, and can be used to search<br />
for and automatically study <strong>the</strong> spectra <strong>of</strong> potentially any type <strong>of</strong> astronomical object.<br />
Toge<strong>the</strong>r, <strong>the</strong> tools form a semi-automatic pipeline allowing a fast progression from<br />
large quantities <strong>of</strong> unknown spectra to useful scientific results.<br />
In <strong>the</strong> past, studies <strong>of</strong> automatic methods <strong>of</strong> spectral analysis have mainly centred<br />
around <strong>the</strong> problem <strong>of</strong> object classification. This makes sense from <strong>the</strong> point <strong>of</strong> view<br />
1
2 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>of</strong> a survey mission because it is desirable to know what types <strong>of</strong> objects have been<br />
observed, with particular interest being paid to those objects not falling into any known<br />
category.<br />
However, <strong>the</strong> individual astronomer, studying a particular type <strong>of</strong> object, is not<br />
always interested in large-scale classification. He needs a way to search exclusively<br />
for samples in a data set which are most like his object <strong>of</strong> interest. <strong>On</strong>ce located,<br />
those samples are likely to exist in large enough numbers to require fur<strong>the</strong>r automatic<br />
assistance in <strong>the</strong>ir analysis.<br />
The techniques needed to help solve this problem already exist in <strong>the</strong> field, but <strong>the</strong>y<br />
have not yet been brought toge<strong>the</strong>r and adapted to form any sort <strong>of</strong> useful, coherent<br />
system. As such, scientific insights contained in large data sets remain mostly untapped.<br />
The work in this project represents what seems to be <strong>the</strong> first attempt at rectifying<br />
this issue. Three major algorithms are employed to construct a general data mining<br />
tool set.<br />
1. Principal Components <strong>Analysis</strong> is applied in a supervised classification role to<br />
create a filter that can help search for a specific type <strong>of</strong> object in an unknown<br />
data set.<br />
2. Artificial Neural Networks have been shown to be a robust and versatile tool for<br />
many tasks in astronomy. They are used here to provide spectral classifications.<br />
3. χ 2 minimisation is used to derive physical parameters for spectra by fitting <strong>the</strong>m<br />
to grids <strong>of</strong> <strong>the</strong>oretical models.<br />
Additional minor tools to facilitate data processing, management, and visualisation<br />
are also prototyped.<br />
Fur<strong>the</strong>rmore, a new and original methodology has been developed to extend <strong>the</strong><br />
functionality <strong>of</strong> <strong>the</strong> χ 2 minimisation code, SFIT, used at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong>.
1.1 Astronomical Data Mining 3<br />
The code is modified using concepts from <strong>the</strong> field <strong>of</strong> computational geometry to allow<br />
<strong>the</strong> use <strong>of</strong> arbitrarily large, three-dimensional grids <strong>of</strong> <strong>the</strong>oretical models. This removes<br />
several severe limitations from <strong>the</strong> program, and prepares it for fur<strong>the</strong>r modification to<br />
permit its use in a distributed computational environment.<br />
The specific outcome <strong>of</strong> this project is a set <strong>of</strong> general tools which can be used<br />
to study <strong>the</strong> spectra <strong>of</strong> any astronomical object, and a “real-world” demonstration<br />
<strong>of</strong> <strong>the</strong>se tools through <strong>the</strong>ir application to search for and analyse <strong>the</strong> spectra <strong>of</strong> hot<br />
subdwarf stars from <strong>the</strong> archives <strong>of</strong> <strong>the</strong> Sloan Digital Sky Survey. The results evidence<br />
several unexplained phenomena <strong>of</strong> extended horizontal branch stars that pose important<br />
questions for <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stellar evolution.<br />
The work undertaken in this project is a step towards <strong>the</strong> larger computational<br />
framework <strong>of</strong> Jeffery (2003) which outlines a wider system incorporating <strong>the</strong> management<br />
<strong>of</strong> atomic data, dynamic generation and storage <strong>of</strong> grids <strong>of</strong> <strong>the</strong>oretical models,<br />
parameter space visualisation, and automated analysis. The use <strong>of</strong> distributed computational<br />
resources, such as <strong>the</strong> Grid, is also envisaged.<br />
1.1 Astronomical Data Mining<br />
The term “data mining” refers to <strong>the</strong> use <strong>of</strong> a broad set <strong>of</strong> techniques and algorithms for<br />
extracting useful patterns and models from very large data sets. Typically, <strong>the</strong> goal is<br />
to discover ei<strong>the</strong>r something hi<strong>the</strong>rto unknown about a phenomenon that only becomes<br />
apparent when it is studied en masse, or else a new phenomenon that only becomes<br />
apparent when observations are ga<strong>the</strong>red in large enough quantities over a sufficiently<br />
wide range.<br />
Traditionally, in astronomy, much effort was invested in ga<strong>the</strong>ring observations <strong>of</strong><br />
one particular object, such as a star, in an attempt to understand that object in detail.<br />
Given <strong>the</strong> universality <strong>of</strong> physics, <strong>the</strong> insights gained are usually applicable to o<strong>the</strong>r<br />
objects <strong>of</strong> <strong>the</strong> same type, allowing a wider understanding to be achieved.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
4 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
However, advances in technology, such as large-area mosaic CCDs and multi-object<br />
fibre-fed spectrographs, mean that modern telescopes can be made to ga<strong>the</strong>r observations<br />
<strong>of</strong> thousands <strong>of</strong> objects in a single night. This opens up <strong>the</strong> possibility <strong>of</strong><br />
discovering new facts about particular objects by studying <strong>the</strong>ir properties in large<br />
numbers, and also <strong>the</strong> possibility <strong>of</strong> discovering completely new objects.<br />
Unfortunately, this abundance <strong>of</strong> data brings with it a set <strong>of</strong> new problems. Managing<br />
all <strong>of</strong> <strong>the</strong> information requires knowledge <strong>of</strong> data formats, storage mechanisms, and<br />
techniques for indexing, searching, and analysing it all. Indeed, modern astronomy is<br />
fast becoming a cross-disciplinary endeavour, providing a rich area for exploring many<br />
aspects <strong>of</strong> computer science and statistics in <strong>the</strong> context <strong>of</strong> real-world applications.<br />
Data Types<br />
The nature <strong>of</strong> astronomical data means that it is inherently heterogeneous in both<br />
format and content, with observations now being ga<strong>the</strong>red over all regions <strong>of</strong> <strong>the</strong> electromagnetic<br />
spectrum. Broadly speaking, astronomical data can be classified into five<br />
domains.<br />
• Imaging data are <strong>the</strong> fundamental component <strong>of</strong> astronomical observations, capturing<br />
a two-dimensional picture <strong>of</strong> <strong>the</strong> universe within a narrow wavelength<br />
region at a particular point in time.<br />
• Catalogues <strong>of</strong> objects are constructed by analysing imaging data, and recording<br />
many different parameters about each object such as brightness and colour,<br />
morphological information, and coordinates.<br />
• Spectroscopy provides detailed physical quantification <strong>of</strong> objects including temperature,<br />
chemical composition, and kinematical information.<br />
• Studies <strong>of</strong> objects in <strong>the</strong> time-domain provide valuable insight into <strong>the</strong> nature<br />
<strong>of</strong> <strong>the</strong> universe by identifying moving objects, variable sources (e.g., pulsating
1.1 Astronomical Data Mining 5<br />
stars), or transient objects such as supernovae and gamma-ray bursts.<br />
• Finally, <strong>the</strong>oretical simulations <strong>of</strong> astronomical objects are an important source<br />
<strong>of</strong> data. Comparing <strong>the</strong>oretical models with observational data is <strong>the</strong> central<br />
mechanism in understanding how <strong>the</strong>se objects formed and have evolved.<br />
Each <strong>of</strong> <strong>the</strong>se data domains carries its own particular problems to be solved in a<br />
data management and mining context. Imaging data and catalogue construction require<br />
robust, automatic techniques to identify sources distinct from background-level noise,<br />
<strong>the</strong>n differentiate between different types <strong>of</strong> objects (e.g., stars, galaxies, and comets),<br />
and finally <strong>the</strong> indexing <strong>of</strong> <strong>the</strong>se data to allow fast searching based on spatial criteria.<br />
Spectroscopy and time-domain data require more involved algorithms for <strong>the</strong> automated<br />
reduction and calibration <strong>of</strong> observations – algorithms which <strong>of</strong>ten have to<br />
be tailored for a specific instrument and telescope setup. The automatic analysis <strong>of</strong><br />
spectroscopic data typically seeks to classify an object onto a predefined categorical<br />
system by somehow comparing <strong>the</strong> object with <strong>the</strong> set <strong>of</strong> standards which define <strong>the</strong><br />
system. The physics <strong>of</strong> an object which are manifest in its spectrum are determined<br />
by computing accurate <strong>the</strong>oretical models and comparing <strong>the</strong>m with <strong>the</strong> observations.<br />
Any results <strong>the</strong>n need to be stored and indexed with <strong>the</strong> observations in a manner that<br />
allows for fur<strong>the</strong>r re-analysis as more improved observations and <strong>the</strong>oretical models<br />
become available.<br />
Numerical simulations to generate <strong>the</strong>oretical models are always in need <strong>of</strong> powerful<br />
and plentiful computational resources to allow more detail and precision to be attained.<br />
As models will always have a shorter shelf-life than observations, appropriate meta-data<br />
needs to be recorded and stored with <strong>the</strong> models so a historical record can be kept as<br />
<strong>the</strong> underlying physics improves. This meta-data is also needed to help automate<br />
<strong>the</strong> parameterisation <strong>of</strong> observations by providing a means to explore grids <strong>of</strong> models,<br />
and ascertain when new models need to be generated to cover a required part <strong>of</strong> <strong>the</strong><br />
parameter space.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
6 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
1.2 Large Data Sets And Their Sources<br />
Three main sources contribute to large observational data sets in astronomy, namely,<br />
those generated by specific surveys, general-purpose observatories, and space missions.<br />
In recent years, Virtual <strong>Observatory</strong> projects are investigating ways to combine <strong>the</strong><br />
various databases generated by <strong>the</strong>se sources, mapping out <strong>the</strong> computational infrastructures<br />
and tools needed to explore large data volumes.<br />
Specific Surveys<br />
Digital sky surveys generate very large quantities <strong>of</strong> homogeneous data over multiple<br />
wavelengths. As such, <strong>the</strong>y are <strong>the</strong> main drivers behind <strong>the</strong> study <strong>of</strong> data mining<br />
methods in astronomy.<br />
The Digitized Palomar <strong>Observatory</strong> Sky Survey 1 (DPOSS; Djorgovski et al.,<br />
1998) is a digital survey <strong>of</strong> <strong>the</strong> entire Nor<strong>the</strong>rn sky in three visible-light bands, based<br />
on <strong>the</strong> photographic sky atlas, POSS-II, <strong>the</strong> second Palomar <strong>Observatory</strong> Sky Survey<br />
(Reid et al., 1991). A set <strong>of</strong> three photographic plates (one in each filter), each covering<br />
36 square degrees, were taken at each <strong>of</strong> 894 pointings spaced by 5 degrees, covering <strong>the</strong><br />
Nor<strong>the</strong>rn sky. The plates were <strong>the</strong>n digitised at <strong>the</strong> Space Telescope Science Institute<br />
(STScI), producing about 1 gigabyte per plate, and about 3 terabytes <strong>of</strong> data in total.<br />
Specially developed data mining s<strong>of</strong>tware called SKICAT (Weir et al., 1995) was<br />
used to perform object classification and measure around 40 parameters for each object,<br />
storing this information in a database which will eventually be released to <strong>the</strong><br />
community as <strong>the</strong> Palomar-Norris Sky Catalog.<br />
The Two Micron All-Sky Survey 2 (2MASS; Skrutskie et al., 2006) is a nearinfrared<br />
(J, H, and K S ) all-sky survey. The project is a collaboration between <strong>the</strong><br />
1 http://dposs.caltech.edu/<br />
2 http://www.ipac.caltech.edu/2mass/
1.2 Large Data Sets And Their Sources 7<br />
University <strong>of</strong> Massachusetts which constructed <strong>the</strong> observatory facilities and operated<br />
<strong>the</strong> survey, and <strong>the</strong> Infrared Processing and <strong>Analysis</strong> Center at Caltech which is responsible<br />
for all data processing and archive issues. The survey began in <strong>the</strong> spring<br />
<strong>of</strong> 1997, completing survey-quality operations in 2000, with <strong>the</strong> final catalogue being<br />
released in March, 2003.<br />
The survey includes over 12 terabytes <strong>of</strong> imaging data, with <strong>the</strong> final catalogue<br />
containing over one million resolved galaxies, and more than three hundred million<br />
stars and o<strong>the</strong>r unresolved sources to a limiting magnitude <strong>of</strong> K S < 14.3. 2MASS is<br />
currently producing <strong>the</strong> following data products for <strong>the</strong> entire sky:<br />
• A digital atlas <strong>of</strong> <strong>the</strong> sky comprising approximately 4 million 8´×16´ images,<br />
having about 4´´ spatial resolution in each <strong>of</strong> <strong>the</strong> three wavelength bands,<br />
• A point source catalog containing accurate positions and fluxes for ∼ 300 million<br />
stars and o<strong>the</strong>r unresolved objects,<br />
• An extended source catalog containing positions and total magnitudes for more<br />
than one million galaxies and o<strong>the</strong>r nebulae.<br />
The 2dF Galaxy Redshift Survey 3 (2dFGRS; Colless et al., 2001) is a major<br />
spectroscopic survey taking full advantage <strong>of</strong> <strong>the</strong> unique capabilities <strong>of</strong> <strong>the</strong> 2dF facility<br />
built by <strong>the</strong> Anglo-Australian <strong>Observatory</strong> 4 . The 2dFGRS obtained spectra for 245,591<br />
objects, mainly galaxies, brighter than a nominal extinction-corrected magnitude limit<br />
<strong>of</strong> b J = 19.45. Reliable redshifts were obtained for 221,414 galaxies. The galaxies cover<br />
an area <strong>of</strong> approximately 1,500 square degrees selected from <strong>the</strong> extended APM Galaxy<br />
Survey <strong>of</strong> <strong>the</strong> South Galactic cap.<br />
The final release dataset comprises <strong>the</strong> following elements:<br />
• source catalogues for <strong>the</strong> full survey, containing data for 382,323 objects, toge<strong>the</strong>r<br />
3 http://www.mso.anu.edu.au/2dFGRS/<br />
4 http://www.aao.gov.au/2df/<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
8 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
with related material,<br />
• spectroscopic catalogues for 245,591 objects, containing <strong>the</strong> spectroscopic parameters<br />
such as redshifts and spectral types.<br />
The Sloan Digital Sky Survey 5 (SDSS; York et al., 2000) is a project to survey<br />
a 10,000 square degree area (1/4 <strong>of</strong> <strong>the</strong> entire sky) <strong>of</strong> <strong>the</strong> North Galactic hemisphere<br />
over a 5 year period. The estimated 100 million catalogued sources from this survey<br />
will <strong>the</strong>n be used as <strong>the</strong> foundation for <strong>the</strong> largest ever spectroscopic survey <strong>of</strong> galaxies,<br />
quasars and stars.<br />
A dedicated 2.5m telescope is specially designed to take wide field (3x3 degree)<br />
images using a 5×6 mosaic <strong>of</strong> 2048×2048 CCD’s, in five wavelength bands, operating<br />
in scanning mode. Spectroscopic targets are <strong>the</strong>n observed using two spectrographs<br />
each with 320 fibres feeding in light from <strong>the</strong> focal plane. A total <strong>of</strong> four 2048×2048<br />
CCDs (one for each channel <strong>of</strong> each spectrograph) collect <strong>the</strong> spectra.<br />
The total raw data will exceed 40 terabytes, and a processed subset <strong>of</strong> about 1<br />
terabyte in size will consist <strong>of</strong> 1 million spectra, positions, and image parameters for<br />
over 100 million objects, plus a mini-image centered on each object in every colour.<br />
The data will be made available to <strong>the</strong> public at specific milestone releases, and upon<br />
completion <strong>of</strong> <strong>the</strong> survey.<br />
General-Purpose Observatories<br />
Traditional ground-based observatories have been saving data, primarily as backups<br />
for <strong>the</strong> users, for a significant time, accumulating large quantities <strong>of</strong> valuable, but<br />
heterogeneous, data. Unfortunately, lack <strong>of</strong> funding, and this inherent heterogeneity,<br />
makes it difficult to archive <strong>the</strong> data in such a way as to make it available and easy<br />
to access for <strong>the</strong> wider astronomical community. However, some notable exceptions do<br />
5 http://www.sdss.org/
1.2 Large Data Sets And Their Sources 9<br />
exist.<br />
The National Optical Astronomy <strong>Observatory</strong> 6 (NOAO) is a US organisation<br />
that manages ground-based national astronomical observatories including <strong>the</strong> Kitt Peak<br />
National <strong>Observatory</strong>, Cerro Tololo Inter-American <strong>Observatory</strong>, and <strong>the</strong> National Solar<br />
<strong>Observatory</strong>.<br />
The NOAO has been archiving all data from <strong>the</strong>ir telescopes in a program called<br />
“Save-<strong>the</strong>-Bits” which, prior to <strong>the</strong> introduction <strong>of</strong> survey-grade instrumentation, generated<br />
around half a terabyte and over 250,000 images a year. With <strong>the</strong> introduction <strong>of</strong><br />
survey instruments and related programs, <strong>the</strong> rate <strong>of</strong> data accumulation has increased,<br />
and NOAO now manages over 10 terabytes <strong>of</strong> data.<br />
The European Sou<strong>the</strong>rn <strong>Observatory</strong> 7 (ESO) operates a number <strong>of</strong> telescopes<br />
(including <strong>the</strong> four 8m class VLT) telescopes at two observatories in <strong>the</strong> sou<strong>the</strong>rn<br />
hemisphere: <strong>the</strong> La Silla <strong>Observatory</strong>, and <strong>the</strong> Paranal observatory. As with many<br />
o<strong>the</strong>r ground-based observatories, ESO has been archiving data for some time, with<br />
storage rates approaching a steady rate <strong>of</strong> approximately 20 terabytes <strong>of</strong> data per year<br />
from all <strong>of</strong> <strong>the</strong>ir telescopes. This number will eventually increase to several hundred<br />
terabytes with <strong>the</strong> completion <strong>of</strong> <strong>the</strong> rest <strong>of</strong> <strong>the</strong> planned facilities, including <strong>the</strong> VST, a<br />
dedicated survey telescope similar in nature to <strong>the</strong> telescope built for <strong>the</strong> SDSS project.<br />
Space Missions<br />
Although ground-based observatories are aided by <strong>the</strong> advancement <strong>of</strong> technology and<br />
continue to make important discoveries, <strong>the</strong>y will always be encumbered by <strong>the</strong> restrictions<br />
imposed by <strong>the</strong> Earth’s atmosphere. Thus, space missions, although extremely<br />
expensive, are critical components in <strong>the</strong> study <strong>of</strong> <strong>the</strong> universe, and all <strong>of</strong> <strong>the</strong> data <strong>the</strong>y<br />
produce are very valuable and <strong>the</strong>refore archived.<br />
6 http://www.noao.edu/<br />
7 http://www.eso.org/<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
10 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
The Multimission Archive at <strong>the</strong> Space Telescope Science Institue 8 (MAST)<br />
archives a variety <strong>of</strong> astronomical data ga<strong>the</strong>red from space missions, with <strong>the</strong> primary<br />
emphasis on <strong>the</strong> optical, ultraviolet, and near-infrared parts <strong>of</strong> <strong>the</strong> spectrum. MAST<br />
provides a cross correlation tool allowing users to search all archived data for all observations<br />
which contain sources from ei<strong>the</strong>r archived or user-supplied catalogue data. In<br />
addition, MAST provides individual mission query capabilities.<br />
The dominant holding for MAST is <strong>the</strong> data archive from <strong>the</strong> Hubble Space Telescope,<br />
but with total holdings currently exceeding ten terabytes, including (or providing<br />
links to) archival data for <strong>the</strong> following missions or projects: Hubble Data Archive,<br />
Galaxy Explorer, Far Ultraviolet Explorer, International Ultraviolet Explorer Final<br />
Archive, Extreme Ultraviolet Explorer, Hopkins Ultraviolet Telescope Archive, Ultraviolet<br />
Imaging Telescope Archive, Wisconsin Ultraviolet Photopolarimeter Experiment<br />
Archive, Copernicus UV Satellite Archive, Berkeley Extreme and Far-UV Spectrometer,<br />
The Interstellar Medium Absorption Pr<strong>of</strong>ile Spectrograph, Digitized Sky Survey,<br />
The Röntgen SATellite Archive.<br />
Virtual Observatories<br />
The Virtual <strong>Observatory</strong> (VO) concept represents a scientific and technological framework<br />
aimed at trying to manage <strong>the</strong> ongoing exponential growth in <strong>the</strong> volume, quality,<br />
and complexity <strong>of</strong> astronomical data ga<strong>the</strong>red by all <strong>of</strong> <strong>the</strong> sources discussed previously.<br />
Two main challenges are faced:<br />
1. The effective inter-linking <strong>of</strong> large, geographically distributed data sets and digital<br />
sky archives in a homogeneous manner <strong>the</strong>reby allowing <strong>the</strong> optimal use <strong>of</strong> data<br />
mining algorithms to extract new science.<br />
2. The research and development <strong>of</strong> data mining and “knowledge discovery in<br />
databases” (KDD) algorithms and techniques for <strong>the</strong> exploration and scientific<br />
8 http://archive.stsci.edu/mast.html
1.2 Large Data Sets And Their Sources 11<br />
investigation <strong>of</strong> large digital sky surveys, including combined multi-wavelength<br />
data sets.<br />
These problems have significant relevance beyond <strong>the</strong> field <strong>of</strong> astronomy as many<br />
aspects <strong>of</strong> society are struggling with information overload.<br />
The National Virtual <strong>Observatory</strong> 9 (NVO) is a project funded by <strong>the</strong> US National<br />
Science Foundation to research and explore <strong>the</strong> technologies necessary to create<br />
a VO. The central <strong>the</strong>mes <strong>of</strong> this research are <strong>the</strong> formation and adoption <strong>of</strong> standards<br />
to make <strong>the</strong> sharing <strong>of</strong> astronomical data easier. An NVO standard that has been<br />
adopted worldwide in this regard is “VOTable”, a way to represent a table <strong>of</strong> data in<br />
XML with good meta-data about <strong>the</strong> semantic meaning <strong>of</strong> <strong>the</strong> data. Grid computing<br />
is seen as an important resource for <strong>the</strong> large-scale analysis <strong>of</strong> astronomical data. The<br />
NVO have also produced research prototypes demonstrating that interesting and efficient<br />
research can be done by building upon on just a few new protocols and standards<br />
for data exchange and access.<br />
The AstroGrid 10 project is a UK government funded, open source project designed<br />
to create a working VO for UK and international astronomers. The goals <strong>of</strong> <strong>the</strong> Astrogrid<br />
project are:<br />
• A working datagrid for key UK databases<br />
• High throughput data mining facilities for interrogating those databases<br />
• A uniform archive query and data-mining s<strong>of</strong>tware interface<br />
• The ability to browse simultaneously multiple datasets<br />
• A set <strong>of</strong> tools for integrated on-line analysis <strong>of</strong> extracted data<br />
• A set <strong>of</strong> tools for on-line database analysis and exploration<br />
9 http://www.us-vo.org/<br />
10 http://www.astrogrid.org/<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
12 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
• A facility for users to upload code to run <strong>the</strong>ir own algorithms on <strong>the</strong> data mining<br />
machines,<br />
• An exploration <strong>of</strong> techniques for open-ended resource discovery<br />
Many <strong>of</strong> <strong>the</strong>se goals are common to o<strong>the</strong>r nations and o<strong>the</strong>r disciplines, and <strong>the</strong><br />
AstroGrid project is working closely with o<strong>the</strong>r VO projects worldwide through <strong>the</strong><br />
International Virtual <strong>Observatory</strong> Alliance (IVOA) – jointly formed with <strong>the</strong> NVO,<br />
and o<strong>the</strong>r world-wide VO efforts – to deliver <strong>the</strong>se goals.<br />
1.3 Astronomical <strong>Spectra</strong><br />
It is clear that much work lies ahead if astronomers are to keep up with <strong>the</strong> ever<br />
increasing amounts <strong>of</strong> data <strong>the</strong>ir telescopes are able to ga<strong>the</strong>r. As such, <strong>the</strong> project<br />
presented in this <strong>the</strong>sis focusses on one particular aspect <strong>of</strong> <strong>the</strong> data mining problem:<br />
methods to analyse digitised astronomical spectra in an automated fashion.<br />
The central idea <strong>of</strong> data mining is to be able to turn large quantities <strong>of</strong> unknown<br />
information into meaningful interpretations, and this is very much a non-trivial task in<br />
<strong>the</strong> context <strong>of</strong> astronomical spectra. Before large-scale statistics can be done to search<br />
for patterns, <strong>the</strong> spectra <strong>of</strong> an interesting type <strong>of</strong> object need to be selected from a<br />
set <strong>of</strong> unknown data. Then, <strong>the</strong> major analytical tasks are usually <strong>the</strong> classification<br />
and physical parameterisation <strong>of</strong> <strong>the</strong> spectra, after which pattern searching can be<br />
performed.<br />
The problems <strong>of</strong> searching, classification, and physical parameterisation all involve<br />
some kind <strong>of</strong> pattern matching in and <strong>of</strong> <strong>the</strong>mselves. Searching, which is basically a very<br />
coarse initial classification, matches unknown spectra to a set <strong>of</strong> known examples <strong>of</strong> a<br />
search target, retaining only those spectra which are within some acceptable distance<br />
from <strong>the</strong> set <strong>of</strong> examples. Classification assigns a fine-grained category to an object<br />
based on how well it matches <strong>the</strong> spectral standards <strong>of</strong> <strong>the</strong> classification system used.
1.3 Astronomical <strong>Spectra</strong> 13<br />
Physical parameterisation matches observations to grids <strong>of</strong> <strong>the</strong>oretical models in an<br />
attempt to find <strong>the</strong> best fit and, consequently, estimates for <strong>the</strong> main physical quantities<br />
<strong>of</strong> interest<br />
1.3.1 Types Of Objects And Their <strong>Spectra</strong><br />
All objects in <strong>the</strong> night sky can be studied by spectroscopic analysis. Each object has<br />
a set <strong>of</strong> distinct features which can be found in its spectrum, reflecting <strong>the</strong> specific<br />
physical processes at work in or around <strong>the</strong> object. This section gives some examples<br />
<strong>of</strong> <strong>the</strong>se objects and <strong>the</strong> spectra <strong>the</strong>y produce.<br />
In Figure 1.1, <strong>the</strong> top plot shows <strong>the</strong> spectrum <strong>of</strong> a hot star. The overall shape<br />
<strong>of</strong> a stellar spectrum approximates <strong>the</strong> curve <strong>of</strong> a black body at <strong>the</strong> same effective<br />
temperature. This temperature can be estimated from <strong>the</strong> peak wavelength (Wien’s<br />
displacement law) or from <strong>the</strong> area under <strong>the</strong> spectrum (using <strong>the</strong> Stefan-Boltzmann<br />
law). The absorption lines in <strong>the</strong> spectrum reflect <strong>the</strong> various chemicals present in <strong>the</strong><br />
star’s atmosphere, and tell <strong>of</strong> <strong>the</strong> specific physical conditions in that region <strong>of</strong> <strong>the</strong> star.<br />
The bottom plot in Figure 1.1 is that <strong>of</strong> a galaxy spectrum. The overall spectrum<br />
<strong>of</strong> a galaxy is simply <strong>the</strong> combined spectrum <strong>of</strong> all <strong>the</strong> stars and o<strong>the</strong>r radiating matter<br />
in <strong>the</strong> galaxy. As galaxies differ in structure and relative composition <strong>of</strong> stellar type<br />
and gas, <strong>the</strong>ir spectra will also differ.<br />
Unlike stars, galaxies are not point sources, so <strong>the</strong>ir spectra must be obtained differently.<br />
As a galaxy can <strong>of</strong>ten be resolved as an extended object, it is possible to take a<br />
spectrum <strong>of</strong> different parts <strong>of</strong> <strong>the</strong> galaxy, providing information about its composition,<br />
<strong>the</strong> stellar birth rates, and rotational velocity for that particular region.<br />
Quasars exhibit very bright emission features relative to a low intensity continuum<br />
in <strong>the</strong>ir spectra, as can be seen in <strong>the</strong> top plot <strong>of</strong> Figure 1.2. In fact, it was only through<br />
careful analysis <strong>of</strong> <strong>the</strong> spectra <strong>of</strong> quasars that astronomers realised <strong>the</strong>y were not just<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
14 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Figure 1.1: A stellar spectrum (top), and a galaxy spectrum (bottom). (Taken from<br />
<strong>the</strong> SDSS)
1.3 Astronomical <strong>Spectra</strong> 15<br />
Figure 1.2: Example <strong>of</strong> a quasar (top) and carbon star (bottom) spectrum. (Taken<br />
from <strong>the</strong> SDSS)<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
16 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Figure 1.3: The emission spectrum <strong>of</strong> <strong>the</strong> Orion nebula (M42).<br />
faint stars. The emission lines in quasar spectra are not where <strong>the</strong>y are expected to be<br />
seen if <strong>the</strong> object was a nearby star. The standard explanation is that <strong>the</strong> quasar is<br />
at a vast distance and so appears to be receding from us due to <strong>the</strong> expansion <strong>of</strong> <strong>the</strong><br />
Universe. This high recession velocity relative to <strong>the</strong> Earth causes <strong>the</strong> spectral lines to<br />
be redshifted to longer wavelengths.<br />
Exotic stars, such as Wolf-Rayet stars or <strong>the</strong> carbon star in <strong>the</strong> bottom plot <strong>of</strong><br />
Figure 1.2, are identified by <strong>the</strong> features present in <strong>the</strong>ir spectra. Carbon stars can<br />
have similar temperatures to G, K, and M-class stars (4,600 - 3,100 K) but have a<br />
much higher abundance <strong>of</strong> carbon than normal stars which appears in <strong>the</strong> spectrum<br />
as very strong molecular bands (C 2 ). As <strong>the</strong>se stars have such low temperatures, <strong>the</strong>y<br />
appear red in colour, but <strong>the</strong> carbon molecules absorb light at blue wavelengths which<br />
makes <strong>the</strong> star appear even redder. Carbon stars are assigned a type C spectral class.<br />
Emission nebulae are clouds <strong>of</strong> high temperature gas. The atoms in <strong>the</strong> cloud are<br />
ionised by ultraviolet light from a nearby star and emit radiation as <strong>the</strong> electrons fall
1.3 Astronomical <strong>Spectra</strong> 17<br />
back into atomic orbitals, so <strong>the</strong>ir spectra show strong emission lines, as can be seen<br />
in Figure 1.3.<br />
These nebulae usually appear to be red because <strong>the</strong> predominant emission line <strong>of</strong><br />
hydrogen in <strong>the</strong> optical (Hα) happens to be red. Although o<strong>the</strong>r colours are produced<br />
by o<strong>the</strong>r atoms, hydrogen is by far <strong>the</strong> most abundant. Emission nebulae are usually<br />
<strong>the</strong> sites <strong>of</strong> recent and ongoing star formation.<br />
1.3.2 <strong>Automatic</strong> Methods <strong>of</strong> <strong>Analysis</strong><br />
Despite <strong>the</strong> diversity in features present in <strong>the</strong> spectra <strong>of</strong> astronomical objects, <strong>the</strong>ir<br />
general character always remains <strong>the</strong> same, namely, flux intensities measured across<br />
some wavelength range. This permits an automated method <strong>of</strong> analysis developed for<br />
one type <strong>of</strong> object to be applied, in principle, to <strong>the</strong> spectra <strong>of</strong> ano<strong>the</strong>r.<br />
Over <strong>the</strong> years, a small number <strong>of</strong> automatic pattern matching techniques have found<br />
wide-spread use in <strong>the</strong> field. <strong>On</strong>e <strong>of</strong> <strong>the</strong> first, and simplest, is <strong>the</strong> cross-correlation function.<br />
This is a signal processing technique wherein two signals are convolved according<br />
to <strong>the</strong> integral<br />
c(z) =<br />
∫ ∞<br />
−∞<br />
T(x)G(z − x)dx. (1.1)<br />
which convolves two functions, T(x) and G(x), over an infinite range, z = [−∞, ∞],<br />
yielding <strong>the</strong> resulting cross-correlation function, c(z).<br />
Simkin (1974) demonstrated <strong>the</strong> use <strong>of</strong> <strong>the</strong> cross-correlation function for measuring<br />
<strong>the</strong> radial velocities <strong>of</strong> stars and galaxies. Tonry & Davis (1979) <strong>the</strong>n applied <strong>the</strong> technique<br />
in a survey to measure galaxy redshifts. Kurtz (1982) used cross-correlation to<br />
classify low resolution (14 Å) stellar spectra onto <strong>the</strong> MK classification system (Morgan<br />
et al., 1978). Cross-correlation remains an important, basic tool that is widely used,<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
18 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
mainly as a method for calculating radial velocities.<br />
Related to <strong>the</strong> cross-correlation function are minimum distance methods (MDM).<br />
Here, an observation is compared with a set <strong>of</strong> templates with <strong>the</strong> intention <strong>of</strong> finding a<br />
match which minimises some distance metric. Kurtz (1982), Lasala (1994), and Gulati<br />
et al. (1994a) used this technique to classify stellar spectra with very positive results.<br />
The application <strong>of</strong> minimum distance methods to <strong>the</strong> parameterisation <strong>of</strong> stellar spectra<br />
by fitting observations to grids <strong>of</strong> <strong>the</strong>oretical models is discussed in Chapter 3.<br />
Aritifical neural networks (ANNs) are a statistical pattern matching algorithm which<br />
have found wide application due to <strong>the</strong>ir powerful ability to “learn” highly non-linear<br />
function mappings by studying examples <strong>of</strong> such mappings. von Hippel et al. (1994)<br />
outline <strong>the</strong> use <strong>of</strong> ANNs for <strong>the</strong> classification <strong>of</strong> stellar spectra. Folkes et al. (1996)<br />
use ANNs to provide automatic classifications <strong>of</strong> low S/N galaxy spectra. Gulati et al.<br />
(1997a) show <strong>the</strong> use <strong>of</strong> ANNs in determining reddening estimates from low-dispersion<br />
ultraviolet spectra <strong>of</strong> O and B stars. Weaver (2000a) demonstrates an ANN-based<br />
technique for performing two-dimensional classification <strong>of</strong> <strong>the</strong> components <strong>of</strong> binary<br />
stars. Qin et al. (2003) use a form <strong>of</strong> ANN to perform automatic star-galaxy separation<br />
by spectra with a high success rate. The use <strong>of</strong> ANNs to provide classifications and<br />
physical parameterisations <strong>of</strong> stellar spectra is studied in Chapter 2.<br />
Principal Components <strong>Analysis</strong> (PCA) is a multivariate statistical technique which<br />
facilitiates <strong>the</strong> discovery <strong>of</strong> linear correlations between observed variables. Early work<br />
by Deeming (1964), Kurtz (1982), and Whitney (1983) examines <strong>the</strong> application <strong>of</strong><br />
PCA to <strong>the</strong> unsupervised classification <strong>of</strong> stellar spectra. Since <strong>the</strong>n, PCA has found a<br />
wide application in spectral analysis such as creating classification systems for galaxy<br />
spectra (Sodre et al., 1998; Galaz & de Lapparent, 1998; Connolly & Szalay, 1999),<br />
determination <strong>of</strong> galactic redshifts (Glazebrook et al., 1998), and investigating <strong>the</strong><br />
polarisation properties <strong>of</strong> broad absorption line quasars (Lamy & Hutsemékers, 2004).<br />
The application <strong>of</strong> PCA to stellar spectra is examined in more detail in Chapter 4.
1.4 Hot Subdwarf Stars 19<br />
1.4 Hot Subdwarf Stars<br />
The automatic analysis tool set established in this <strong>the</strong>sis, although general in nature,<br />
has been applied to <strong>the</strong> analysis <strong>of</strong> a specific type <strong>of</strong> astronomical object in order to<br />
demonstrate <strong>the</strong> effectiveness <strong>of</strong> <strong>the</strong> tools, and how <strong>the</strong>y might be used in a real-world<br />
scenario.<br />
The early type subluminous dwarfs (Greenstein & Sargent, 1974) are defined as stars<br />
which populate a region located below <strong>the</strong> upper main sequence on <strong>the</strong> Hertzsprung-<br />
Russell diagram, extending <strong>the</strong> horizontal branch to higher effective temperatures, <strong>the</strong>y<br />
are mostly considered to be low-mass (M core ≈ 0.50 − 0.55M ⊙ ), core helium burning<br />
objects surrounded by a thin envelope <strong>of</strong> hydrogen. Visibly, <strong>the</strong>y are quite blue objects,<br />
(B − V ) ≈ −0.3, (U − B) ≈ −1.0, and have been shown to dominate <strong>the</strong> population <strong>of</strong><br />
faint blue stars in <strong>the</strong> galaxy (m B ≤ 16) (Green et al., 1986). Regardless <strong>of</strong> <strong>the</strong>ir prior<br />
evolution, hot subdwarfs are thought to be direct progenitors <strong>of</strong> white dwarfs, although<br />
only a small fraction (< 2%) <strong>of</strong> white dwarfs are formed through this route.<br />
1.4.1 Spectroscopy<br />
The hot subdwarfs fall into three broad subgroups based on spectroscopic criteria.<br />
sdB Strong Stark-broadened hydrogen lines, with weak He I and no Mg II absorption<br />
lines.<br />
sdOB/He-sdB Strong HeI absorption with weak or absent hydrogen Balmer lines,<br />
and HeII. Carbon lines <strong>of</strong> varying strength.<br />
sdO Strong He II and weak He I lines, with broad and shallow hydrogen Balmer lines<br />
superimposed with He II lines.<br />
Examples from each <strong>of</strong> <strong>the</strong>se subgroups can be seen in Figure 1.4.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
20 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
3<br />
HeII<br />
2.5<br />
PG1220-056<br />
sdO3VII:He40<br />
Flux (continuum = 1) + const.<br />
2<br />
1.5<br />
1<br />
FEIGE 110<br />
sdO8VII:He6<br />
PG1532+523<br />
sdB1VII:He4<br />
PG1544+488<br />
sdBC1VII:He39<br />
0.5<br />
HeI<br />
H CII H CII CIII<br />
H<br />
0<br />
4000 4200 4400 4600 4800 5000 5200<br />
Wavelength (Angstroms)<br />
Figure 1.4: Examples from each hot subdwarf spectrographic subgroup. Classifications<br />
listed are those from Drilling et al. (2006).<br />
Analyses <strong>of</strong> sdB spectra (e.g., Edelmann et al., 2003) show <strong>the</strong>m to have effective<br />
temperatures in <strong>the</strong> range 20,000 ≤ T eff /K ≤ 40,000, surface gravities in <strong>the</strong> range<br />
5.0 ≤ log g(cgs) ≤ 6.0, and extremely helium-deficient atmospheres n He /n H 0.01.<br />
sdB stars are thought to be low-mass (M core ≈ 0.50 −0.55M ⊙ , Caloi 1976), core helium<br />
burning objects, with a very thin hydrogen envelope (M env 0.02M ⊙ , Heber 1986).<br />
The helium deficiency <strong>of</strong> sdB stars is believed to be caused by gravitational settling,<br />
i.e., <strong>the</strong> settling <strong>of</strong> heavier elements sure to gravity (Wesemael et al., 1982). However,<br />
Heber (1991) found that some sdB stars show metals like carbon and silicon to be<br />
over-abundant in <strong>the</strong>ir atmospheres, believed to be due to radiative levitation being<br />
large for those elements.<br />
Analyses <strong>of</strong> sdO spectra performed by Dreizler et al. (1990) and Thejll et al. (1994)<br />
find that <strong>the</strong>y have effective temperatures in <strong>the</strong> range 40,000 ≤ T eff /K ≤ 80,000, with<br />
<strong>the</strong> majority lying between 40,000 − 50,000K. Surface gravities lie in <strong>the</strong> range 4.0 ≤
1.4 Hot Subdwarf Stars 21<br />
Bright<br />
a)<br />
L<br />
The Hertzsprung Russell Diagram<br />
High mass<br />
Main Sequence<br />
Horizontal<br />
Branch<br />
Large<br />
Red Giants<br />
b) c)<br />
Asymptotic Giant Branch<br />
Helium Burning<br />
L<br />
Giant Branch<br />
L<br />
expansion slowed, envelope<br />
removed by companion<br />
subdwarf B stars<br />
White Dwarfs<br />
Sun<br />
Hydrogen Burning<br />
binary star<br />
Faint<br />
Small<br />
Blue/Hot<br />
T<br />
Low mass<br />
Red/Cool<br />
Normal<br />
<strong>Stellar</strong> Evolution<br />
T<br />
T<br />
Figure 1.5: Schematic temperature-luminosity diagrams showing: a) <strong>the</strong> positions <strong>of</strong><br />
stars belonging to <strong>the</strong> main stellar groups; b) <strong>the</strong> normal sequence <strong>of</strong> stellar evolution<br />
experienced by a star <strong>of</strong> a few solar masses; c) possible evolution <strong>of</strong> an sdB star in a<br />
binary system. (Diagram courtesy <strong>of</strong> C.S. Jeffery).<br />
log g(cgs) ≤ 6.5, and <strong>the</strong> atmospheres <strong>of</strong> most sdO stars are helium-rich, n He 0.50,<br />
with additional enrichment <strong>of</strong> carbon and nitrogen.<br />
Drilling (1996) and Jeffery et al. (1997) represent <strong>the</strong> first attempts to introduce a<br />
homogeneous classification system for hot subdwarfs. This past work has been extended<br />
and fur<strong>the</strong>r refined by Drilling et al. (2006) to produce a three-dimensional classification<br />
system based on a spectral type, luminosity class, and a helium class. The standard<br />
stars <strong>of</strong> this system are used in Chapter 2 as <strong>the</strong> basis for training an artificial neural<br />
network to automatically classify hot subdwarf spectra.<br />
1.4.2 <strong>Stellar</strong> Evolution<br />
<strong>On</strong>e <strong>of</strong> <strong>the</strong> most useful tools in stellar astronomy is <strong>the</strong> Hertzsprung-Russell (HR) diagram<br />
which plots absolute magnitude against spectral type. The relationship between<br />
<strong>the</strong>se two parameters shows several important patterns, with <strong>the</strong> most significant being<br />
that <strong>the</strong> majority <strong>of</strong> stars lie within a band stretching from <strong>the</strong> region <strong>of</strong> bright, hot<br />
stars to <strong>the</strong> region <strong>of</strong> dim, cool stars. This band is called <strong>the</strong> main sequence <strong>of</strong> <strong>the</strong> HR<br />
diagram. The giant stars are seen as a large cluster occuring above <strong>the</strong> cooler end <strong>of</strong><br />
<strong>the</strong> main sequence, and <strong>the</strong> white dwarfs populate a sequence <strong>of</strong> dim, hot stars running<br />
almost parallel to <strong>the</strong> main sequence. Evidently, <strong>the</strong> HR diagram serves as a kind <strong>of</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
22 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
atlas for <strong>the</strong> different types <strong>of</strong> stars, and stellar evolution is usually described in terms<br />
<strong>of</strong> how <strong>the</strong> underlying physics changes a star’s position on <strong>the</strong> HR diagram over time.<br />
The HR diagram can also be plotted as <strong>the</strong> relationship between colour and absolute<br />
magnitude, <strong>the</strong> version frequently used by observers. Theorists prefer to plot luminosity<br />
(or surface gravity) against effective temperature, as shown in <strong>the</strong> schematic diagram<br />
<strong>of</strong> Figure 1.5a. The log g-T eff version <strong>of</strong> <strong>the</strong> HR diagram will be used later in this <strong>the</strong>sis<br />
(Chapters 5 and 6).<br />
Canonical stellar evolution <strong>the</strong>ory (see Figure 1.5b) predicts that a low-mass, core<br />
hydrogen burning main sequence star will eventually exhaust all <strong>the</strong> hydrogen in its<br />
core, converting it, through nuclear fusion, into helium.<br />
At <strong>the</strong> point when core hydrogen fusion ceases, <strong>the</strong> core is not hot enough to begin<br />
helium fusion and starts to collapse because no energy is being generated to counteract<br />
<strong>the</strong> effect <strong>of</strong> gravity. The collapsing core heats up, with some <strong>of</strong> this heat being transferred<br />
into <strong>the</strong> hydrogen envelope surrounding <strong>the</strong> core. Eventually, this envelope can<br />
become hot enough to fuse hydrogen in a thin shell at <strong>the</strong> core boundary.<br />
The continued core collapse and hydrogen shell burning causes temperature and<br />
pressure in <strong>the</strong> shell to increase. The increasing shell temperature supplies sufficient<br />
pressure to <strong>the</strong> outer layers <strong>of</strong> <strong>the</strong> star, causing <strong>the</strong>m to expand and cool. The star leaves<br />
<strong>the</strong> main sequence, and evolves to lower temperatures at nearly constant luminosity,<br />
eventually reaching <strong>the</strong> red giant branch. Mass can be lost from <strong>the</strong> outer layers due<br />
to stellar winds.<br />
The core collapse continues until <strong>the</strong> helium ceases to behave like an ideal gas, and<br />
becomes electron degenerate. Essentially, this means that <strong>the</strong> gas doesn’t expand very<br />
much as its temperature increases. The hydrogen burning shell adds helium to <strong>the</strong><br />
core which continues to increase in temperature. The core finally becomes hot enough<br />
to fuse helium and commences this reaction in an explosive manner called <strong>the</strong> helium<br />
flash.
1.4 Hot Subdwarf Stars 23<br />
The degeneracy <strong>of</strong> <strong>the</strong> core is removed, and it expands and cools as helium burning<br />
continues. The temperature in <strong>the</strong> hydrogen envelope also cools. The star contracts<br />
again as a new state <strong>of</strong> equilibrium is reached, and settles on <strong>the</strong> horizontal branch.<br />
A star on <strong>the</strong> horizontal branch has two energy sources: a helium burning core, and<br />
a hydrogen burning shell. The star evolves at nearly constant luminosity, with <strong>the</strong> core<br />
converting helium into mostly carbon and oxygen. When <strong>the</strong> helium is exhausted, <strong>the</strong><br />
core again begins to contract under gravity. Now, <strong>the</strong>re is a hydrogen burning shell<br />
and a helium burning shell which cause <strong>the</strong> star to expand, evolving with increasing<br />
luminosity to <strong>the</strong> asymptotic giant branch.<br />
This stage <strong>of</strong> <strong>the</strong> star’s life is characterised by high mass loss due to stellar winds.<br />
The process <strong>of</strong> helium fusion is very sensitive to temperature, so <strong>the</strong> helium burning<br />
shell goes through a series <strong>of</strong> <strong>the</strong>rmal pulses alternating with periods <strong>of</strong> quiescence.<br />
This is thought to enhance <strong>the</strong> efficiency <strong>of</strong> <strong>the</strong> stellar winds until <strong>the</strong> entire outer<br />
envelope <strong>of</strong> <strong>the</strong> star is lost. When <strong>the</strong> mass <strong>of</strong> <strong>the</strong> envelope is almost entirely depleted,<br />
<strong>the</strong> star begins to evolve across <strong>the</strong> HR diagram at constant luminosity.<br />
A significant fraction <strong>of</strong> material has been ejected from <strong>the</strong> outer regions <strong>of</strong> <strong>the</strong> star,<br />
and <strong>the</strong> expelled gas is ionised by <strong>the</strong> star (temperatures <strong>of</strong> such stars <strong>of</strong>ten exceed<br />
50,000K). The planetary nebula disperses into interstellar space.<br />
The hydrogen and helium burning layers eventually extinguish, and <strong>the</strong> star becomes<br />
a white dwarf with a degenerate carbon–oxygen core. The core cools quickly and<br />
luminosity decreases, but it takes a long time for <strong>the</strong> <strong>the</strong>rmal energy in <strong>the</strong> core to be<br />
radiated away completely.<br />
sdB Evolution<br />
Extended horizontal branch stars tend to differ from true horizontal branch stars in<br />
terms <strong>of</strong> <strong>the</strong> luminosity <strong>of</strong> <strong>the</strong> hydrogen burning shell. As noted above, <strong>the</strong> mass <strong>of</strong><br />
this envelope is very small (M env 0.02M ⊙ ) for a subdwarf B star, meaning that its<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
24 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
luminosity is negligible. For a normal horizontal branch star, <strong>the</strong> luminosity <strong>of</strong> its<br />
hydrogen envelope equals or even exceeds that <strong>of</strong> <strong>the</strong> helium core.<br />
How <strong>the</strong> hot subdwarfs come to arrive on <strong>the</strong> extended horizontal branch is still<br />
under debate. A number <strong>of</strong> scenarios have been proposed to explain <strong>the</strong> evolution <strong>of</strong><br />
sdB stars.<br />
In <strong>the</strong> single star scenario, enhanced mass loss on <strong>the</strong> red giant branch due to stellar<br />
winds may remove all <strong>of</strong> <strong>the</strong> hydrogen-rich envelope before core helium burning begins<br />
(D’Cruz et al., 1996).<br />
In <strong>the</strong> binary scenario (see Figure 1.5c), Mengel et al. (1976) suggest sdB’s could<br />
be formed from relatively wide binaries. Mass transfer through stable Roche Lobe<br />
overflow results in a depletion <strong>of</strong> <strong>the</strong> hydrogen-rich envelope prior to <strong>the</strong> helium core<br />
flash. If <strong>the</strong> sdB progenitor and its compact companion are in a close binary system, a<br />
common-envelope phase can result in <strong>the</strong> creation <strong>of</strong> a helium star. More recent work<br />
(Maxted et al., 2001) suggests ∼2/3 <strong>of</strong> sdBs are in close binary systems.<br />
sdO Evolution<br />
The atmospheric parameters <strong>of</strong> sdO stars show <strong>the</strong>m to be less homogenous than <strong>the</strong><br />
sdBs. Generally, <strong>the</strong>y appear to fall into two subgroups on <strong>the</strong> log g–T eff plane. <strong>On</strong>e<br />
group (“compact” sdOs) lies close to <strong>the</strong> <strong>the</strong>oretical post-extended horizontal branch<br />
evolutionary tracks, and <strong>the</strong>refore might have evolved from sdB stars. The o<strong>the</strong>r group<br />
have lower surface gravities (“luminous” sdOs), lying closer to <strong>the</strong> post-asymptotic<br />
giant branch tracks. These stars are found in <strong>the</strong> same region on <strong>the</strong> log g–T eff plane<br />
as <strong>the</strong> central stars <strong>of</strong> planetary nebulae.<br />
Various evolutionary scenarios have been proposed to explain <strong>the</strong> origin <strong>of</strong> sdO stars,<br />
and it is unlikely that a single scenario can come to explain both subgroups.<br />
Several <strong>the</strong>ories exist for “compact” sdOs. The Post EHB scenario attempts to
1.4 Hot Subdwarf Stars 25<br />
explain <strong>the</strong> large number <strong>of</strong> sdOs found at <strong>the</strong> extreme end <strong>of</strong> <strong>the</strong> horizontal branch,<br />
along <strong>the</strong> helium burning main sequence, which suggests a close connection to sdB stars<br />
(Caloi, 1989; Dorman et al., 1993). But how does an sdB star become an sdO? It has<br />
been suggested that <strong>the</strong> hydrogen-rich envelope <strong>of</strong> an sdB can re-ignite during <strong>the</strong> postextended<br />
horizontal branch phase, causing <strong>the</strong> star to evolve towards <strong>the</strong> asymptotic<br />
giant branch. However, <strong>the</strong> luminosity <strong>of</strong> <strong>the</strong> star is not sufficient to let it ascend <strong>the</strong><br />
asymptotic giant branch, so <strong>the</strong> star returns to <strong>the</strong> sdO region. Dreizler et al. (1990)<br />
propose an alternate <strong>the</strong>ory wherein deep mixing <strong>of</strong> <strong>the</strong> star’s atmosphere by helium<br />
shell flashes could explain <strong>the</strong> helium enrichment seen in sdO stars.<br />
O<strong>the</strong>r explanations for compact sdOs include <strong>the</strong> delayed helium flash scenario proposed<br />
by Sweigart (1997) which suggests that if mass loss during <strong>the</strong> red giant branch<br />
is too high, <strong>the</strong>n <strong>the</strong> helium core never reaches ignition mass, and <strong>the</strong> star ends up as a<br />
helium white dwarf without going through a horizontal branch phase. Alternatively, if<br />
<strong>the</strong> ignition <strong>of</strong> helium is delayed but can still occur on <strong>the</strong> white dwarf cooling sequence,<br />
it will take <strong>the</strong> star into <strong>the</strong> region <strong>of</strong> <strong>the</strong> sdO stars.<br />
A third evolutionary scenario comes from binary white dwarf mergers as studied by<br />
Iben (1990). It was found that <strong>the</strong> evolution <strong>of</strong> close binary systems, leading to <strong>the</strong><br />
merger <strong>of</strong> He+He and CO+He white dwarfs, could produce low-mass helium burning<br />
stars similar to sdOs. Strong support for this scenario comes from Napiwotzki et al.<br />
(2004) who found that almost all <strong>of</strong> <strong>the</strong> sdO stars in <strong>the</strong>ir sample were apparently<br />
single.<br />
For <strong>the</strong> “luminous” sdO stars, Heber & Hunger (1987) suggest that <strong>the</strong>y are “born<br />
again post-asymptotic giant branch” stars. In this scenario (Iben et al., 1983), a postasymptotic<br />
giant branch star undergoes a late helium shell flash, sending it to <strong>the</strong><br />
asymptotic giant branch for a second time. During this phase, <strong>the</strong> outer hydrogen<br />
envelope can be completely removed by stellar winds, leaving <strong>the</strong> star with <strong>the</strong> appearance<br />
<strong>of</strong> a luminous sdO star. Husfeld et al. (1989) suggest that a small number <strong>of</strong> sdO<br />
stars are also formed from normal post-asymptotic giant branch evolution.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
26 Chapter 1 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
1.4.3 Why Study Them?<br />
The study <strong>of</strong> hot subdwarfs is important in several respects. As <strong>the</strong>y exist in large<br />
numbers, and have been shown to be highly evolved stars, <strong>the</strong>y are useful indicators for<br />
studying <strong>the</strong> structure and evolution <strong>of</strong> <strong>the</strong> galaxy. Brown et al. (1997) suggest that<br />
<strong>the</strong>se stars are <strong>the</strong> main cause <strong>of</strong> <strong>the</strong> ultraviolet upturn phenomenon (UV excess) seen<br />
in elliptical galaxies and <strong>the</strong> bulges <strong>of</strong> o<strong>the</strong>r spiral galaxies because <strong>the</strong>y spend a long<br />
time (10 8 yrs) on <strong>the</strong> extended horizontal branch at high temperatures. They are also<br />
considered to be useful age indicators for elliptical galaxies (Brown et al., 2000).<br />
As described previously, <strong>the</strong> hot subdwarfs are interesting in <strong>the</strong>ir own right because<br />
<strong>the</strong>ir evolution cannot seem to be explained by canonical stellar evolution <strong>the</strong>ories. This<br />
makes <strong>the</strong>m important objects from an astrophysical point <strong>of</strong> view.<br />
1.4.4 Why Search For Them In The SDSS?<br />
The Sloan Digital Sky Survey project and <strong>the</strong> data it produces is a prime example <strong>of</strong><br />
where <strong>the</strong> future <strong>of</strong> astronomy is heading.<br />
The main observational goal <strong>of</strong> <strong>the</strong> survey is to collect photometric and spectroscopic<br />
data on galaxies and quasars. However, many quasars appear as very blue objects, so<br />
<strong>the</strong> SDSS will observe spectra for a lot <strong>of</strong> blue stars, such as white dwarfs and hot<br />
subdwarfs, because <strong>the</strong>se objects cannot be differentiated at <strong>the</strong> photometric level.<br />
This makes <strong>the</strong> SDSS an unbiased, magnitude-limited survey containing potentially<br />
hundreds <strong>of</strong> moderate resolution (∼ 3.0Å), fully reduced hot subdwarf spectra which<br />
can be used to statistically identify new subgroups within an extracted sample. The<br />
large, homogeneous, publicly accessible data archives are <strong>the</strong>refore an excellent test site<br />
for <strong>the</strong> tool set developed in this <strong>the</strong>sis.
1.5 Summary 27<br />
1.5 Summary<br />
The continual advancement <strong>of</strong> observational and information technology is driving astronomy<br />
forward as a data-rich discipline. A clear need has been identified for robust<br />
automatic methods to help analyse large databases <strong>of</strong> astronomical data, and extract<br />
from <strong>the</strong>m useful science.<br />
This <strong>the</strong>sis focuses on automatic tools to search for and analyse astronomical spectra<br />
in large databases. Artificial neural networks (Chapter 2), χ 2 minimisation (Chapter<br />
3), and principal components analysis (Chapter 4) are <strong>the</strong> methods used to construct<br />
a generalisable tool kit for performing this task.<br />
The tools will be demonstrated by applying <strong>the</strong>m to <strong>the</strong> problem <strong>of</strong> searching for<br />
and analysing <strong>the</strong> spectra <strong>of</strong> hot subdwarf stars from <strong>the</strong> archives <strong>of</strong> <strong>the</strong> Sloan Digital<br />
Sky Survey (Chapter 5). They will also be used to analyse o<strong>the</strong>r smaller data sets<br />
(Chapter 6).<br />
As <strong>the</strong> amount <strong>of</strong> data ga<strong>the</strong>red by astronomers increases, much work is needed to<br />
improve <strong>the</strong> ways in which it can be analysed, and solve <strong>the</strong> problems that lie ahead.<br />
Some <strong>of</strong> <strong>the</strong> issues encountered during this project are discussed, finally, in Chapter 7.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Chapter 2<br />
Classification - Artificial Neural<br />
Networks<br />
Artificial neural networks (ANNs) are a statistical machine learning algorithm best<br />
thought <strong>of</strong> as arbitrary function estimators. They are able to provide a non-linear<br />
parameterised mapping between some input vector, x, and an output vector, y.<br />
For example, in <strong>the</strong> case <strong>of</strong> stellar spectral classification, x is <strong>the</strong> feature vector<br />
containing <strong>the</strong> flux values <strong>of</strong> a spectrum over some wavelength range, and y is <strong>the</strong> classification<br />
assigned to x according to some classification standard. The mapping performed<br />
by <strong>the</strong> ANN is analogous to <strong>the</strong> process which leads an expert human classifier<br />
to assign classification y to spectrum x. This ability to replicate non-linear functions<br />
makes ANNs a powerful tool in astronomical data mining.<br />
In <strong>the</strong> context <strong>of</strong> machine learning, ANNs are part <strong>of</strong> a wider class <strong>of</strong> methods<br />
to approximate non-linear functions. Some <strong>of</strong> <strong>the</strong> mystery that commonly surrounds<br />
<strong>the</strong>ir use can be dispelled by relating several important issues to <strong>the</strong> simpler process <strong>of</strong><br />
polynomial curve fitting. Here, <strong>the</strong> problem is to fit a polynomial to a set <strong>of</strong> M points<br />
by minimising some error function. The n th -order polynomial is given by<br />
29
30 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
y(x) = w 0 + w 1 x + · · · + w n x n =<br />
n∑<br />
w i x i . (2.1)<br />
i=0<br />
If this is considered as a non-linear mapping which takes x as input and produces y<br />
as output, <strong>the</strong>n <strong>the</strong> exact form <strong>of</strong> <strong>the</strong> function y(x) is determined by <strong>the</strong> values <strong>of</strong> <strong>the</strong><br />
parameters w 0 ,... w n , which are analogous to <strong>the</strong> weights in a neural network.<br />
The weights can be determined by minimising an error function which compares<br />
<strong>the</strong> desired output from <strong>the</strong> polynomial, d(x k ), for each input value, x k , and <strong>the</strong> polynomial’s<br />
actual output, y(x k ), for instance, <strong>the</strong> commonly used sum-<strong>of</strong>-squares error<br />
function,<br />
E = 1 ∑<br />
(y(x k ) − d(x k )) 2 . (2.2)<br />
2<br />
k<br />
The minimisation <strong>of</strong> an error function such as Equation 2.2, which involves target<br />
values for <strong>the</strong> polynomial outputs, is called supervised learning since for each input value<br />
<strong>the</strong> desired output is specified. This is also a common way to determine <strong>the</strong> weights <strong>of</strong><br />
a neural network for a particular application (<strong>the</strong> back-propagation algorithm adjusts<br />
<strong>the</strong> weights by calculating <strong>the</strong> derivatives <strong>of</strong> <strong>the</strong> error function with respect to <strong>the</strong><br />
weights). A second form <strong>of</strong> learning, called unsupervised learning, does not involve <strong>the</strong><br />
use <strong>of</strong> target data. In <strong>the</strong> context <strong>of</strong> neural networks, this form <strong>of</strong> learning can be used<br />
to discover clusters or o<strong>the</strong>r patterns in a data set.<br />
If <strong>the</strong> polynomial <strong>of</strong> Equation 2.1 is being trained to model a particular inputoutput<br />
mapping via supervised training, <strong>the</strong>n <strong>the</strong> goal is to have a model which gives<br />
good predictions for new data, in o<strong>the</strong>r words one which exhibits good generalisation<br />
properties. <strong>On</strong>e <strong>of</strong> <strong>the</strong> factors which influences a model’s ability to generalise is <strong>the</strong><br />
number <strong>of</strong> free parameters it has (i.e., <strong>the</strong> number <strong>of</strong> degrees <strong>of</strong> freedom). If a firstorder<br />
polynomial is chosen to model a non-linear mapping, <strong>the</strong>n it will generalise poorly<br />
because a linear function is not flexibile enough to match <strong>the</strong> underlying mapping
31<br />
function very well. In o<strong>the</strong>r words, <strong>the</strong> model has a high bias, meaning that <strong>the</strong><br />
complexity <strong>of</strong> <strong>the</strong> polynomial is not sufficient to model <strong>the</strong> actual mapping function.<br />
The bias can be reduced by increasing <strong>the</strong> number <strong>of</strong> degrees <strong>of</strong> freedom, i.e., increasing<br />
<strong>the</strong> order <strong>of</strong> <strong>the</strong> polynomial. This gives it greater flexibility to model <strong>the</strong> non-linear<br />
mapping. However, if <strong>the</strong> order is increased too much, <strong>the</strong> polynomial’s approximation<br />
to <strong>the</strong> underlying function will actually get worse - <strong>the</strong> mapping may give an exact<br />
fit to <strong>the</strong> training data, but its ability to generalise is hampered by highly oscillatory<br />
behaviour between training points. Such a model is said to over-fit <strong>the</strong> training data,<br />
and has a high variance meaning that <strong>the</strong> model is sensitive to <strong>the</strong> training data (i.e.,<br />
quantity, noise, distribution, etc.).<br />
The point <strong>of</strong> best generalisation is determined by a trade-<strong>of</strong>f between <strong>the</strong> model’s<br />
bias and variance, and occurs when <strong>the</strong> number <strong>of</strong> degrees <strong>of</strong> freedom in <strong>the</strong> model is<br />
relatively small compared to <strong>the</strong> size <strong>of</strong> <strong>the</strong> training data set. The quantity <strong>of</strong> training<br />
data is a significant factor in achieving good generalisation. As <strong>the</strong> quantity <strong>of</strong> training<br />
data increases, <strong>the</strong> model’s complexity can be increased, <strong>the</strong>reby reducing bias, while<br />
ensuring that <strong>the</strong> model is more heavily constrained, <strong>the</strong>reby also reducing variance.<br />
In <strong>the</strong> context <strong>of</strong> neural networks, <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> model is determined by <strong>the</strong><br />
number and structure <strong>of</strong> <strong>the</strong> internal weights. The weights are arranged in a network <strong>of</strong><br />
layers, with more layers allowing <strong>the</strong> ANN to model essentially any non-linear function.<br />
However, as illustrated in <strong>the</strong> discussion <strong>of</strong> polynomial fitting, a neural network with<br />
too much complexity may succeed in “memorising” <strong>the</strong> training data by over fitting<br />
<strong>the</strong>m and <strong>the</strong>refore yielding poor generalisation properties. A number <strong>of</strong> techniques<br />
exist to combat over-fitting and regularise, or smooth, <strong>the</strong> mapping produced by neural<br />
networks, such as weight decay which adds a penalty term to <strong>the</strong> error function that<br />
weights against large values for <strong>the</strong> network’s internal weights, early stopping <strong>of</strong> <strong>the</strong><br />
training process also prevents <strong>the</strong> network weights from becoming too large, and adding<br />
noise to <strong>the</strong> training data set makes it more difficult for <strong>the</strong> neural network to over-fit.<br />
A more detailed review <strong>of</strong> basic neural network <strong>the</strong>ory can be found in Bishop (1995).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
32 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Previous work by o<strong>the</strong>rs in <strong>the</strong> field <strong>of</strong> automatic stellar spectral analysis demonstrates<br />
that ANNs are well-suited to fast classification and parameterisation <strong>of</strong> large<br />
quantities <strong>of</strong> spectra from across <strong>the</strong> main sequence. See, for example, Gulati et al.<br />
(1994b), von Hippel et al. (1994), Storrie-Lombardi et al. (1994), Weaver & Torres-<br />
Dodgen (1995), Bailer-Jones (1996), Gulati et al. (1996), Bailer-Jones (1997), Gulati<br />
et al. (1997b), Weaver & Torres-Dodgen (1997), Bailer-Jones et al. (1997), Bailer-Jones<br />
et al. (1998), Singh et al. (1998), Rhee et al. (1999), Allende Prieto et al. (2000), Weaver<br />
(2000b), and Snider et al. (2001).<br />
Here, <strong>the</strong> feedforward multilayer back-propagation ANN code STATNET <strong>of</strong> Bailer-<br />
Jones (1996) is used to obtain classifications and acquire astrophysical parameters from<br />
a sample <strong>of</strong> hot subdwarf spectra. The values for <strong>the</strong> astrophysical parameters are<br />
compared with those obtained from a different computerised technique, that <strong>of</strong> χ 2<br />
minimisation as implemented in <strong>the</strong> code SFIT (see Chapter 3).<br />
2.1 Classifying Hot Subdwarfs<br />
The hot subdwarfs do not fall within <strong>the</strong> scope <strong>of</strong> <strong>the</strong> standard MK system (Morgan<br />
et al., 1978), <strong>the</strong>refore Drilling et al. (2006) have extended and refined <strong>the</strong> earlier work<br />
<strong>of</strong> Drilling (1996) and Jeffery et al. (1997) to construct a three-dimensional MK-like<br />
classification scale for hot subdwarfs. This scale is based upon a sample <strong>of</strong> spectra<br />
from a number <strong>of</strong> sources, covering <strong>the</strong> wavelength region 4050–4900Å<br />
at a resolution<br />
<strong>of</strong> 2.5Å , and consists <strong>of</strong> a ‘spectral’ class, ‘luminosity’ class, and a ‘helium’ class.<br />
The classification scale uses a spectral type running from sdO1 to sdA (1 – 20),<br />
analogous to MK spectral classes. It introduces a helium class (0 – 40) based on H, HeI<br />
and HeII line strengths, and uses luminosity classes IV – VIII, where most subdwarfs<br />
have luminosity class ∼VII. The mapping between <strong>the</strong> Drilling et al. (2006) classes and<br />
those used elsewhere, e.g. <strong>the</strong> PG survey (Green et al., 1986), is illustrated in figure 16<br />
<strong>of</strong> Drilling et al. (2006).
2.1 Classifying Hot Subdwarfs 33<br />
2.1.1 The Training Sample<br />
A set <strong>of</strong> subdwarf spectra was taken from a collection compiled by Drilling et al. (2006)<br />
from data provided by Moehler et al. (1990a,b), Dreizler et al. (1990), and Theissen<br />
et al. (1993). It comprises a representative sample <strong>of</strong> 174 PG subdwarfs and blue<br />
horizontal branch stars, plus a few o<strong>the</strong>r stars not included in <strong>the</strong> PG catalog. Several<br />
observations have been supplied for many <strong>of</strong> <strong>the</strong> targets with <strong>the</strong> sample containing<br />
471 spectra in total at an approximate resolution <strong>of</strong> 2.5 Å.<br />
The spectra are not homogeneous. Due to <strong>the</strong> data being ga<strong>the</strong>red by different<br />
observers using different equipment at different locations, etc., a number <strong>of</strong> issues affect<br />
<strong>the</strong> sample including: calibration anomalies, velocity shifting, different windows <strong>of</strong><br />
wavelength coverage, inconsistent S/Ns and dispersion intervals, and so on.<br />
A pre-processing step was needed to correct <strong>the</strong>se problems and establish a more<br />
homogenous sample. The spectra were visually inspected to select <strong>the</strong> best samples for<br />
each star. The resulting 359 spectra were corrected for large cosmic spikes and instrumental<br />
end-effects. A velocity shift correction was applied by cross-correlating each<br />
spectrum with a grid <strong>of</strong> <strong>the</strong>oretical spectra chosen to coarsely cover <strong>the</strong> approximate<br />
T eff , log g, log(n He /n H ) range <strong>of</strong> <strong>the</strong> Drilling et al. (2006) classification scale. Finally,<br />
<strong>the</strong> spectra were rebinned onto a common wavelength grid <strong>of</strong> 4050 – 4950 Å with a<br />
dispersion <strong>of</strong> 1 Å pixel−1 .<br />
It should be noted that <strong>the</strong> radial velocity correction described above already partly<br />
solves <strong>the</strong> parameterisation problem by choosing <strong>the</strong> best fitting model from <strong>the</strong> grid.<br />
As such, training <strong>the</strong> neural network to solve for this parameter simultaneously alongside<br />
<strong>the</strong> o<strong>the</strong>r astrophysical parameters may be a more convenient approach. However,<br />
this was not attempted here.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
34 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
0<br />
I<br />
II<br />
Luminosity Class<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
VIII<br />
IX<br />
O O5 B B5 A<br />
<strong>Spectra</strong>l Type<br />
40<br />
30<br />
Helium Class<br />
20<br />
10<br />
0<br />
O O5 B B5 A<br />
<strong>Spectra</strong>l Type<br />
40<br />
30<br />
Helium Class<br />
20<br />
10<br />
0<br />
IX<br />
VIII<br />
VII<br />
VI<br />
V<br />
IV<br />
III<br />
II<br />
I<br />
0<br />
Luminosity Class<br />
Figure 2.1: The training sample shows clustering in certain regions <strong>of</strong> <strong>the</strong> classification<br />
space. For clarity, points have been <strong>of</strong>fset by small random shifts in both coordinates.
2.1 Classifying Hot Subdwarfs 35<br />
2.1.2 Methodology<br />
As described at <strong>the</strong> beginning <strong>of</strong> <strong>the</strong> chapter, training an ANN to learn <strong>the</strong> Drilling<br />
et al. (2006) classification system involves iterating over <strong>the</strong> training set and minimising<br />
<strong>the</strong> sum-<strong>of</strong>-squares error function between <strong>the</strong> desired output and <strong>the</strong> network’s<br />
actual output (see Equation 2.2) with respect to <strong>the</strong> ANN’s internal parameters. The<br />
minimisation process continues until some criterion <strong>of</strong> convergence has been reached<br />
(e.g., when <strong>the</strong> weight updates have become very small).<br />
A typical strategy to assess network performance after training is to apply <strong>the</strong> network<br />
to an application set for which <strong>the</strong> “true” classifications are known. Unfortunately,<br />
no o<strong>the</strong>r suitable set <strong>of</strong> spectra previously classified onto <strong>the</strong> Drilling et al. (2006) scale<br />
were available for <strong>the</strong> study presented here.<br />
An alternative is to split <strong>the</strong> Drilling sample into two similarly sized sets, with one<br />
used for training, and <strong>the</strong> o<strong>the</strong>r to quantify performance. However, as <strong>the</strong> Drilling<br />
sample is small, and its distribution across <strong>the</strong> parameter space is limited (see Figure<br />
2.1), a concern is that <strong>the</strong>re may not be enough data to constrain <strong>the</strong> model if <strong>the</strong><br />
sample is split into two smaller subsets.<br />
<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, if <strong>the</strong> two subset approach is changed slightly, <strong>the</strong>re is a way to<br />
determine how well a given ANN model performs using only <strong>the</strong> data in <strong>the</strong> Drilling<br />
sample. A technique called N-fold cross-validation, or <strong>the</strong> leave-one-out method, permits<br />
<strong>the</strong> greatest number <strong>of</strong> samples to be used in training while still giving an idea <strong>of</strong><br />
ANN performance over <strong>the</strong> whole sample set.<br />
The method proceeds by assuming a data set <strong>of</strong> size N. Each datum is left out in<br />
turn and <strong>the</strong> ANN is trained on <strong>the</strong> remaining N −1 samples. The ANN’s performance<br />
is <strong>the</strong>n assessed by classifying <strong>the</strong> omitted datum. No random sampling is involved<br />
with this method, so repeating <strong>the</strong> procedure for a particular ANN model always gives<br />
<strong>the</strong> same result.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
36 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
The leave-one-out method carries with it a large computational cost as each ANN<br />
model must be trained N times (in this case, N = 359). As several models are to<br />
be tested, <strong>the</strong> computational burden was alleviated by <strong>the</strong> construction <strong>of</strong> a small distributed<br />
cluster <strong>of</strong> 15 ordinary desktop workstations at <strong>Armagh</strong> <strong>Observatory</strong> using <strong>the</strong><br />
Condor batch system (e.g., Livny & Raman, 1998). The cluster reduced <strong>the</strong> computation<br />
time for <strong>the</strong> leave-one-out procedure by a factor <strong>of</strong> ∼10 compared to using only a<br />
single workstation.<br />
To determine <strong>the</strong> optimal complexity <strong>of</strong> <strong>the</strong> ANN model, two different ANN architectures<br />
were studied, one with a single hidden layer <strong>of</strong> 10 nodes, and one with two<br />
hidden layers <strong>of</strong> 5 nodes each. The notation used to refer to <strong>the</strong>se architectures are<br />
901:10:3,and 901:5:5:3, respectively.<br />
This notation explains <strong>the</strong> structure <strong>of</strong> <strong>the</strong> neural network in terms <strong>of</strong> layers <strong>of</strong><br />
processing nodes, and <strong>the</strong> number <strong>of</strong> nodes in each layer. For each network being<br />
tested, an input layer <strong>of</strong> 901 nodes corresponds to <strong>the</strong> 901 flux points in <strong>the</strong> preprocessed<br />
observational sample, and an output layer <strong>of</strong> three nodes corresponds to<br />
each parameter in <strong>the</strong> classification scale: spectral type, luminosity class, and helium<br />
class.<br />
For each model architecture, a committee <strong>of</strong> five ANNs was formed. The committee<br />
approach (see Bishop, 1995, sect. 9.6) trains a number <strong>of</strong> ANNs on <strong>the</strong> same data, and<br />
applies <strong>the</strong>m in unison on a new datum. The results from each <strong>of</strong> <strong>the</strong> ANNs are <strong>the</strong>n<br />
averaged toge<strong>the</strong>r to provide a combined result. In STATNET, each network in <strong>the</strong><br />
committee is initialised with different random values for <strong>the</strong> weights, so <strong>the</strong> committee<br />
approach seeks to achieve more robust results by averaging out ‘convergence noise’ due<br />
to <strong>the</strong> variance <strong>of</strong> <strong>the</strong> model causing <strong>the</strong> minimisation process to get caught in local<br />
minima, with <strong>the</strong> final set <strong>of</strong> weights <strong>the</strong>refore being different for each committee ANN.<br />
The leave-one-out method was carried out for five different training epochs for each<br />
architecture: 150, 300, 500, 700, and 1000 iterations <strong>of</strong> <strong>the</strong> optimisation procedure.<br />
This required about four days <strong>of</strong> continuous computation on <strong>the</strong> Condor cluster. The
2.1 Classifying Hot Subdwarfs 37<br />
approach <strong>of</strong> stopping <strong>the</strong> training procedure early is a method <strong>of</strong> regularising <strong>the</strong> ANN<br />
models.<br />
STATNET also implements a weight decay factor in <strong>the</strong> neural network’s sum-<strong>of</strong>squares<br />
error function, but this feature was not used here (or in <strong>the</strong> parameterisation<br />
network described in <strong>the</strong> next section). Weight decay attempts to prevent <strong>the</strong> ANN<br />
model from over-fitting <strong>the</strong> training data by discriminating against network weights<br />
that become too large during training. Large network weights (which can occur if <strong>the</strong><br />
network is trained for too long) increase <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> mapping because <strong>the</strong>y<br />
produce regions <strong>of</strong> high curvature in <strong>the</strong> input-output parameter space.<br />
As <strong>the</strong> classification training set was small, it was felt that weight decay should<br />
not be used here in order to preserve <strong>the</strong> structure and curvature <strong>of</strong> <strong>the</strong> input-output<br />
mapping. An alternative is early stopping which regularises <strong>the</strong> network by limiting<br />
<strong>the</strong> effective number <strong>of</strong> degrees <strong>of</strong> freedom. This number is supposed to start out small<br />
and <strong>the</strong>n grow during <strong>the</strong> minimisation <strong>of</strong> <strong>the</strong> sum-<strong>of</strong>-squares error function, which<br />
corresponds to a steady increase in <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> model.<br />
If <strong>the</strong> network error is measured against a validation sample, as is done here via <strong>the</strong><br />
leave-one-out method, it is typically observed that this error <strong>of</strong>ten shows a decrease at<br />
first, followed by an increase as <strong>the</strong> network starts to over-fit. The network’s training<br />
procedure can be terminated close to <strong>the</strong> point <strong>of</strong> smallest error since this gives a<br />
network which is expected to have <strong>the</strong> best generalisation performance.<br />
STATNET posesses <strong>the</strong> capability to add weighting factors to each <strong>of</strong> <strong>the</strong> network’s<br />
outputs so that certain outputs contribute more to <strong>the</strong> sum-<strong>of</strong>-squares error minimisation<br />
than o<strong>the</strong>rs. These weighting factors (called ‘β’ parameters in STATNET) allow<br />
<strong>the</strong> user to control <strong>the</strong> level <strong>of</strong> modelling precision for each output variable. If this is<br />
limited by <strong>the</strong> noise in <strong>the</strong> data, 1/ √ β should be approximately equal to <strong>the</strong> standard<br />
deviation <strong>of</strong> <strong>the</strong> noise in <strong>the</strong> output variable.<br />
STATNET includes a data scaling option which separately scales each input and<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
38 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
output variable to have zero mean and unit standard devition. With respect to <strong>the</strong> β<br />
parameters, <strong>the</strong> variance scaling casts each β in terms <strong>of</strong> <strong>the</strong> scaled variables. Therefore,<br />
1/ √ β roughly interprets as <strong>the</strong> fractional uncertainty in a particular output variable.<br />
As an example, <strong>the</strong> default β value <strong>of</strong> 6.0 corresponds to a standard deviation <strong>of</strong> 0.4.<br />
Thus, if <strong>the</strong> data are variance scaled and roughly normally distributed, 95% <strong>of</strong> <strong>the</strong> data<br />
will lie in <strong>the</strong> range −2 to +2, so this standard deviation corresponds to approximately<br />
a 10% uncertainty.<br />
In terms <strong>of</strong> <strong>the</strong> Drilling et al. (2006) classification parameters, <strong>the</strong> expected accuracy<br />
in each parameter for a human classifier is ±2 spectral types, ±1 luminosity class,<br />
and ±2 helium classes. These correspond to uncertainties <strong>of</strong> 10%, 12.5%, and 5%<br />
respectively. Therefore, <strong>the</strong> STATNET β parameters were set to 6.0 for <strong>the</strong> spectral<br />
type output, 4.0 for luminosity class, and 25 for helium class.<br />
2.1.3 Results<br />
Tables 2.1 and 2.2 give <strong>the</strong> σ rms and correlation coefficient values, r, comparing each<br />
ANN architecture’s results with <strong>the</strong> classifications assigned by Drilling et al. (2006) as<br />
determined by <strong>the</strong> leave-one-out method.<br />
901:10:3<br />
150 300 500 700 1000<br />
SpT 2.1041 2.1967 2.2338 2.2947 2.3434<br />
σ rms LC 1.1771 1.1835 1.2199 1.2435 1.2627<br />
HeC 5.5604 4.5434 4.3255 4.3540 4.5109<br />
SpT 0.8710 0.8621 0.8586 0.8523 0.8473<br />
r LC 0.8209 0.8201 0.8123 0.8061 0.8012<br />
HeC 0.9216 0.9483 0.9533 0.9527 0.9491<br />
Table 2.1: Results <strong>of</strong> <strong>the</strong> leave-one-out procedure as applied to a committee <strong>of</strong> five<br />
901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations.<br />
The large σ rms values for helium scale classifications, apparent in both tables, suggest<br />
<strong>the</strong> ANNs are having difficulty generalising for this parameter. However, <strong>the</strong> high
2.1 Classifying Hot Subdwarfs 39<br />
901:5:5:3<br />
150 300 500 700 1000<br />
SpT 1.7446 1.8296 1.9593 2.0626 2.2202<br />
σ rms LC 1.0574 1.0766 1.1078 1.1536 1.2156<br />
HeC 6.2962 5.2257 4.3019 4.1405 4.2633<br />
SpT 0.9065 0.8983 0.8858 0.8759 0.8599<br />
r LC 0.8507 0.8621 0.8389 0.8272 0.8116<br />
HeC 0.9007 0.9316 0.9528 0.9573 0.9547<br />
Table 2.2: As Table 2.1, but for <strong>the</strong> committee <strong>of</strong> five 901:5:5:3 ANNs.<br />
correlation coefficients suggest a good learning response. There are several possible<br />
reasons for this. Firstly, it could be due to a problem with <strong>the</strong> neural network model<br />
itself, ei<strong>the</strong>r a regularisation issue (e.g., not using weight decay), or sub-optimal settings<br />
<strong>of</strong> <strong>the</strong> β parameters. Secondly, it is possible that <strong>the</strong> neural networks simply cannot do<br />
any better for this parameter, in which case <strong>the</strong> attention turns to <strong>the</strong> Drilling et al.<br />
(2006) classification scale itself and <strong>the</strong> observational sample on which this study is<br />
based.<br />
If <strong>the</strong> S/N <strong>of</strong> <strong>the</strong> observational sample is not sufficiently high enough for <strong>the</strong> ANNs<br />
to generalise well for <strong>the</strong> helium scale, this could be affecting <strong>the</strong> bias and variance <strong>of</strong><br />
<strong>the</strong> models, making it difficult to ascertain <strong>the</strong> underlying mapping function. It is also<br />
possible that <strong>the</strong> helium scale itself is too fine-grained. If <strong>the</strong> helium scale was scaled<br />
down by a factor <strong>of</strong> ∼4, a corresponding four-fold reduction in <strong>the</strong> σ rms errors would<br />
be observed (<strong>the</strong> corresponding correlation coefficients would remain unchanged as this<br />
statistic is not affected by scaling effects). This would bring <strong>the</strong>m in line with those <strong>of</strong><br />
spectral type and luminosity class, i.e., σ rms ∼ ±1 helium class.<br />
Fur<strong>the</strong>r investigation <strong>of</strong> this issue is required.<br />
It can be seen in Tables 2.1 and 2.2 that both architectures are able to learn <strong>the</strong><br />
appropriate spectral features associated with spectral type and luminosity class within<br />
<strong>the</strong> first 250-300 epochs <strong>of</strong> <strong>the</strong> training procecdure. After this point, fur<strong>the</strong>r training<br />
only serves to degrade performance with respect to <strong>the</strong>se parameters which indicates<br />
that <strong>the</strong> models are starting to over-fit <strong>the</strong> training data.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
40 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
For <strong>the</strong> helium scale, both architectures yield optimal classifications after a few<br />
hundred more training epochs. The 901:10:3 architecture achieved best performance<br />
at around 500 iterations, and <strong>the</strong> 901:5:5:3 architecture reached its optimum at around<br />
700 iterations. A similar phenomenon was reported by Snider et al. (2001, sect. 5.1),<br />
although Willemsen et al. (2005) did not observe <strong>the</strong> same effect.<br />
The optimal trade-<strong>of</strong>f in accuracy between <strong>the</strong> classification parameters occurs at<br />
around 300 training epochs for <strong>the</strong> 901:10:3 architecture, and 500 epochs for <strong>the</strong><br />
901:5:5:3 architecture. The results <strong>of</strong> <strong>the</strong>se two ANNs are compared with <strong>the</strong> actual<br />
Drilling et al. (2006) classifications in Figure 2.2.<br />
2.2 Physical Parameters<br />
The ability <strong>of</strong> neural network models to obtain astrophysical parameters <strong>of</strong> hot subdwarf<br />
spectra was tested by generating a grid <strong>of</strong> syn<strong>the</strong>tic spectra to be used as a training<br />
set, and extracting two application sets from <strong>the</strong> Drilling et al. (2006) sample.<br />
The first application set contains 60 stars which were used by Drilling et al. to<br />
calibrate <strong>the</strong>ir classification system against <strong>the</strong> physical parameters <strong>of</strong> T eff , log g, and<br />
log(n He /n H ). These 60 stars have been previously analysed by <strong>the</strong>ir original observers,<br />
with astrophyscial parameters being derived mostly by <strong>the</strong> method <strong>of</strong> fine analysis.<br />
The second application set contains 133 stars from <strong>the</strong> Drilling et al. sample for<br />
which no astrophysical parameters have been listed in Drilling et al. (2006).<br />
Using <strong>the</strong> first application set, <strong>the</strong> neural network results for those stars can be<br />
compared against <strong>the</strong> results <strong>of</strong> <strong>the</strong> fine analyses performed by <strong>the</strong> original observers.<br />
However, <strong>the</strong> second application set has no measure <strong>of</strong> comparison. For that, <strong>the</strong> χ 2<br />
fitting code used at <strong>Armagh</strong> <strong>Observatory</strong>, SFIT, is used to derive a set <strong>of</strong> astrophysical<br />
parameters based on a grid <strong>of</strong> syn<strong>the</strong>tic spectra. SFIT is also applied to <strong>the</strong> first<br />
application set to serve as second comparison for <strong>the</strong> neural network results.
2.2 Physical Parameters 41<br />
Architecture 901:10:3<br />
Architecture 901:5:5:3<br />
A<br />
A<br />
ANN <strong>Spectra</strong>l Type<br />
B5<br />
B<br />
O5<br />
ANN <strong>Spectra</strong>l Type<br />
B5<br />
B<br />
O5<br />
O<br />
O<br />
O O5 B B5 A<br />
Drilling <strong>Spectra</strong>l Type<br />
O O5 B B5 A<br />
Drilling <strong>Spectra</strong>l Type<br />
0<br />
0<br />
I<br />
I<br />
ANN Luminosity Class<br />
II<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
ANN Luminosity Class<br />
II<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
VIII<br />
VIII<br />
IX<br />
IX<br />
IX<br />
VIII<br />
VII VI V IV III<br />
Drilling Luminosity Class<br />
II<br />
I<br />
0<br />
IX<br />
VIII<br />
VII VI V IV III<br />
Drilling Luminosity Class<br />
II<br />
I<br />
0<br />
40<br />
40<br />
ANN Helium Class<br />
30<br />
20<br />
10<br />
ANN Helium Class<br />
30<br />
20<br />
10<br />
0<br />
0<br />
0 10 20 30 40<br />
Drilling Helium Class<br />
0 10 20 30 40<br />
Drilling Helium Class<br />
Figure 2.2: Results <strong>of</strong> <strong>the</strong> leave-one-out procedure for both ANN architectures at <strong>the</strong><br />
near-optimal training time <strong>of</strong> 300 iterations for <strong>the</strong> 901:10:3 architecture (left column),<br />
and 500 iterations for <strong>the</strong> 901:5:5:3 architecture (right column). Also plotted is <strong>the</strong><br />
best-fit linear least squares line.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
42 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
The neural network training grid contains 2009 syn<strong>the</strong>tic spectra generated using<br />
<strong>the</strong> line-blanketed LTE spectral syn<strong>the</strong>sis code SPECTRUM (Jeffery et al., 2001).<br />
The grid covered <strong>the</strong> parameter space in T eff : 12000 - 50000K, ∆T eff ∼5000K; log g:<br />
3.5 - 6.0 dex, ∆log g = 0.5 dex; and log(n He /n H ): -3 - 3 dex, in 10 non-uniformly spaced<br />
intervals.<br />
In order to match this training grid to <strong>the</strong> Drilling et al. (2006) observations, each<br />
syn<strong>the</strong>tic spectrum was first convolved with a Gaussian to lower its resolution to that<br />
<strong>of</strong> <strong>the</strong> observations (∼ 2.5 Å), and <strong>the</strong>n re-binned onto <strong>the</strong> same wavelength grid as <strong>the</strong><br />
observations (4050–4950 Å at a 1.0 Å dispersion).<br />
Design limitations in <strong>the</strong> χ 2 minimisation code used at <strong>Armagh</strong> <strong>Observatory</strong>, SFIT<br />
(which are dealt with in Chapter 3), required a smaller grid <strong>of</strong> syn<strong>the</strong>tic spectra to<br />
be used. The grid covered <strong>the</strong> parameter space: T eff = {15,20,25,30,35, 40, 50}kK,<br />
log g = {3,4,5,6}, and log(n He /n H ) = {−3, −1,0,+1,+3}. This grid is commensurate<br />
with <strong>the</strong> dispersion and S/N present in <strong>the</strong> Drilling et al. (2006) sample.<br />
A default instrumental pr<strong>of</strong>ile <strong>of</strong> 1Å<br />
(FWHM) was assumed during <strong>the</strong> fitting for<br />
each application set, and all data points more than 5% above continuum were rejected.<br />
All three intrinsic parameters, T eff , log g, and log(n He /n H ), were free to vary in <strong>the</strong> χ 2<br />
optimisation.<br />
Solutions for v rad and v sin i were also obtained. The correction for v rad used during<br />
<strong>the</strong> pre-processing stage (Section 2.1.1) appeared to have left residual shifts <strong>of</strong> a few<br />
km s −1 , and, in one case, (possibly where Balmer lines were confused with Heii lines)<br />
<strong>of</strong> a couple <strong>of</strong> Ångströms. Overall, < v rad >= −1.9 ± 22.3kms −1 , <strong>the</strong> mean being<br />
satisfactorily close to <strong>the</strong> expectation value (0 km s −1 ). The solution for v sini allowed<br />
SFIT to be tolerant <strong>of</strong> both <strong>the</strong> varying instrumental resolution present in <strong>the</strong> data, and<br />
any rotational broadening present in <strong>the</strong> source. Formally, < v sin i >= 59 ± 39kms −1 .<br />
A single normalisation procedure was applied to remove small trends in <strong>the</strong> background<br />
continuum. Nine “continuum” regions free <strong>of</strong> hydrogen and helium lines were
2.2 Physical Parameters 43<br />
defined. After an initial optimisation step, <strong>the</strong> spectrum was divided by <strong>the</strong> initial fit.<br />
A second-order polynomial was fitted to this ratio using only <strong>the</strong> data in <strong>the</strong> continuum<br />
regions. An estimate <strong>of</strong> <strong>the</strong> true sample S/N was obtained from <strong>the</strong> RMS <strong>of</strong> <strong>the</strong> ratio<br />
around <strong>the</strong> polynomial fit in <strong>the</strong>se same regions. The sample was <strong>the</strong>n multiplied by<br />
<strong>the</strong> polynomial fit before a second optimisation step was applied.<br />
2.2.1 Methodology<br />
A control experiment was carried out to determine if neural network models trained<br />
on a set <strong>of</strong> syn<strong>the</strong>tic spectra at infinte S/N are able to accurately parameterise o<strong>the</strong>r<br />
syn<strong>the</strong>tic spectra over a range <strong>of</strong> S/Ns.<br />
The training grid <strong>of</strong> syn<strong>the</strong>tic spectra was randomly divided into two evenly sized<br />
training and application subsets. Several committees <strong>of</strong> different ANN architectures<br />
were trained on <strong>the</strong> training subset for range <strong>of</strong> training epochs. The intention here<br />
was to establish optimal model complexity for <strong>the</strong> task without using weight decay.<br />
The STATNET β parameters for each <strong>of</strong> <strong>the</strong> network output variables (T eff , log g,<br />
log(n He /n H )) were set to 6.0, estimating a 10% error in each parameter. This is commensurate<br />
with <strong>the</strong> spacing <strong>of</strong> <strong>the</strong> grid points over <strong>the</strong> parameter space, and assumes,<br />
conservatively, that <strong>the</strong> neural network model will do at least as well as nearest neighbour<br />
matching to <strong>the</strong> syn<strong>the</strong>tic spectra in <strong>the</strong> grid. Again, <strong>the</strong> Condor cluster allowed<br />
<strong>the</strong> different experiments to be carried out in parallel.<br />
The application subset was duplicated eight times. Each set was degraded to one <strong>of</strong><br />
<strong>the</strong> following S/Ns by <strong>the</strong> addition <strong>of</strong> Gaussian noise: {∞, 1000, 500, 100, 50, 20, 10,<br />
5}. Each trained ANN committee was applied in turn to <strong>the</strong> noised application sets.<br />
The experiments suggested that <strong>the</strong> optimal network architecture was a 901:10:10:3<br />
configuration, trained for 500 epochs for T eff and log g parameterisations, and 1350<br />
epochs for log(n He /n H ) parameterisations.<br />
The results showed positive correlations between <strong>the</strong> actual parameters and <strong>the</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
44 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
ANN’s results. However, <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> ANNs declined quickly as <strong>the</strong> S/N <strong>of</strong><br />
<strong>the</strong> application set fell below 100. This observation is important because <strong>the</strong> spectra<br />
in <strong>the</strong> Drilling et al. (2006) sample are not <strong>of</strong> a consistent S/N. The majority <strong>of</strong> <strong>the</strong><br />
sample has an S/N somewhere in <strong>the</strong> 50 – 100 range, so <strong>the</strong> neural network model<br />
should account for this. The results imply that an ANN trained on syn<strong>the</strong>tic spectra<br />
<strong>of</strong> infinite S/N will not give <strong>the</strong> most accurate parameterisations <strong>of</strong> <strong>the</strong> observational<br />
sample.<br />
This result was also reported by Snider et al. (2001) and Willemsen et al. (2005).<br />
In <strong>the</strong> latter, Willemsen et al. reported on <strong>the</strong>ir attempts to improve <strong>the</strong> generalisation<br />
abilities <strong>of</strong> <strong>the</strong>ir neural network models by increasing <strong>the</strong> amount <strong>of</strong> weight decay<br />
taking place. They found that performance improved only when <strong>the</strong> weight decay<br />
term was chosen to be ra<strong>the</strong>r large, indicating that <strong>the</strong> problem lies in regularising <strong>the</strong><br />
model, i.e., a neural network trained on high S/N spectra will over-fit <strong>the</strong> data unless<br />
“restrained”.<br />
The solution chosen in <strong>the</strong> study presented here was to make two copies <strong>of</strong> <strong>the</strong><br />
entire grid <strong>of</strong> 2009 <strong>the</strong>oretical models. <strong>On</strong>e copy being degraded to a S/N <strong>of</strong> 100, <strong>the</strong><br />
o<strong>the</strong>r to 50. The final training set for <strong>the</strong> optimal network architecture was <strong>the</strong>n a<br />
combination <strong>of</strong> all three grids, totalling 6027 syn<strong>the</strong>tic spectra. This addition <strong>of</strong> noise<br />
to <strong>the</strong> training grid serves as ano<strong>the</strong>r mechanism <strong>of</strong> regularisation. Willemsen et al.<br />
(2005) employed a similar solution. The noise serves to ‘smear out’ each training point,<br />
making it difficult for <strong>the</strong> network to fit individual points precisely, and hence reducing<br />
over-fitting.<br />
Despite increasing <strong>the</strong> size <strong>of</strong> <strong>the</strong> training set, <strong>the</strong>re is no reason to believe that <strong>the</strong><br />
optimal ANN configuration would be consequently changed. The fundamental structure<br />
and physical parameters <strong>of</strong> <strong>the</strong> noised spectra are no different than <strong>the</strong> unnoised<br />
spectra.
2.2 Physical Parameters 45<br />
2.2.2 Results<br />
Application Set 1: 60 Calibration Stars<br />
The results <strong>of</strong> applying <strong>the</strong> two ANN models to <strong>the</strong> 60 calibration stars are given in <strong>the</strong><br />
first column <strong>of</strong> Table 2.3, and <strong>the</strong> actual parameters obtained are listed in Appendix<br />
A.<br />
The correlation coefficients show a reasonable agreement between <strong>the</strong> ANN’s predicted<br />
T eff parameterisations and those <strong>of</strong> Drilling et al. (2006). However, <strong>the</strong> log(n He /n H )<br />
and log g correlation coefficients are not quite as positive. Looking at <strong>the</strong> middle and<br />
last plots in Figure 2.3, it can be seen that <strong>the</strong> ANN’s results in <strong>the</strong>se parameters<br />
(indicated by <strong>the</strong> blue crosses) are visibly more scattered than <strong>the</strong> T eff results given in<br />
<strong>the</strong> first plot.<br />
The typical errors quoted in <strong>the</strong> original fine analyses <strong>of</strong> <strong>the</strong>se stars are σ Teff<br />
=<br />
±2500K and σ log g = ±0.2dex. The results in <strong>the</strong> first column <strong>of</strong> Table 2.3 are still<br />
within ∼ 2σ <strong>of</strong> <strong>the</strong> fine analysis errors, which is significant (assuming, <strong>of</strong> course, that<br />
<strong>the</strong> method <strong>of</strong> fine analysis is more accurate than ei<strong>the</strong>r <strong>of</strong> <strong>the</strong> methods used here).<br />
ANN/Drilling χ 2 /Drilling ANN/χ 2<br />
T eff 4389.79 4338.85 3740.99<br />
σ rms log g 0.4577 0.3754 0.4908<br />
log(n He /n H ) 0.9796 0.4769 0.8382<br />
T eff 0.9207 0.9447 0.9131<br />
r log g 0.7844 0.8173 0.7525<br />
log(n He /n H ) 0.8705 0.9649 0.8816<br />
Table 2.3: Results <strong>of</strong> parameterising <strong>the</strong> 60 calibration stars.<br />
SFIT was applied to <strong>the</strong> 60 calibration stars and <strong>the</strong> results are listed in <strong>the</strong> second<br />
column <strong>of</strong> Table 2.3. The actual parameters obtained are listed in Appendix A.<br />
SFIT compares well with <strong>the</strong> neural network in T eff and log g, but gives slightly better<br />
performance in log(n He /n H ).<br />
A direct comparison between <strong>the</strong> neural network and SFIT’s results is given in <strong>the</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
46 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
80<br />
ANN/χ 2 T eff Parameterisations (kK)<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60 80<br />
Drilling T eff Calibrations (kK)<br />
8<br />
ANN/χ 2 log g Parameterisations<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
Drilling log g Calibrations<br />
ANN/χ 2 log( nHe / nH ) Parameterisations<br />
4<br />
2<br />
0<br />
-2<br />
-4<br />
-6<br />
-6<br />
-4<br />
-2<br />
0<br />
2<br />
4<br />
Drilling log( nHe / nH ) Calibrations<br />
Figure 2.3: Parameterisations <strong>of</strong> <strong>the</strong> 60 calibration stars. Results from each method<br />
have been combined onto each plot. ANN results are indicated by blue crosses, and χ 2<br />
minimiser results by red pluses.
2.2 Physical Parameters 47<br />
third column <strong>of</strong> Table 2.3. The disagreement between <strong>the</strong> neural network models and<br />
SFIT is <strong>of</strong> similar degree as <strong>the</strong> disagreement <strong>of</strong> each method with <strong>the</strong> Drilling et al.<br />
parameters. The σ rms values in column three <strong>of</strong> <strong>the</strong> table are still within twice <strong>the</strong><br />
quoted errors for <strong>the</strong> fine analyses <strong>of</strong> <strong>the</strong> 60 calibration stars, which is a significant<br />
result (again, assuming that fine analysis is <strong>the</strong> more accurate method) and confirms<br />
that ANNs have <strong>the</strong> potential <strong>of</strong> being able to parameterise hot subdwarf spectra to a<br />
similar degree <strong>of</strong> accuracy as <strong>the</strong> more traditional method <strong>of</strong> χ 2 minimisation.<br />
The poor generalisation <strong>of</strong> <strong>the</strong> neural network in <strong>the</strong> log(n He /n H ) parameter is a<br />
significant issue, and requires fur<strong>the</strong>r investigation.<br />
Application Set 2: 133 Unparameterised Stars<br />
The two ANN committees were applied to <strong>the</strong> remaining 133 unparameterised stars in<br />
<strong>the</strong> sample. These stars were also parameterised using SFIT. The parameters obtained<br />
from both methods are listed in Appendix A.<br />
A direct comparison between <strong>the</strong> two methods was made. The results are presented<br />
in Table 2.4, and Figure 2.4. For approximately twice as many stars, <strong>the</strong> σ rms values<br />
are only slightly worse than <strong>the</strong> values in <strong>the</strong> last column <strong>of</strong> Table 2.3. Tentatively<br />
speaking, <strong>the</strong> results could still be considered to support <strong>the</strong> view that ANNs have<br />
<strong>the</strong> potential <strong>of</strong> being able to parameterise hot subdwarf spectra to a similar degree <strong>of</strong><br />
accuracy as χ 2 minimisers.<br />
As has been pointed out previously, <strong>the</strong> neural network models seem to be suffering<br />
from regularisation issues when training on syn<strong>the</strong>tic spectra. With fur<strong>the</strong>r investigation<br />
on this matter, a significant imrpovement in <strong>the</strong> neural network’s generalisation<br />
performance could be obtained.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
48 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
χ 2 T eff Parameterisations (kK)<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60<br />
ANN T eff Parameterisations (kK)<br />
7<br />
χ 2 log g Parameterisations<br />
6<br />
5<br />
4<br />
3<br />
2<br />
2<br />
3 4 5 6<br />
ANN log g Parameterisations<br />
7<br />
χ 2 log( nHe / nH ) Parameterisations<br />
4<br />
2<br />
0<br />
-2<br />
-4<br />
-6<br />
-6<br />
-4<br />
-2<br />
0<br />
2<br />
4<br />
ANN log( nHe / nH ) Parameterisations<br />
Figure 2.4: Parameterisations <strong>of</strong> <strong>the</strong> 133 unparameterised stars using <strong>the</strong> ANNs and<br />
χ 2 minimiser. Also shown is <strong>the</strong> best-fit linear least squares line.
2.3 Summary 49<br />
ANN/χ 2<br />
T eff 5768.74<br />
σ rms log g 0.6853<br />
log(n He /n H ) 0.9926<br />
T eff 0.8850<br />
r log g 0.8003<br />
log(n He /n H ) 0.8875<br />
Table 2.4: A comparison between ANNs and χ 2 minimisation for parameterising <strong>the</strong><br />
133 unparameterised stars.<br />
2.3 Summary<br />
Artificial neural networks are a fast, and powerful method for automatically classifying<br />
astronomical spectra. A feed-forward neural network configured in a 901:5:5:3 architecture,<br />
and trained for 500 epochs, was able to classify hot subdwarf spectra onto <strong>the</strong><br />
Drilling et al. (2006) scale with global errors (σ rms ) <strong>of</strong> ∼ 2 sub-types for spectral type,<br />
∼ 1 sub-class for luminosity class, and ∼ 4 sub-classes for <strong>the</strong> helium class. This was<br />
<strong>the</strong> most accurate ANN discovered for <strong>the</strong> task.<br />
The use <strong>of</strong> ANNs for obtaining physical parameters from stellar spectra <strong>of</strong>fers <strong>the</strong><br />
possibility <strong>of</strong> having a fast method for deriving initial parameter estimates. However,<br />
establishing <strong>the</strong> optimal network architecture to accurately model <strong>the</strong> flux-space to<br />
physical parameter-space mapping function was found to be cumbersome with much<br />
experimentation required. It was also discovered that attempting to train <strong>the</strong> neural<br />
network model on infinite S/N syn<strong>the</strong>tic spectra led to over-fitting due to insufficient<br />
regularisation. A solution was attempted by <strong>the</strong> addition <strong>of</strong> noise to <strong>the</strong> training set,<br />
but fur<strong>the</strong>r investigation here is needed.<br />
χ 2 methods are <strong>the</strong>refore more desirable for parameterising astronomical spectra in<br />
a general data mining tool kit as <strong>the</strong>y <strong>of</strong>fer more flexibility and greater ease <strong>of</strong> use than<br />
ANNs. Of course, <strong>the</strong>se qualities come with <strong>the</strong> price <strong>of</strong> slower speed, with χ 2 methods<br />
unable to compete with ANNs in this regard. This issue is discussed fur<strong>the</strong>r in <strong>the</strong> next<br />
Chapter. However, if <strong>the</strong> regularisation issues with parameterising ANNs can be solved,<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
50 Chapter 2 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>the</strong>ir extremely fast application speed would instantly make <strong>the</strong>m <strong>the</strong> preferred tool.
Chapter 3<br />
Parameterisation - χ 2 Fitting<br />
3.1 Analysing <strong>Stellar</strong> <strong>Spectra</strong><br />
Deriving physical parameters (i.e., T eff , log g, abundances) for a star is done by a fine<br />
analysis <strong>of</strong> its spectrum. The traditional method <strong>of</strong> spectroscopic fine analysis is a long,<br />
iterative process requiring several months to complete.<br />
The method is based on measuring equivalent widths <strong>of</strong> spectroscopic lines. The<br />
astronomer must go through a spectrum and manually identify as many spectral lines<br />
as possible, and <strong>the</strong> ions to which <strong>the</strong>y belong. Microturbulent and rotational velocities<br />
are first determined. Then, an initial grid <strong>of</strong> model atmospheres is calculated to cover<br />
<strong>the</strong> approximate T eff , log g, and composition <strong>of</strong> <strong>the</strong> star.<br />
Using <strong>the</strong>se models, <strong>the</strong> <strong>the</strong>oretical equivalent widths are calculated for each <strong>of</strong> <strong>the</strong><br />
identified ion lines in <strong>the</strong> star over <strong>the</strong> range <strong>of</strong> elemental abundances in <strong>the</strong> grid. These<br />
equivalent widths are combined to form curves <strong>of</strong> growth which can <strong>the</strong>n be used to<br />
read <strong>of</strong>f derived abundances for each <strong>of</strong> <strong>the</strong> measured ion line equivalent widths in <strong>the</strong><br />
stellar spectrum.<br />
Temperature and surface gravity are determined by using <strong>the</strong> derived abundances<br />
<strong>of</strong> lines known to be sensitive to temperature (e.g., Fe) and gravity (e.g., H or He), and<br />
51
52 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
performing a process <strong>of</strong> comparison and line fitting with each <strong>of</strong> <strong>the</strong> models in <strong>the</strong> grid.<br />
The derived values <strong>of</strong> T eff and log g are <strong>the</strong>n used, along with <strong>the</strong> measured equivalent<br />
widths, to calculate new abundances. A new grid <strong>of</strong> model atmospheres is computed<br />
with <strong>the</strong>se parameters. The entire analysis process <strong>of</strong> determining curves <strong>of</strong> growth, deriving<br />
values <strong>of</strong> T eff , log g, and abundances, and recomputing <strong>the</strong> model grid is repeated<br />
until <strong>the</strong> derived parameters agree with those used in <strong>the</strong> models (i.e., convergence is<br />
achieved).<br />
An excellent description <strong>of</strong> this process, and demonstration <strong>of</strong> its application, can<br />
be found in Dudley (1992).<br />
Progress Towards Automation<br />
Given <strong>the</strong> iterative nature <strong>of</strong> <strong>the</strong> method <strong>of</strong> fine analysis, and <strong>the</strong> time required to<br />
conduct an analysis for a single star, attempts have been made to find automated<br />
procedures for accomplishing <strong>the</strong> same goal much more quickly.<br />
Hutchison (1971) presents an automatic procedure for detecting spectral features<br />
and determining accurate line frequencies, line depths, and equivalent widths for highresolution<br />
infrared spectra.<br />
Morossi & Crivellari (1980) describe a method to obtain T eff and log g by comparing<br />
observations to a grid <strong>of</strong> models. Their method is based on a least-squares minimisation<br />
procedure which determines values for <strong>the</strong> parameters which optimise <strong>the</strong> fit between<br />
<strong>the</strong> <strong>the</strong>oretical models and observational data.<br />
Katz et al. (1998) use a χ 2 minimisation procedure to obtain values <strong>of</strong> T eff , log g,<br />
and [Fe/H] from ELODIE spectra by fitting observations to a library <strong>of</strong> 211 reference<br />
stars observed with <strong>the</strong> same instrument for which <strong>the</strong> atmospheric parameters are<br />
well-known.<br />
The method <strong>of</strong> χ 2 fitting has grown to be very much <strong>the</strong> de facto procedure <strong>of</strong>
3.1 Analysing <strong>Stellar</strong> <strong>Spectra</strong> 53<br />
automating <strong>the</strong> parameterisation <strong>of</strong> astronomical spectra. It is a specific case <strong>of</strong> <strong>the</strong><br />
more general class <strong>of</strong> fitting procedures known as metric distance minimisation (or<br />
minimum distance methods), where, as <strong>the</strong> name suggests, results are determined by<br />
minimising some distance metric between <strong>the</strong> object under analysis and each member<br />
<strong>of</strong> a set <strong>of</strong> templates. The object is assigned <strong>the</strong> parameters <strong>of</strong> <strong>the</strong> template which gives<br />
<strong>the</strong> smallest distance.<br />
Let x = (x 1 ,x 2 ,... ,x N ) be <strong>the</strong> spectrum to parameterise, and s = (s 1 ,s 2 ,... ,s N )<br />
be a template spectrum with known physical parameters. The distance metric to be<br />
minimised is <strong>of</strong> <strong>the</strong> form<br />
D = 1 N<br />
[ i=N<br />
] 1/p<br />
∑<br />
w i |x i − s i | p , (3.1)<br />
i=1<br />
where w i is a weight assigned to flux element s i <strong>of</strong> <strong>the</strong> template spectrum. Typically,<br />
s is only one template in a set <strong>of</strong> templates, S = {s 1 ,s 2 ,... ,s M }, and equation 3.1 is<br />
computed for all templates s j . Equation 3.1 becomes χ 2 fitting when p = 2 and<br />
w i = σ −2<br />
i<br />
, where σ i is <strong>the</strong> error in x i .<br />
For a straightforward nearest neighbour minimisation <strong>of</strong> <strong>the</strong> χ 2 metric over a grid <strong>of</strong><br />
templates, an accurate result requires <strong>the</strong> grid to be finely spaced in each parameter <strong>of</strong><br />
interest so that <strong>the</strong> effects <strong>of</strong> that parameter on <strong>the</strong> flux vector can be ascertained. This<br />
can create a large data requirement, and <strong>the</strong> computation time required to parameterise<br />
one spectrum can increase prohibitively as equation 3.1 must be evaluated for all <strong>the</strong><br />
templates.<br />
<strong>On</strong>e solution to this problem is to use some method <strong>of</strong> interpolation to “fill in <strong>the</strong><br />
gaps” between templates in a discrete grid. As interpolation creates <strong>the</strong> illusion <strong>of</strong><br />
continuity in <strong>the</strong> grid, it also opens <strong>the</strong> possibility <strong>of</strong> using search-based optimisation<br />
methods to locate <strong>the</strong> minimum <strong>of</strong> D in an efficient manner.<br />
Unfortunately, with χ 2 fitting, <strong>the</strong>re is no escaping <strong>the</strong> so-called curse <strong>of</strong> dimen-<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
54 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
sionality. As <strong>the</strong> number <strong>of</strong> parameters to be determined increases, <strong>the</strong> number <strong>of</strong><br />
templates in <strong>the</strong> grid also increases exponentially.<br />
χ 2 Fitting for Astronomical Data Mining<br />
The main disadvantage to using χ 2 fitting in <strong>the</strong> context <strong>of</strong> a data mining application<br />
is slowness. In contrast to artificial neural networks, no training procedure exists to<br />
extract information from <strong>the</strong> template grid, and <strong>the</strong> grid is required to be present and<br />
searched to minimise D for every new spectrum to be parameterised.<br />
<strong>On</strong>e solution to <strong>the</strong> speed issue is to distribute <strong>the</strong> grid search over many computers<br />
in a parallel cluster. The grid <strong>of</strong> templates could be broken up into N sections, where<br />
N is <strong>the</strong> number <strong>of</strong> processing nodes in <strong>the</strong> cluster. Each node <strong>the</strong>n receives its section<br />
<strong>of</strong> <strong>the</strong> grid, finds <strong>the</strong> local minimum <strong>of</strong> D for an observed spectrum, and reports this<br />
value back to a master processing node which <strong>the</strong>n selects <strong>the</strong> global minimum from<br />
all <strong>the</strong> node results.<br />
In a data mining context, it is likely that <strong>the</strong> template grid will cover a large region<br />
<strong>of</strong> <strong>the</strong> parameter space <strong>of</strong> interest in reasonable detail so as to account for <strong>the</strong> diversity<br />
<strong>of</strong> objects that will be encountered. Large template grids pose storage and accessing<br />
problems from within <strong>the</strong> χ 2 minimisation program because <strong>the</strong> main memory <strong>of</strong> <strong>the</strong><br />
computer may not be capacious enough to store all <strong>the</strong> templates at once.<br />
The work <strong>of</strong> this chapter is concerned with taking a pre-existing χ 2 minimisation<br />
program used at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong>, and beginning <strong>the</strong> modifications necessary<br />
in order to use <strong>the</strong> program more efficiently in a data mining context. Parallelising <strong>the</strong><br />
program is a relatively straightforward task, however <strong>the</strong> problem <strong>of</strong> managing large<br />
template grids is much more involved and needs to be tackled first.
3.2 SFIT 55<br />
3.2 SFIT<br />
SFIT (Jeffery et al., 2001) is a Fortran 90 implementation <strong>of</strong> <strong>the</strong> χ 2 minimisation<br />
method outlined in <strong>the</strong> previous section. Given a grid <strong>of</strong> <strong>the</strong>oretical model spectra,<br />
and an observed spectrum, SFIT finds <strong>the</strong> combination <strong>of</strong> physical parameters <strong>of</strong> <strong>the</strong><br />
model which most closely matches <strong>the</strong> observed spectrum by minimising <strong>the</strong> χ 2 distance<br />
metric.<br />
The program considers several broadening processes which must be applied to <strong>the</strong><br />
<strong>the</strong>oretical spectra before comparison with an observed spectrum. These include instrumental<br />
broadening I(∆λ), rotational broadening V(v sin i,β), acceleration broadening<br />
A(v), and projection broadening P(v − ¯v).<br />
Model grids are discrete in three-dimensions: T eff , log g, and n atm , <strong>the</strong> fractional<br />
atmospheric abundance <strong>of</strong> an element. A linear interpolation in tables method is used<br />
to estimate <strong>the</strong> model space between grid points. Fitting solutions can be obtained<br />
in several parameters (T eff , log g, n atm , v sini, and v rad ) for both single and composite<br />
spectra. The χ 2 minimisation can be carried out using ei<strong>the</strong>r <strong>the</strong> Nelder-Mead<br />
downhill simplex optimisation procedure, implemented as a variant <strong>of</strong> <strong>the</strong> AMOEBA<br />
algorithm <strong>of</strong> Press et al. (1986), or <strong>the</strong> Levenberg-Marquardt algorithm (Levenberg,<br />
1944; Marquardt, 1963). Nearest neighbour χ 2 fitting is also possible.<br />
The Amoeba algorithm minimises a function (in this case, <strong>the</strong> χ 2 difference between<br />
<strong>the</strong> observed spectrum and <strong>the</strong> models in <strong>the</strong> grid) by defining an initial simplex with<br />
N +1 vertices, where N is <strong>the</strong> number <strong>of</strong> dimensions in <strong>the</strong> function’s parameter space.<br />
The method <strong>the</strong>n takes a series <strong>of</strong> steps, most <strong>of</strong> which just move <strong>the</strong> point <strong>of</strong> <strong>the</strong><br />
simplex where <strong>the</strong> function to be minimised is largest through <strong>the</strong> opposite face <strong>of</strong><br />
<strong>the</strong> simplex to a lower point. The simplex “moves” through <strong>the</strong> parameter space by<br />
contracting and expanding until <strong>the</strong> distance “moved” is smaller than some tolerance<br />
threshold, at which point <strong>the</strong> method is determined to have converged on a solution.<br />
The Levenberg-Marquardt method is an interative method specifically catering for<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
56 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>the</strong> minimisation <strong>of</strong> sum-<strong>of</strong>-squares error functions (i.e., <strong>the</strong> χ 2 function used in SFIT).<br />
The algorithm expands <strong>the</strong> error function around a point and examines <strong>the</strong> derivatives<br />
to search for a minimum by dynamically setting <strong>the</strong> step size according to <strong>the</strong> direction<br />
<strong>of</strong> <strong>the</strong> gradient. As <strong>the</strong> solution approaches <strong>the</strong> minimum, <strong>the</strong> step size decreases, and<br />
<strong>the</strong> algorithm usually converges quickly. The use <strong>of</strong> this method in SFIT requires <strong>the</strong><br />
initial guess for <strong>the</strong> parameters to be reasonably close to <strong>the</strong> solution as Levenberg-<br />
Marquardt can get trapped in local minima. Although slower, Amoeba is more robust<br />
against this possibility.<br />
Both methods assume that <strong>the</strong> error function is ei<strong>the</strong>r continuous, or can be evaluated<br />
for any point within <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> parameter space. Evaluation <strong>of</strong> <strong>the</strong> χ 2<br />
error function in SFIT depends upon a grid <strong>of</strong> models which is discrete. As mentioned<br />
in <strong>the</strong> previous section, an interpolation method is used to “fill in <strong>the</strong> gaps” <strong>of</strong> <strong>the</strong><br />
grid, <strong>the</strong>reby creating <strong>the</strong> illusion <strong>of</strong> a continuous parameter space. As <strong>the</strong> Amoeba or<br />
Levenberg-Marquardt optimisers examine <strong>the</strong> properties <strong>of</strong> <strong>the</strong> error function throughout<br />
<strong>the</strong> parameter space, <strong>the</strong>y are more <strong>of</strong>ten than not examining an interpolation <strong>of</strong><br />
<strong>the</strong> model spectra.<br />
<strong>On</strong>ce T eff , log g, and v sin i have been determined, SFIT can estimate <strong>the</strong> composition<br />
<strong>of</strong> a star by adjusting <strong>the</strong> abundances <strong>of</strong> <strong>the</strong> different atomic species which<br />
contribute to <strong>the</strong> absorption spectrum until <strong>the</strong> <strong>the</strong>oretical spectrum matches <strong>the</strong> observed<br />
spectrum. As <strong>the</strong> number <strong>of</strong> free parameters in such an analysis is so large (i.e.,<br />
<strong>the</strong> abundances <strong>of</strong> H, C, N, O, Al, and so on, along with <strong>the</strong> microturbulent velocity,<br />
v t ), pre-computing multidimensional grids <strong>of</strong> <strong>the</strong>oretical spectra is infeasible. SFIT<br />
solves this problem by computing syn<strong>the</strong>tic spectra as demanded by <strong>the</strong> χ 2 minimisation<br />
algorithm.<br />
SFIT is currently distributed with STERNE and SPECTRUM, <strong>the</strong> model atmosphere<br />
and spectral syn<strong>the</strong>sis codes used at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong>. As part <strong>of</strong><br />
this <strong>the</strong>sis, <strong>the</strong> source codes for all three programs, and <strong>the</strong>ir associated libraries, were<br />
ported from a simple build system based on GNU make to a more flexible build system
3.2 SFIT 57<br />
based on <strong>the</strong> GNU autotools (see Appendix E).<br />
3.2.1 Limitations <strong>of</strong> SFIT<br />
Analyses performed with SFIT are hindered by <strong>the</strong> program’s restrictions on <strong>the</strong> size<br />
<strong>of</strong> <strong>the</strong> model grid. Grids are limited to three dimensions (T eff , log g, and n atm ), and, at<br />
maximum, nine points in T eff , five in log g, and five in n atm . Models are permitted to<br />
have no more than five thousand wavelength points.<br />
These limits are due to design decisions made during SFIT’s inital construction<br />
which choose to store <strong>the</strong> model grid entirely in <strong>the</strong> computer’s main memory. The<br />
restrictions on dimensionality and number <strong>of</strong> grid points are merely hard-coded numbers<br />
within <strong>the</strong> program, and <strong>the</strong>refore cannot be changed without recompiling <strong>the</strong> source<br />
codes.<br />
Storing <strong>the</strong> model grid in main memory, whilst providing fast access to <strong>the</strong> models,<br />
also presents a problem in that computer memory is finite - orders <strong>of</strong> magnitude more<br />
finite than <strong>the</strong> space available on secondary storage devices, such as hard disks. Despite<br />
ever increasing main memory capacities, <strong>the</strong> implied upper limit on <strong>the</strong> number <strong>of</strong><br />
models and <strong>the</strong>ir detail will always be much smaller than if secondary storage was<br />
used.<br />
Ano<strong>the</strong>r restriction SFIT places on <strong>the</strong> model grid is that it must be rectangular<br />
and complete, with no missing grid points. This is problematic because it may be<br />
difficult or impossible for model atmosphere simulations to converge for a given set <strong>of</strong><br />
physical parameters. In such an instance, a make-shift solution is employed wherein a<br />
converged model close to <strong>the</strong> desired physical parameters is used to “plug <strong>the</strong> gap”.<br />
The rectangularity and completeness requirements are a result <strong>of</strong> SFIT’s interpolation<br />
scheme which generates approximations <strong>of</strong> models in <strong>the</strong> parameter space between<br />
discrete grid points by linear interpolation in tables. An irregular grid, or a missing<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
58 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
grid point, prevents <strong>the</strong> interpolation scheme from operating correctly.<br />
3.2.2 Proposal to Remove SFIT’s Limitatons<br />
In summary, <strong>the</strong> limitations <strong>of</strong> SFIT’s treatment <strong>of</strong> model grids are<br />
1. Size limitations due to initial program design decisions and storage <strong>of</strong> grids in<br />
main memory.<br />
2. Interpolation scheme cannot handle irregular or incomplete grids.<br />
Modifying SFIT to be more useful in a data mining context requires removing <strong>the</strong>se<br />
two limitations.<br />
The solution to <strong>the</strong> first limitation is obvious: correct <strong>the</strong> limiting initial design<br />
decisions, and store <strong>the</strong> model grids on secondary storage, i.e. hard disk, reading <strong>the</strong>m<br />
into main memory on an individual basis only when needed. An indexing scheme is<br />
<strong>the</strong>n required, one that can be held in main memory in place <strong>of</strong> <strong>the</strong> models, and quickly<br />
searched to determine which models are to be read in and <strong>the</strong>ir location on disk.<br />
The nature <strong>of</strong> this index is dependent on <strong>the</strong> interpolation scheme chosen to correct<br />
<strong>the</strong> second SFIT limitation. Interpolation allows a complicated function to be<br />
approximated at an unknown point by using known surrounding points to construct a<br />
simpler, estimating function. Different interpolation schemes use <strong>the</strong> known surrounding<br />
points in different ways, so <strong>the</strong> design and function <strong>of</strong> <strong>the</strong> proposed model grid<br />
indexing scheme must be tailored accordingly.<br />
Many interpolation schemes exist in <strong>the</strong> literature. The ideal scheme for this application<br />
should be multidimensional (although this can be relaxed due to <strong>the</strong> curse <strong>of</strong><br />
dimensionality), have low computation cost, and be able to operate over potentially<br />
randomly sampled functions. The interpolating function must also be continuous, and<br />
be based on known data points local to <strong>the</strong> interpolation point (as opposed to global
3.2 SFIT 59<br />
methods, in which <strong>the</strong> interpolated value is influenced by all <strong>of</strong> <strong>the</strong> available data).<br />
Two interpolation functions which stand out in terms <strong>of</strong> <strong>the</strong>ir simplicity, multidimensionality,<br />
and ability to handle incomplete grids are weighted average interpolation<br />
and simplex interpolation.<br />
Weighted Average Interpolation<br />
The most common weighted average method referred to in <strong>the</strong> literature is that <strong>of</strong><br />
Shepard (1968) and its modifications, such as Renka (1988).<br />
Given an underlying function, f, with values f i at nodes (x i ,y i ) for i = 1,... ,N,<br />
<strong>the</strong> interpolating formula is <strong>of</strong> <strong>the</strong> form,<br />
F(x,y) =<br />
∑ N<br />
k=1 W k(x,y)f k (x,y)<br />
∑ N<br />
i=1 W . (3.2)<br />
i(x,y)<br />
The weighting function, W k , is defined by some inverse distance function,<br />
W k (x,y) = 1 d 2 , (3.3)<br />
k<br />
where d k (x,y) denotes <strong>the</strong> Euclidean distance between (x,y) and (x k ,y k ).<br />
A suitable indexing scheme for weighted average interpolation should allow fast<br />
searching <strong>of</strong> <strong>the</strong> node points to determine which are within a specified radius <strong>of</strong> <strong>the</strong><br />
interpolation point (i.e., nearest neighbour searching).<br />
The field <strong>of</strong> computational geometry contains many algorithms and data structures<br />
for indexing and searching a set <strong>of</strong> N-dimensional points in a computationally efficient<br />
manner. <strong>On</strong>e data structure that is very applicable to nearest neighbour searching<br />
problems is <strong>the</strong> k-D tree (Moore, 1991). Figure 3.1 demonstrates <strong>the</strong> k-D tree in two<br />
dimensions.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
60 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
[4,9]<br />
[4,9]<br />
[2,5]<br />
[8,7]<br />
[2,5] [8,7]<br />
[3,2]<br />
[3,2]<br />
Figure 3.1: Example <strong>of</strong> a k-D tree in two dimensions. <strong>On</strong> <strong>the</strong> left is <strong>the</strong> representation<br />
<strong>of</strong> how <strong>the</strong> k-D tree on <strong>the</strong> right splits up <strong>the</strong> x,y plane. (Adapted from Moore 1991.)<br />
This data structure is a binary tree which represents a series <strong>of</strong> partitions in k-<br />
dimensional space, organising a set <strong>of</strong> points into a collection <strong>of</strong> hyper-rectangular<br />
regions. Nearest neighbour searching can be carried out in O(log 2 N) time on average,<br />
where N is <strong>the</strong> number <strong>of</strong> nodes in <strong>the</strong> tree.<br />
All that remains is to determine how many nearest neighbours are needed, and <strong>the</strong><br />
weighted average interpolation can be performed immediately.<br />
Simplex Interpolation<br />
A simplex, or N-simplex, is <strong>the</strong> N-D analogue <strong>of</strong> a triangle in 2-D and a tetrahedron<br />
in 3-D, as demonstrated in Figure 3.2.<br />
Simplex-based interpolation uses a weighted linear combination <strong>of</strong> <strong>the</strong> simplex vertices<br />
to approximate a function at a point located on or within <strong>the</strong> simplex boundary.<br />
These weights are computed as <strong>the</strong> barycentric coordinates <strong>of</strong> <strong>the</strong> interpolation point<br />
within <strong>the</strong> simplex.<br />
Given a collection <strong>of</strong> N-dimensional points, such as a grid <strong>of</strong> model spectra, a suit-
3.2 SFIT 61<br />
(a) 1−simplex (b) 2−simplex (c) 3−simplex<br />
Figure 3.2: A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is a<br />
tetrahedron.<br />
able indexing scheme must allow <strong>the</strong> vertices <strong>of</strong> <strong>the</strong> enclosing N-simplex to be located<br />
quickly.<br />
As this is, again, ano<strong>the</strong>r nearest neighbour problem, <strong>the</strong> method <strong>of</strong> k-D trees could<br />
be a viable solution. However, if <strong>the</strong> dimensionality <strong>of</strong> <strong>the</strong> grid is kept to three dimensions<br />
or less, <strong>the</strong> field <strong>of</strong> computational geometry <strong>of</strong>fers ano<strong>the</strong>r approach.<br />
Several algorithms exist which can take a cloud <strong>of</strong> two or three-dimensional points<br />
and generate a triangular or tetrahedral mesh. All that is <strong>the</strong>n needed is a method to<br />
search <strong>the</strong> mesh for <strong>the</strong> triangle or tetrahedron that contains <strong>the</strong> interpolation point.<br />
Choosing <strong>the</strong> Solution<br />
Preliminary testing <strong>of</strong> both interpolation and indexing schemes was carried out to help<br />
determine which solution would be more viable.<br />
Constructing a suitable prototype <strong>of</strong> <strong>the</strong> weighted average/k-D tree solution was hindered<br />
by Fortran 90’s insufficient flexibility to support <strong>the</strong> implementation <strong>of</strong> advanced<br />
data structures. No suitable third-party libraries were available to speed development,<br />
and, as a result <strong>of</strong> time constraints, <strong>the</strong> pursuit <strong>of</strong> this solution had to be abandoned.<br />
<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, if it is assumed that SFIT model grids are limited to three<br />
dimensions, <strong>the</strong>n several freely available third-party libraries exist which can generate<br />
tetrahedral meshes from a cloud <strong>of</strong> random points. From a purely pragmatic stand-<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
62 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
point, this makes <strong>the</strong> simplex interpolation scheme very attractive. After <strong>the</strong> mesh has<br />
been generated, <strong>the</strong> methods <strong>the</strong>n required to search for <strong>the</strong> tetrahedron enclosing an<br />
interpolation point are simple geometric operations.<br />
Thus, <strong>the</strong> simplex interpolation method was chosen to solve SFIT’s grid management<br />
problems. The weighted-average/k-D tree solution is an interesting idea (which,<br />
unlike <strong>the</strong> simplex scheme, is not limited to three dimensions), and should be pursued<br />
in future work.<br />
3.3 Tetrahedralisation: Interpolation and Indexing<br />
In developing <strong>the</strong> simplex interpolation and corresponding grid indexing scheme, it<br />
was assumed that SFIT grids will always be three dimensional due to <strong>the</strong> curse <strong>of</strong><br />
dimensionality.<br />
From this assumption, <strong>the</strong> tetrahedral mesh indexing scheme, described previously,<br />
can be constructed using third-party libraries. This affords a very pragmatic solution<br />
to <strong>the</strong> problem.<br />
3.3.1 Simplex Interpolation<br />
Barycentric coordinates express <strong>the</strong> location <strong>of</strong> any point within an N-simplex in terms<br />
<strong>of</strong> a set <strong>of</strong> homogenous coordinates that form a linear combination <strong>of</strong> <strong>the</strong> simplex<br />
vertices. Given a tetrahedron defined by three arbitrary vertices, v 1 , v 2 , v 3 , and<br />
v 4 , and some point p within this tetrahedron, p can be expressed as <strong>the</strong> weighted<br />
combination <strong>of</strong> <strong>the</strong> four vertices<br />
p = λ 1 v 1 + λ 2 v 2 + λ 3 v 3 + λ 4 v 4 , (3.4)<br />
where λ 1 , λ 2 , λ 3 , and λ 4 are <strong>the</strong> barycentric coordinates. These are subject to <strong>the</strong>
3.3 Tetrahedralisation: Interpolation and Indexing 63<br />
constraints that<br />
0 ≤ λ 1 ,λ 2 ,λ 3 ,λ 4 ≤ 1, (3.5)<br />
and,<br />
λ 1 + λ 2 + λ 3 + λ 4 = 1. (3.6)<br />
Calculating <strong>the</strong> barycentric coordinates <strong>of</strong> a point inside a given tetrahedron is<br />
accomplished by reformulating equation 3.4 as follows,<br />
⎡<br />
⎢<br />
⎣<br />
p x<br />
p y<br />
p z<br />
1<br />
⎤ ⎡<br />
=<br />
⎥ ⎢<br />
⎦ ⎣<br />
v 1x v 2x v 3x v 4x<br />
v 1y v 2y v 3y v 4y<br />
v 1z v 2z v 3z v 4z<br />
1 1 1 1<br />
⎤ ⎡<br />
·<br />
⎥ ⎢<br />
⎦ ⎣<br />
⎤<br />
λ 1<br />
λ 2<br />
, (3.7)<br />
λ 3 ⎥<br />
⎦<br />
λ 4<br />
or, rewriting in matrix notation,<br />
b = A · x, (3.8)<br />
where b = [ p x p y p z 1 ] T , x =<br />
[<br />
λ 1 λ 2 λ 3 λ 4<br />
] T, and,<br />
⎡<br />
A =<br />
⎢<br />
⎣<br />
v 1x v 2x v 3x v 4x<br />
v 1y v 2y v 3y v 4y<br />
v 1z v 2z v 3z v 4z<br />
1 1 1 1<br />
⎤<br />
.<br />
⎥<br />
⎦<br />
Therefore, x can be found through <strong>the</strong> standard methods <strong>of</strong> solving equation 3.8.<br />
As will be useful later on, if <strong>the</strong> computed barycentric coordinates do not conform<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
64 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
to <strong>the</strong> constraints discussed earlier, <strong>the</strong>n <strong>the</strong> point <strong>of</strong> interest can be determined to lie<br />
outside <strong>the</strong> given tetrahedron.<br />
3.3.2 Grid Index - Delaunay Triangulation<br />
The Delaunay triangulation (O’Rourke, 1998) is frequently used to generate meshes<br />
<strong>of</strong> N-simplices from a set <strong>of</strong> N-dimensional points because it has certain desirable<br />
properties, <strong>the</strong> most important <strong>of</strong> which is <strong>the</strong> following: inside <strong>the</strong> circum-hypersphere<br />
<strong>of</strong> any simplex, <strong>the</strong>re are no o<strong>the</strong>r points <strong>of</strong> <strong>the</strong> set (see Figure 3.3).<br />
This property yields a resulting triangulation which is “natural” and provably optimal<br />
in many respects. It is known that <strong>the</strong> Delaunay triangulation exists and is unique<br />
for a set <strong>of</strong> points in general position, that is, no N + 1 points are on <strong>the</strong> same hyperplane<br />
and no N + 2 points are on <strong>the</strong> same hypersphere, for an N-dimensional set <strong>of</strong><br />
points.<br />
In <strong>the</strong> context <strong>of</strong> SFIT, <strong>the</strong> Delaunay tetrahedralisation <strong>of</strong> a model grid is generated<br />
by <strong>the</strong> third-party library TetGen 1 .<br />
TetGen is a portable C++ program implementing <strong>the</strong> Delaunay triangulation algorithm<br />
<strong>of</strong> Edelsbrunner & Shah (1992). This algorithm is simple, fast, and TetGen’s<br />
implementation is numerically robust due to <strong>the</strong> use <strong>of</strong> adaptive exact arithmetic code<br />
(Shewchuk, 1996). TetGen can be compiled as a set <strong>of</strong> library functions which can <strong>the</strong>n<br />
be integrated into o<strong>the</strong>r applications, in this case, SFIT.<br />
A technical difficulty arises in that SFIT is a Fortran 90 program, but TetGen is<br />
written in C++. Unfortunately, <strong>the</strong> Fortran 90 standard does not provide for calling<br />
functions written in o<strong>the</strong>r programming languages, and it has been left up to <strong>the</strong><br />
individual compiler implementors to include a solution.<br />
SFIT is currently based around <strong>the</strong> Intel Fortran compiler for Linux 2 , and it is rel-<br />
1 http://tetgen.berlios.de<br />
2 http://www.intel.com/cd/s<strong>of</strong>tware/products/asmo-na/eng/compilers/flin/
3.3 Tetrahedralisation: Interpolation and Indexing 65<br />
Figure 3.3: In two dimensions, <strong>the</strong> Delaunay triangulation guarantees that no o<strong>the</strong>r<br />
points lie in <strong>the</strong> circumcircle <strong>of</strong> any simplex.<br />
atively straightforward to call out to C/C++ functions using <strong>the</strong> mechanisms provided<br />
by this compiler.<br />
To simplify <strong>the</strong> process <strong>of</strong> calling <strong>the</strong> TetGen library from Fortran, a small “glue”<br />
function was written in C. This function accepts a flattened array <strong>of</strong> three-dimensional<br />
model grid points, copies <strong>the</strong> data into <strong>the</strong> data structure used by TetGen, calls TetGen<br />
to perform <strong>the</strong> tetrahedralisation, <strong>the</strong>n returns a flattened array <strong>of</strong> vertices for <strong>the</strong><br />
generated tetrahedra, and a flattened array denoting <strong>the</strong> neighbouring tetrahedra for<br />
each generated tetrahedron.<br />
This process <strong>of</strong> calling TetGen to construct <strong>the</strong> new model grid indexing scheme fits<br />
in with SFIT’s normal grid generation procedure, as outlined in algorithm 1.<br />
As noted in <strong>the</strong> pseudo-code, <strong>the</strong> parameters <strong>of</strong> <strong>the</strong> models are rescaled before being<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
66 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Algorithm 1 Generating a Tetrahedralisation <strong>of</strong> a Model Grid<br />
for all models in <strong>the</strong> grid file list do<br />
read model parameters from header<br />
record each parameter in corresponding grid axis array<br />
write model fluxes to direct access grid file<br />
append model parameters and corresponding direct access file record numbers to<br />
a linked list<br />
end for<br />
rescale parameters in linked list to yield more optimal tetrahedra<br />
flatten <strong>the</strong> parameters in linked list to an array <strong>of</strong> 3D points<br />
pass array <strong>of</strong> points into TetGen {TetGen returns two arrays: a list <strong>of</strong> tetrahedra<br />
vertices, and a list <strong>of</strong> tetrahedra neighbours}<br />
write grid axis arrays to beginning <strong>of</strong> index file<br />
write linked list <strong>of</strong> model data to index file<br />
write list <strong>of</strong> tetrahedra vertices to index file<br />
write list <strong>of</strong> tetrahedra neighbours to index file<br />
passed to TetGen. This is to allow <strong>the</strong> generation <strong>of</strong> a mesh which is composed, more<br />
optimally, <strong>of</strong> “fat” tetrahedra, avoiding degenerate tetrahedra or “slivers” which would<br />
cause numerical problems for <strong>the</strong> simplex interpolation scheme and <strong>the</strong> point location<br />
algorithm outlined in <strong>the</strong> next section.<br />
Such degenerate tetrahedra would arise because <strong>of</strong> a scale disparity between <strong>the</strong><br />
model grid axes. For instance, <strong>the</strong> T eff axis contains effective temperatues measured in<br />
Kelvin and rescaled in magnitude by a division by 100.<br />
<strong>On</strong> <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> n atm axis typically contains fractional values 0 ≤ n atm ≤ 1.<br />
This disparity means that model grids are very compact in <strong>the</strong> n atm dimension, and<br />
comparatively widely spaced in <strong>the</strong> T eff dimension.<br />
Given <strong>the</strong> model grid axis arrays accumulated during <strong>the</strong> model grid creation process<br />
(which typically correspond to <strong>the</strong> dimensions <strong>of</strong> T eff , log g, and n atm ), each axis is<br />
rescaled in <strong>the</strong> following manner.
3.3 Tetrahedralisation: Interpolation and Indexing 67<br />
Let A i be <strong>the</strong> i th model grid axis comprising <strong>the</strong> list <strong>of</strong> m monotonically increasing<br />
points {a i1 ,a i2 ,...,a im }. A i is rescaled according to <strong>the</strong> mapping function f : A i ↦→ R i<br />
such that f(a) for every a ∈ A i is defined as<br />
f(a) = a − a i1<br />
a i2 − a i1<br />
∗ 100. (3.9)<br />
This simple function translates A i to <strong>the</strong> origin, and rescales <strong>the</strong> points onto a more<br />
widely spaced grid. Assuming a constant distance between all points a ii , this mapping<br />
yields a list <strong>of</strong> m monotonically increasing points R i , {0,100,200, · · · ,(m − 1) ∗ 100}.<br />
3.3.3 Navigating <strong>the</strong> Index - Point Location<br />
The algorithm for locating <strong>the</strong> tetrahedron which encloses any given interpolation point<br />
is based on a randomised jump-and-walk methodology, inspired by <strong>the</strong> work <strong>of</strong> Mücke<br />
et al. (1996).<br />
The basic idea is simple. A “good starting point” is established by randomly sampling<br />
<strong>the</strong> set <strong>of</strong> tetrahedra. The distances between each tetrahedron’s centroid and <strong>the</strong><br />
given interpolation point are calculated, and <strong>the</strong> tetrahedron closest to <strong>the</strong> interpolation<br />
point is selected.<br />
A line segment is <strong>the</strong>n constructed using <strong>the</strong> chosen tetrahedron’s centroid and <strong>the</strong><br />
interpolation point. The tetrahedron containing <strong>the</strong> interpolation point is located by<br />
“walking through” <strong>the</strong> tetrahedra which intersect this line. Figure 3.4 illustrates <strong>the</strong><br />
concept in two dimensions<br />
More formally, given <strong>the</strong> tetrahedralisation D <strong>of</strong> a model grid containing n tetrahedra,<br />
and an interpolation point p (rescaled using Equation 3.9), <strong>the</strong> following procedure<br />
locates <strong>the</strong> tetrahedron <strong>of</strong> D, if any, which contains p:<br />
1. Select m tetrahedra T 1 , · · · ,T m at random from D, where m = ⌈2n 1 3 ⌉<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
68 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
p<br />
L<br />
T<br />
Figure 3.4: The line segment, L, is constructed using <strong>the</strong> centroid <strong>of</strong> <strong>the</strong> starting<br />
tetrahedron, T, and <strong>the</strong> interpolation point, p. The tetrahedra visited on <strong>the</strong> walkthrough<br />
are coloured grey.<br />
2. Determine <strong>the</strong> index j ∈ {1, · · · ,m} <strong>of</strong> <strong>the</strong> tetrahedron minimising <strong>the</strong> Euclidian<br />
distance d(centroid(T j ),p). Set T = T j<br />
3. Locate <strong>the</strong> tetrahedron containing p (if it exists) by traversing all tetrahedra<br />
intersected by <strong>the</strong> line segment L = (centroid(T),p).<br />
Step 3 is implemented in constant time per tetrahedron visited once <strong>the</strong> initial<br />
tetrahedron, intersected by L and incident on starting point T, is determined. This is<br />
due to <strong>the</strong> fact that TetGen conveniently returns an array which describes, for every<br />
tetrahedron in <strong>the</strong> mesh, which tetrahedra are its neighbours.<br />
The implementation <strong>of</strong> <strong>the</strong> walk-though mechanism is based on <strong>the</strong> fast ray-triangle<br />
intersection algorithm <strong>of</strong> Möller & Trumbore (1997). This algorithm is very straight-
3.3 Tetrahedralisation: Interpolation and Indexing 69<br />
forward.<br />
A ray R(t) with origin O and normalised direction D is defined as<br />
R(t) = O + tD, (3.10)<br />
and a triangle is defined by three vertices V 0 , V 1 , and V 2 . A point, T(u,v), on a<br />
triangle is given by<br />
T(u,v) = (1 − u − v)V 0 + uV 1 + vV 2 , (3.11)<br />
where (u,v) are <strong>the</strong> barycentric coordinates which must fulfill u,v ≥ 0, and u + v ≤<br />
1. Computing <strong>the</strong> intersection between <strong>the</strong> ray, R(t), and <strong>the</strong> triangle, T(u,v), is<br />
equivalent to R(t) = T(u,v), which yields<br />
O + tD = (1 − u − v)V 0 + uV 1 + vV 2 . (3.12)<br />
Rearranging <strong>the</strong> terms gives<br />
[<br />
⎡<br />
]<br />
−D, V 1 − V 0 , V 2 − V 0<br />
·<br />
⎢<br />
⎣<br />
t<br />
u<br />
v<br />
⎤<br />
⎥<br />
⎦ = O − V 0. (3.13)<br />
The barycentric coordinates (u,v) and <strong>the</strong> distance, t, from <strong>the</strong> ray origin to <strong>the</strong><br />
intersection point can be found by solving <strong>the</strong> linear system <strong>of</strong> equations above. If <strong>the</strong><br />
barycentric coordinates meet <strong>the</strong> requirements stipulated earlier, <strong>the</strong>n <strong>the</strong> ray intersects<br />
<strong>the</strong> triangle.<br />
From <strong>the</strong> starting point <strong>of</strong> <strong>the</strong> walk-through method, each triangular face <strong>of</strong> <strong>the</strong><br />
tetrahedron is tested using this algorithm to determine if it is intersected by <strong>the</strong> line<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
70 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
segment L. If an intersecting face is found, <strong>the</strong> walk-through moves to <strong>the</strong> tetrahedron<br />
opposite that face (in constant time).<br />
This new tetrahedron is first tested to see if it contains point p by way <strong>of</strong> <strong>the</strong> simplex<br />
interpolation method discussed in section 3.3.1. If <strong>the</strong> tetrahedron does not contain<br />
p, <strong>the</strong> ray-triangle intersection test is performed, and <strong>the</strong> walk-through moves to <strong>the</strong><br />
neighbouring tetrahedron on <strong>the</strong> o<strong>the</strong>r side <strong>of</strong> <strong>the</strong> face intersected by <strong>the</strong> ray. If <strong>the</strong><br />
tetrahedron does contain p, <strong>the</strong>n <strong>the</strong> walk-through procedure can terminate successfully<br />
by returning <strong>the</strong> interpolation weights (i.e., <strong>the</strong> barycentric coordinates) obtained from<br />
<strong>the</strong> point-in-simplex test.<br />
It is possible that point p could lie outside <strong>the</strong> convex hull <strong>of</strong> <strong>the</strong> tetrahedralisation.<br />
The walk-through algorithm recognises this eventuality when <strong>the</strong> line segment L intersects<br />
<strong>the</strong> face <strong>of</strong> a tetrahedron which is a member <strong>of</strong> <strong>the</strong> convex hull and <strong>the</strong>refore has<br />
no neighbour listed in <strong>the</strong> array returned by TetGen.<br />
Ra<strong>the</strong>r than allowing <strong>the</strong> walk-through algorithm to spend time traversing <strong>the</strong> tetrahedralisation<br />
in order to discover that point p lies outside <strong>the</strong> convex hull, it is possible<br />
to test for this case immediately after forming <strong>the</strong> line segment L.<br />
In addition to generating <strong>the</strong> Delaunay tetrahedralisation <strong>of</strong> a model grid, TetGen is<br />
also able to return a list <strong>of</strong> those tetrahedron faces which comprise <strong>the</strong> convex hull. After<br />
forming <strong>the</strong> line segment L, each <strong>of</strong> <strong>the</strong>se faces could <strong>the</strong>n be tested for intersection.<br />
However, it doesn’t really matter which method is used because, if point p lies<br />
outside <strong>the</strong> convex hull, <strong>the</strong> simplex interpolation method dictates that SFIT can no<br />
longer proceed with a fitting run and must stop.<br />
In summary, pseudo-code for <strong>the</strong> algorithms outlined in this section are given in<br />
algorithms 2, 3, and 4.
3.3 Tetrahedralisation: Interpolation and Indexing 71<br />
Algorithm 2 Locating a Point in a Tetrahedralisation<br />
rescale point, p, onto axes <strong>of</strong> rescaled model grid<br />
if no starting tetrahedron exists <strong>the</strong>n<br />
find close starting tetrahedron by random selection<br />
end if<br />
walk through tetrahedralisation<br />
if enclosing tetrahedron found <strong>the</strong>n<br />
return barycentric coordinates <strong>of</strong> point p within <strong>the</strong> tetrahedron<br />
else<br />
point lies outside <strong>the</strong> convex hull <strong>of</strong> <strong>the</strong> tetrahedralisation<br />
exit SFIT<br />
end if<br />
Algorithm 3 Finding Walk-Through Starting Point<br />
select at random m = ⌈2n 1 3 ⌉ tetrahedra from <strong>the</strong> tetrahedralisation, where n is <strong>the</strong><br />
total number <strong>of</strong> tetrahedra in <strong>the</strong> tetrahedralisation<br />
compute <strong>the</strong> Euclidean distance from each selected tetrahedron’s centroid to <strong>the</strong><br />
interpolation point<br />
return <strong>the</strong> index <strong>of</strong> <strong>the</strong> closest tetrahedron<br />
Algorithm 4 Walk-Through <strong>of</strong> Tetrahedralisation<br />
construct <strong>the</strong> line segment, L, from given starting tetrahedron’s centroid to <strong>the</strong> interpolation<br />
point, p<br />
current tetrahedron = given starting tetrahedron<br />
loop<br />
if current tetrahedron contains <strong>the</strong> interpolation point <strong>the</strong>n<br />
return <strong>the</strong> barycentric coordinates <strong>of</strong> its location<br />
else<br />
test each triangular face <strong>of</strong> <strong>the</strong> starting tetrahedron for intersection with L<br />
current tetrahedron = neighbouring tetrahedron on o<strong>the</strong>r side <strong>of</strong> intersected face<br />
if current tetrahedron is null <strong>the</strong>n<br />
interpolation point lies outside convex hull<br />
exit SFIT<br />
end if<br />
end if<br />
end loop<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
72 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
3.4 Testing <strong>the</strong> Modifications<br />
The new simplex interpolation and indexing scheme was tested against <strong>the</strong> previous<br />
SFIT grid storage and interpolation in tables method using two case studies. The<br />
first conducts an analysis <strong>of</strong> a spectrum from <strong>the</strong> extreme helium star BD+10 2179<br />
(Klemola, 1961) to allow a comparison <strong>of</strong> each <strong>of</strong> <strong>the</strong> different optimisation routines<br />
<strong>of</strong>fered by SFIT over <strong>the</strong> two interpolation schemes. The second uses a coarse grid <strong>of</strong><br />
<strong>the</strong>oretical models to parameterise a large number <strong>of</strong> o<strong>the</strong>r models to give an indication<br />
<strong>of</strong> <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> two interpolation schemes whilst keeping <strong>the</strong> optimisation method<br />
constant.<br />
Case Study 1: BD+10 2179<br />
The observed high-resolution echelle spectrum <strong>of</strong> BD+10 2179 used in this study covers<br />
<strong>the</strong> wavelength range 3760–5230 Å, at a dispersion <strong>of</strong> 0.1Å pixel−1 . The spectrum<br />
has already been wavelength calibrated and normalised. Both versions <strong>of</strong> SFIT fit a<br />
window, 4054–4545 Å, <strong>of</strong> this spectrum to a grid <strong>of</strong> 48 <strong>the</strong>oretical models covering <strong>the</strong><br />
parameter space as described in Table 3.1.<br />
Parameter Values<br />
T eff (K) 14,000, 16,000, 18,000, 20,000<br />
log g 2.00, 2.50, 3.00, 3.50<br />
n He 0.9960, 0.9890, 0.9690<br />
Table 3.1: Details <strong>of</strong> <strong>the</strong> model grid used in <strong>the</strong> comparison<br />
.<br />
The grid also has a latent fourth parameter in carbon abundance.<br />
For each analysis, <strong>the</strong> same initial guesses for each parameter were used for <strong>the</strong><br />
Amoeba and Levenberg-Marquardt optimisation methods. These have been chosen to<br />
be close to expected values <strong>of</strong> <strong>the</strong> final parameter. They, and <strong>the</strong> step sizes given to<br />
<strong>the</strong> Amoeba routine, are listed in Table 3.2.
3.4 Testing <strong>the</strong> Modifications 73<br />
Parameter Initial Value Amoeba Step Size<br />
T eff (kK) 17.0 2.0<br />
log g (dex) 2.5 0.5<br />
n He 0.989 0.01<br />
v sin i 27.5 10.0<br />
v rad 137.4 10.0<br />
Table 3.2: Initial parameters used for <strong>the</strong> Amoeba and Levenberg-Marquardt optimisation<br />
routines. The step sizes used for Amoeba are also given<br />
.<br />
An analysis begins by fixing <strong>the</strong> helium abundance, and solving for T eff , log g, v sin i,<br />
and v rad . Then, <strong>the</strong> values for <strong>the</strong>se parameters are fixed, and a solution is found for<br />
n He which, with <strong>the</strong> latent n C parameter, is effectively a first approximation for n C .<br />
Finally, <strong>the</strong> value <strong>of</strong> n He is fixed again, and <strong>the</strong> solutions for T eff and log g are checked.<br />
The results obtained by each optimisation method available in SFIT (Nelder-Mead<br />
simplex (Amoeba), Levenberg-Marquardt (LM), and nearest neighbour (NN) fitting)<br />
are presented in Tables 3.3 and 3.4 for both <strong>the</strong> original SFIT and <strong>the</strong> modified SFIT.<br />
Listed in paren<strong>the</strong>ses for each parameter are <strong>the</strong> standard errors generated by SFIT.<br />
Unmodified SFIT<br />
Amoeba LM NN<br />
T eff (kK) 18.000 (±0.014) 18.087 (±0.016) 18.00 (±0.011)<br />
log g 2.743 (±0.004) 2.747 (±0.004) 2.50 (±0.003)<br />
n He 0.997 (±0.001) 0.994 (±0.001) 0.996 (±0.001)<br />
v sin i 33.44 (±0.13) 36.9 (±0.15) 27.50 (±0.11)<br />
v rad 136.23 (±0.0) 137.4 (±0.0) 137.4 (±0.0)<br />
χ 2 Fit 9.00 9.90 9.96<br />
Time (secs) 54.3 11.42 27.99<br />
Table 3.3: Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> unmodified version <strong>of</strong> SFIT<br />
This is a satisfactory result which shows that <strong>the</strong> simplex interpolation and grid<br />
indexing scheme performs slightly better (in terms <strong>of</strong> <strong>the</strong> final χ 2 value) than <strong>the</strong><br />
original linear interpolation in tables method. There is also a small gain in terms <strong>of</strong><br />
execution speed <strong>of</strong> <strong>the</strong> Amoeba method with <strong>the</strong> new simplex-based scheme.<br />
It should be noted that <strong>the</strong> 6-fold increase in speed for nearest neighbour searching<br />
reported in Table 3.4 is due to a re-write <strong>of</strong> some SFIT internals to take advantage <strong>of</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
74 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Modified SFIT<br />
Amoeba LM NN<br />
T eff (kK) 18.150 (±0.005) 17.870 (±0.015) 18.00 (±0.012)<br />
log g 2.836 (±0.004) 2.687 (±0.005) 2.50 (±0.003)<br />
n He 0.993 (±0.001) 0.992 (±0.001) 0.996 (±0.001)<br />
v sini 33.46 (±0.13) 36.90 (±0.14) 27.50 (±0.12)<br />
v rad 136.22 (±0.0) 137.4 (±0.0) 137.4 (±0.0)<br />
χ 2 Fit 8.88 9.770 10.20<br />
Time (secs) 20.649 14.01 4.71<br />
Table 3.4: Results <strong>of</strong> BD+10 2179 analysis with <strong>the</strong> modified version <strong>of</strong> SFIT<br />
<strong>the</strong> data structures used in <strong>the</strong> simplex interpolation scheme. The data structures allow<br />
a fast iteration over all <strong>the</strong> models in a grid, reading each in from disk as needed. This<br />
means that <strong>the</strong> χ 2 computation is being performed directly with <strong>the</strong> model itself, in<br />
contrast with <strong>the</strong> linear interpolation in tables scheme which actually tries to interpolate<br />
to <strong>the</strong> grid point instead <strong>of</strong> accessing <strong>the</strong> model directly. This is ano<strong>the</strong>r design flaw<br />
in SFIT that <strong>the</strong> simplex-based scheme corrects. The difference in methodology also<br />
accounts for <strong>the</strong> slightly different χ 2 values for nearest neighbour fitting listed in Tables<br />
3.3 and 3.4.<br />
Case Study 2: Model-Based <strong>Analysis</strong><br />
The grid <strong>of</strong> <strong>the</strong>oretical models used in this case study is given in Table 3.5. It coarsely<br />
covers almost <strong>the</strong> entire parameter space <strong>of</strong> models available in <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong><br />
archives.<br />
Parameter Values<br />
T eff (kK) 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 50.0<br />
log g 3.00, 4.00, 5.00, 6.00<br />
n He 0.001, 0.1, 0.5, 0.9, 0.999<br />
Table 3.5: The model grid used to obtain physical parameters <strong>of</strong> <strong>the</strong> set <strong>of</strong> test models.<br />
The rationale <strong>of</strong> <strong>the</strong> experiment is to use this grid to parameterise a large set <strong>of</strong><br />
models which fall within its boundaries, but aren’t actually used in <strong>the</strong> grid. Keeping<br />
<strong>the</strong> optimisation method constant, <strong>the</strong> results <strong>of</strong> <strong>the</strong> parameterisations will give an
3.4 Testing <strong>the</strong> Modifications 75<br />
indication <strong>of</strong> <strong>the</strong> relative accuracy <strong>of</strong> <strong>the</strong> two interpolation schemes.<br />
1238 models were selected to be parameterised by each version <strong>of</strong> SFIT. Each model<br />
was convolved with a Gaussian <strong>of</strong> 1Å<br />
FWHM to degrade its resolution slightly, and<br />
<strong>the</strong>n resampled onto a wavelength grid <strong>of</strong> 4050–4950 Å.<br />
The optimisation method used was Nelder-Mead simplex, with initial parameters<br />
and step sizes as follows: T eff = 30kK, δT eff = 5.0kK; log g = 4.5, δ log g = 1.0; n He =<br />
0.5, δn He = 0.1. Results are presented in Figures 3.5 to 3.7, and in Table 3.6,<br />
Before discussing <strong>the</strong> results, <strong>the</strong> presence <strong>of</strong> some anomalies in <strong>the</strong> linear interpolation<br />
in tables parameterisations must be noted and dealt with. Figure 3.5 plots <strong>the</strong><br />
parameterisation results for all <strong>of</strong> <strong>the</strong> 1238 models. At T eff ∼ 50,000K, <strong>the</strong> optimiser<br />
returns unbelievable values <strong>of</strong> log g for some models. Something also seems to be going<br />
wrong with <strong>the</strong> T eff parameterisations at <strong>the</strong> 50,000K grid boundary as some models<br />
with log g ∼ 3.5 are assigned temperatures much larger than 50,000K.<br />
The implementation <strong>of</strong> <strong>the</strong> linear interpolation in tables method used in SFIT does<br />
not take any steps to limit <strong>the</strong> optimisation routines to <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> grid,<br />
and actually allows some extrapolation to occur at <strong>the</strong> edges <strong>of</strong> <strong>the</strong> grid. However,<br />
it is unclear whe<strong>the</strong>r <strong>the</strong> anomalous T eff and log g values are due to <strong>the</strong> optimisation<br />
routine (in this case, Amoeba) extrapolating too far outside <strong>the</strong> grid space (i.e., <strong>the</strong>re<br />
is a problem with <strong>the</strong> implementation <strong>of</strong> <strong>the</strong> interpolation routine), or if <strong>the</strong>re is a<br />
problem with <strong>the</strong> models.<br />
If Figure 3.5 is replotted with axes closer to <strong>the</strong> grid boundaries, as in Figure 3.6, <strong>the</strong><br />
best performance <strong>of</strong> <strong>the</strong> interpolation method appears to occur below T eff = 40,000K.<br />
Between 40,000K and 50,000K, <strong>the</strong> parameterisations are more randomly distributed<br />
indicating a greater level <strong>of</strong> “confusion” from <strong>the</strong> interpolation routine. A cursory<br />
inspection <strong>of</strong> <strong>the</strong> models reveals no significant problems, so it could be hypo<strong>the</strong>sised<br />
that <strong>the</strong>re is definitely an issue with <strong>the</strong> implementation. However, an inspection <strong>of</strong><br />
Figure 3.7 shows a similar “confusion” from <strong>the</strong> simplex-based method.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
76 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
-10<br />
0<br />
10<br />
20<br />
log g<br />
30<br />
40<br />
50<br />
60<br />
70<br />
80<br />
80000<br />
70000<br />
60000<br />
50000 40000<br />
T eff (K)<br />
30000<br />
20000<br />
10000<br />
3<br />
2<br />
1<br />
log( n He / n H )<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
-5<br />
80000<br />
70000<br />
60000<br />
50000 40000<br />
T eff (K)<br />
30000<br />
20000<br />
10000<br />
Figure 3.5: Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />
Clearly visible are anomalous results arising from a suspected defect in <strong>the</strong> method’s<br />
implementation.
3.4 Testing <strong>the</strong> Modifications 77<br />
2<br />
3<br />
4<br />
log g<br />
5<br />
6<br />
7<br />
50000<br />
40000<br />
30000<br />
T eff (K)<br />
20000<br />
10000<br />
2<br />
1<br />
log( n He / n H )<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
50000<br />
40000<br />
30000<br />
T eff (K)<br />
20000<br />
10000<br />
Figure 3.6: Parameterisation results from <strong>the</strong> linear interpolation in tables method.<br />
Axes have been restricted to give a view <strong>of</strong> <strong>the</strong> grid boundaries described in Table 3.5.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
78 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
2<br />
3<br />
4<br />
log g<br />
5<br />
6<br />
7<br />
50000<br />
40000<br />
30000<br />
T eff (K)<br />
20000<br />
10000<br />
2<br />
1<br />
log( n He / n H )<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
50000<br />
40000<br />
30000<br />
T eff (K)<br />
20000<br />
10000<br />
Figure 3.7: Parameterisation results from <strong>the</strong> simplex-based interpolation scheme.<br />
In contrast with Figures 3.5 and 3.6, <strong>the</strong> simplex-based scheme clearly restricts <strong>the</strong><br />
optimisers to <strong>the</strong> grid boundaries.
3.4 Testing <strong>the</strong> Modifications 79<br />
At T eff ≥ 40,000K, <strong>the</strong> helium-rich models are most likely confusing <strong>the</strong> optimiser<br />
because <strong>the</strong> HeII ion lines manifest at wavelengths close to those <strong>of</strong> <strong>the</strong> neutral hydrogen<br />
lines. This problem requires fur<strong>the</strong>r investigation, but, to work around <strong>the</strong> issue, a<br />
comparison <strong>of</strong> parameterisation results for those models with T eff ≤ 40,000K, and<br />
log g ≤ 6.0 is also given in Table 3.6. These RMS metrics give a better indication <strong>of</strong><br />
<strong>the</strong> relative performance <strong>of</strong> <strong>the</strong> two methods.<br />
σ rms<br />
σ rms<br />
All Models T eff ≤ 40kK, log g ≤ 6.0<br />
T eff (K) log g n He T eff (K) log g n He<br />
Simplex/Models 3592.74 0.329 0.102 2666.79 0.355 0.068<br />
Linear/Models 4695.11 3.362 0.149 1905.47 0.349 0.056<br />
Linear/Simplex 3455.88 3.376 0.150 1928.02 0.306 0.065<br />
Table 3.6: RMS comparison <strong>of</strong> parameterisation results from each interpolation<br />
method with <strong>the</strong> original parameters <strong>of</strong> each model. Also given is <strong>the</strong> RMS difference<br />
between <strong>the</strong> methods, and a comparison between <strong>the</strong> results in <strong>the</strong> region <strong>of</strong> parameter<br />
space for which both schemes seem to give <strong>the</strong>ir best results (see Figures 3.6 and 3.7).<br />
The linear interpolation in tables scheme yields slightly more accurate results than<br />
<strong>the</strong> simplex-based method for all three parameters. This is most likely due to <strong>the</strong> coarse<br />
grid spacing used in <strong>the</strong> experiment, with a finer-grained grid allowing <strong>the</strong> simplex<br />
interpolation method to achieve more accuracy. Using a finer-grained grid with SFIT<br />
is now possible because <strong>the</strong> simplex-based gird management scheme removes all <strong>the</strong><br />
grid size, shape, and completeness restrictions imposed by <strong>the</strong> old linear interpolation<br />
in tables method.<br />
The speed difference between <strong>the</strong> two methods should also be emphasised. To parameterise<br />
all <strong>the</strong> models, SFIT took approximately 10 minutes with <strong>the</strong> simplex-based<br />
scheme, but over 90 minutes with <strong>the</strong> old methodology. This significant gain in speed,<br />
along with <strong>the</strong> o<strong>the</strong>r advantages <strong>of</strong>fered by <strong>the</strong> new simplex-based scheme, outweigh<br />
<strong>the</strong> possible slight loss <strong>of</strong> accuracy indicated in Table 3.6.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
80 Chapter 3 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
3.5 Summary<br />
The χ 2 fitting code, SFIT, has been modified and extended to handle arbitrarily large<br />
grids <strong>of</strong> <strong>the</strong>oretical model spectra. This paves <strong>the</strong> way to making SFIT more amenable<br />
to parameterising very large quantities <strong>of</strong> stellar spectra in an astronomical data mining<br />
application.<br />
Two major problems were identified with <strong>the</strong> way SFIT manages grids <strong>of</strong> models.<br />
Grids were restricted in size due to hard-coded limits written into <strong>the</strong> program, and<br />
<strong>the</strong> interpolation scheme used to approximate <strong>the</strong> space between grid points could not<br />
handle irregular or incomplete grids.<br />
These problems were solved by developing a new grid management and interpolation<br />
scheme based on simplex interpolation and Delaunay triangulation.<br />
This new scheme was tested against <strong>the</strong> old version <strong>of</strong> SFIT by parameterising a wellstudied<br />
spectrum, and a large quantity <strong>of</strong> <strong>the</strong>oretical models. The new version <strong>of</strong> SFIT<br />
was found to perform much faster than <strong>the</strong> old version, with a more accurate fit being<br />
reported for <strong>the</strong> individual spectrum, and slightly (but not significantly) worse results<br />
in <strong>the</strong> parameterisation <strong>of</strong> <strong>the</strong> models. This slight loss <strong>of</strong> accuracy is outweighed by <strong>the</strong><br />
increase in overall speed, and <strong>the</strong> removal <strong>of</strong> several severely constricting restrictions<br />
on <strong>the</strong> size, shape, and completeness <strong>of</strong> SFIT model grids.
Chapter 4<br />
Filtering - Principal Components<br />
<strong>Analysis</strong><br />
Modern astronomical data sets <strong>of</strong>ten contain observations <strong>of</strong> many different types <strong>of</strong><br />
objects, and are rarely typologically homogeneous (Chapter 1). Searching for particular<br />
types <strong>of</strong> objects in such large databases requires computer assistance. Query parameters<br />
can be used to narrow down <strong>the</strong> data set to objects <strong>of</strong> a particular colour range,<br />
redshift, morphology, or some o<strong>the</strong>r parameter combination <strong>of</strong> significance. However,<br />
this reduced data set will invariably still contain objects that <strong>the</strong> astronomer would<br />
like to discard. Manual inspection <strong>of</strong> <strong>the</strong> data is not time-efficient unless quantities are<br />
small. It is more expedient to have an automated, or semi-automated, tool that can be<br />
used to assist in filtering through <strong>the</strong> data.<br />
Filtering is essentially a coarse-grained classification problem. An unknown spectrum<br />
is compared with a collection <strong>of</strong> known, or template, spectra to determine if it<br />
belongs to that particular class <strong>of</strong> object. The well-known techniques <strong>of</strong> cross correlation<br />
(Tonry & Davis, 1979) and χ 2 minimisation (Chapter 3) are immediately applicable.<br />
However, in a data mining context, speed is <strong>of</strong> importance, and <strong>the</strong>se techniques are<br />
slow.<br />
81
82 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>On</strong>e way to construct a fast filtering method is to extract <strong>the</strong> defining features from<br />
a set <strong>of</strong> known spectra, and use <strong>the</strong>m to summarise and represent that set. Instead <strong>of</strong><br />
comparing an unknown spectrum with each template spectrum, it can <strong>the</strong>n be weighed<br />
against <strong>the</strong> summarised form in a more computationally expedient manner.<br />
Principal Components <strong>Analysis</strong> (PCA; Murtagh & Heck, 1987) can be used to<br />
construct such a summary. It is a multivariate statistical technique which seeks to<br />
summarise <strong>the</strong> variance <strong>of</strong> an N-dimensional data set in a handful <strong>of</strong> independent<br />
parameters. These parameters capture <strong>the</strong> main sources <strong>of</strong> linear variation in <strong>the</strong> data<br />
set, and can be used to construct a fast test to determine if an unknown spectrum is<br />
similar to a collection <strong>of</strong> known spectra.<br />
Ano<strong>the</strong>r advantage to using PCA as a filter is that <strong>the</strong> independent parameters produced<br />
are unique to each data set. This means that a PCA-based filter is generalisable,<br />
and can be used to construct a filter for any type <strong>of</strong> astronomical object.<br />
As a testament to its versatility, PCA has been applied on several occasions to <strong>the</strong><br />
classification <strong>of</strong> astronomical spectra. Deeming (1964) applied it to <strong>the</strong> classification <strong>of</strong><br />
G and K-type giants. Connolly et al. (1995) used PCA to classify galaxy spectra, and<br />
Francis et al. (1992) applied it to <strong>the</strong> classification <strong>of</strong> quasar spectra. Whereas <strong>the</strong>se<br />
studies used PCA in an unsupervised manner, it is used here in a supervised fashion.<br />
A Filter for Hot Subdwarfs<br />
Chapter 1 outlines a general data mining toolkit for astronomical spectra, with a specific<br />
application to hot subdwarfs. As such, <strong>the</strong> apparatus <strong>of</strong> <strong>the</strong> PCA-based filter outlined<br />
here will be applied to <strong>the</strong> data set obtained from Drilling et al. (2006) to construct a<br />
filter for hot subdwarfs.<br />
The operation <strong>of</strong> this filter will <strong>the</strong>n be applied to a set <strong>of</strong> real-world low-dispersion<br />
spectra obtained from <strong>the</strong> Sloan Digital Sky Survey in an attempt to data mine a
4.1 Constructing A PCA-Based Filter 83<br />
Y<br />
u 2<br />
u<br />
1<br />
Figure 4.1: Principal component analysis. u 1 is <strong>the</strong> first principal component and <strong>the</strong><br />
axis onto which <strong>the</strong> projected positions <strong>of</strong> <strong>the</strong> data have <strong>the</strong>ir maximum sum. u 2 is <strong>the</strong><br />
second principal component, and u 1 · u 2 = 0.<br />
X<br />
collection <strong>of</strong> hot subdwarf candidates for fur<strong>the</strong>r study.<br />
4.1 Constructing A PCA-Based Filter<br />
Principal components analysis transforms an N-dimensional data set onto a new set<br />
<strong>of</strong> optimally defined axes. These axes represent <strong>the</strong> directions <strong>of</strong> maximum variance<br />
between variables in <strong>the</strong> data set, and are called <strong>the</strong> Principal Components (PCs). The<br />
technique basically amounts to a rotation from <strong>the</strong> original axes to <strong>the</strong> new ones, and<br />
is <strong>the</strong>refore a linear transformation <strong>of</strong> <strong>the</strong> data.<br />
Figure 4.1 illustrates <strong>the</strong> concept with a two dimensional data set. The direction<br />
<strong>of</strong> maximum variance in <strong>the</strong> data is represented by u 1 . This new axis (<strong>the</strong> first PC <strong>of</strong><br />
<strong>the</strong> data set) better describes <strong>the</strong> data than ei<strong>the</strong>r x 1 or x 2 . The remaining variance<br />
in <strong>the</strong> data, once <strong>the</strong>y have been projected onto <strong>the</strong> first PC, is described by u 2 , <strong>the</strong><br />
second PC. Thus, u 1 and u 2 are a more optimally aligned directional basis set for this<br />
particular data set.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
84 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
The PCs are derived in decreasing order <strong>of</strong> importance, with <strong>the</strong> first PC describing<br />
most <strong>of</strong> <strong>the</strong> variance in <strong>the</strong> data, and subsequent PCs representing less and less<br />
information about <strong>the</strong> variance. In <strong>the</strong> case <strong>of</strong> a large N-dimensional data set, a successful<br />
derivation <strong>of</strong> <strong>the</strong> principal components means that <strong>the</strong> first few components can<br />
be used to give a compressed representation <strong>of</strong> <strong>the</strong> data without a significant loss <strong>of</strong><br />
information.<br />
Lesser principal components will typically contain information on features in <strong>the</strong><br />
data which are not very well correlated, such as noise or anomalies. By discarding<br />
<strong>the</strong>se components, a compressed representation will preferentially remove undesired<br />
features, and features which do not vary over a sufficient fraction <strong>of</strong> <strong>the</strong> data set.<br />
4.1.1 Ma<strong>the</strong>matics <strong>of</strong> PCA<br />
This presentation <strong>of</strong> PCA <strong>the</strong>ory follows that <strong>of</strong> Bailer-Jones (1996) and Murtagh &<br />
Heck (1987).<br />
Let <strong>the</strong> vector x = (x 1 ,x 2 ,x 3 ,... ,x N ) be a stellar spectrum with N flux bins.<br />
A spectrum can <strong>the</strong>n be considered a point in N-dimensional space, with each axis<br />
representing each flux bin. M such spectra can be described as <strong>the</strong> (M × N) matrix<br />
X T = (x 1 ,x 2 ,... ,x M ).<br />
The first principal component is <strong>the</strong> normalised vector, u, which best fits <strong>the</strong> points<br />
in X T . The criterion <strong>of</strong> goodness <strong>of</strong> fit <strong>of</strong> this axis to <strong>the</strong> point set is defined as <strong>the</strong><br />
squared deviation <strong>of</strong> <strong>the</strong> points from <strong>the</strong> axis. Minimising <strong>the</strong> sum <strong>of</strong> distances between<br />
<strong>the</strong> points and axis is equivalent to maximising <strong>the</strong> sum <strong>of</strong> squared projections onto<br />
<strong>the</strong> axis, i.e., maximising <strong>the</strong> variance <strong>of</strong> <strong>the</strong> points when projected onto this axis.<br />
The sum <strong>of</strong> squared projections <strong>of</strong> <strong>the</strong> points in X T onto <strong>the</strong> new axis, u, is<br />
(Xu) T (Xu). (4.1)
4.1 Constructing A PCA-Based Filter 85<br />
In maximising this quadratic form, <strong>the</strong> constraint must be made that u T u = 1 o<strong>the</strong>rwise<br />
<strong>the</strong> projection can be maximised arbitrarily. Setting S = X T X, and introducing<br />
<strong>the</strong> Lagrange multiplier, λ, <strong>the</strong> maximum is obtained by differentiating<br />
u T Su − λ(u T u − 1), (4.2)<br />
which gives,<br />
2Su − 2λu. (4.3)<br />
Setting this equal to zero, <strong>the</strong> optimal value <strong>of</strong> u is <strong>the</strong> solution <strong>of</strong><br />
Su = λu. (4.4)<br />
This is a standard eigenvector problem. The eigenvector <strong>of</strong> S, u, is <strong>the</strong> line <strong>of</strong> best<br />
fit, and <strong>the</strong> corresponding eigenvalue, λ, indicates <strong>the</strong> amount <strong>of</strong> variance described by<br />
this line.<br />
Calculating <strong>the</strong> remaining axes proceeds in a similar manner. The second axis is<br />
found by again maximising u T Su, but with <strong>the</strong> added constraint that <strong>the</strong> second axis<br />
be orthogonal to <strong>the</strong> first, i.e., u T 2 u 1 = 0. Introducing <strong>the</strong> Lagrange multipliers, λ 2 and<br />
µ, <strong>the</strong> maximum is obtained by differentiating<br />
u T 2 Su 2 − λ 2 (u T 2 u 2 − 1) − µ(u T 2 u 1 ), (4.5)<br />
giving,<br />
2Su 2 − 2λ 2 u 2 − µu 1 . (4.6)<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
86 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Setting this equal to zero, and multiplying through by u T 1 yields<br />
µu T 1 u 1 = 0, (4.7)<br />
which implies that µ = 0. Therefore, equation 4.6 is <strong>of</strong> <strong>the</strong> same form as equation<br />
4.4, meaning λ 2 and u 2 are <strong>the</strong> second largest eigenvalue and eigenvector <strong>of</strong> S.<br />
Thus, <strong>the</strong> principal components <strong>of</strong> a set <strong>of</strong> N-dimensional points, X, are <strong>the</strong> eigenvectors<br />
<strong>of</strong> <strong>the</strong> matrix <strong>of</strong> sums <strong>of</strong> squares and cross products, S = X T X. There are N<br />
eigenvectors for an N-dimensional problem.<br />
The principal components form a directional basis set, meaning that PCA is best<br />
applied to data that are centred. Geometrically speaking, centring is equivalent to<br />
a shift in <strong>the</strong> origin <strong>of</strong> <strong>the</strong> co-ordinate system, and is performed by calculating and<br />
subtracting <strong>the</strong> mean from <strong>the</strong> row vectors <strong>of</strong> X.<br />
Let x i be <strong>the</strong> average <strong>of</strong> element x i over all M data points. Therefore, <strong>the</strong> i th element<br />
<strong>of</strong> <strong>the</strong> p th point is given by<br />
∆x i,p = x i,p − x i . (4.8)<br />
S now becomes <strong>the</strong> covariance matrix <strong>of</strong> <strong>the</strong> data points. The result <strong>of</strong> equation 4.4<br />
remains unchanged. Subtracting <strong>the</strong> mean also has <strong>the</strong> advantage that <strong>the</strong> dynamic<br />
range <strong>of</strong> S is reduced, increasing <strong>the</strong> numerical stability <strong>of</strong> <strong>the</strong> solution to <strong>the</strong> eigenvector<br />
problem.<br />
4.1.2 Building A Hot Subdwarf Filter<br />
By retaining only <strong>the</strong> most significant principal components <strong>of</strong> an N-dimensional data<br />
set, a quick test can determine if a new data point is in a similar region <strong>of</strong> N-dimensional
4.1 Constructing A PCA-Based Filter 87<br />
1.0<br />
Normalised Flux<br />
0.8<br />
0.6<br />
4100<br />
4500<br />
Wavelength (Angstroms)<br />
4900<br />
Figure 4.2: Mean spectrum <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample.<br />
space as <strong>the</strong> original data set. This is <strong>the</strong> principle upon which a filter can be built<br />
to help search for astronomical objects <strong>of</strong> a particular type from a large collection <strong>of</strong><br />
unknown spectra.<br />
As described at <strong>the</strong> start <strong>of</strong> <strong>the</strong> chapter, such a filter will now be developed using<br />
<strong>the</strong> collection <strong>of</strong> 177 standard hot subdwarf spectra obtained from Drilling et al. (2006)<br />
(see also Chapter 2).<br />
The first step is to construct <strong>the</strong> mean spectrum, subtract it from each spectrum in<br />
<strong>the</strong> set, <strong>the</strong>reby forming <strong>the</strong> matrix <strong>of</strong> difference spectra using equation 4.8. The mean<br />
spectrum is plotted in Figure 4.2.<br />
The elements <strong>of</strong> <strong>the</strong> covariance matrix, S, are <strong>the</strong>n calculated from<br />
s i,j = ∆x i,p ∆x j,p . (4.9)<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
88 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
The use <strong>of</strong> <strong>the</strong> covariance matrix in <strong>the</strong> formulation <strong>of</strong> PCA assumes that <strong>the</strong> data<br />
do not need to be standardised, i.e., that all <strong>the</strong> variables are on <strong>the</strong> same scale. This<br />
assumption is valid here because <strong>the</strong> Drilling et al. (2006) spectra have all been continuum<br />
normalised, and <strong>the</strong> application <strong>of</strong> <strong>the</strong> filter will be to normalised spectra.<br />
If <strong>the</strong> variables were on different scales, e.g., if <strong>the</strong> Drilling et al. (2006) set <strong>of</strong> spectra<br />
were unnormalised and half had flux scales several orders <strong>of</strong> magnitude greater than<br />
<strong>the</strong> o<strong>the</strong>r, <strong>the</strong>n <strong>the</strong> large differences between <strong>the</strong> variances <strong>of</strong> <strong>the</strong> variables would cause<br />
weaker variables to be ignored. Likewise, PCA can be sensitive to outliers in <strong>the</strong> data<br />
set which can greatly contribute to <strong>the</strong> variance.<br />
Scale dependences must be removed if PCA is to generate useful components. Common<br />
approaches to normalisation include standardising <strong>the</strong> variables to have unit variance,<br />
compressing <strong>the</strong>m onto <strong>the</strong> scale 0-1, or taking logarithms. The results <strong>of</strong> <strong>the</strong><br />
PCA will depend on <strong>the</strong> normalisation method used.<br />
In this application <strong>of</strong> PCA to stellar spectra, <strong>the</strong> covariance matrix, S, will always<br />
be real and symmetric. As such, equation 4.4 does not need to solved as is because any<br />
real matrix is diagonalised by <strong>the</strong> matrix <strong>of</strong> its eigenvectors (see Golub & Van Loan<br />
1989).<br />
Any real and symmetric matrix can be reliably diagonalised using a technique such<br />
as Jacobi’s method. Here, a QR-based singular value decomposition (see Press et al.<br />
1986) routine has been used to calculate <strong>the</strong> eigenvectors. The results <strong>of</strong> <strong>the</strong> PCA<br />
analysis are presented in Figures 4.3 and 4.4 wherein <strong>the</strong> first ten principal components<br />
<strong>of</strong> <strong>the</strong> Drilling et al. (2006) spectra have been plotted.<br />
The PCs are rotations in <strong>the</strong> data space <strong>of</strong> <strong>the</strong> original axes, <strong>the</strong>refore <strong>the</strong>y resemble<br />
spectra, and have <strong>the</strong> same number <strong>of</strong> elements as <strong>the</strong> original spectra. It can be clearly<br />
seen that <strong>the</strong> first PC differentiates between hydrogen and helium lines. This reification<br />
makes sense as it is <strong>the</strong>se features which vary most across <strong>the</strong> Drilling et al. (2006) data<br />
set. The second PC also clearly differentiates between HeI and HeII line series. For <strong>the</strong>
4.1 Constructing A PCA-Based Filter 89<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
4100 4500 4900<br />
4100 4500 4900<br />
4100 4500 4900<br />
4100 4500 4900<br />
4100 4500 4900<br />
PC 4<br />
PC 3<br />
PC 2<br />
PC 1<br />
PC 0<br />
Figure 4.3: First five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
90 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
0.15<br />
0.0<br />
-0.15<br />
4100 4500 4900<br />
4100 4500 4900<br />
4100 4500 4900<br />
4100 4500 4900<br />
4100 4500 4900<br />
PC 9<br />
PC 8<br />
PC 7<br />
PC 6<br />
PC 5<br />
Figure 4.4: Second five PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006) sample.
4.1 Constructing A PCA-Based Filter 91<br />
100<br />
99<br />
Cumulative Percentage <strong>of</strong> Total Variance<br />
98<br />
97<br />
96<br />
95<br />
94<br />
0 1 2 3 4 5 6 7 8 9<br />
Principal Component<br />
Figure 4.5: Cumulative variance <strong>of</strong> <strong>the</strong> first ten PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006)<br />
sample.<br />
remaining PCs, it becomes harder to attach any meaningful interpretation.<br />
The question remains as to how many principal components should be retained in<br />
order to form an adequate representation <strong>of</strong> <strong>the</strong> Drilling et al. (2006) standard stars.<br />
Figure 4.5 shows <strong>the</strong> cumulative percentage variance accounted for by <strong>the</strong> first ten<br />
principal components.<br />
The first principal component itself accounts for 94.66% <strong>of</strong> <strong>the</strong> total variance, which<br />
is not surprising given <strong>the</strong> reification outlined previously. All ten PCs account for<br />
99.83% <strong>of</strong> <strong>the</strong> variance, however 99.30% is described by <strong>the</strong> first four PCs, making<br />
<strong>the</strong>m sufficiently adequate to give a compressed representation <strong>of</strong> <strong>the</strong> Drilling et al.<br />
(2006) hot standards.<br />
It should be noted that this selection criterion <strong>of</strong> maximal variance may unwisely<br />
discard <strong>the</strong> less significant PCs. Lahav et al. (1996) point out that, in <strong>the</strong> role <strong>of</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
92 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
classification <strong>of</strong> galaxy spectra, <strong>the</strong> fractional variance on its own was not sufficient to<br />
determine how many PCs were needed for classification. The reason for this may be<br />
due to non-linearity in <strong>the</strong> data (a spectrum is not a linear combination <strong>of</strong> line features,<br />
and <strong>the</strong> lines do not separate into different principal components), <strong>the</strong> effect <strong>of</strong> noise<br />
on <strong>the</strong> deduction <strong>of</strong> <strong>the</strong> PCs, or <strong>the</strong> fact that classification requires more information<br />
than that given simply by <strong>the</strong> maximal variance.<br />
In <strong>the</strong> application <strong>of</strong> PCA here to <strong>the</strong> filtering <strong>of</strong> stellar spectra, only an adequate<br />
representation <strong>of</strong> a data set is sought through PCA, and not an adequate discrimination<br />
between classes within a data set. As such, <strong>the</strong> criterion <strong>of</strong> maximal variance remains<br />
valid.<br />
Now, let <strong>the</strong> matrix E T = (u 1 ,u 2 ,u 3 ,u 4 ) contain <strong>the</strong> first four principal components<br />
<strong>of</strong> <strong>the</strong> Drilling et al. (2006) hot standards. To determine <strong>the</strong> similarity <strong>of</strong> some unknown<br />
spectrum y = (y 1 ,y 2 ,y 3 ,...,y N ) to <strong>the</strong> Drilling et al. (2006) standards, first, <strong>the</strong> vector,<br />
p, is constructed which is <strong>the</strong> magnitudes <strong>of</strong> <strong>the</strong> projection <strong>of</strong> y onto each <strong>of</strong> <strong>the</strong> four<br />
principal components in E,<br />
p = ∆y · E, (4.10)<br />
where ∆y is <strong>the</strong> mean subtracted difference spectrum <strong>of</strong> y (i.e., ∆y = y −x, where<br />
x is <strong>the</strong> mean spectrum <strong>of</strong> <strong>the</strong> Drilling et al. (2006)).<br />
The reduced reconstruction <strong>of</strong> y, y r , is <strong>the</strong>n given by<br />
y r = x + p · E T . (4.11)<br />
Figure 4.6 shows <strong>the</strong> results <strong>of</strong> projecting two hot subdwarf spectra onto <strong>the</strong> first<br />
four principal components <strong>of</strong> <strong>the</strong> Drilling et al. (2006) hot standards.<br />
At <strong>the</strong> top, spectrum A is a relatively good S/N observation <strong>of</strong> a cooler subdwarf.
4.1 Constructing A PCA-Based Filter 93<br />
1.5<br />
1.0<br />
A 1.89970<br />
Original Spectrum<br />
Reduced Reconstruction<br />
0.5<br />
0.0<br />
4100 4500 4900<br />
1.5<br />
1.0<br />
B 6.22063<br />
Original Spectrum<br />
Reduced Reconstruction<br />
0.5<br />
0.0<br />
4100 4500 4900<br />
Figure 4.6: Illustration <strong>of</strong> projecting hot subdwarf spectra onto <strong>the</strong> first four PCs <strong>of</strong><br />
<strong>the</strong> Drilling et al. (2006) standards.<br />
The original spectrum is plotted in red, and its reduced reconstruction in blue.<br />
Spectrum B shows a hotter subdwarf with a lower S/N observation. Again, <strong>the</strong><br />
original spectrum is plotted in red, with <strong>the</strong> reduced reconstruction in blue.<br />
Spectrum A compares well with its reduced reconstruction, <strong>the</strong> latter showing very<br />
little difference to <strong>the</strong> original. However, spectrum B is noiser, and its reduced reconstruction<br />
matches <strong>the</strong> spectrum well but for <strong>the</strong> noise (here, <strong>the</strong> noise-filtering capabilites<br />
<strong>of</strong> PCA can be observed).<br />
Certainly, spectrum A, if encountered in a large set <strong>of</strong> unknown spectra, would be<br />
desirable to <strong>the</strong> astronomer, whereas spectrum B could be considered too noisy for any<br />
fur<strong>the</strong>r analysis. Thus, when filtering through a large set <strong>of</strong> unknown spectra, those<br />
spectra which compare well with <strong>the</strong>ir reduced reconstructions will be <strong>of</strong> most interest<br />
to <strong>the</strong> hot subdwarf astronomer.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
94 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
A suitable quantitative measure for this comparison is <strong>the</strong> reconstruction error<br />
R = 100 × √ 1 N<br />
i=N<br />
∑<br />
(y i − y r,i ) 2 , (4.12)<br />
i=1<br />
where y i is <strong>the</strong> i th flux bin <strong>of</strong> <strong>the</strong> original spectrum, y, and y r,i is <strong>the</strong> i th flux bin<br />
<strong>of</strong> <strong>the</strong> reduced reconstruction <strong>of</strong> y, y r . This error metric gives <strong>the</strong> RMS difference<br />
in each flux bin between <strong>the</strong> original spectrum and its reconstruction. The factor <strong>of</strong><br />
100 is simply a scaling factor to make <strong>the</strong> final error values easier to work with (it is<br />
anticipated that <strong>the</strong> majority <strong>of</strong> values for R will lie in <strong>the</strong> range 0 ≤ R ≤ 1).<br />
The reconstruction errors for each spectrum in Figure 4.6 are shown in <strong>the</strong> top left<br />
region <strong>of</strong> each plot.<br />
How “well” a spectrum should compare in this manner with its reduced reconstruction<br />
is a subjective measure dependent on <strong>the</strong> type <strong>of</strong> object an astronomer is filtering<br />
for, and what fur<strong>the</strong>r analysis he has in mind. In <strong>the</strong> hot subdwarf case, for classification<br />
purposes, a spectrum such as B in Figure 4.6 may mark <strong>the</strong> lower threshold <strong>of</strong> <strong>the</strong><br />
reconstruction errors that are to be accepted. However, if <strong>the</strong> derivation <strong>of</strong> physical<br />
parameters is <strong>the</strong> goal, <strong>the</strong>n reconstruction errors close to that <strong>of</strong> spectrum A, but not<br />
as low as that <strong>of</strong> B, may be desired.<br />
As mentioned in <strong>the</strong> introduction to this chapter, PCA is a data-driven tool, with<br />
<strong>the</strong> principal components derived for one data set being unique to those data. As such,<br />
if, say, a galaxy spectrum is reconstructed using <strong>the</strong> PCs <strong>of</strong> <strong>the</strong> Drilling et al. (2006)<br />
standards, its reconstruction error will be very high as it won’t have many (if any)<br />
features in common with hot subdwarfs. The same is true for noisy, or incomplete<br />
spectra, making <strong>the</strong>m easy to filter out.
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 95<br />
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs<br />
The PCA hot subdwarf filter was applied to a sample <strong>of</strong> 4610 spectra obtained from<br />
<strong>the</strong> Sloan Digital Sky Survey, Data Release 3 database. The selection criteria used to<br />
obtain <strong>the</strong> sample from <strong>the</strong> SDSS are outlined in <strong>the</strong> following SQL query,<br />
SELECT s.plate, s.mjd,s.fiberid<br />
FROM BESTDR3..SpecPhotoAll as s<br />
WHERE s.specClass = dbo.fSpecClass(’STAR’)<br />
AND (s.primTarget & (dbo.fPrimTarget(’TARGET_STAR_BHB’)<br />
+ dbo.fPrimTarget(’TARGET_STAR_SUB_DWARF’)) > 0)<br />
AND (s.objType = 2)<br />
The criteria naively rely upon <strong>the</strong> classifications automatically assigned by <strong>the</strong> SDSS<br />
spectrophotometic pipeline.<br />
The SDSS supplies spectra in FITS format with each FITS file including a calibrated<br />
spectrum, a normalised spectrum, and all measured parameters (redshift, line fits, line<br />
indices, per-pixel resolution, etc.) stored in <strong>the</strong> FITS header.<br />
For convenience, <strong>the</strong> normalised spectra were extracted from <strong>the</strong> FITS files, and<br />
subsequently velocity corrected using <strong>the</strong> redshift stored in each FITS header. The<br />
spectra were <strong>the</strong>n rebinned onto <strong>the</strong> common wavelength grid <strong>of</strong> 4050–4950Å at a<br />
dispersion <strong>of</strong> 1Å pixel−1 to match <strong>the</strong> Drilling et al. (2006) spectra.<br />
The PCA filter was applied using equations 4.10 and 4.11, outlined in <strong>the</strong> previous<br />
section, to construct <strong>the</strong> set <strong>of</strong> reduced reconstructions. The reconstruction errors were<br />
<strong>the</strong>n calculated as per equation 4.12.<br />
The distribution <strong>of</strong> <strong>the</strong> reconstruction errors is displayed in Figure 4.7.<br />
The histogram shows that most <strong>of</strong> <strong>the</strong> spectra in <strong>the</strong> SDSS sample are concentrated<br />
in <strong>the</strong> region R ≤∼ 4.0. The contents <strong>of</strong> <strong>the</strong> first three error bins (R ≤∼ 1.8) are<br />
shown in Figures 4.8 and 4.9. Clearly, <strong>the</strong>se eight spectra are <strong>of</strong> a good S/N, strong<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
96 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
300<br />
250<br />
Number <strong>of</strong> <strong>Spectra</strong><br />
200<br />
150<br />
100<br />
50<br />
0<br />
15.00<br />
Reconstruction Error - R<br />
Figure 4.7: Histogram <strong>of</strong> reconstructions errors from <strong>the</strong> SDSS data sample.<br />
subdwarf candidates, and well-suited to fur<strong>the</strong>r analysis.<br />
As <strong>the</strong> reconstruction error increases, <strong>the</strong> S/N <strong>of</strong> <strong>the</strong> spectra starts to decrease.<br />
Figure 4.10 shows four spectra sampled from <strong>the</strong> maximal error bin, R ∼ 3.0.<br />
They are slightly noiser spectra than those in Figures 4.8 and 4.9, but yet <strong>the</strong><br />
reconstructions are still a close match, meaning <strong>the</strong>y could still be suitable for fur<strong>the</strong>r<br />
analysis.<br />
By around R ≈ 4.5, <strong>the</strong> reconstruction quality is becoming noticably poorer, as<br />
demonstrated in Figure 4.11. Here, <strong>the</strong> S/N is becoming progressively lower, and<br />
objects with spectra quite unlike those <strong>of</strong> subdwarfs, such as white dwarfs, begin to<br />
make an appearance in <strong>the</strong> succeding error bins.<br />
<strong>On</strong>e interesting feature <strong>of</strong> note is <strong>the</strong> final error bin which contains all <strong>the</strong> SDSS<br />
spectra with reconstruction errors R > 15.0. It contains a large number <strong>of</strong> spectra in
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 97<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
A 1.52992<br />
J234137.25+000123.2<br />
4100 4500 4900<br />
B 1.66001<br />
J171531.67+271545.5<br />
4100 4500 4900<br />
C 1.62708<br />
J155612.59+022152.9<br />
4100 4500 4900<br />
D 1.74365<br />
J153701.88-011307.9<br />
4100 4500 4900<br />
Figure 4.8: <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
98 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
A 1.73210<br />
J152357.12+354009.4<br />
4100 4500 4900<br />
B 1.79120<br />
J151722.09+603546.3<br />
4100 4500 4900<br />
C 1.71810<br />
J125244.60-002512.9<br />
4100 4500 4900<br />
D 1.76950<br />
J112015.43+650003.2<br />
4100 4500 4900<br />
Figure 4.9: <strong>Spectra</strong> in first three reconstruction error histogram bins (R ≤ ∼ 3.0).
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 99<br />
comparision to <strong>the</strong> preceding error bins. Such high reconstruction errors are indicative<br />
<strong>of</strong> spectra with features poorly matched to typical subdwarf spectra. Figure 4.13 shows<br />
a sample <strong>of</strong> four spectra from this error bin.<br />
The first three spectra are dominated by noise, with spectrum B exhibiting an<br />
anomalous gap in <strong>the</strong> data at around<br />
4110Å. Spectrum D is incomplete, hence <strong>the</strong><br />
large reconstruction error.<br />
The PCA filter is effective at separating out <strong>the</strong> very low S/N exemplars, and incomplete<br />
spectra as shown in Figure 4.13. However, it does not magically separate out<br />
subdwarf candidates. Invariably, <strong>the</strong>y will be mixed in with stars that are very much<br />
spectroscopically similar to subdwarfs. An example <strong>of</strong> high S/N spectra that aren’t<br />
subdwarfs, and which are filtered out, is shown in Figure 4.12.<br />
The SDSS sample used here was predominantly composed <strong>of</strong> cooler BHB and main<br />
sequence stars, with some white dwarfs. Thus, any subdwarf candidates were difficult<br />
for <strong>the</strong> filter to extract from amidst <strong>the</strong> spectroscopically similar cooler stars. This<br />
problem was due to <strong>the</strong> search criteria used in <strong>the</strong> initial SQL query, but it can be<br />
rectified by altering <strong>the</strong> database query to select by photometric colour which would<br />
exclude most <strong>of</strong> <strong>the</strong> cooler stars and subdwarf-main sequence binaries.<br />
The reconstruction error calculation described in equation 4.12 provides a description<br />
<strong>of</strong> <strong>the</strong> mean difference between an original spectrum and its reconstruction. As<br />
such, it served to rank <strong>the</strong> SDSS spectra mostly according to noise content. This<br />
meant that objects such as white dwarfs started to be found ranked alongside lower<br />
S/N subdwarf candidates/BHB stars with reconstruction errors <strong>of</strong> around R ≈ 7.0.<br />
This is not necessarily a problem per se because, by about R ≈ 5.0, any subdwarfs<br />
to be found are going to be dominated by noise levels that may not be conducive to<br />
useful fur<strong>the</strong>r analysis.<br />
Practically speaking, <strong>the</strong> PCA filter allows a value <strong>of</strong> R to be established beyond<br />
which any spectra can be safely discarded on <strong>the</strong> grounds that <strong>the</strong>y are not <strong>of</strong> sufficient<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Figure 4.10: Sample <strong>of</strong> spectra from <strong>the</strong> eighth error bin (R ∼ 3.0).<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
A 2.97285<br />
J023624.84-072238.1<br />
4100 4500 4900<br />
B 3.00098<br />
J001832.61+155540.1<br />
4100 4500 4900<br />
C 2.89146<br />
J224640.34-090631.8<br />
4100 4500 4900<br />
D 2.88735<br />
J145418.66-022346.1<br />
4100 4500 4900<br />
100 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 101<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
A 4.66907<br />
J001146.72+152147.5<br />
4100 4500 4900<br />
B 4.50669<br />
J165401.98+294801.7<br />
4100 4500 4900<br />
C 4.54534<br />
J074623.09+205546.7<br />
4100 4500 4900<br />
D 4.50072<br />
J113044.42+612111.7<br />
4100 4500 4900<br />
Figure 4.11: Sample <strong>of</strong> spectra from <strong>the</strong> fourteenth error bin (R ∼ 4.5).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
102 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
1.5<br />
1.0<br />
0.5<br />
4900<br />
4100<br />
4500<br />
4500<br />
4500<br />
4500<br />
J085128.17+060551.2<br />
1.5<br />
1.0<br />
J110651.79+625024.0<br />
0.5<br />
A 6.48477<br />
4900<br />
4100<br />
J092252.13+524446.4<br />
1.5<br />
1.0<br />
0.5<br />
4900<br />
B 6.94160<br />
J080051.56+223558.5<br />
4100<br />
4900<br />
1.5<br />
1.0<br />
0.5<br />
C 6.99275<br />
D 7.03080<br />
4100<br />
Figure 4.12: Sample <strong>of</strong> high S/N DA white dwarfs from <strong>the</strong> 22 nd − 24 th error bins<br />
(R ∼ 6.4 − 7.1)
4.2 Searching <strong>the</strong> SDSS for Hot Subdwarfs 103<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
A 15.15518<br />
J075647.73+232913.6<br />
4100 4500 4900<br />
B 20.11230<br />
J141509.80-021147.2<br />
4100 4500 4900<br />
C 38.54495<br />
J140804.49+011320.1<br />
4100 4500 4900<br />
D 66.91276<br />
J145616.92+024549.6<br />
4100 4500 4900<br />
Figure 4.13: Sample <strong>of</strong> spectra from <strong>the</strong> fifty-third error bin (R > 15.0).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
104 Chapter 4 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
S/N for whatever fur<strong>the</strong>r analysis <strong>the</strong> astronomer has in mind. This will also safely<br />
discard objects whose spectra are not sufficiently similar to <strong>the</strong> objects <strong>of</strong> interest.<br />
For <strong>the</strong> spectra that remain, a visual inspection is still necessary to separate out<br />
candidates <strong>of</strong> interest from objects for which <strong>the</strong> reconstruction error calculation is<br />
not sensitive enough to mark for removal. In <strong>the</strong> obtained SDSS sample, spectra<br />
with a reconstruction error <strong>of</strong> R < 5.0 were generally suitable for classification or<br />
parameterisation, however, as mentioned previously, any real subdwarf candidates in<br />
that sub-sample were mixed in with cooler BHB and main-sequence stars.<br />
4.3 Summary<br />
The concept <strong>of</strong> <strong>the</strong> PCA-based filtering tool presented here is certainly sound from <strong>the</strong><br />
point <strong>of</strong> necessity. In <strong>the</strong> construction <strong>of</strong> a filter for hot subdwarfs, and its application<br />
to search for such stars in <strong>the</strong> SDSS, it was discovered that <strong>the</strong> SDSS-assigned spectral<br />
classifications are not a useful criterion to include in an initial search.<br />
The data set obtained was composed <strong>of</strong> a large quantity <strong>of</strong> blue horizontal branch<br />
stars. As <strong>the</strong>y are spectroscopically very similar to hot subdwarfs, this made it difficult<br />
for <strong>the</strong> filter to provide a robust discrimination between <strong>the</strong> two object types.<br />
This point highlights <strong>the</strong> need to use appropriate and specific search criteria when<br />
extracting data from a very large survey database. In <strong>the</strong> case <strong>of</strong> hot subdwarfs and <strong>the</strong><br />
SDSS, a photometric colour-based search would allow cooler BHB stars to be avoided.<br />
Still, <strong>the</strong> PCA filter is not completely automated, and cannot be treated as a black<br />
box. A user must be aware <strong>of</strong> <strong>the</strong> correct manner <strong>of</strong> operation:<br />
1. The set <strong>of</strong> training data from which a filter is to be constructed must be preprocessed<br />
into an homogeneous form.<br />
2. Application data must be pre-processed to have <strong>the</strong> same properties as <strong>the</strong> train-
4.3 Summary 105<br />
ing set (i.e., wavelength range, dispersion, etc.).<br />
3. An acceptable reconstruction error threshold is a subjective decision that <strong>the</strong><br />
user must make. It can only be determined through examination <strong>of</strong> <strong>the</strong> filtering<br />
results, and prior experience.<br />
4. A visual inspection <strong>of</strong> data below <strong>the</strong> acceptable error threshold is still required<br />
to ensure <strong>the</strong> correct extraction <strong>of</strong> candidate objects from undesired but spectroscopically<br />
similar objects.<br />
The diversity <strong>of</strong> real-world data makes decisive filtering a very hard problem, but<br />
<strong>the</strong> PCA filter presented here is able to reduce <strong>the</strong> search space by at least an order <strong>of</strong><br />
magnitude, making <strong>the</strong> job <strong>of</strong> visual inspection a lot more tractable.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Chapter 5<br />
Application I - SDSS Hot<br />
Subdwarfs<br />
Having established a set <strong>of</strong> tools in Chapters 2 to 4 for data mining large sets <strong>of</strong> astronomical<br />
spectra, <strong>the</strong>y are now applied in unison to extract and analyse hot subdwarf<br />
candidates from <strong>the</strong> Sloan Digital Sky Survey.<br />
Firstly, a set <strong>of</strong> search criteria based on SDSS photometric colours is devised to<br />
obtain a data set which excludes most <strong>of</strong> <strong>the</strong> horizontal branch stars encountered in<br />
<strong>the</strong> previous chapter. This data set is <strong>the</strong>n filtered with <strong>the</strong> aid <strong>of</strong> <strong>the</strong> PCA filter, and<br />
pre-processed before being fed into <strong>the</strong> analysis pipeline for classification and parameterisation.<br />
5.1 Search Criteria And Data Sets<br />
After <strong>the</strong> work <strong>of</strong> Harris et al. (2003) and Kleinman et al. (2004) (based on <strong>the</strong> photometric<br />
simulations <strong>of</strong> Fan 1999), a search was made <strong>of</strong> <strong>the</strong> SDSS Data Release 3<br />
database using <strong>the</strong> following selection criteria <strong>of</strong> SDSS ugriz point spread function<br />
colour magnitudes,<br />
107
108 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
SELECT s.plate, s.mjd,s.fiberid<br />
FROM BESTDR3..SpecPhotoAll as s<br />
WHERE s.psfMag_u < 21<br />
AND (s.psfMag_u - s.psfMag_g) < 0.7<br />
AND (s.psfMag_g - s.psfMag_r) < -0.1<br />
AND s.specClass dbo.fSpecClass(’QSO’)<br />
For completeness, <strong>the</strong> spectra chosen by <strong>the</strong> SDSS as <strong>the</strong>ir hot standards were also<br />
retrieved using a separate query,<br />
5.1.<br />
SELECT s.plate, s.mjd,s.fiberid<br />
FROM BESTDR3..SpecObj as s<br />
WHERE s.objType = dbo.fObjType(’HOT_STD’)<br />
The total data quantities retrieved by <strong>the</strong>se two queries are summarised in Table<br />
Data Set <strong>Spectra</strong><br />
Retrieved<br />
Colour-Colour 6539<br />
Hot Standards 1411<br />
Total 7950<br />
(6764 Unique)<br />
Table 5.1: Summary <strong>of</strong> data quantities obtained from <strong>the</strong> SDSS DR3.<br />
5.2 PCA Filtering<br />
The PCA filter from Chapter 4 was applied to <strong>the</strong> 6764 unique spectra obtained from<br />
<strong>the</strong> SDSS. The SDSS-normalised spectrum was extracted from <strong>the</strong> each <strong>of</strong> <strong>the</strong> downloaded<br />
FITS files, and velocity corrected using <strong>the</strong> SDSS-derived redshift stored in<br />
each file’s FITS header. The histogram <strong>of</strong> reconstruction errors is plotted in Figure<br />
5.1.<br />
The large quantity <strong>of</strong> spectra located at <strong>the</strong> error bin R ≈ 2.46 are blank – <strong>the</strong><br />
normalised flux level is constant at 1.0 for all wavelengths. This is due to <strong>the</strong> rebinning
5.2 PCA Filtering 109<br />
500<br />
450<br />
400<br />
350<br />
Number <strong>of</strong> <strong>Spectra</strong><br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
35.00<br />
Reconstruction Error - R<br />
Figure 5.1: Histogram <strong>of</strong> reconstruction errors for <strong>the</strong> colour-colour selected SDSS<br />
sample.<br />
routine’s default behaviour <strong>of</strong> assigning a flux value <strong>of</strong> 1.0 to those wavelengths where<br />
no flux information is available for interpolation. In this case, <strong>the</strong> spectra in question<br />
seem to originally cover a lower wavelength range than <strong>the</strong> chosen 4050–4950 Å<br />
range.<br />
O<strong>the</strong>rwise, visual examination <strong>of</strong> <strong>the</strong> error bins reveals that all <strong>of</strong> <strong>the</strong> hot subdwarf<br />
candidates <strong>of</strong> reasonable S/N are located below a reconstruction error level <strong>of</strong> R ≤ 6.4,<br />
and are mixed in with many white dwarf and blue horizontal branch spectra which are<br />
hard to separate out because <strong>the</strong>y <strong>of</strong>ten show almost no spectral features which allow<br />
<strong>the</strong> PCA filter to clearly distinguish <strong>the</strong>m from hot subdwarf candidates. At R > 6.4,<br />
<strong>the</strong> error bins are almost entirely comprised <strong>of</strong> various types <strong>of</strong> white dwarfs, with only<br />
a few very low S/N hot subdwarf candidates.<br />
Selecting all those spectra with reconstruction errors R ≤ 6.4 yields 817 samples,<br />
approximately 400 <strong>of</strong> which are <strong>the</strong> “blank” spectra discussed previously. Removing<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
110 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>the</strong>m left a final set <strong>of</strong> 400 spectra which were manually processed to select <strong>the</strong> hot<br />
subdwarf candidates from amidst <strong>the</strong> white dwarfs. This proceeded quickly as white<br />
dwarf spectra are quite distinct.<br />
A final data set <strong>of</strong> 282 hot subdwarf candidates was obtained.<br />
5.3 <strong>Analysis</strong><br />
The SDSS-normalised spectra are created by fitting a pseudo-continuum using a median/mean<br />
filter. A sliding window is created <strong>of</strong> length 300 pixels for stars, and a<br />
set <strong>of</strong> reference lines are used to mask out major absorption features by excluding<br />
pixels closer than 8 pixels to any reference line. The remaining pixels are ordered,<br />
and <strong>the</strong> values between to 40th and 60th percentile are averaged to give <strong>the</strong> pseudocontinuum.<br />
However, this pseudo-continuum tends to underfit <strong>the</strong> real continuum for <strong>the</strong> higherorder<br />
Balmer lines, with blending between <strong>the</strong> broad wings pulling <strong>the</strong> pseudo-continuum<br />
down. Although <strong>the</strong> SDSS-normalised spectra are sufficient for <strong>the</strong> coarse filtering performed<br />
by <strong>the</strong> PCA filter, <strong>the</strong> underfitting associated with <strong>the</strong> pseudo-continuum makes<br />
<strong>the</strong>m unsuitable for use in classification or parameterisation.<br />
Instead, <strong>the</strong> SDSS-calibrated spectra <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates were renormalised<br />
using an automated method based on cubic spline fitting, after having been<br />
velocity corrected, again, using <strong>the</strong> SDSS redshifts. Each spectrum was <strong>the</strong>n resampled<br />
onto <strong>the</strong> common wavelength grid <strong>of</strong> 4050–4950 Å at a sampling <strong>of</strong> 1Å pixel−1 , ready<br />
for analysis by <strong>the</strong> classification neural network and SFIT.<br />
Physical parameters in T eff , log g, and log(n He /n H ) were derived by fitting each<br />
spectrum to a large grid <strong>of</strong> 2426 LTE model spectra generated using STERNE and<br />
SPECTRUM. Details <strong>of</strong> <strong>the</strong> grid are summarised in Table 5.2.
5.4 Results 111<br />
Parameter Values<br />
T eff (kK) 8.0, 9.0, 10.0, 12.0, 14.0, 15.0, 16.0, 18.0, 20.0, 22.0,<br />
24.0 25.0, 26.0, 28.0, 30.0, 32.0, 34.0, 35.0, 36.0, 38.0,<br />
40.0, 45.0, 50.0<br />
log g 2.50, 3.00, 3.50, 4.00, 4.50, 5.00, 5.50, 6.00<br />
n He 0.001, 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 0.99, 0.999<br />
Table 5.2: The model grid used to obtain physical parameters from <strong>the</strong> SDSS hot<br />
subdwarf candidates.<br />
.<br />
5.4 Results<br />
The results <strong>of</strong> both classification and parameterisation are presented in Figures 5.2-5.8,<br />
and tabulated in Appendix B.<br />
5.4.1 Parameterisation<br />
A number <strong>of</strong> interesting features are present in <strong>the</strong> diagrams <strong>of</strong> Figure 5.2. Most<br />
prominent in <strong>the</strong> log g–T eff plot is <strong>the</strong> low density region centred at T eff ≈ 22,500K.<br />
Figure 5.4 overlays Figure 5.2 with density estimate contours which better illustrate<br />
<strong>the</strong> presence <strong>of</strong> <strong>the</strong> gap.<br />
This low density region appears to separate <strong>the</strong> blue horizontal branch stars from<br />
<strong>the</strong> extended horizontal branch. However, it occurs at <strong>the</strong> same position as <strong>the</strong> zero-age<br />
main sequence, so could it be <strong>the</strong> result <strong>of</strong> selection effects? The answer is probably no<br />
because an early B-type main sequence star with an apparent magnitude <strong>of</strong> m v = 15,<br />
similar to <strong>the</strong> stars in <strong>the</strong> hot subdwarf sample, and an absolute magnitude <strong>of</strong> M V =<br />
−2.4, would be located ∼ 30kpc away out <strong>of</strong> <strong>the</strong> plane <strong>of</strong> <strong>the</strong> galaxy. The existence <strong>of</strong><br />
such a star at this position is unlikely.<br />
The same low density region was also observed by Green et al. (2006) and Saffer<br />
et al. (1994), and corresponds with <strong>the</strong> second gap indentified in observations <strong>of</strong> blue<br />
halo stars by Newell (1973).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
112 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
2<br />
3<br />
4<br />
log g<br />
ZAMS<br />
5<br />
ZAHB<br />
6<br />
He-MS<br />
7<br />
50000<br />
40000<br />
30000<br />
Effective Temperature (K)<br />
20000<br />
10000<br />
3<br />
2<br />
1<br />
log( nHe / nH )<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
50000<br />
40000<br />
30000<br />
Effective Temperature (K)<br />
20000<br />
10000<br />
Figure 5.2: Parameterisation results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. The<br />
helium main sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary tracks <strong>of</strong> Dorman<br />
et al. (1993) are also plotted.
5.4 Results 113<br />
1.0<br />
0.8<br />
0.6<br />
sdO4VII:He26 50654, 6.001, -0.913<br />
1.0<br />
0.8<br />
0.6<br />
sdB1VI:He29 34502, 5.581, -0.568<br />
1.0<br />
0.5<br />
0.0<br />
sdB3VI:He2 25219, 5.303, -2.769<br />
1.0<br />
0.5<br />
sdB7III:He2 12653, 3.342, -3.004<br />
0.0<br />
4000 4200 4400 4600 4800 5000<br />
Wavelength (Angstroms)<br />
Figure 5.3: Four example fits from <strong>the</strong> 282 SDSS hot subdwarfs. The classification<br />
and physical parameters (T eff (K), log g, log(n He /n H )) obtained for each star are printed<br />
in <strong>the</strong> lower corners <strong>of</strong> each plot.<br />
Heber et al. (1984) and Newell (1973) propose evolutionary explanations for this<br />
gap based on variations in hydrogen envelope mass along <strong>the</strong> horizontal branch, but<br />
this was before <strong>the</strong> discovery that possibly 2/3 <strong>of</strong> <strong>the</strong> sdB stars blueward <strong>of</strong> <strong>the</strong> gap<br />
are short-period binaries (Maxted et al., 2001) (and <strong>the</strong>refore products <strong>of</strong> <strong>the</strong> common<br />
envelope binary evolutionary channel).<br />
Monte Carlo simulations <strong>of</strong> single star evolution on <strong>the</strong> extended horizontal branch,<br />
carried out at St. Andrews (Jeffery & Jardine 1984, unpublished), did not reveal<br />
<strong>the</strong> existence <strong>of</strong> such a gap. It is <strong>the</strong>refore our hypo<strong>the</strong>sis that <strong>the</strong> second gap <strong>of</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
114 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
2<br />
3<br />
4<br />
log g<br />
ZAMS<br />
5<br />
ZAHB<br />
6<br />
He-MS<br />
7<br />
50000<br />
40000<br />
30000<br />
Effective Temperature (K)<br />
20000<br />
10000<br />
3<br />
2<br />
1<br />
log( nHe / nH )<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
50000<br />
40000<br />
30000<br />
Effective Temperature (K)<br />
20000<br />
10000<br />
Figure 5.4: The results <strong>of</strong> applying a kernel density estimate analysis to <strong>the</strong> data<br />
from Figure 5.2. The low-density at T eff ≈ 22,500K is prominent, along with ano<strong>the</strong>r<br />
possible low-density region at T eff ≈ 41,000K.
5.4 Results 115<br />
Newell (1973) reflects differing evolutionary scenarios for blue horizontal branch stars<br />
and extended horizontal branch stars, primarily that subdwarf B stars result from<br />
common-envelope binary evolution.<br />
In <strong>the</strong> single star evolution hypo<strong>the</strong>sis, a strong stellar wind on <strong>the</strong> RGB is believed<br />
to occur, but which fails to remove <strong>the</strong> entire outer hydrogen envelope before <strong>the</strong> helium<br />
core flash takes place. After <strong>the</strong> helium flash, a star evolves to <strong>the</strong> horizontal branch.<br />
The distribution <strong>of</strong> stellar masses along <strong>the</strong> horizontal branch must be continuous because<br />
evolutionary models do not predict gaps if <strong>the</strong> factors affecting mass loss in single<br />
stars (e.g., metallicity, rotation rate, magnetic field strength, etc.) are not discrete.<br />
In <strong>the</strong> binary star evolution scenario, most <strong>of</strong> <strong>the</strong> hydrogen-rich envelope is removed<br />
(ei<strong>the</strong>r by Roche Lobe overflow, or by a common envelope phase) at <strong>the</strong> tip <strong>of</strong> <strong>the</strong> RGB,<br />
meaning that evolution proceeds to <strong>the</strong> blue end <strong>of</strong> <strong>the</strong> horizontal branch. The distribution<br />
<strong>of</strong> post-common envelope binaries is not continuous because a partial removal<br />
<strong>of</strong> <strong>the</strong> hydrogen envelope does not occur.<br />
The second feature <strong>of</strong> interest in Figure 5.2 is <strong>the</strong> cluster <strong>of</strong> stars at T eff ≈ 44,000K,<br />
log g = 5.7. The clump is also noticable in <strong>the</strong> log(n He /n H )–T eff plot in Figure 5.2 as<br />
<strong>the</strong> group <strong>of</strong> extremely helium rich stars at log(n He /n H ) ≈ 1.2. Heber et al. (2006), in<br />
a spectral analysis <strong>of</strong> sdO stars selected from <strong>the</strong> Supernova Ia Progenitor Survey, <strong>the</strong><br />
Hamburg Quasar Survey, and <strong>the</strong> SDSS, show a similar clustering at <strong>the</strong> same location<br />
on <strong>the</strong>ir log g–T eff diagram.<br />
The log(n He /n H )–T eff diagram in Figure 5.2 shows that <strong>the</strong> majority <strong>of</strong> <strong>the</strong> stars in<br />
<strong>the</strong> sample have helium deficient atmospheres (less than 0.5 times <strong>the</strong> solar abundance).<br />
This has been attributed to diffusion and gravitational settling processes at work in<br />
<strong>the</strong> extended horizontal branch stars (Wesemael et al., 1982).<br />
For 28,000K ≤ T eff ≤ 40,000K, a correlation between helium abundance and T eff<br />
can be seen, with <strong>the</strong> helium abundance increasing with temperature. The same phenomenon<br />
was reported by Edelmann et al. (2003) in <strong>the</strong>ir analysis <strong>of</strong> sdBs from <strong>the</strong><br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
116 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Hamburg Quasar Survey, and Saffer et al. (1994) in a study <strong>of</strong> 92 field sdBs drawn<br />
largely from <strong>the</strong> PG catalogue. Both studies also report <strong>the</strong> existence <strong>of</strong> two sequences<br />
in <strong>the</strong> correlation, with a smaller fraction <strong>of</strong> stars having lower helium abundances at<br />
<strong>the</strong> same temperatures than <strong>the</strong> bulk <strong>of</strong> <strong>the</strong> sdBs. There is evidence to suggest <strong>the</strong><br />
existence <strong>of</strong> <strong>the</strong>se two sequences in Figure 5.2. Heber et al. (2006) also expand on this<br />
phenomenon by showing that <strong>the</strong> “cooler” sdO stars in <strong>the</strong>ir sample adhere to two<br />
distinct sequences, and extend <strong>the</strong> trend to higher T eff .<br />
The band <strong>of</strong> stars evident at log(n He /n H ) = −3 corresponds to <strong>the</strong> boundary <strong>of</strong> <strong>the</strong><br />
model grid used in <strong>the</strong> analysis.<br />
5.4.2 Classification<br />
The neural network classification results <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates are shown<br />
in Figure 5.5. Although <strong>the</strong> neural network gives real-value outputs for each classification<br />
parameter, <strong>the</strong>se have been rounded to <strong>the</strong>ir closest value on <strong>the</strong> discrete Drilling<br />
et al. (2006) system to reflect how a human classifier would use <strong>the</strong> system.<br />
A correlation can be seen between luminosity class and spectral type, with luminosity<br />
decreasing as spectral type progresses from O to A. As <strong>the</strong> physical analogues<br />
to luminosity and spectral type are log g and T eff respectively, this trend mirrors that<br />
found in <strong>the</strong> log g–T eff plot <strong>of</strong> Figure 5.2.<br />
From <strong>the</strong> plot <strong>of</strong> helium class against spectral type, it can be seen that <strong>the</strong> stars<br />
in <strong>the</strong> sample are ei<strong>the</strong>r helium poor or helium rich. There is a group <strong>of</strong> early-type<br />
sdBs showing a higher helium class than <strong>the</strong> bulk <strong>of</strong> such stars at <strong>the</strong> same spectral<br />
type. These are most likely <strong>the</strong> interesting subset hot subdwarf stars known as He-sdBs<br />
(Jeffery et al., 1996; Ahmad, 2004).<br />
Figure 5.6 gives a comparison <strong>of</strong> <strong>the</strong> neural network classification results with <strong>the</strong><br />
distribution <strong>of</strong> stars originally classified by Drilling et al. (2006) in <strong>the</strong>ir paper. The
5.4 Results 117<br />
0<br />
I<br />
II<br />
Luminosity Class<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
VIII<br />
IX<br />
O<br />
O5<br />
B<br />
B5<br />
A<br />
<strong>Spectra</strong>l Type<br />
40<br />
30<br />
Helium Class<br />
20<br />
10<br />
0<br />
O<br />
O5<br />
B<br />
B5<br />
A<br />
<strong>Spectra</strong>l Type<br />
Figure 5.5: Classification results <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates. Points<br />
have been given small random <strong>of</strong>fsets in each axis for clarity.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
118 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
40<br />
A<br />
30<br />
B5<br />
20<br />
10<br />
0<br />
A<br />
O<br />
O5 B B5<br />
<strong>Spectra</strong>l Type<br />
A<br />
40<br />
30<br />
20<br />
10<br />
0<br />
B5<br />
B<br />
<strong>Spectra</strong>l Type<br />
O<br />
O5<br />
Helium Class<br />
Helium Class<br />
0<br />
I<br />
II<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
VIII<br />
IX<br />
O<br />
O5 B B5<br />
<strong>Spectra</strong>l Type<br />
A<br />
0<br />
I<br />
II<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
VIII<br />
IX<br />
B<br />
O<br />
O5<br />
<strong>Spectra</strong>l Type<br />
Luminosity Class<br />
Luminosity Class<br />
Figure 5.6: A comparison <strong>of</strong> <strong>the</strong> ANN classifications <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf<br />
candidates (left-most plots) with all <strong>the</strong> stars classified by Drilling et al. (2006) (rightmost<br />
plots). Points have been given small random <strong>of</strong>fsets in each axis for clarity.
5.4 Results 119<br />
50000<br />
Effective Temperature (K)<br />
40000<br />
30000<br />
20000<br />
10000<br />
O<br />
O5<br />
B<br />
<strong>Spectra</strong>l Type<br />
B5<br />
A<br />
7<br />
6<br />
5<br />
log g<br />
4<br />
3<br />
2<br />
0<br />
I<br />
II<br />
III<br />
IV V VI<br />
Luminosity Class<br />
VII<br />
VIII<br />
IX<br />
3<br />
2<br />
log( nHe / nH )<br />
1<br />
0<br />
-1<br />
-2<br />
-3<br />
0<br />
10<br />
20<br />
Helium Class<br />
30<br />
40<br />
Figure 5.7: A calibration <strong>of</strong> <strong>the</strong> ANN classifications onto <strong>the</strong> Drilling et al. (2006)<br />
system using <strong>the</strong> 282 SDSS hot subdwarf candidates.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
120 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
trends in <strong>the</strong> two distributions are similar if one takes into account <strong>the</strong> differing sample<br />
sizes.<br />
<strong>On</strong>e feature <strong>of</strong> interest in <strong>the</strong> luminosity class–spectral type plot <strong>of</strong> <strong>the</strong> Drilling<br />
et al. (2006) data is <strong>the</strong> group <strong>of</strong> high-luminosity B-type giant stars. These correspond<br />
with a group <strong>of</strong> MK stars used by Drilling et al. (2006) to interface <strong>the</strong>ir hot subdwarf<br />
classification system with <strong>the</strong> MK system. In <strong>the</strong> corresponding plot for <strong>the</strong> 282 SDSS<br />
hot subdwarfs studied here, no such low luminosity class B-type stars are contained in<br />
<strong>the</strong> sample.<br />
A third-order calibration <strong>of</strong> <strong>the</strong> Drilling et al. (2006) classification system is shown<br />
in Figure 5.7 (i.e., <strong>the</strong> Drilling et al. (2006) parameters are being correlated to <strong>the</strong>ir<br />
corresponding physical parameters using a sample <strong>of</strong> spectra that is not comprised <strong>of</strong><br />
<strong>the</strong> original standard stars, and has not been classified by Drilling et al. or any o<strong>the</strong>r<br />
human trained to use <strong>the</strong> Drilling et al. (2006) scale).<br />
Although a linear correlation can be discerned between T eff vs. spectral type, and<br />
log(n He /n H ) vs. helium class, <strong>the</strong> correlations are quite poor. This could be due to<br />
systematic noise introduced during <strong>the</strong> renormalisation <strong>of</strong> <strong>the</strong> SDSS data, and may<br />
also signify that <strong>the</strong> neural network is having difficulty interpolating in regions not<br />
well represented by <strong>the</strong> original Drilling et al. (2006) training data (Figure 2.1 shows<br />
two low-density regions around spectral types O5 and B5, which is where <strong>the</strong> most<br />
“confusion” is seen in <strong>the</strong> correlation <strong>of</strong> Figure 5.7).<br />
Despite <strong>the</strong> noise, <strong>the</strong> log(n He /n H ) vs. helium class plot still follows <strong>the</strong> trend <strong>of</strong><br />
Figure 14 <strong>of</strong> Drilling et al. (2006).<br />
Between log g and luminosity class, no significant correlation can be seen. This is<br />
due to <strong>the</strong> majority <strong>of</strong> subdwarfs residing in <strong>the</strong> luminosity classes VI and VII, and<br />
between log g values <strong>of</strong> 5.0 and 6.0. The seemingly bi-modal distribution <strong>of</strong> this plot<br />
corresponds to <strong>the</strong> separation between <strong>the</strong> lower-T eff , lower-log g BHB stars in <strong>the</strong> SDSS<br />
sample, and <strong>the</strong> higher-T eff , higher-log g subdwarfs. It is impossible to constrain any
5.4 Results 121<br />
25<br />
20<br />
Stars Per Bin<br />
15<br />
10<br />
5<br />
0<br />
-600 -400 -200 0 200 400 600<br />
Redshift (Km s -1 )<br />
Figure 5.8: The distribution <strong>of</strong> SDSS-derived redshifts <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates.<br />
linear fit to <strong>the</strong> distribution due to <strong>the</strong> under-representation <strong>of</strong> <strong>the</strong> lower-log g, higher<br />
luminosity class region. The concentration <strong>of</strong> points in luminosity classes VI and VII<br />
reflect a similar pattern observed in Figure 15 <strong>of</strong> Drilling et al. (2006).<br />
5.4.3 Radial Velocities<br />
As an interesing aside, <strong>the</strong> radial velocities <strong>of</strong> <strong>the</strong> 282 hot subdwarf candidates, as<br />
measured by <strong>the</strong> SDSS, are plotted in Figure 5.8. The errors in <strong>the</strong> radial velocities are<br />
<strong>of</strong> <strong>the</strong> order <strong>of</strong> 30kms −1 . Several studies <strong>of</strong> <strong>the</strong> kinematical behaviour <strong>of</strong> hot subdwarfs<br />
have been conducted in <strong>the</strong> past, e.g., Altmann et al. (2004), Maxted et al. (2001), de<br />
Boer et al. (1997), Colin et al. (1994).<br />
Altmann et al. (2004) point out that short-period sdB binaries could exhibit orbital<br />
velocities in excess <strong>of</strong> 200kms −1 , but with most being <strong>of</strong> <strong>the</strong> order <strong>of</strong> 50kms −1 or less.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
122 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Based on <strong>the</strong> parameterisation and classification results <strong>of</strong> <strong>the</strong> hot subdwarf sample<br />
studied here, it is clear that <strong>the</strong> majority <strong>of</strong> <strong>the</strong> sample are sdBs, and, consequently,<br />
possibly short-period binaries (see also Maxted et al. 2001).<br />
As <strong>the</strong> SDSS observes out <strong>of</strong> <strong>the</strong> galactic plane, most <strong>of</strong> <strong>the</strong> hot subdwarf candidates<br />
will be ei<strong>the</strong>r thick disk, or halo objects with greater radial velocities due to <strong>the</strong>ir orbits<br />
not conforming with <strong>the</strong> local standard <strong>of</strong> rest (see Altmann et al. 2004). There are<br />
a few objects in <strong>the</strong> hot subdwarf sample with velocities cz > ±400kms −1 . Although<br />
<strong>the</strong>se velocities are unverified and could be anomalous, <strong>the</strong>y are greater than what can<br />
be accounted for by <strong>the</strong> previously outlined mechanisms. As such, <strong>the</strong>y are <strong>of</strong> interest<br />
for fur<strong>the</strong>r study (e.g., Hirsch et al. 2005).<br />
5.5 Sources <strong>of</strong> Error<br />
The results <strong>of</strong> this chapter are affected by a number <strong>of</strong> error sources. The issues <strong>of</strong><br />
primary concern are systematic errors arising from <strong>the</strong> internal accuracy <strong>of</strong> <strong>the</strong> tools<br />
<strong>the</strong>mselves, whe<strong>the</strong>r <strong>the</strong> training data for <strong>the</strong> tools are representative <strong>of</strong> <strong>the</strong> application<br />
domain, <strong>the</strong> assumptions used in generating <strong>the</strong> model spectra, and random errors in<br />
<strong>the</strong> application spectra along with systematic errors introduced during <strong>the</strong> observation<br />
and reduction stage.<br />
In terms <strong>of</strong> <strong>the</strong> physical parameters derived using SFIT, SFIT produces standard<br />
errors for each parameter it fits based on <strong>the</strong> curvature <strong>of</strong> <strong>the</strong> χ 2 function in <strong>the</strong> region<br />
<strong>of</strong> parameter space about <strong>the</strong> located minimum. These errors give an indication <strong>of</strong><br />
<strong>the</strong> internal accuracy <strong>of</strong> <strong>the</strong> fittin method, with <strong>the</strong> χ 2 function giving an indication<br />
<strong>of</strong> <strong>the</strong> goodness-<strong>of</strong>-fit. At <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> grid, where <strong>the</strong> curvature is difficult<br />
to estimate, or in regions <strong>of</strong> low curvature, <strong>the</strong> standard errors may not be as useful a<br />
measure <strong>of</strong> SFIT’s internal uncertainty.<br />
A major error source is <strong>the</strong> grid <strong>of</strong> <strong>the</strong>oretical models to which observations are fit.<br />
Here, models have been used which assume a stellar atmosphere that is plane-parallel,
5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency 123<br />
and in local <strong>the</strong>rmal, radiative and hydrostatic equilibrium. Opacities are modelled<br />
using opacity distribution functions, which differs fundamentally from <strong>the</strong> methods<br />
used in stellar atmospheres that do not make <strong>the</strong> LTE assumption. It is known that<br />
<strong>the</strong> LTE approximation is good up to 40,000K, after which NLTE effects become more<br />
significant. There is also <strong>the</strong> question <strong>of</strong> whe<strong>the</strong>r or not <strong>the</strong> inclusion <strong>of</strong> physical effects,<br />
such as magnetic fields, is an important issue.<br />
Within SFIT itself, <strong>the</strong> assumption is made that changes in <strong>the</strong> physical parameters<br />
<strong>of</strong> a model have a corresponding linear effect on <strong>the</strong> flux distribution. It is known from<br />
<strong>the</strong>ory that changes in <strong>the</strong> physical parameters have a nonlinear effect on <strong>the</strong> flux<br />
distribution, but a trade-<strong>of</strong>f must be made between accuracy and efficiency, expecially<br />
in a data mining context.<br />
O<strong>the</strong>r sources <strong>of</strong> error, such as from <strong>the</strong> SDSS observation and reduction pipeline<br />
or <strong>the</strong> hot subdwarf classification standards obtained from Drilling et al. (2006), are<br />
difficult to quantify. For <strong>the</strong> same reason, discussion <strong>of</strong> errors arising from models is<br />
a complicated topic and beyond <strong>the</strong> scope <strong>of</strong> this <strong>the</strong>sis. However, see, for example,<br />
Behara & Jeffery (2006) for an investigation <strong>of</strong> <strong>the</strong> influence <strong>of</strong> improving <strong>the</strong> opacities<br />
used in <strong>the</strong> models.<br />
Never<strong>the</strong>less, <strong>the</strong> issue <strong>of</strong> <strong>the</strong> robustness <strong>of</strong> <strong>the</strong> results presented in this chapter<br />
(and also <strong>the</strong> conclusions which are drawn from <strong>the</strong> results) is very important, but<br />
quantifying <strong>the</strong> influence <strong>of</strong> all <strong>the</strong> possible error sources requires fur<strong>the</strong>r investigation.<br />
5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency<br />
Figure 5.9 gives some examples <strong>of</strong> <strong>the</strong> BHB and white dwarf contaminants mentioned<br />
earlier in <strong>the</strong> chapter. In cases B, C, and D, <strong>the</strong> differences between <strong>the</strong> original<br />
spectrum and its reconstruction are not sufficient to produce a reconstruction error<br />
greater than <strong>the</strong> chosen threshold <strong>of</strong> 6.4. In case A, <strong>the</strong> BHB star, <strong>the</strong> reconstruction<br />
matches <strong>the</strong> original spectrum very closely, except for a slight difference in Hδ. Physical<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
124 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
parameters obtained for this star using SFIT show that it is too cool to be a subdwarf<br />
(T eff = 12,000K,log g = 3.42,n He = 0.004).<br />
The simple RMS error calculation <strong>of</strong> Equation 4.12 yields <strong>the</strong> scaled RMS difference<br />
between each flux point <strong>of</strong> <strong>the</strong> original spectrum and its PCA reconstruction.<br />
Clearly, <strong>the</strong>n, for such small differences this error metric is not sensitive enough to<br />
filter out <strong>the</strong> BHB and white dwarf contaminants. This limitation <strong>of</strong> <strong>the</strong> PCA filter<br />
could be dimished by fur<strong>the</strong>r developing <strong>the</strong> reconstruction error calculation to include<br />
a weighting scheme that gives more significance to <strong>the</strong> spectral lines and features commonly<br />
found in <strong>the</strong> objects under investigation. A disadvantage to this approach is that<br />
a <strong>the</strong> weighting scheme must be crafted and optimised manually to suit <strong>the</strong> quirks <strong>of</strong><br />
<strong>the</strong> PCA filter and spectral features <strong>of</strong> <strong>the</strong> target objects. A more robust error metric<br />
that does not require user input is a topic for future work.<br />
Quantitative Estimation <strong>of</strong> Filter Efficiency<br />
To give an estimate <strong>of</strong> <strong>the</strong> success (and failure) <strong>of</strong> <strong>the</strong> PCA filter as deployed in this<br />
chapter, <strong>the</strong> word “success” needs to be more clearly defined.<br />
Based on <strong>the</strong> results plotted in Figure 5.2, <strong>the</strong> assumption can be made that<br />
most subdwarfs in <strong>the</strong> SDSS sample lie, with good probability, in a region T eff ≥<br />
23,000K,log g ≥ 4.7, as demonstrated in Figure 5.10.<br />
For any chosen value <strong>of</strong> R for <strong>the</strong> reconstruction error threshold, stars with a reconstruction<br />
error and parameters inside this region will be assumed to be true positives,<br />
i.e., actual subdwarfs that <strong>the</strong> filter has successfully separated out. False positives are<br />
those stars which are within <strong>the</strong> value <strong>of</strong> R but lie outside this region, i.e., stars which<br />
<strong>the</strong> filter should have excluded but didn’t. True negatives lie both outside <strong>the</strong> shaded<br />
region and beyond <strong>the</strong> threshold <strong>of</strong> R. And, finally, false negatives lie within <strong>the</strong> shaded<br />
region but are outside <strong>of</strong> <strong>the</strong> filter’s error threshold.
5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency 125<br />
1.5<br />
1.0<br />
0.5<br />
4900<br />
4100<br />
4500<br />
4500<br />
4500<br />
4500<br />
J220403.45+122507.3<br />
1.5<br />
1.0<br />
0.5<br />
1.5<br />
A 2.30372<br />
4900<br />
4100<br />
J213301.41+122831.1<br />
1.0<br />
0.5<br />
1.0<br />
0.9<br />
J135532.42+001124.0<br />
B 2.78036<br />
4900<br />
4100<br />
J101805.04+011123.5<br />
C 3.34386<br />
D 3.90771<br />
4100<br />
4900<br />
Figure 5.9: Examples <strong>of</strong> white dwarf and BHB contaminants. A - BHB star with<br />
deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines due to high<br />
surface gravity. C - DB white dwarf. D - Uncertain (some evidence <strong>of</strong> weak carbon<br />
absorption, so possibly a DQ white dwarf).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
126 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
2<br />
3<br />
4<br />
log g<br />
ZAMS<br />
5<br />
ZAHB<br />
He-MS<br />
6<br />
7<br />
50000<br />
40000<br />
30000<br />
Effective Temperature (K)<br />
20000<br />
10000<br />
Figure 5.10: This gray-shaded region <strong>of</strong> <strong>the</strong> log g–T eff plane represents an area <strong>of</strong> good<br />
probability that <strong>the</strong> stars within it are subdwarfs.<br />
Using <strong>the</strong>se definitions, <strong>the</strong> PCA filter’s efficiency can be quantitatively stated for<br />
any value <strong>of</strong> R. Of course, <strong>the</strong> assumption is that every star passing through <strong>the</strong> filter<br />
has values for T eff and log g. Estimates <strong>of</strong> <strong>the</strong>se parameters for <strong>the</strong> SDSS sample were<br />
obtained by applying SFIT to <strong>the</strong> whole data set.<br />
The quantitative measures used are <strong>the</strong> percentage rate <strong>of</strong> true positives (which<br />
measures how successful <strong>the</strong> PCA filter is, according to <strong>the</strong> aforementioned definition<br />
<strong>of</strong> “success”),<br />
TPRate =<br />
TP<br />
× 100% (5.1)<br />
TP + FN<br />
where TP is <strong>the</strong> number <strong>of</strong> true positives and FN <strong>the</strong> number <strong>of</strong> false negatives, and<br />
also <strong>the</strong> rate <strong>of</strong> false positives (which measures how <strong>of</strong>ten <strong>the</strong> filter fails),
5.6 <strong>Analysis</strong> <strong>of</strong> PCA Filter Efficiency 127<br />
100<br />
TP Rate<br />
FP Rate<br />
TP - FP<br />
80<br />
Percentage - %<br />
60<br />
40<br />
20<br />
0<br />
0 10 20 30 40 50<br />
Reconstruction Error - R<br />
Figure 5.11: TP rates (red) and FP rates (blue) <strong>of</strong> <strong>the</strong> PCA filter as a function <strong>of</strong> <strong>the</strong><br />
reconstruction error threshold, R. The green curve is <strong>the</strong> difference between <strong>the</strong> TP<br />
and FP rates.<br />
FPRate =<br />
FP<br />
× 100% (5.2)<br />
FP + TN<br />
where FP is <strong>the</strong> number <strong>of</strong> false positives, and TN is <strong>the</strong> number <strong>of</strong> true negatives.<br />
Figure 5.11 shows how <strong>the</strong> TP and FP rates vary as a function <strong>of</strong> R in <strong>the</strong> application<br />
to <strong>the</strong> SDSS data set. The rate <strong>of</strong> true positives increases rapidly until R ∼ 10 after<br />
which it begins to level <strong>of</strong>f. The percentage <strong>of</strong> false positives increases slowly until<br />
R ∼ 5.5. From this point until R ∼ 13 <strong>the</strong> filter begins to produce false positives at<br />
<strong>the</strong> maximum rate before starting to level <strong>of</strong>f. At R ∼ 28, <strong>the</strong> rate <strong>of</strong> false positives<br />
surpasses that <strong>of</strong> true positives meaning that <strong>the</strong> filter now fails more than it succeeds.<br />
An idea <strong>of</strong> <strong>the</strong> optimum value for R can be determined by plotting <strong>the</strong> difference<br />
between <strong>the</strong> rates <strong>of</strong> true positives and false positives for each R. This is <strong>the</strong> green curve<br />
in Figure 5.11. There is a noticeable and very definite peak. Figure 5.12 shows a close<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
128 Chapter 5 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
100<br />
TP Rate<br />
FP Rate<br />
TP - FP<br />
80<br />
Percentage - %<br />
60<br />
40<br />
20<br />
0<br />
0<br />
1<br />
2<br />
3<br />
4 5 6<br />
Reconstruction Error - R<br />
7<br />
8<br />
9<br />
10<br />
Figure 5.12: A closer examination <strong>of</strong> <strong>the</strong> TP and FP rates. The peak in <strong>the</strong> green<br />
TP-FP curve occurs at R ∼ 7.0 and signifies <strong>the</strong> optimum value for R in <strong>the</strong> SDSS<br />
sample.<br />
up view <strong>of</strong> <strong>the</strong> region <strong>of</strong> this peak, which occurs at R ∼ 7.0. At this error threshold,<br />
<strong>the</strong> PCA filter is producing <strong>the</strong> maximum number <strong>of</strong> true positives compared to false<br />
positives. In o<strong>the</strong>r words, this is <strong>the</strong> optimum value <strong>of</strong> R for this particular application.<br />
This compares favourably with <strong>the</strong> chosen reconstruction error threshold <strong>of</strong> R ≤ 6.4<br />
reported in section 5.2.<br />
It should be pointed out that <strong>the</strong>re does not seem to be a reliable method for<br />
determining <strong>the</strong> optimal threshold value <strong>of</strong> R for a filter and data set, a priori, without<br />
first establishing at least a rough estimate <strong>of</strong> physical or classification parameters. If<br />
<strong>the</strong> PCA filter (which is fast in its operation) was paired with a parameterising neural<br />
network or a fast nearest neighbour χ 2 fitting, <strong>the</strong>n an estimate <strong>of</strong> <strong>the</strong> optimal PCA<br />
error threshold could be obtained using <strong>the</strong> same method as above.
5.7 Summary 129<br />
5.7 Summary<br />
The tools developed in Chapters 2 to 4 have been deployed on a real-world data set<br />
with some interesting outcomes. The hot subdwarf candidates extracted from <strong>the</strong><br />
SDSS represent a completely homogeneous set, and <strong>the</strong>ir analysis evidences several<br />
unexplained phenomena:<br />
1. Existence <strong>of</strong> <strong>the</strong> second horizontal branch gap <strong>of</strong> Newell (1973) at T eff ≈ 22,500K.<br />
2. Two sdB n He –T eff sequences, also observed by Edelmann et al. (2003).<br />
3. A clustering <strong>of</strong> hot, helium rich sdO stars at T eff ≈ 44,000K, log g = 5.7, also<br />
observed by Heber et al. (2006).<br />
These results reiterate <strong>the</strong> challenge to provide evolutionary explanations for <strong>the</strong> variety<br />
<strong>of</strong> stars present on <strong>the</strong> extended horizontal branch, and <strong>the</strong> subsequent importance<br />
<strong>of</strong> continuing research into hot subdwarfs.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Chapter 6<br />
Application II - O<strong>the</strong>r Data Sets<br />
The work presented in this chapter details <strong>the</strong> application <strong>of</strong> <strong>the</strong> analysis pipeline to<br />
three smaller data sets obtained in collaboration with o<strong>the</strong>rs in <strong>the</strong> field. This reflects<br />
<strong>the</strong> situation described in Chapter 1 regarding <strong>the</strong> heterogenous data sets amassed<br />
by various ground-based observatories. When data from <strong>the</strong>se observatories are made<br />
available, robust tools will be needed to process <strong>the</strong>m into a homogeneous form, and<br />
provide fast analyses.<br />
6.1 2MASS-Selected Sample<br />
A preliminary analysis <strong>of</strong> <strong>the</strong> 282 SDSS hot subdwarf candidates in <strong>the</strong> previous chapter<br />
was presented at <strong>the</strong> Second Meeting on Hot Subdwarfs and Related Objects in La<br />
Palma, June 2005. As a result <strong>of</strong> this conference, E. M. Green provided <strong>the</strong> author<br />
with a sample <strong>of</strong> high S/N, low-resolution spectra selected from 2MASS 1 photometry<br />
(see Green et al. 2006) to be classified and parameterised with <strong>the</strong> tools developed in<br />
this <strong>the</strong>sis.<br />
83 2MASS-selected spectra were made available with an average S/N <strong>of</strong> about 133,<br />
1 http://www.ipac.caltech.edu/2mass<br />
131
132 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
but varying as high as 273 and as low as 70. The wavelength range covered is 3615–<br />
6900 Å at a resolution <strong>of</strong> R ≈ 922.<br />
<strong>Spectra</strong> for two known stars, Balloon 090900004 and BD+48 2721, were also supplied<br />
along with physcial parameters (T eff , log g, log(n He /n H )) obtained using NLTE<br />
model atmospheres (H+He, zero metal). The purpose <strong>of</strong> <strong>the</strong>se stars is to provide a<br />
temperature calibration for <strong>the</strong> hot and cool ends <strong>of</strong> <strong>the</strong> sdOB sequence, so that <strong>the</strong><br />
parameterisation results obtained with SFIT (and LTE model atmospheres) can be<br />
compared with those derived from o<strong>the</strong>r model atmospheres.<br />
All <strong>of</strong> <strong>the</strong> spectra were previously flux and wavelength calibrated. Normalisation<br />
was carried out using a cubic spline fitting routine, and <strong>the</strong> spectra were <strong>the</strong>n resampled<br />
onto a common wavelength grid <strong>of</strong> 4050–4950 Å at a sampling <strong>of</strong> 1 Å pixel−1 . Radial<br />
velocities were corrected for by cross correlating each spectrum with a grid <strong>of</strong> 101<br />
<strong>the</strong>oretical models coarsely varying over T eff , log g, and log(n He /n H ).<br />
During this pre-processing stage, it was discovered that two <strong>of</strong> <strong>the</strong> stars in <strong>the</strong> sample<br />
were white dwarfs, so <strong>the</strong>y were excluded from any fur<strong>the</strong>r analysis. Application <strong>of</strong> <strong>the</strong><br />
PCA filter <strong>of</strong> Chapter 4 was deemed unnecessary given <strong>the</strong> small sample size.<br />
<strong>Analysis</strong> And Results<br />
Classification and parameterisation on <strong>the</strong> final 83 stars was carried out using <strong>the</strong><br />
classification neural network <strong>of</strong> Chapter 2, and SFIT using <strong>the</strong> same grid <strong>of</strong> models as<br />
in Chapter 5 (Table 5.2). Results are plotted in Figures 6.1 and 6.2, and tabulated in<br />
Appendix C.<br />
The parameterisation results <strong>of</strong> <strong>the</strong> two calibration stars, Balloon 090900004 and<br />
BD+48 2721, are given in Table 6.1. Small differences exist between <strong>the</strong> parameters for<br />
both stars, with <strong>the</strong> hotter star, Balloon 090900004, showing a temperature difference<br />
<strong>of</strong> ∼ 9700K. This is not unexpected considering <strong>the</strong> inherent differences between <strong>the</strong>
6.1 2MASS-Selected Sample 133<br />
LTE and NLTE approaches.<br />
Identifier NLTE LTE<br />
T eff (K) 23017 (248) 22979 (240)<br />
BD+48 2721 log g 5.035 (0.028) 5.267 (0.032)<br />
log(n He /n H ) -2.135 (0.022) -1.629 (0.018)<br />
T eff (K) 40897 (248) 31147 (278)<br />
Balloon 090900004 log g 5.369 (0.022) 4.757 (0.054)<br />
log(n He /n H ) -2.842 (0.046) -1.811 (0.056)<br />
Table 6.1: Parameters <strong>of</strong> <strong>the</strong> two calibration stars as obtained by χ 2 -fitting to NLTE<br />
(Green et al., 2006) and LTE (<strong>Armagh</strong>) model atmospheres. Formal errors are given<br />
in paren<strong>the</strong>ses.<br />
The parameterisation results <strong>of</strong> Figure 6.1 show distributions with some similarity<br />
to those <strong>of</strong> <strong>the</strong> SDSS hot subdwarf candidates in Figure 5.2. The second gap <strong>of</strong> Newell<br />
(1973) seems to be present at T eff ≈ 23,000K (however, it is unsure if Green’s sample<br />
suffers from any selection effects). Some main sequence late-type B and A stars appear<br />
to be present in <strong>the</strong> sample.<br />
The log(n He /n H )–T eff results in Figure 6.1 show <strong>the</strong> atmospheric helium deficiency<br />
<strong>of</strong> <strong>the</strong> sdB stars, and <strong>the</strong> cluster <strong>of</strong> blue horizontal branch stars with normal helium<br />
abundances. The main sequence stars present in <strong>the</strong> sample can be seen again as <strong>the</strong><br />
low temperature, hydrogen-rich data points. Not enough sdB stars are present in <strong>the</strong><br />
sample to confirm any correlation between helium abundance and T eff , although such<br />
a correlation appears to be suggested by <strong>the</strong> results.<br />
The distribution <strong>of</strong> classifications in Figure 6.2 again shows some similarity to that<br />
<strong>of</strong> <strong>the</strong> SDSS hot subdwarf candidates in Figure 5.5. Not plotted in Figure 6.2 are <strong>the</strong><br />
late-A and early-F spectral classifications assigned to some stars by <strong>the</strong> neural network.<br />
The parameterisation results suggest <strong>the</strong> existence <strong>of</strong> such stars in <strong>the</strong> sample, but it<br />
is <strong>of</strong> interest that <strong>the</strong> neural network would distinguish and assign <strong>the</strong>m (unreliable)<br />
classes for which no samples were present in <strong>the</strong> training data. Figure 6.3 plots <strong>the</strong>se<br />
stars. The deep and broad hydrogen Balmer lines correspond with <strong>the</strong> late-A and<br />
early-F spectral types. This would seem to demonstrate that <strong>the</strong> neural network has<br />
very good generalisation properties.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
134 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
2<br />
3<br />
log g<br />
4<br />
ZAMS<br />
5<br />
ZAHB<br />
He-MS<br />
6<br />
7<br />
50000<br />
40000<br />
30000<br />
20000<br />
10000<br />
Effective Temperature (K)<br />
2<br />
1<br />
log( nHe / nH )<br />
0<br />
-1<br />
-2<br />
-3<br />
-4<br />
50000<br />
40000<br />
30000<br />
20000<br />
10000<br />
Effective Temperature (K)<br />
Figure 6.1: SFIT physical parameters for 2MASS-selected sample. The helium main<br />
sequence <strong>of</strong> Paczyński (1971), and post-EHB evolutionary tracks <strong>of</strong> Dorman et al.<br />
(1993) are also plotted.
6.1 2MASS-Selected Sample 135<br />
0<br />
I<br />
II<br />
Luminosity Class<br />
III<br />
IV<br />
V<br />
VI<br />
VII<br />
VIII<br />
IX<br />
O O5 B B5 A<br />
<strong>Spectra</strong>l Type<br />
40<br />
30<br />
Helium Class<br />
20<br />
10<br />
0<br />
O O5 B B5 A<br />
<strong>Spectra</strong>l Type<br />
Figure 6.2: ANN classification for 2MASS-selected sample. Points have been given<br />
small random <strong>of</strong>fsets in each axis for clarity.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
136 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
6<br />
J143155.30+172404.9<br />
sdA7V:He3<br />
5.5<br />
5<br />
J095854.23+360314.3<br />
sdF8VI:He2<br />
4.5<br />
Flux (continuum = 1) + const.<br />
4<br />
3.5<br />
3<br />
2.5<br />
J114454.50+031550.2<br />
J112832.64+603859.3<br />
sdA7V:He4<br />
sdF5V:He3<br />
2<br />
J111819.13+093144.4<br />
sdA2V:He5<br />
1.5<br />
1<br />
J083127.37+422201.7<br />
sdA5VI:He2<br />
0.5<br />
0<br />
4100<br />
4500<br />
4900<br />
Wavelength (Angstroms)<br />
Figure 6.3: The stars assigned late-A and early-F spectral types by <strong>the</strong> neural network.
6.2 SDSS sdB-He Stars <strong>of</strong> Harris et al. (2003) 137<br />
6.2 SDSS sdB-He Stars <strong>of</strong> Harris et al. (2003)<br />
In collaboration with Ahmad (Ahmad et al., 2006) <strong>the</strong> classification neural network<br />
was used to classify a small set <strong>of</strong> “helium-rich” sdB-He stars obtained from <strong>the</strong> SDSS<br />
by Harris et al. (2003). Results <strong>of</strong> this analysis, along with helium abundances derived<br />
by Ahmad using SFIT and a grid <strong>of</strong> LTE model atmospheres, are presented in Table<br />
6.2.<br />
SDSS Identifier n He ANN Class<br />
J094044.08+004759 0.16 sdB0VIII:He23<br />
J113840.69-003531 0.01 sdB3V:He1<br />
J124346.38+002534 0.05 sdB1V:He23<br />
J125410.86-010408 0.01 sdB3III:He5<br />
J131745.80+010450 0.01 sdB0VI:He3<br />
J134545.24-000641 0.15 sdO9VII:He21<br />
J134635.68-001804 0.09 sdA2IV:He0<br />
J135707.35+010454 0.36 sdO6VII:He30<br />
J141556.68-005814 0.21 sdB8VI:He14<br />
J143917.64+010251 0.01 sdB6V:He3<br />
J144514.93+000249 0.02 sdB1VII:He11<br />
J152708.31+003308 0.45 sdO9VIII:He35<br />
J152905.62+002137 0.06 sdO9VII:He10<br />
J154238.43-003758 0.07 sdA2III:He2<br />
Table 6.2: Classification results for <strong>the</strong> sdB-He stars <strong>of</strong> Harris et al. (2003).<br />
The aim <strong>of</strong> this work was to determine if <strong>the</strong> sdB-He stars <strong>of</strong> Harris et al. (2003) are<br />
similar to He-sdB stars (see Ahmad 2004) as this would increase <strong>the</strong> number <strong>of</strong> known<br />
helium-rich subdwarfs for fur<strong>the</strong>r study.<br />
However, it is clear from <strong>the</strong> classification and parameterisation results obtained<br />
that most <strong>of</strong> <strong>the</strong> sdB-He stars show very little helium enrichment, with half <strong>of</strong> <strong>the</strong><br />
stars in <strong>the</strong> sample having surface gravities too low to be subdwarfs (Ahmad, private<br />
communication). Out <strong>of</strong> <strong>the</strong> remaining subdwarfs, only a handful are helium rich (i.e.<br />
having n He ≥ 0.10, or He class > 20).<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
138 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
6.3 Ahmad & Jeffery (2003) He-sdBs<br />
Ahmad & Jeffery (2003) undertook <strong>the</strong> first systematic study <strong>of</strong> a set <strong>of</strong> helium-rich<br />
subdwarf B stars, obtaining observations and physical parameters for 17 targets.<br />
These stars have been previously classified by Drilling et al. (2006) using observations<br />
from different sources. As such, <strong>the</strong> re-classification <strong>of</strong> <strong>the</strong>se stars by <strong>the</strong> neural network<br />
in Chapter 2, using <strong>the</strong> new observations <strong>of</strong> Ahmad & Jeffery (2003), presents an<br />
opportunity to verify <strong>the</strong> neural network’s performance.<br />
Ahmad & Jeffery (2003) observed <strong>the</strong> targets over a variety <strong>of</strong> wavelength ranges<br />
between 3900 and 5000 Å, with <strong>the</strong> spectra being bias corrected, flat-fielded, sky subtracted,<br />
and wavelength calibrated using standard procedures. All spectra were normalised<br />
by defining a smooth polynomial continuum from sections <strong>of</strong> local continuum,<br />
with care being taken to avoid <strong>the</strong> wings <strong>of</strong> broad absorption lines.<br />
Before passing <strong>the</strong> spectra to <strong>the</strong> neural network, <strong>the</strong>y were rebinned onto <strong>the</strong> common<br />
wavelength grid <strong>of</strong> 4050–4950 Å at a sampling <strong>of</strong> 1 Å pixel−1 . Any wavelength bins<br />
in this grid for which no flux data were available in <strong>the</strong> original observations (i.e., in<br />
<strong>the</strong> case <strong>of</strong> a short spectrum) were automatically assigned a flux value <strong>of</strong> 1.0.<br />
The results are presented in Table 6.3, with a graphical comparison between <strong>the</strong><br />
neural network classifications and those <strong>of</strong> Drilling et al. (2006) plotted in Figure 6.4.<br />
Although <strong>the</strong> sample is limited in distribution in <strong>the</strong> classification parameter space,<br />
a good agreement can be seen between <strong>the</strong> neural network and Drilling et al. (2006),<br />
providing confirmation <strong>of</strong> <strong>the</strong> work presented in Chapter 2.<br />
6.4 Summary<br />
The application <strong>of</strong> <strong>the</strong> analytical tools developed in previous chapters to a collection<br />
<strong>of</strong> small data sets from different sources highlights <strong>the</strong>ir versatility and usefulness.
6.4 Summary 139<br />
40<br />
ANN Helium Class<br />
30<br />
20<br />
10<br />
10 20 30 40<br />
Drilling Helium Class<br />
IV<br />
V<br />
ANN Luminosity Class<br />
VI<br />
VII<br />
VIII<br />
IX<br />
IX<br />
VIII<br />
VII<br />
VI<br />
Drilling Luminosity Class<br />
V<br />
IV<br />
B5<br />
ANN <strong>Spectra</strong>l Type<br />
B<br />
O5<br />
O5 B B5<br />
Drilling <strong>Spectra</strong>l Type<br />
Figure 6.4: Comparison <strong>of</strong> ANN classifications with those <strong>of</strong> Drilling et al. (2006)<br />
for <strong>the</strong> 17 He-sdBs <strong>of</strong> Ahmad & Jeffery (2003). Points have been given small random<br />
<strong>of</strong>fsets in each axis for clarity. Also plotted is <strong>the</strong> best fit least squares regression line<br />
with error bars showing <strong>the</strong> RMS <strong>of</strong> <strong>the</strong> residuals.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
140 Chapter 6 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Identifier Drilling Class ANN Class<br />
HS1000+471 sdBC0.2VII:He28 sdB0VII:He29<br />
HS1844+637 sdB1VII:He39 sdB2VII:He37<br />
LSIV-14 116 sdB0.2VII:He17 sdB0VIII:He20<br />
PG0229+064 sdB3V:He13 sdB4V:He18<br />
PG0240+046 sdBC0.2VII:He24 sdB2VII:He28<br />
PG0902+057 sdB0VII:He38 sdO9VII:He35<br />
PG1127+019 sdOC9VII:He40 sdO8VI:He41<br />
PG1415+492 sdBC1VI:He39 sdB0VI:He38<br />
PG1544+488 sdBC1VIII:He39 sdB0VII:He37<br />
PG1554+408 sdB0.2VII:He39 sdB0VII:He36<br />
PG1600+171 sdOC8.5VII:He39 sdO8VI:He37<br />
PG1615+413 sdB1VII:He37 sdB2VII:He34<br />
PG1658+273 sdOC9.5VII:He39 sdO8VII:He40<br />
PG1715+273 sdB1VII:He37 sdO5VIII:He36<br />
PG2258+155 sdB0.2VII:He39 sdB1VII:He35<br />
PG2321+214 sdB0VII:He37 sdB2VII:He37<br />
TON107 sdBC0.5VII:He28 sdB1VII:He27<br />
Table 6.3: Classification results for <strong>the</strong> Ahmad & Jeffery (2003) He-sdBs.<br />
The results <strong>of</strong> <strong>the</strong> 2MASS-selected sample appear to confirm <strong>the</strong> findings <strong>of</strong> Green<br />
et al. (2006), and lend support to <strong>the</strong> results described in <strong>the</strong> previous chapter. Before<br />
<strong>the</strong> evolutionary details causing <strong>the</strong> observed distributions can be understood,<br />
additional data, e.g., stellar masses, needs to be ga<strong>the</strong>red.<br />
The application <strong>of</strong> <strong>the</strong> classification neural network to <strong>the</strong> helium-rich subdwarf B<br />
stars <strong>of</strong> Harris et al. (2003) highlights <strong>the</strong> need for a homogeneous classification scheme<br />
for hot subdwarfs.
Chapter 7<br />
Conclusions And Future Work<br />
This project set out to examine <strong>the</strong> problem <strong>of</strong> analysing large sets <strong>of</strong> astronomical<br />
spectra. Specifically, <strong>the</strong> intention was to establish a set <strong>of</strong> tools that can automatically<br />
extract and analyse <strong>the</strong> spectra <strong>of</strong> any type <strong>of</strong> object from a large database <strong>of</strong> unknown<br />
observations, and <strong>the</strong>n apply <strong>the</strong>se tools to a real survey database.<br />
Analysing large sets <strong>of</strong> astronomical spectra consists <strong>of</strong> three core problems: classification,<br />
physical parameterisation, and <strong>the</strong> extraction <strong>of</strong> particular types <strong>of</strong> objects<br />
from an unknown data set.<br />
In this project, classification was tackled by <strong>the</strong> highly versatile statistical machine<br />
learning method <strong>of</strong> artificial neural networks, which has seen widespread use in astronomy.<br />
Chapter 2 studied <strong>the</strong> use <strong>of</strong> ANNs to classify hot subdwarf spectra onto <strong>the</strong><br />
system defined by Drilling et al. (2006). Global errors (σ rms ) on <strong>the</strong> classifications <strong>of</strong><br />
∼ 2 subtypes for spectral type, ∼ 1 subclass for luminosity class, and ∼ 4 subclasses for<br />
<strong>the</strong> helium class were achieved. These errors are in line with <strong>the</strong> accuracies achieved<br />
by human classifiers.<br />
Physical parameters were obtained by fitting observations to grids <strong>of</strong> <strong>the</strong>oretical<br />
models using a χ 2 minimisation procedure. SFIT, <strong>the</strong> χ 2 minimisation code used at <strong>the</strong><br />
<strong>Armagh</strong> <strong>Observatory</strong>, has been improved in Chapter 3 using concepts from <strong>the</strong> domain<br />
141
142 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>of</strong> computational geometry to provide a new methodology for storing and accessing<br />
arbitrarily large, three-dimensional grids <strong>of</strong> models, paving <strong>the</strong> way to extending <strong>the</strong><br />
code to operate in distributed parallel computing environments.<br />
Locating <strong>the</strong> spectra <strong>of</strong> a particular type <strong>of</strong> object in a large set <strong>of</strong> unknown observations<br />
was accomplished using <strong>the</strong> multivariate statistical technique, Principal Components<br />
<strong>Analysis</strong>. Chapter 4 outlined <strong>the</strong> mechanics <strong>of</strong> <strong>the</strong> filter, and demonstrated how<br />
it was used to extract hot subdwarf spectra from a data set obtained from <strong>the</strong> SDSS.<br />
This solution provides a means to reduce unknown data sets to quantities suitable for<br />
closer visual inspection.<br />
Collectively, <strong>the</strong>se tools were applied to <strong>the</strong> archives <strong>of</strong> <strong>the</strong> SDSS to extract and<br />
analyse <strong>the</strong> spectra <strong>of</strong> hot subdwarf stars. The PCA filter was able to reduce a set<br />
<strong>of</strong> almost 7000 unknown spectra to a collection <strong>of</strong> approximately 400 samples from<br />
which 282 hot subdwarf candidates were quickly extracted by visual inspection. The<br />
classification ANN successfully assigned classes to <strong>the</strong>se stars based on <strong>the</strong> Drilling et al.<br />
(2006) system, and physical parameters were derived using SFIT and a grid <strong>of</strong> LTE<br />
model atmospheres. The results revealed several unexplained phenomena <strong>of</strong> extended<br />
horizontal branch stars, namely,<br />
1. Existence <strong>of</strong> <strong>the</strong> second horizontal branch gap <strong>of</strong> Newell (1973) at T eff ≈ 22,500K.<br />
2. Two sdB n He –T eff sequences, also observed by Edelmann et al. (2003).<br />
3. A clustering <strong>of</strong> hot, helium rich sdO stars at T eff ≈ 44,000K, log g = 5.7, also<br />
observed by Heber et al. (2006).<br />
These findings pose important questions for stellar evolution <strong>the</strong>ory, and represent<br />
a successful demonstration <strong>of</strong> what this project set out to achieve.
143<br />
Future Directions<br />
Working with <strong>the</strong> data from <strong>the</strong> SDSS highlighted a number <strong>of</strong> improvements that could<br />
be made to <strong>the</strong> tools <strong>the</strong>mselves, but several important problems concerning spectral<br />
analysis and its large-scale application were also made apparent.<br />
Continuum Normalisation<br />
<strong>On</strong>e <strong>of</strong> <strong>the</strong> most troubling was <strong>the</strong> normalisation <strong>of</strong> stellar continua. As noted in Chapter<br />
5, <strong>the</strong> SDSS uses a method based on median/mean filtering which tends to underfit<br />
<strong>the</strong> continuum in regions where <strong>the</strong> blending <strong>of</strong> lines becomes very strong.<br />
An automatic renormalisation method based on cubic spline fitting was employed<br />
in Chapter 5 in an attempt to gain a more precise fit to <strong>the</strong> continuum. This method<br />
used several sets <strong>of</strong> pre-programmed wavelength locations as control points for <strong>the</strong> cubic<br />
spline fit. The control points in each set were chosen manually by iterative refinement,<br />
and <strong>the</strong> different sets essentially conformed to a coarse temperature–abundance classification<br />
system because different control points were needed for hot, helium-rich stars<br />
and cooler, helium-poor stars.<br />
<strong>On</strong>ce <strong>the</strong> sets <strong>of</strong> control points were established, <strong>the</strong> method gave good results<br />
for <strong>the</strong> final set <strong>of</strong> hot subdwarf candidates. Obviously, this particular methodology<br />
is poorly catered for a general data mining application because it is tied to one<br />
particular type <strong>of</strong> object.<br />
A more robust and general automatic algorithm is required.<br />
However, this is an extremely difficult problem because such an algorithm must take<br />
into account many factors: noise, regions where <strong>the</strong> spectral flux changes rapidly, cosmic<br />
spikes and o<strong>the</strong>r anomalies, and troublesome regions like that <strong>of</strong> <strong>the</strong> higher-order<br />
Balmer lines where <strong>the</strong> actual continuum runs above <strong>the</strong> flux information present. An<br />
acceptable solution will be very hard to come by.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
144 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Data Management<br />
Ano<strong>the</strong>r major problem encountered was <strong>the</strong> management <strong>of</strong> large data sets. The two<br />
main issues are storing sets <strong>of</strong> spectra in a meaningful and easily accessible manner,<br />
and keeping track <strong>of</strong> <strong>the</strong> changes to each spectrum over <strong>the</strong> course <strong>of</strong> time.<br />
Almost 7000 unique spectra were extracted from <strong>the</strong> SDSS in Chapter 5. Over <strong>the</strong><br />
course <strong>of</strong> <strong>the</strong> analysis, <strong>the</strong> spectra were converted from FITS files to ASCII format,<br />
filtered, renormalised, velocity corrected, resampled, and collected toge<strong>the</strong>r into <strong>the</strong><br />
specific formats required by <strong>the</strong> classification and parameterisation codes. Eventually,<br />
this trail <strong>of</strong> data became cumbersome to manage and keep track <strong>of</strong> as it was replicated<br />
into different folders and different files across <strong>the</strong> computer’s file system. There was<br />
also an unfortunate incident where a badly typed command accidently deleted several<br />
very important folders <strong>of</strong> data.<br />
When <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> 282 hot subdwarfs was complete, <strong>the</strong> results were stored<br />
in several ASCII-format files which had to be processed manually in order to correlate<br />
<strong>the</strong> classifications <strong>of</strong> <strong>the</strong> ANN with <strong>the</strong> parameters found by SFIT. This led to several<br />
such files in different folders with no attached information to say when <strong>the</strong> results were<br />
obtained, from what data set, using which models, and which ANN.<br />
Both <strong>of</strong> <strong>the</strong>se issues highlight <strong>the</strong> need for a centralised database which can keep<br />
track <strong>of</strong> <strong>the</strong> changes made to <strong>the</strong> data as an analysis proceeds. Such an idea is already<br />
widely used in tools to help manage computer s<strong>of</strong>tware projects (e.g., CVS 1 ). These<br />
tools record all <strong>the</strong> changes made to each individual source file, allowing <strong>the</strong> changes to<br />
be rolled back to any previous version should something go awry. Auditing analyses <strong>of</strong><br />
astronomical spectra in this manner would bring with it not only data integrity, but a<br />
trail <strong>of</strong> operations conducted on <strong>the</strong> data which could be analysed in detail later should<br />
an erroneous methodology need verified.<br />
A centralised database would also allow structured metadata to be recorded con-<br />
1 http://www.nongnu.org/cvs/
145<br />
cerning <strong>the</strong> dates and times <strong>of</strong> analyses, <strong>the</strong> tools used and <strong>the</strong>ir version numbers,<br />
<strong>the</strong> <strong>the</strong>oretical models used, <strong>the</strong> date <strong>the</strong>y were generated, and <strong>the</strong> codes and atomic<br />
data used to generate <strong>the</strong>m, and so on. Such metadata would prove invaluable if, for<br />
example, an analysis is revised at a later date.<br />
Finally, storing results alongside <strong>the</strong> data in a homogeneous database would greatly<br />
simplify tasks such as producing plots for publication, applying clustering algorithms<br />
to automatically look for patterns in <strong>the</strong> results, and cross-correlating <strong>the</strong> database<br />
with o<strong>the</strong>r databases accessible over <strong>the</strong> internet.<br />
Data Visualisation<br />
When dealing with large quantities <strong>of</strong> data, one extremely useful tool is interactive<br />
visualisation. Being able to graphically represent data in useful ways, and manipulate<br />
<strong>the</strong>m by way <strong>of</strong> visualisation, facilitates <strong>the</strong> process <strong>of</strong> discovery and understanding.<br />
When analysing <strong>the</strong> SDSS data in Chapter 5, <strong>the</strong> final hot subdwarf sample was manually<br />
selected from <strong>the</strong> PCA filtering results. This stage would have proceeded much<br />
more quickly if a good visualisation tool had been in place.<br />
In this project, extensive use was made <strong>of</strong> Gnuplot 2 to visualise spectra. Although<br />
Gnuplot is an excellent plotting tool, it is not designed for interactive investigation <strong>of</strong><br />
<strong>the</strong> data being plotted. As such, to visualise <strong>the</strong> SDSS data, Gnuplot was invoked from<br />
a script to produce thousands <strong>of</strong> plots that were subsequently displayed in a series <strong>of</strong><br />
static web pages. Clearly, this is awkward, adding ano<strong>the</strong>r layer <strong>of</strong> data management<br />
to complicate <strong>the</strong> problems mentioned previously. A better solution is desperately<br />
needed.<br />
2 http://www.gnuplot.info/<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
146 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Algorithm Development<br />
Working with <strong>the</strong> main analytical tools used in this project showed that <strong>the</strong>y could<br />
be improved in several ways. The errors obtained for <strong>the</strong> classifications produced<br />
by <strong>the</strong> neural network in Chapter 2 are global estimates based on <strong>the</strong> leave-one-out<br />
cross-validation that was carried out. It would be far more useful if proper confidence<br />
intervals were available for each individual result produced by <strong>the</strong> ANN. Such confidence<br />
intervals can be obtained through <strong>the</strong> bootstrap statistical technique (e.g., Willemsen<br />
et al., 2005), or Bayesian methods (see Bishop, 1995, sect. 10.2).<br />
The SFIT model grid indexing and searching methodology in Chapter 3 works well<br />
for two and three-dimensional grids. Although it was stipulated in that chapter that<br />
higher dimensional grids are not likely to be used due to <strong>the</strong> curse <strong>of</strong> dimensionality, <strong>the</strong><br />
use <strong>of</strong> four or possibly five-dimensional grids may not be out <strong>of</strong> <strong>the</strong> question as computer<br />
technology continues to improve. In <strong>the</strong>ory, <strong>the</strong> Delaunay triangulation methodology in<br />
Chapter 3 could be extended to higher dimensional geometries, but a different approach<br />
(perhaps <strong>the</strong> k-D tree-based algorithm discussed in <strong>the</strong> chapter) may be more flexible<br />
and less complicated.<br />
As it stands, SFIT, with <strong>the</strong> modifications <strong>of</strong> Chapter 3, is a flexible and robust<br />
tool for spectral parameterisation. The next step forward is to introduce parallel programming<br />
constructs to allow its use in a distributed computing environment, such<br />
as <strong>the</strong> computing cluster at <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong> (see Appendix D), or <strong>the</strong> Grid.<br />
Programatically speaking, this is not a very difficult task, but it does require some<br />
planning.<br />
The Principal Components <strong>Analysis</strong> filtering tool <strong>of</strong> Chapter 4 worked well for <strong>the</strong><br />
application to hot subdwarf spectra. A visual selection process is still required on<br />
<strong>the</strong> final filtered data set because precise filtering is a hard problem. Never<strong>the</strong>less,<br />
future work could help improve <strong>the</strong> efficiency <strong>of</strong> <strong>the</strong> PCA filter perhaps by devising<br />
a new reconstruction error calculation that is more sensitive to <strong>the</strong> finer details <strong>of</strong>
147<br />
astronomical spectra.<br />
In <strong>the</strong> application <strong>of</strong> <strong>the</strong> hot subdwarf filter to <strong>the</strong> data sets obtained from <strong>the</strong> SDSS<br />
in Chapters 4 and 5, <strong>the</strong> filter could have worked better if more weighting was given to<br />
differences in <strong>the</strong> cores and wings <strong>of</strong> spectral lines. This would burden <strong>the</strong> user with<br />
supplying some sort <strong>of</strong> line list giving <strong>the</strong> wavelengths and perhaps equivalent widths<br />
<strong>of</strong> spectral lines to which <strong>the</strong> error calculation should pay attention, but a little effort<br />
spent in preparation could save a lot <strong>of</strong> time when it comes to <strong>the</strong> visual inspection<br />
stage.<br />
The tools used in this project were chosen based on <strong>the</strong>ir previous successful applications<br />
to analysing astronomical spectra, but many o<strong>the</strong>r machine learning techniques<br />
have <strong>the</strong> potential to be employed (see Russell & Norvig, 2003). Algorithms such as<br />
<strong>the</strong> Kohonen self-organising map (Kohonen, 1990; Kohonen et al., 1996), and Bayesian<br />
probabilistic methods like those embodied in <strong>the</strong> AutoClass program 3 , can take an unknown<br />
dataset and automatically derive classes for that set based on <strong>the</strong> information<br />
present in <strong>the</strong> data. This makes <strong>the</strong>m <strong>of</strong> particular interest for filtering and classification<br />
problems, and it would be a worthwhile project to investigate <strong>the</strong>ir ability in this<br />
regard.<br />
Afterword<br />
As noted in Chapter 1, improvements in observational and information technology<br />
mean that <strong>the</strong> amount <strong>of</strong> data being ga<strong>the</strong>red in astronomy is always increasing. The<br />
specific result <strong>of</strong> this <strong>the</strong>sis is a set <strong>of</strong> tools which can be used to analyse <strong>the</strong> very large<br />
databases that will be generated by new survey projects such as SDSS-II and <strong>the</strong> GAIA<br />
space mission.<br />
The ultimate future goal <strong>of</strong> <strong>the</strong> work presented in this <strong>the</strong>sis is, however, to continue<br />
<strong>the</strong> development <strong>of</strong> <strong>the</strong> computational framework <strong>of</strong> Jeffery (2003). This framework<br />
3 http://ic.arc.nasa.gov/ic/projects/bayes-group/autoclass/<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
148 Chapter 7 - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
incorporates <strong>the</strong> tool set developed here into a much wider system to analyse and<br />
manage astronomical data, making use <strong>of</strong> distributed computing initiatives such as <strong>the</strong><br />
Grid (see Figure 7.1). This system will help us set sail on <strong>the</strong> seas <strong>of</strong> astronomical data,<br />
charting our way into <strong>the</strong> unknown mysteries <strong>of</strong> <strong>the</strong> universe.
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Figure 7.1: Schematic diagram showing how <strong>the</strong> work <strong>of</strong> this <strong>the</strong>sis fits in with <strong>the</strong><br />
wider system envisaged by Jeffery (2003).<br />
Training<br />
Data<br />
Unknown<br />
Data Set<br />
(eg SDSS)<br />
ANN<br />
Classification<br />
Remote<br />
Astronomical<br />
Databases<br />
(eg Simbad)<br />
Pre−Processing<br />
PCA<br />
Filtering<br />
Manual<br />
Selection<br />
Results<br />
Database<br />
Results<br />
Exploration<br />
&<br />
Visualisation<br />
χ 2 Model<br />
Fitting<br />
Distributed<br />
Computing<br />
Resources<br />
Theoretical<br />
Models<br />
Database<br />
Parameter<br />
Space<br />
Exploration<br />
Model<br />
Generation<br />
Remote<br />
Atomic<br />
Database<br />
Third−Party<br />
Codes<br />
Request<br />
New<br />
Data<br />
R−Matrix II<br />
Calculation<br />
149
151
Bibliography<br />
Ahmad, A. 2004, PhD <strong>the</strong>sis, The Queen’s University <strong>of</strong> Belfast<br />
Ahmad, A. & Jeffery, C. S. 2003, A&A, 402, 335<br />
Ahmad, A., Winter, C., & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2,<br />
The Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs and Related Objects, ed.<br />
R. H. Østensen, 159–162<br />
Allende Prieto, C., Rebolo, R., López, R. J. G., Serra-Ricart, M., Beers, T. C., Rossi,<br />
S., Bonifacio, P., & Molaro, P. 2000, AJ, 120, 1516<br />
Altmann, M., Edelmann, H., & de Boer, K. S. 2004, A&A, 414, 181<br />
Bailer-Jones, C. A. L. 1996, PhD <strong>the</strong>sis, University <strong>of</strong> Cambridge<br />
—. 1997, PASP, 109, 932<br />
Bailer-Jones, C. A. L., Irwin, M., Gilmore, G., & von Hippel, T. 1997, MNRAS, 292,<br />
157<br />
Bailer-Jones, C. A. L., Irwin, M., & von Hippel, T. 1998, MNRAS, 298, 361<br />
Behara, N. T. & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, The<br />
Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs and Related Objects, ed. R. H.<br />
Østensen, 115–122<br />
Bishop, C. M. 1995, Neural Networks for Pattern Recognition (Oxford: Oxford University<br />
Press)<br />
Brown, T. M., Bowers, C. W., Kimble, R. A., & Ferguson, H. C. 2000, ApJ, 529, L89<br />
Brown, T. M., Ferguson, H. C., Davidsen, A. F., & Dorman, B. 1997, ApJ, 482, 685<br />
Caloi, V. 1976, A&A, 50, 471<br />
—. 1989, A&A, 221, 27<br />
Colin, J., de Boer, K. S., Dauphole, B., Ducourant, C., Dulou, M. R., Geffert, M., Le<br />
Campion, J.-F., Moehler, S., Odenkirchen, M., Schmidt, J. H. K., & Theissen, A.<br />
1994, A&A, 287, 38<br />
153
154 BIBLIOGRAPHY<br />
Colless, M., Dalton, G., Maddox, S., Su<strong>the</strong>rland, W., Norberg, P., Cole, S., Bland-<br />
Hawthorn, J., Bridges, T., Cannon, R., Collins, C., Couch, W., Cross, N., Deeley,<br />
K., De Propris, R., Driver, S. P., Efstathiou, G., Ellis, R. S., Frenk, C. S., Glazebrook,<br />
K., Jackson, C., Lahav, O., Lewis, I., Lumsden, S., Madgwick, D., Peacock, J. A.,<br />
Peterson, B. A., Price, I., Seaborne, M., & Taylor, K. 2001, MNRAS, 328, 1039<br />
Connolly, A. J. & Szalay, A. S. 1999, AJ, 117, 2052<br />
Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L., & Calzetti, D. 1995,<br />
AJ, 110, 1071<br />
D’Cruz, N. L., Dorman, B., Rood, R. T., & O’Connell, R. W. 1996, ApJ, 466, 359<br />
de Boer, K. S., Aguilar Sanchez, Y., Altmann, M., Geffert, M., Odenkirchen, M.,<br />
Schmidt, J. H. K., & Colin, J. 1997, A&A, 327, 577<br />
Deeming, T. J. 1964, MNRAS, 127, 493<br />
Djorgovski, S. G., Gal, R. R., Odewahn, S. C., de Carvalho, R. R., Brunner, R., Longo,<br />
G., & Scaramella, R. 1998, in Wide Field Surveys in Cosmology, 14th IAP meeting<br />
held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN: 2-8 6332-241-9,<br />
p. 89., ed. S. Colombi, Y. Mellier, & B. Raban, 89–+<br />
Dorman, B., Rood, R. T., & O’Connell, R. W. 1993, ApJ, 419, 596<br />
Dreizler, S., Heber, U., Werner, K., Moehler, S., & de Boer, K. S. 1990, A&A, 235, 234<br />
Drilling, J. S. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient Stars, 461<br />
Drilling, J. S., Jeffery, C. S., Moehler, S., Heber, U., & Napiwotzki, R. 2006, in preparation<br />
Dudley, R., E. 1992, PhD <strong>the</strong>sis, The University <strong>of</strong> St. Andrews<br />
Edelmann, H., Heber, U., Hagen, H.-J., Lemke, M., Dreizler, S., Napiwotzki, R., &<br />
Engels, D. 2003, A&A, 400, 939<br />
Edelsbrunner, H. & Shah, N. R. 1992, in SCG ’92: Proceedings <strong>of</strong> <strong>the</strong> eighth annual<br />
symposium on Computational geometry (New York, NY, USA: ACM Press), 43–52<br />
Fan, X. 1999, AJ, 117, 2528<br />
Folkes, S. R., Lahav, O., & Maddox, S. J. 1996, MNRAS, 283, 651<br />
Francis, P. J., Hewett, P. C., Foltz, C. B., & Chaffee, F. H. 1992, ApJ, 398, 476<br />
Galaz, G. & de Lapparent, V. 1998, A&A, 332, 459<br />
Glazebrook, K., Offer, A. R., & Deeley, K. 1998, ApJ, 492, 98<br />
Golub, G. H. & Van Loan, C. F. 1989, Matrix Computations, 2nd edn. (Baltimore,<br />
Maryland 21218: The Johns Hopkins University Press)
BIBLIOGRAPHY 155<br />
Green, E. M., Fontaine, G., Hyde, E. A., Charpinet, S., & Chayer, P. 2006, in Baltic<br />
Astronomy, Vol. 15, Nos. 1-2, The Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs<br />
and Related Objects, ed. R. H. Østensen, 167–174<br />
Green, R. F., Schmidt, M., & Liebert, J. 1986, ApJS, 61, 305<br />
Greenstein, J. L. & Sargent, A. I. 1974, ApJS, 28, 157<br />
Gulati, R., Gupta, R., & Singh, H. 1997a, PASP, 109, 843<br />
Gulati, R. K., Gupta, R., Gothoskar, P., & Khobragade, S. 1994a, ApJ, 426, 340<br />
—. 1994b, Vistas in Astronomy, 38, 293<br />
—. 1996, Bulletin <strong>of</strong> <strong>the</strong> Astronomical Society <strong>of</strong> India, 24, 21<br />
Gulati, R. K., Gupta, R., & Rao, N. K. 1997b, A&A, 322, 933<br />
Harris, H. C., Liebert, J., Kleinman, S. J., Nitta, A., Anderson, S. F., Knapp, G. R.,<br />
Krzesiński, J., Schmidt, G., Strauss, M. A., Vanden Berk, D., Eisenstein, D., Hawley,<br />
S., Margon, B., Munn, J. A., Silvestri, N. M., Smith, J. A., Szkody, P., Collinge,<br />
M. J., Dahn, C. C., Fan, X., Hall, P. B., Schneider, D. P., Brinkmann, J., Burles,<br />
S., Gunn, J. E., Hennessy, G. S., Hindsley, R., Ivezić, Z., Kent, S., Lamb, D. Q.,<br />
Lupton, R. H., Nichol, R. C., Pier, J. R., Schlegel, D. J., SubbaRao, M., Uomoto,<br />
A., Yanny, B., & York, D. G. 2003, AJ, 126, 1023<br />
Heber, U. 1986, A&A, 155, 33<br />
Heber, U. 1991, in IAU Symp. 145: Evolution <strong>of</strong> Stars: <strong>the</strong> Photospheric Abundance<br />
Connection, ed. G. Michaud & A. V. Tutukov, 363–+<br />
Heber, U., Hirsch, H., Ströer, A., O’Toole, S., Haas, S., & Dreizler, S. 2006, in Baltic<br />
Astronomy, Vol. 15, Nos. 1-2, The Proceedings <strong>of</strong> <strong>the</strong> 2nd Meeting on Hot Subdwarfs<br />
and Related Objects, ed. R. H. Østensen, 104–111<br />
Heber, U. & Hunger, K. 1987, in IAU Colloq. 95: Second Conference on Faint Blue<br />
Stars, ed. A. G. D. Philip, D. S. Hayes, & J. W. Liebert, 599–602<br />
Heber, U., Hunger, K., Jonas, G., & Kudritzki, R. P. 1984, A&A, 130, 119<br />
Hirsch, H. A., Heber, U., O’Toole, S. J., & Bresolin, F. 2005, A&A, 444, L61<br />
Husfeld, D., Butler, K., Heber, U., & Drilling, J. S. 1989, A&A, 222, 150<br />
Hutchison, R. B. 1971, AJ, 76, 711<br />
Iben, I., Kaler, J. B., Truran, J. W., & Renzini, A. 1983, ApJ, 264, 605<br />
Iben, I. J. 1990, ApJ, 353, 215<br />
Jeffery, C. S. 2003, in ASP Conf. Ser. 288: <strong>Stellar</strong> Atmosphere Modeling, ed. I. Hubeny,<br />
D. Mihalas, & K. Werner, 141–+<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
156 BIBLIOGRAPHY<br />
Jeffery, C. S., Drilling, J. S., Harrison, P. M., Heber, U., & Moehler, S. 1997, A&AS,<br />
125, 501<br />
Jeffery, C. S., Heber, U., Hill, P. W., Dreizler, S., Drilling, J. S., Lawson, W. A.,<br />
Leuenhagen, U., & Werner, K. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient<br />
Stars, ed. C. S. Jeffery & U. Heber, 471–+<br />
Jeffery, C. S., Woolf, V. M., & Pollacco, D. L. 2001, A&A, 376, 497<br />
Katz, D., Soubiran, C., Cayrel, R., Adda, M., & Cautain, R. 1998, A&A, 338, 151<br />
Kleinman, S. J., Harris, H. C., Eisenstein, D. J., Liebert, J., Nitta, A., Krzesiński, J.,<br />
Munn, J. A., Dahn, C. C., Hawley, S. L., Pier, J. R., Schmidt, G., Silvestri, N. M.,<br />
Smith, J. A., Szkody, P., Strauss, M. A., Knapp, G. R., Collinge, M. J., Mukadam,<br />
A. S., Koester, D., Uomoto, A., Schlegel, D. J., Anderson, S. F., Brinkmann, J.,<br />
Lamb, D. Q., Schneider, D. P., & York, D. G. 2004, ApJ, 607, 426<br />
Klemola, A. R. 1961, ApJ, 134, 130<br />
Kohonen, T. 1990, in New Concepts in Computer Science: Proc. Symp. in Honour <strong>of</strong><br />
Jean-Claude Simon (Paris, France: AFCET), 181–190<br />
Kohonen, T., Hynninen, J., Kangas, J., & Laaksonen, J. 1996, SOM PAK: The Self-<br />
Organizing Map program package, Tech. Rep. A31, Laboratory <strong>of</strong> Computer and<br />
Information Science, Helsinki University <strong>of</strong> Technology<br />
Kurtz, M. J. 1982, Ph.D. Thesis<br />
Lahav, O., Naim, A., Sodré, L., & Storrie-Lombardi, M. C. 1996, MNRAS, 283, 207<br />
Lamy, H. & Hutsemékers, D. 2004, A&A, 427, 107<br />
Lasala, J. 1994, in ASP Conf. Ser. 60: The MK Process at 50 Years: A Powerful Tool<br />
for Astrophysical Insight, ed. C. J. Corbally, R. O. Gray, & R. F. Garrison, 312–+<br />
Levenberg, K. 1944, Questions <strong>of</strong> Applied Ma<strong>the</strong>matics, 2, 164<br />
Livny, M. & Raman, R. 1998, in The Grid: Blueprint for a New Computing Infrastructure,<br />
ed. I. Foster & C. Kesselman (Morgan Kaufmann)<br />
Marquardt, D. W. 1963, Journal <strong>of</strong> <strong>the</strong> Society for Industrial and Applied Ma<strong>the</strong>matics,<br />
11, 431<br />
Maxted, P. f. L., Heber, U., Marsh, T. R., & North, R. C. 2001, MNRAS, 326, 1391<br />
Mengel, J. G., Norris, J., & Gross, P. G. 1976, ApJ, 204, 488<br />
Moehler, S., de Boer, K. S., & Heber, U. 1990a, A&A, 239, 265<br />
Moehler, S., Richtler, T., de Boer, K. S., Dettmar, R. J., & Heber, U. 1990b, A&AS,<br />
86, 53<br />
Möller, T. & Trumbore, B. 1997, Journal <strong>of</strong> Graphics Tools, 2, 21, see:<br />
http://www.acm.org/jgt/papers/MollerTrumbore97/
BIBLIOGRAPHY 157<br />
Moore, A. 1991, A tutorial on kd-trees, Extract from PhD Thesis, available from<br />
http://www.cs.cmu.edu/∼awm/papers.html<br />
Morgan, W. W., Abt, H. A., & Tapscott, J. W. 1978, Revised MK <strong>Spectra</strong>l Atlas for<br />
stars earlier than <strong>the</strong> sun (Williams Bay: Yerkes <strong>Observatory</strong>, and Tucson: Kitt Peak<br />
National <strong>Observatory</strong>, 1978)<br />
Morossi, C. & Crivellari, L. 1980, A&AS, 41, 299<br />
Mücke, E. P., Saias, I., & Zhu, B. 1996, in SCG ’96: Proceedings <strong>of</strong> <strong>the</strong> twelfth annual<br />
symposium on Computational geometry (New York, NY, USA: ACM Press), 274–283<br />
Murtagh, F. & Heck, A. 1987, Multivariate Data <strong>Analysis</strong> (Dordrecht, Holland: D.<br />
Reidel Publishing Co.)<br />
Napiwotzki, R., Karl, C. A., Lisker, T., Heber, U., Christlieb, N., Reimers, D., Nelemans,<br />
G., & Homeier, D. 2004, Ap&SS, 291, 321<br />
Newell, E. B. 1973, ApJS, 26, 37<br />
O’Rourke, J. 1998, Computational Geometry in C, 2nd edn. (Cambridge (UK) and<br />
New York: Cambridge University Press)<br />
Paczyński, B. 1971, Acta Astronomica, 21, 1<br />
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. 1986, Numerical<br />
Recipes: The Art <strong>of</strong> Scientific Computing, 1st edn. (Cambridge (UK) and New York:<br />
Cambridge University Press)<br />
Qin, D.-M., Guo, P., Hu, Z.-Y., & Zhao, Y.-H. 2003, Chinese Journal <strong>of</strong> Astronony and<br />
Astrophysics, 3, 277<br />
Reid, I. N., Brewer, C., Brucato, R. J., McKinley, W. R., Maury, A., Mendenhall,<br />
D., Mould, J. R., Mueller, J., Neugebauer, G., Phinney, J., Sargent, W. L. W.,<br />
Schombert, J., & Thicksten, R. 1991, PASP, 103, 661<br />
Renka, R. J. 1988, ACM Trans. Math. S<strong>of</strong>tw., 14, 139<br />
Rhee, J., Beers, T. C., & Irwin, M. J. 1999, Bulletin <strong>of</strong> <strong>the</strong> American Astronomical<br />
Society, 31, 971<br />
Russell, S. & Norvig, P. 2003, Artificial Intelligence A Modern Approach, 2nd edn.<br />
(Upper Saddle River, New Jersey 07458: Pearson Education Inc.)<br />
Saffer, R. A., Bergeron, P., Koester, D., & Liebert, J. 1994, ApJ, 432, 351<br />
Shepard, D. 1968, in Proceedings <strong>of</strong> <strong>the</strong> 1968 23rd ACM national conference (New<br />
York, NY, USA: ACM Press), 517–524<br />
Shewchuk, J. R. 1996, in SCG ’96: Proceedings <strong>of</strong> <strong>the</strong> twelfth annual symposium on<br />
Computational geometry (New York, NY, USA: ACM Press), 141–150<br />
Simkin, S. M. 1974, A&A, 31, 129<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
158 BIBLIOGRAPHY<br />
Singh, H. P., Gulati, R. K., & Gupta, R. 1998, MNRAS, 295, 312<br />
Skrutskie, M. F., Cutri, R. M., Stiening, R., Weinberg, M. D., Schneider, S., Carpenter,<br />
J. M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Liebert, J.,<br />
Lonsdale, C., Monet, D. G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J. D.,<br />
Gizis, J. E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R.,<br />
Kopan, E. L., Marsh, K. A., McCallon, H. L., Tam, R., Van Dyk, S., & Wheelock,<br />
S. 2006, AJ, 131, 1163<br />
Snider, S., Allende Prieto, C., von Hippel, T., Beers, T. C., Sneden, C., Qu, Y., &<br />
Rossi, S. 2001, ApJ, 562, 528<br />
Sodre, L. J., Cuevas, H., & Capelato, H. V. 1998, in Wide Field Surveys in Cosmology,<br />
14th IAP meeting held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN:<br />
2-8 6332-241-9, p. 424., ed. S. Colombi, Y. Mellier, & B. Raban, 424–+<br />
Storrie-Lombardi, M. C., Irwin, M. J., von Hippel, T., & Storrie-Lombardi, L. J. 1994,<br />
Vistas in Astronomy, 38, 331<br />
Sweigart, A. V. 1997, ApJ, 474, L23+<br />
Theissen, A., Moehler, S., Heber, U., & de Boer, K. S. 1993, A&A, 273, 524<br />
Thejll, P., Bauer, F., Saffer, R., Liebert, J., Kunze, D., & Shipman, H. L. 1994, ApJ,<br />
433, 819<br />
Tonry, J. & Davis, M. 1979, AJ, 84, 1511<br />
von Hippel, T., Storrie-Lombardi, L. J., Storrie-Lombardi, M. C., & Irwin, M. J. 1994,<br />
MNRAS, 269, 97<br />
Weaver, W. B. 2000a, Bulletin <strong>of</strong> <strong>the</strong> American Astronomical Society, 32, 1430<br />
—. 2000b, ApJ, 541, 298<br />
Weaver, W. B. & Torres-Dodgen, A. V. 1995, ApJ, 446, 300<br />
—. 1997, ApJ, 487, 847<br />
Weir, N., Fayyad, U. M., Djorgovski, S. G., & Roden, J. 1995, PASP, 107, 1243<br />
Wesemael, F., Winget, D. E., Cabot, W., van Horn, H. M., & Fontaine, G. 1982, ApJ,<br />
254, 221<br />
Whitney, C. A. 1983, A&AS, 51, 443<br />
Willemsen, P. G., Hilker, M., Kayser, A., & Bailer-Jones, C. A. L. 2005, A&A, 436,<br />
379<br />
York, D. G., Adelman, J., Anderson, J. E., Anderson, S. F., Annis, J., Bahcall, N. A.,<br />
Bakken, J. A., Barkhouser, R., Bastian, S., Berman, E., Boroski, W. N., Bracker, S.,<br />
Briegel, C., Briggs, J. W., Brinkmann, J., Brunner, R., Burles, S., Carey, L., Carr,<br />
M. A., Castander, F. J., Chen, B., Colestock, P. L., Connolly, A. J., Crocker, J. H.,
BIBLIOGRAPHY 159<br />
Csabai, I., Czarapata, P. C., Davis, J. E., Doi, M., Dombeck, T., Eisenstein, D.,<br />
Ellman, N., Elms, B. R., Evans, M. L., Fan, X., Federwitz, G. R., Fiscelli, L., Friedman,<br />
S., Frieman, J. A., Fukugita, M., Gillespie, B., Gunn, J. E., Gurbani, V. K.,<br />
de Haas, E., Haldeman, M., Harris, F. H., Hayes, J., Heckman, T. M., Hennessy,<br />
G. S., Hindsley, R. B., Holm, S., Holmgren, D. J., Huang, C.-h., Hull, C., Husby, D.,<br />
Ichikawa, S.-I., Ichikawa, T., Ivezić, Ž., Kent, S., Kim, R. S. J., Kinney, E., Klaene,<br />
M., Kleinman, A. N., Kleinman, S., Knapp, G. R., Korienek, J., Kron, R. G., Kunszt,<br />
P. Z., Lamb, D. Q., Lee, B., Leger, R. F., Limmongkol, S., Lindenmeyer, C.,<br />
Long, D. C., Loomis, C., Loveday, J., Lucinio, R., Lupton, R. H., MacKinnon, B.,<br />
Mannery, E. J., Mantsch, P. M., Margon, B., McGehee, P., McKay, T. A., Meiksin,<br />
A., Merelli, A., Monet, D. G., Munn, J. A., Narayanan, V. K., Nash, T., Neilsen,<br />
E., Neswold, R., Newberg, H. J., Nichol, R. C., Nicinski, T., Nonino, M., Okada, N.,<br />
Okamura, S., Ostriker, J. P., Owen, R., Pauls, A. G., Peoples, J., Peterson, R. L.,<br />
Petravick, D., Pier, J. R., Pope, A., Pordes, R., Prosapio, A., Rechenmacher, R.,<br />
Quinn, T. R., Richards, G. T., Richmond, M. W., Rivetta, C. H., Rockosi, C. M.,<br />
Ruthmansdorfer, K., Sandford, D., Schlegel, D. J., Schneider, D. P., Sekiguchi, M.,<br />
Sergey, G., Shimasaku, K., Siegmund, W. A., Smee, S., Smith, J. A., Snedden, S.,<br />
Stone, R., Stoughton, C., Strauss, M. A., Stubbs, C., SubbaRao, M., Szalay, A. S.,<br />
Szapudi, I., Szokoly, G. P., Thakar, A. R., Tremonti, C., Tucker, D. L., Uomoto, A.,<br />
Vanden Berk, D., Vogeley, M. S., Waddell, P., Wang, S.-i., Watanabe, M., Weinberg,<br />
D. H., Yanny, B., & Yasuda, N. 2000, AJ, 120, 1579<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Appendices<br />
161
Appendix A<br />
Results for 192 Drilling et al.<br />
(2006) Hot Subdwarfs<br />
This table lists <strong>the</strong> parameterisation results for both <strong>the</strong> calibrated and uncalibrated<br />
stars obtained from Drilling et al. (2006). Results obtained from <strong>the</strong> parameterisation<br />
neural network and SFIT are given, with <strong>the</strong> internal errors <strong>of</strong> SFIT also listed.<br />
163
Table A.1: Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs<br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
BD-07 3477 27748 1362 5.420 0.120 -2.673 0.204 1.90E-01 26360.4620 5.4370 -2.4641<br />
BD+25 3941 28478 447 4.645 0.058 -1.422 0.034 1.89E+00 30794.9018 4.7807 -2.2476<br />
BD+28 4211 48135 1120 5.773 0.072 -1.121 0.040 3.72E-01 59518.9743 6.6219 -3.3647<br />
BD+40 4032 27895 713 4.083 0.069 -1.079 0.036 1.02E+00 27197.0607 3.7571 -0.9165<br />
Feige 110 40000 196 5.776 0.042 -2.020 0.136 8.78E-01 45638.2315 5.9658 -3.3709<br />
Feige 15 12000 162 4.500 0.053 -2.201 0.276 2.48E+00 13763.3319 3.9102 -1.4225<br />
Feige 38 29629 504 5.546 0.055 -2.483 0.132 2.71E-01 30014.7462 5.5768 -2.5487<br />
Feige 56 15571 279 3.608 0.048 -1.765 0.101 2.80E-01 17780.0476 3.7579 -2.3262<br />
Feige 98 11590 196 3.793 0.064 -2.541 0.302 7.42E-01 13083.8300 3.8179 -3.0230<br />
FHB 18 10819 133 4.179 0.037 -1.602 0.052 5.28E-01 12901.1081 4.2528 -2.7523<br />
FHB 23 11646 189 4.394 0.053 -2.586 0.334 4.27E-01 14271.8636 4.4317 -3.3414<br />
HD 144941 22000 348 3.835 0.055 0.963 0.008 1.31E+00 21681.7225 3.6821 1.6263<br />
HD 160641 30614 207 3.153 0.025 2.237 0.225 5.06E+00 31303.9914 2.8086 1.8212<br />
HD 17520 34793 723 3.804 0.067 -0.661 0.030 1.46E+00 35330.3819 4.0278 -0.9810<br />
HD 184279 25927 292 3.917 0.034 -0.400 0.016 2.06E+00 28409.0447 3.9086 -0.0132<br />
HD 192281 11337 194 4.012 0.044 -1.856 0.405 6.63E-01 11143.6105 3.6825 -2.2308<br />
HD 217086 37427 439 4.691 0.065 -1.237 0.045 8.02E-01 45007.2314 5.1943 -1.5241<br />
Hiltner 600 29739 564 4.717 0.063 -1.112 0.034 6.06E-01 27869.8814 4.7985 -0.9717<br />
HR 6092 22693 545 4.026 0.070 -1.477 0.039 4.36E+00 14259.7656 2.7584 -0.0691<br />
HR 6588 23305 466 3.682 0.063 -0.938 0.033 3.60E+00 16634.1650 2.5556 -0.1572<br />
continued on next page<br />
164 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table A.1: continued<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
HR 6719 25813 814 3.500 0.056 -0.597 0.026 2.68E+00 19240.4897 2.3722 -0.0260<br />
HR 7287 19088 391 3.598 0.058 -1.929 0.074 4.50E+00 12007.6482 2.1732 0.1142<br />
HR 8622 28951 1186 2.919 0.063 0.107 0.006 3.06E+00 29216.7530 2.8812 0.2961<br />
HS 0016+0044 29725 540 5.523 0.075 -2.939 0.377 6.74E-01 30591.2646 5.7893 -3.2113<br />
HS 1000+471 40766 137 5.659 0.036 0.521 0.013 6.30E+00 35501.0360 4.3467 -0.6256<br />
HS 1844+637 29045 177 3.633 0.026 0.976 0.008 8.46E+00 39133.7152 4.5950 3.4902<br />
HS 2253+0900 13534 281 3.863 0.060 -1.156 0.062 5.37E-01 13688.8067 3.9192 -1.8474<br />
HS 2301+0728 17658 528 4.378 0.065 -2.771 0.257 2.29E+00 10788.7216 2.7773 -3.8880<br />
HZ 15 20434 584 3.000 0.065 -0.630 0.025 2.37E+00 24751.3086 3.0214 -1.0013<br />
HZ 44 38507 224 5.381 0.040 0.088 0.003 1.58E+00 37595.6085 4.8985 -0.1747<br />
LSIV-14 37999 240 5.648 0.045 -0.573 0.017 3.63E+00 37185.7816 5.3742 -0.8631<br />
LSIV-6 28974 257 3.747 0.046 2.522 0.144 2.73E+00 26745.5011 2.7521 3.2416<br />
LSS 5121 30511 245 3.216 0.037 2.273 0.163 3.95E+00 32165.1540 3.2400 1.0310<br />
PG0001+275 35180 300 5.406 0.053 -3.000 0.434 4.31E+00 34205.9951 5.0684 -4.2459<br />
PG0004+133 26205 1118 4.828 0.110 -2.037 0.094 3.67E+00 32994.6787 5.2114 -1.8659<br />
PG0009+036 20214 629 4.496 0.084 -2.488 0.134 2.29E+00 23334.3874 4.8444 -2.5336<br />
PG0039+049 28606 391 4.668 0.068 -2.995 0.430 2.26E+00 20927.6745 3.7393 -3.3758<br />
PG0039+135 45029 212 5.408 0.081 0.514 0.030 3.51E+00 42789.5834 5.3039 -0.3035<br />
PG0057+155 32203 375 5.500 0.063 -1.785 0.053 1.02E+00 33110.1795 5.5769 -2.0610<br />
PG0101+039 27565 1344 5.357 0.114 -3.000 0.434 1.66E+00 27132.9594 5.1346 -2.8721<br />
PG0133+114 35999 61 6.000 0.034 -2.995 0.430 1.54E+00 30996.2889 5.0018 -3.0707<br />
continued on next page<br />
165
Table A.1: continued<br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG0135+242 25308 323 3.375 0.057 2.452 0.123 3.30E+00 22720.6398 2.5661 2.3728<br />
PG0142+148 28738 565 5.022 0.080 -2.966 0.402 3.72E+00 33990.4595 5.0653 -3.3974<br />
PG0208+016 44506 160 5.926 0.041 1.218 0.043 4.17E+00 42334.8969 5.7844 0.7455<br />
PG0229+064 18305 192 3.991 0.056 -0.798 0.035 9.77E-01 23366.1468 4.5001 -1.1572<br />
PG0232+095 35000 435 4.861 0.030 -1.392 0.053 5.88E+00 26042.8910 3.8106 -2.5768<br />
PG0304+183 29953 710 5.182 0.077 -2.913 0.356 7.50E+00 25550.8542 4.7584 -3.9436<br />
PG0314+146 8541 37 1.235 0.028 -2.004 0.088 7.27E+00 4603.7402 1.2050 -3.8688<br />
PG0342+026 21878 819 4.731 0.073 -3.000 0.434 2.06E+00 30227.8532 5.3591 -3.3190<br />
PG0838+133 40055 139 4.500 0.031 0.990 0.013 3.52E+00 46872.5237 4.9328 4.0786<br />
PG0856+121 26869 620 5.600 0.067 -3.000 0.434 2.70E+00 27571.6464 5.7342 -3.3096<br />
PG0902+058 42352 111 6.000 0.041 1.912 0.106 3.85E+00 42180.4296 5.7766 1.9591<br />
PG0907+123 26482 641 5.075 0.071 -3.000 0.434 4.06E+00 24278.8101 4.9274 -3.3162<br />
PG0909+164 31880 562 4.847 0.086 -3.000 0.434 1.57E+00 35575.3699 4.6247 -3.7231<br />
PG0909+275 34009 427 4.795 0.057 -0.685 0.022 1.93E+00 36507.4485 5.1157 -1.2373<br />
PG0918+029 31029 310 5.500 0.066 -2.649 0.193 2.41E+00 24129.8781 4.4794 -2.2902<br />
PG0920+029 25992 1329 4.781 0.111 -3.000 0.434 1.48E+00 27712.3799 4.8930 -3.3238<br />
PG0921+161 32868 292 5.329 0.060 -1.640 0.057 2.59E+00 35284.4531 5.1844 -3.0325<br />
PG0921+311 42320 137 5.873 0.047 1.068 0.015 2.72E+00 41033.2928 5.5140 0.5536<br />
PG0934+145 16681 212 4.031 0.037 -0.899 0.034 2.22E+00 13314.2430 3.7870 -0.7299<br />
PG0954+049 13384 239 3.399 0.058 -1.526 0.087 1.31E+00 12465.6053 3.0812 -2.6841<br />
PG1000+375 32896 237 5.814 0.053 -1.761 0.050 1.31E+00 20945.0047 4.8115 -1.6003<br />
continued on next page<br />
166 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table A.1: continued<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG1017+431 32192 454 4.816 0.074 -2.991 0.425 5.45E-01 44773.4090 5.8294 -3.6347<br />
PG1018-047 30361 252 5.365 0.057 -3.000 0.434 2.23E+00 30145.1825 5.3885 -5.4213<br />
PG1047+003 32977 318 5.459 0.066 -2.130 0.117 4.07E-01 34846.5851 5.5663 -2.5655<br />
PG1049+013 32754 427 4.725 0.069 -2.610 0.177 8.38E-01 44195.8859 5.5828 -2.9385<br />
PG1050-065 34509 236 5.591 0.047 -1.212 0.035 2.91E+00 36241.7231 5.3524 -1.2791<br />
PG1118+061 28321 386 5.224 0.065 -3.000 0.434 3.84E-01 27695.5481 5.1031 -2.9965<br />
PG1127+019 40812 131 4.965 0.070 1.940 0.265 1.81E+00 39675.7493 4.6297 2.1534<br />
PG1136-003 30576 260 5.250 0.056 -3.000 0.434 4.32E-01 28147.6630 5.0249 -3.0350<br />
PG1154-070 28000 963 5.430 0.092 -2.200 0.069 3.92E-01 26478.4624 5.2617 -2.1939<br />
PG1220-056 49308 883 5.460 0.107 0.309 0.013 8.09E-01 52185.9827 6.0800 -1.0282<br />
PG1230+067 38843 191 4.926 0.056 1.013 0.013 2.36E+00 40314.1570 5.0314 2.8373<br />
PG1245-042 15232 266 3.803 0.056 -1.820 0.144 5.97E-01 15548.7313 3.8416 -1.4344<br />
PG1246-122 32573 361 4.001 0.044 -1.012 0.035 3.11E+00 33475.4066 3.7321 -2.2059<br />
PG1249+762 50000 276 5.623 0.087 0.368 0.028 1.06E+00 65439.9717 6.7602 0.0994<br />
PG1255+547 32774 330 5.500 0.059 -1.547 0.046 1.20E+00 30935.1727 5.6242 -2.1342<br />
PG1258-030 13075 278 3.656 0.048 -2.286 0.168 1.03E+00 12502.8404 3.3526 -2.0501<br />
PG1300+279 48677 632 5.955 0.053 -0.041 0.002 1.56E+00 46441.3479 6.5777 -0.9764<br />
PG1303-114 31245 354 5.502 0.068 -3.000 0.434 1.82E+00 30588.0996 5.4478 -5.0836<br />
PG1325+054 44232 208 5.915 0.033 0.326 0.011 2.18E+00 45019.5994 6.0397 1.9269<br />
PG1336-018 31271 245 5.567 0.049 -2.519 0.143 6.11E-01 37061.5891 5.9196 -3.4263<br />
PG1343-102 29958 707 5.424 0.087 -2.910 0.353 6.71E-01 31263.5069 5.3800 -3.4850<br />
continued on next page<br />
167
Table A.1: continued<br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG1343+578 15335 301 3.396 0.062 -1.835 0.089 7.67E-01 14789.0812 3.2773 -2.8106<br />
PG1348+607 45000 223 5.386 0.076 0.000 0.000 2.17E+00 56671.0493 6.6681 2.0785<br />
PG1352-023 47661 1197 6.000 0.091 -1.723 0.092 2.46E+00 54114.0712 6.2103 -3.7261<br />
PG1355-064 49999 733 5.299 0.115 0.364 0.029 9.95E-01 55817.7736 5.5139 -1.3330<br />
PG1401+289 47629 247 5.753 0.087 0.184 0.014 9.22E-01 43645.2377 5.6074 0.1812<br />
PG1409-103 39399 768 5.008 0.089 -2.166 0.127 1.45E+00 50621.5726 5.7496 -4.4764<br />
PG1413+114 43416 117 6.000 0.039 1.261 0.063 3.54E+00 44247.9760 5.7971 3.0220<br />
PG1415+492 31467 283 4.135 0.047 2.512 0.141 2.26E+00 30378.3273 3.7367 2.7468<br />
PG1426-067 34262 380 5.368 0.066 -3.000 0.434 1.17E+00 34904.3068 5.1176 -2.8789<br />
PG1432+004 24561 1051 4.987 0.114 -2.308 0.088 9.80E-01 24852.4845 5.0079 -2.5775<br />
PG1433+239 35306 378 5.345 0.061 -2.970 0.405 3.89E+00 39665.7300 5.3344 -3.5664<br />
PG1441+407 49802 277 6.000 0.089 0.464 0.038 1.61E+00 46391.6786 5.5498 0.7923<br />
PG1448-052 32489 304 5.189 0.058 -3.000 0.434 7.68E-01 50271.3965 6.1452 -3.8949<br />
PG1449+652 30456 269 4.598 0.057 -3.000 0.434 1.23E+00 22392.8613 3.6074 -4.2284<br />
PG1451+492 17996 482 3.999 0.067 -1.919 0.108 8.85E-01 21898.4813 4.1696 -2.0822<br />
PG1453-081 16393 281 3.977 0.068 -1.196 0.095 1.02E+00 20030.3017 3.9852 -1.7400<br />
PG1453-085 12264 169 3.175 0.044 -2.777 0.260 6.93E-01 14759.0551 3.3091 -3.1086<br />
PG1458+423 29151 811 5.000 0.104 -2.995 0.430 6.94E-01 29104.1751 4.8454 -3.9124<br />
PG1506-052 38002 479 5.227 0.071 -2.226 0.073 2.32E+00 57963.7346 6.0973 -6.7294<br />
PG1510+635 12489 184 3.256 0.048 -2.906 0.350 2.53E+00 14137.8470 3.2197 -3.4824<br />
PG1518-098 47134 1376 5.000 0.098 0.276 0.029 1.04E+00 63938.6641 5.7489 2.8699<br />
continued on next page<br />
168 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table A.1: continued<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG1519+640 28801 669 5.000 0.099 -2.779 0.261 8.55E-01 29147.1760 4.9799 -3.1609<br />
PG1526+440 41199 139 5.826 0.037 0.738 0.023 3.90E+00 37868.3371 4.9471 -0.1320<br />
PG1532+523 30618 263 5.257 0.056 -2.935 0.374 1.03E+00 25389.6717 4.5495 -3.1041<br />
PG1534-018 45010 166 5.656 0.043 1.224 0.087 8.31E-01 43114.3697 5.3541 0.8758<br />
PG1536+690 50000 277 5.647 0.087 0.364 0.028 9.38E-01 55913.0919 6.1882 -0.0670<br />
PG1537-046 47494 676 5.249 0.108 0.174 0.008 1.17E+00 58302.0139 6.0350 2.8671<br />
PG1538+401 32260 320 5.446 0.064 -3.000 0.434 8.57E-01 34944.9166 5.5433 -2.9674<br />
PG1538+611 30528 307 5.416 0.063 -3.000 0.434 1.32E+00 29648.9555 5.0240 -4.0399<br />
PG1543+629 40002 185 5.536 0.043 -2.137 0.119 1.07E+00 51462.0631 5.9720 -3.9767<br />
PG1544+488 30992 273 4.202 0.046 2.522 0.144 1.50E+00 33677.6132 4.4517 3.5425<br />
PG1544+601 30000 518 5.543 0.054 -2.579 0.165 4.27E-01 28454.0634 5.5260 -2.6698<br />
PG1545+035 38820 419 5.000 0.083 -1.081 0.036 1.43E+00 47852.7894 5.5488 -4.0558<br />
PG1549+006 31939 559 5.288 0.070 -2.074 0.103 1.08E+00 29109.5245 5.0910 -1.8754<br />
PG1553-077 44904 210 5.707 0.045 0.256 0.014 1.56E+00 42387.5197 5.3536 1.0981<br />
PG1554+408 34356 255 4.320 0.056 2.522 0.289 3.41E+00 38902.7912 4.9593 2.1864<br />
PG1558-007 21419 753 4.922 0.081 -2.709 0.222 4.26E-01 26843.3693 5.3108 -2.7541<br />
PG1559+048 36325 228 5.600 0.049 -0.938 0.033 1.05E+00 36003.1956 5.3491 -0.9926<br />
PG1559+222 42434 138 6.000 0.047 2.129 0.117 2.79E+00 41856.4568 5.5616 1.0422<br />
PG1559+533 29420 633 5.500 0.090 -2.817 0.285 1.52E+00 21995.1194 5.0622 -1.9979<br />
PG1600+171 45883 363 5.999 0.092 0.943 0.026 3.97E+00 42817.5697 5.4414 1.6103<br />
PG1602+013 39961 202 5.592 0.042 -1.995 0.129 1.49E+00 53840.6098 6.1985 -3.8367<br />
continued on next page<br />
169
Table A.1: continued<br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG1605+072 30000 824 4.779 0.088 -2.790 0.268 2.97E-01 31978.9966 4.9655 -2.6540<br />
PG1607+174 32181 335 4.674 0.046 -0.268 0.009 5.28E+00 31842.6743 4.4473 -0.4410<br />
PG1610+519 31595 380 4.537 0.070 -3.000 0.434 5.74E-01 38165.4762 4.8614 -4.8178<br />
PG1613+467 23590 1075 4.612 0.083 -2.541 0.151 9.68E-01 29674.0969 5.1593 -3.3097<br />
PG1615+413 28553 189 3.759 0.018 0.990 0.008 3.70E+00 36907.9108 4.9056 2.1208<br />
PG1618+563 33309 341 5.481 0.069 -1.513 0.042 2.17E+00 36466.0770 5.8061 -2.4338<br />
PG1619+525 31178 318 5.500 0.067 -2.371 0.102 4.17E-01 30888.0423 5.4437 -2.4573<br />
PG1624+085 43897 205 5.752 0.058 0.623 0.045 1.94E+00 41755.1142 5.5719 0.1620<br />
PG1627+006 23222 593 5.193 0.068 -2.899 0.344 1.01E+00 20611.6950 4.8277 -2.7701<br />
PG1627+017 22959 522 5.206 0.065 -2.700 0.218 3.61E-01 22424.9747 5.2100 -2.8087<br />
PG1629+466 35779 273 5.000 0.035 0.767 0.030 1.21E+00 37328.4040 5.1599 0.6611<br />
PG1640+645 34458 262 5.591 0.048 -1.590 0.051 4.84E-01 35698.5308 5.9373 -2.2544<br />
PG1644+404 28221 457 5.060 0.062 -3.000 0.434 7.48E-01 32326.3238 5.4986 -3.2872<br />
PG1645+610 28377 522 5.411 0.080 -2.391 0.107 8.12E-01 17587.7044 4.7845 -2.2044<br />
PG1646+607 47595 668 6.000 0.057 -0.032 0.001 2.31E+00 46365.3539 6.0318 0.4509<br />
PG1648+315 41835 130 6.000 0.046 1.040 0.014 4.08E+00 36749.1785 4.9233 0.0589<br />
PG1648+536 30145 630 5.001 0.093 -3.000 0.434 7.59E-01 31819.6176 5.0395 -4.0618<br />
PG1653+633 34667 268 5.790 0.055 -1.693 0.043 1.01E+00 38094.6200 6.2811 -2.4102<br />
PG1656+600 30481 252 6.000 0.056 -2.628 0.184 1.51E+00 33912.5833 6.0668 -3.3172<br />
PG1658+273 43059 113 6.000 0.038 1.261 0.063 3.25E+00 40736.4516 4.8451 1.2766<br />
PG1701+359 32615 88 5.918 0.021 -2.604 0.175 5.87E-01 32564.5119 5.5194 -3.9213<br />
continued on next page<br />
170 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table A.1: continued<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG1704+222 15926 334 2.561 0.053 -1.131 0.053 5.49E-01 19339.7888 2.9039 -1.5097<br />
PG1705+537 15025 341 3.437 0.062 -1.356 0.049 1.23E+00 19121.6772 3.7978 -1.7138<br />
PG1707+657 36119 114 5.993 0.033 -1.902 0.104 6.97E-01 32424.6910 5.4695 -1.9184<br />
PG1708+142 18154 459 3.480 0.080 -1.245 0.068 2.06E+00 20510.9488 3.6958 -1.5666<br />
PG1708+602 42477 775 5.256 0.074 -0.945 0.060 8.49E-01 52356.2200 5.9487 -3.2523<br />
PG1710+490 29497 546 5.102 0.069 -2.705 0.220 1.06E+00 30354.3920 5.2502 -3.0947<br />
PG1710+567 33908 486 5.110 0.033 -1.538 0.045 9.87E-01 30094.2953 4.8112 -2.7425<br />
PG1715+273 29074 227 3.797 0.016 1.070 0.010 4.32E+00 35385.5095 4.5349 3.0662<br />
PG1717+423 23293 615 4.654 0.074 -2.991 0.425 1.12E+00 26991.4606 4.9110 -3.2374<br />
PG1722+286 33802 292 5.795 0.058 -1.757 0.050 1.33E+00 31508.9630 5.7798 -1.6469<br />
PG1724+590 28706 605 5.000 0.094 -2.420 0.114 9.74E-01 27423.4056 4.9714 -3.2446<br />
PG1738+505 24113 746 4.922 0.081 -1.829 0.059 9.28E-01 28143.7622 5.3359 -1.8641<br />
PG1739+489 22474 539 4.569 0.066 -2.744 0.241 7.71E-01 28940.6842 5.0868 -3.4167<br />
PG1743+477 26873 1206 4.904 0.123 -2.214 0.071 1.82E+00 26957.5253 4.8522 -3.3545<br />
PG2059+013 33086 233 5.583 0.046 -1.607 0.053 2.64E+00 31942.9438 5.5146 -1.9470<br />
PG2111+023 14681 254 4.000 0.055 -1.274 0.073 3.69E+00 18712.6960 4.0693 -2.2958<br />
PG2120+062 32008 512 4.133 0.052 -0.950 0.046 7.43E-01 32664.6531 4.1185 -1.1104<br />
PG2148+095 30001 806 4.555 0.082 -3.000 0.434 5.46E-01 25977.6010 3.9359 -3.3697<br />
PG2151+100 35789 66 5.941 0.034 -2.677 0.206 7.84E-01 41296.8002 5.7262 -3.0819<br />
PG2158+082 49999 743 5.500 0.119 0.364 0.029 1.28E+00 53960.7462 5.8944 -1.2564<br />
PG2159+051 13496 252 3.244 0.050 -1.441 0.084 3.91E-01 13921.4240 3.3088 -1.7585<br />
continued on next page<br />
171
Table A.1: continued<br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PG2204+035 31535 302 5.917 0.059 -2.506 0.139 3.75E+00 23039.5221 5.2470 -2.1168<br />
PG2205+023 27156 646 5.622 0.069 -3.000 0.434 1.06E+00 24413.7057 5.4836 -3.0047<br />
PG2215+151 45566 318 5.841 0.068 1.951 0.620 4.56E+00 41433.7407 5.4859 -0.1429<br />
PG2218+020 21062 711 4.768 0.070 -2.707 0.221 1.34E+00 30254.5605 5.4740 -2.9763<br />
PG2219+094 19206 754 3.546 0.068 -1.457 0.112 1.39E+00 24813.0492 4.0618 -1.7815<br />
PG2229+099 16940 236 3.755 0.053 -0.958 0.031 1.41E+00 18337.4404 3.8720 -1.3692<br />
PG2258+155 34000 381 4.481 0.046 0.945 0.008 3.68E+00 37528.2798 5.2630 2.4797<br />
PG2259+134 31323 396 5.772 0.057 -1.975 0.082 1.35E+00 31934.3911 5.8685 -2.0224<br />
PG2301+259 18959 395 4.217 0.061 -1.904 0.070 6.67E-01 16716.7103 4.1106 -1.5974<br />
PG2314+076 30140 305 5.640 0.050 -3.000 0.434 3.38E+00 31564.2513 5.1605 -3.1646<br />
PG2317+046 33177 797 4.504 0.104 -3.000 0.434 1.54E+00 40658.1786 4.6877 -3.3510<br />
PG2318+239 16940 296 3.778 0.036 -1.392 0.075 1.47E+00 20890.7160 4.2557 -1.8445<br />
PG2321+214 38502 268 4.977 0.063 2.314 0.179 1.52E+00 39171.8164 5.0203 2.4143<br />
PG2331+038 29017 428 5.642 0.054 -2.401 0.109 3.59E+00 32370.5612 5.8341 -2.7235<br />
PG2337+070 29563 670 5.735 0.060 -1.997 0.086 7.85E-01 29133.8280 5.8738 -1.4688<br />
PG2339+199 30763 396 4.189 0.043 1.042 0.009 4.17E+00 33495.0283 4.4077 1.0705<br />
PG2345+241 17743 288 3.699 0.044 -0.954 0.046 6.76E-01 19581.9082 3.8835 -1.0702<br />
PG2349+002 28383 334 5.600 0.053 -3.000 0.434 2.00E+00 26081.2311 5.2581 -4.0008<br />
PG2351+198 14539 269 3.768 0.046 -1.380 0.094 1.64E+00 13733.0764 4.0071 -1.6889<br />
PG2352+181 47309 246 5.873 0.089 0.264 0.021 2.81E+00 44348.4299 5.7750 -0.4701<br />
PG2358+107 23768 966 4.978 0.105 -2.685 0.210 3.04E+00 24626.4068 4.9117 -3.6593<br />
continued on next page<br />
172 Chapter A - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table A.1: continued<br />
SFIT Results<br />
ANN Results<br />
Identifier T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2 T eff log g log(n He /n H )<br />
(K) (cgs) (K) (cgs)<br />
PHL 1079 31862 389 5.589 0.055 -2.303 0.087 1.32E+00 31739.7364 5.6372 -2.0741<br />
PHL 4 40197 685 5.000 0.093 -1.254 0.062 6.59E-01 50528.5221 5.6359 -2.8566<br />
TON 107 39369 266 5.602 0.039 -0.076 0.003 4.42E+00 36793.9981 4.8072 -0.2664<br />
VZ1128 M3 34893 388 4.500 0.058 -0.968 0.044 8.72E-01 35857.4448 4.3289 -2.0402<br />
173<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Appendix B<br />
Results for 282 SDSS DR3 Hot<br />
Subdwarf Candidates<br />
This table lists <strong>the</strong> classification and parameterisation results for <strong>the</strong> SDSS hot subdwarf<br />
candidates <strong>of</strong> Chapter 5. Also listed for each star are its position and redshift as<br />
obtained by <strong>the</strong> SDSS. The internal errors <strong>of</strong> SFIT are given, along with <strong>the</strong> value <strong>of</strong><br />
χ 2 for <strong>the</strong> best fit.<br />
175
Table B.1: Results for 282 SDSS Hot Subdwarf Candidates<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J000607.88-010320.8 00:06:07.88 -01:03:20.8 -279.634 26.6456 sdO7VII:He39 42788 155 5.280 0.060 2.093 0.108 6.17E+00<br />
J001651.42-011329.3 00:16:51.42 -01:13:29.3 115.816 44.9755 sdO2VII:He26 47737 3077 3.726 0.083 0.364 0.044 2.59E+00<br />
J001837.14+152150.0 00:18:37.14 +15:21:50.0 -73.1296 25.5077 sdO9VII:He39 38579 94 4.555 0.026 2.522 0.433 6.46E+00<br />
J001930.36+135530.9 00:19:30.36 +13:55:30.9 -225.249 34.6605 sdB0VI:He12 29686 475 5.382 0.045 -1.120 0.023 6.17E+00<br />
J002323.99-002953.3 00:23:23.99 -00:29:53.3 -5.64728 33.4529 sdB3VI:He5 30771 241 5.746 0.042 -1.873 0.032 6.68E-01<br />
J002852.26+135446.5 00:28:52.26 +13:54:46.5 35.3602 31.5028 sdO9VI:He8 33187 387 4.944 0.062 -2.692 0.214 9.48E-01<br />
J004233.43+004717.6 00:42:33.43 +00:47:17.6 24.2894 32.9952 sdB4V:He1 29989 668 4.987 0.080 -3.000 0.434 2.73E+00<br />
J011506.17+140513.5 01:15:06.17 +14:05:13.5 -162.802 29.9172 sdB3IV:He7 32000 380 4.545 0.056 -2.991 0.425 7.56E+00<br />
J013847.59+141532.1 01:38:47.59 +14:15:32.1 -188.872 33.771 sdO9VIII:He7 27012 612 4.972 0.072 -2.212 0.071 7.17E+00<br />
J015026.10-094226.9 01:50:26.10 -09:42:26.9 -18.51 33.4439 sdB8VI:He10 33657 341 5.303 0.029 -1.402 0.022 1.58E+00<br />
J021617.11-095513.1 02:16:17.11 -09:55:13.1 -29.2216 33.6757 sdO8VI:He12 35372 180 5.561 0.034 -1.195 0.027 9.74E-01<br />
J023032.65-081439.5 02:30:32.65 -08:14:39.5 -194.8 28.2265 sdB6IV:He3 13683 176 3.755 0.037 -1.399 0.065 7.83E-01<br />
J031620.13+004222.9 03:16:20.13 +00:42:22.9 -23.1856 31.5085 sdB0V:He8 32906 81 5.692 0.016 -2.227 0.073 1.67E+00<br />
J031854.14+004135.0 03:18:54.14 +00:41:35.0 6.24576 32.428 sdB7IV:He2 20554 512 4.500 0.063 -2.047 0.097 1.82E+00<br />
J033358.21+002007.5 03:33:58.21 +00:20:07.5 117.167 34.2372 sdB2VI:He13 34164 234 5.270 0.036 -0.771 0.017 6.59E+00<br />
J073712.28+264224.7 07:37:12.28 +26:42:24.7 8.71841 31.8302 sdO9VI:He7 32799 75 5.788 0.019 -2.278 0.082 1.71E+00<br />
J073856.99+401942.1 07:38:56.99 +40:19:42.1 -202.597 22.922 sdO1VII:He34 50000 547 5.495 0.087 0.644 0.047 2.18E+00<br />
J074001.91+240127.4 07:40:01.91 +24:01:27.4 -130.711 32.684 sdA1IV:He0 30156 269 5.085 0.042 -2.991 0.425 7.72E+00<br />
J074458.10+324259.9 07:44:58.10 +32:42:59.9 50.6134 38.7134 sdO9VII:He24 38824 74 5.996 0.025 -0.512 0.010 8.41E+00<br />
J074534.16+372718.6 07:45:34.16 +37:27:18.6 29.853 33.6007 sdB0VII:He4 35000 155 5.394 0.033 -3.000 0.434 2.44E+00<br />
J074613.17+333307.6 07:46:13.17 +33:33:07.6 -13.6868 23.2616 sdO7VII:He39 44962 100 6.000 0.031 1.261 0.047 1.60E+00<br />
J074720.59+384910.7 07:47:20.59 +38:49:10.7 25.1426 29.7373 sdB7IV:He3 16696 250 3.821 0.047 -1.588 0.101 2.74E+00<br />
continued on next page<br />
176 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J074806.15+342927.7 07:48:06.15 +34:29:27.7 -54.5131 34.0109 sdB0VI:He6 33653 269 5.478 0.053 -1.697 0.043 1.38E+00<br />
J074811.34+435239.6 07:48:11.34 +43:52:39.6 26.4415 32.7577 sdB1VI:He8 25154 617 5.003 0.067 -1.463 0.025 1.29E+00<br />
J075236.78+441642.5 07:52:36.78 +44:16:42.5 -102.636 34.6248 sdB1V:He9 34243 241 5.496 0.019 -1.059 0.020 2.63E+00<br />
J075249.96+305935.2 07:52:49.96 +30:59:35.2 144.115 29.591 sdB4V:He34 39999 182 5.576 0.027 1.134 0.012 9.97E+00<br />
J080259.80+411438.0 08:02:59.80 +41:14:38.0 -26.9931 22.3976 sdO7VII:He39 44340 95 6.000 0.030 1.261 0.047 1.68E+00<br />
J080628.10+323059.4 08:06:28.10 +32:30:59.4 -38.8078 32.5904 sdB2V:He11 30983 240 5.650 0.035 -1.332 0.019 8.91E-01<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
J080726.80+303501.8 08:07:26.80 +30:35:01.8 -65.0792 28.5258 sdB5IV:He4 13573 172 3.697 0.035 -1.552 0.062 8.53E-01<br />
J081342.92+275034.8 08:13:42.92 +27:50:34.8 -5.38343 32.9205 sdB3VI:He2 27634 681 5.500 0.070 -2.294 0.085 1.28E+00<br />
J081540.66+430524.5 08:15:40.66 +43:05:24.5 131.458 38.0934 sdB4V:He32 37418 195 5.536 0.033 -0.307 0.008 1.52E+01<br />
J081607.91+480349.7 08:16:07.91 +48:03:49.7 -26.1981 31.9453 sdB2VI:He5 24992 545 5.359 0.058 -2.293 0.085 1.18E+00<br />
J082751.06+410925.9 08:27:51.06 +41:09:25.9 -21.3762 30.7086 sdO5VII:He36 48823 522 5.503 0.084 0.397 0.023 1.30E+01<br />
J082802.04+404009.0 08:28:02.04 +40:40:09.0 -182.813 30.9737 sdO4VII:He11 36831 344 5.088 0.050 -2.117 0.114 3.48E+00<br />
J083006.17+475150.4 08:30:06.17 +47:51:50.4 -6.3029 32.2109 sdB0VII:He3 27934 1042 5.330 0.084 -2.892 0.339 7.32E-01<br />
J083241.96+483445.1 08:32:41.96 +48:34:45.1 26.2838 29.9143 sdB4V:He5 18422 288 4.241 0.036 -1.521 0.072 2.29E+00<br />
J083456.98+422053.2 08:34:56.98 +42:20:53.2 13.8063 28.1785 sdB6III:He5 10816 93 3.075 0.028 -1.789 0.187 3.02E+00<br />
J083842.71+053309.5 08:38:42.71 +05:33:09.5 68.4768 32.1614 sdO9VII:He5 30554 214 5.502 0.047 -2.281 0.083 9.13E-01<br />
J083935.91+030840.8 08:39:35.91 +03:08:40.8 21.5494 31.5714 sdB0VI:He10 35310 415 4.854 0.057 -2.592 0.170 7.74E-01<br />
J084122.67+063029.6 08:41:22.67 +06:30:29.6 -11.2503 31.7429 sdB0VI:He6 32357 79 5.620 0.015 -2.141 0.060 8.36E-01<br />
J084413.77+023229.3 08:44:13.77 +02:32:29.3 268.878 31.7126 sdB6IV:He0 13225 156 4.036 0.033 -2.156 0.124 3.87E+00<br />
J084556.16+542357.6 08:45:56.16 +54:23:57.6 6.03341 31.7933 sdB5IV:He11 30069 344 5.053 0.036 -1.170 0.026 1.02E+01<br />
J084727.88+024814.8 08:47:27.88 +02:48:14.8 136.219 28.589 sdB5IV:He2 13429 182 3.539 0.035 -2.100 0.164 1.47E+00<br />
J085422.40+013651.0 08:54:22.40 +01:36:51.0 -7.05121 26.6296 sdB0VI:He25 34383 203 5.287 0.037 -0.784 0.018 1.70E+00<br />
continued on next page<br />
177
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J085650.28+401730.9 08:56:50.28 +40:17:30.9 -31.3088 19.8957 sdO4VI:He11 27006 1513 3.878 0.098 -3.000 0.434 1.20E+00<br />
J085727.66+424215.4 08:57:27.66 +42:42:15.4 192.101 26.8266 sdB3VI:He35 38783 151 5.406 0.025 0.689 0.018 9.33E+00<br />
J085900.33+023313.1 08:59:00.33 +02:33:13.1 19.9144 33.5645 sdB0VI:He4 33042 250 5.451 0.050 -1.685 0.042 9.75E-01<br />
J090559.15+055442.1 09:05:59.15 +05:54:42.1 316.248 26.3361 sdA0III:He3 12411 140 3.377 0.044 -1.647 0.192 6.02E+00<br />
J091225.13+421922.5 09:12:25.13 +42:19:22.5 -82.0724 32.7113 sdB4V:He6 30073 349 5.150 0.044 -2.647 0.193 2.67E+00<br />
J091544.44+511338.8 09:15:44.44 +51:13:38.8 -19.1172 29.6024 sdB2V:He6 34140 265 5.386 0.051 -1.514 0.028 1.92E+00<br />
J092436.41+040135.7 09:24:36.41 +04:01:35.7 114.477 29.0217 sdB4IV:He4 15224 225 3.614 0.032 -1.715 0.135 2.78E+00<br />
J092520.70+470330.6 09:25:20.70 +47:03:30.6 38.5098 33.0974 sdB5VI:He4 29376 480 5.196 0.054 -2.459 0.125 3.43E+00<br />
J092634.88+473036.0 09:26:34.88 +47:30:36.0 -6.49944 26.6692 sdB3V:He4 17256 215 4.128 0.038 -1.783 0.053 8.97E+00<br />
J092830.55+561811.8 09:28:30.55 +56:18:11.8 -135.281 31.0954 sdO6VI:He18 41016 436 5.000 0.067 -0.737 0.030 1.50E+00<br />
J093059.63+025032.4 09:30:59.63 +02:50:32.4 1.1058 33.0569 sdB1VI:He7 30308 171 5.537 0.037 -2.122 0.058 1.11E+00<br />
J093215.32-002108.5 09:32:15.32 -00:21:08.5 178.664 28.1481 sdB4V:He5 14602 166 3.723 0.036 -2.053 0.147 2.31E+00<br />
J093245.91+081618.6 09:32:45.91 +08:16:18.6 83.619 33.7491 sdB4VI:He4 32561 239 5.360 0.042 -1.662 0.040 1.54E+00<br />
J093322.20+440322.7 09:33:22.20 +44:03:22.7 -49.4753 29.8445 sdB4V:He2 16544 229 4.143 0.026 -1.629 0.111 2.44E+00<br />
J093549.72+544101.0 09:35:49.72 +54:41:01.0 -99.729 34.2498 sdB0VI:He4 36231 258 5.497 0.054 -1.463 0.025 3.48E+00<br />
J094143.53+535833.4 09:41:43.53 +53:58:33.4 -41.6598 28.0576 sdB5IV:He2 12276 163 3.500 0.041 -2.066 0.152 1.12E+00<br />
J094346.62+531429.1 09:43:46.62 +53:14:29.1 -150.267 25.7962 sdB1V:He26 35458 147 5.043 0.030 -0.573 0.014 3.93E+00<br />
J094623.03+040456.1 09:46:23.03 +04:04:56.1 55.3774 34.0528 sdB0VI:He9 37074 124 6.000 0.032 -1.395 0.022 1.01E+00<br />
J094900.45+025702.9 09:49:00.45 +02:57:02.9 245.283 29.1801 sdB8IV:He0 14000 212 3.251 0.040 -2.259 0.158 1.88E+01<br />
J095101.29+034757.0 09:51:01.29 +03:47:57.0 136.159 32.8147 sdB1VI:He3 30001 560 5.417 0.056 -2.228 0.073 9.20E-01<br />
J095847.23+602147.4 09:58:47.23 +60:21:47.4 -298.171 27.6608 sdB5IV:He2 12650 125 3.631 0.031 -1.726 0.115 9.37E-01<br />
J100019.99-003413.3 10:00:19.99 -00:34:13.3 90.0055 48.3655 sdO3VIII:He15 45730 659 5.151 0.072 -0.716 0.043 4.93E+00<br />
continued on next page<br />
178 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J100317.05+025510.4 10:03:17.05 +02:55:10.4 203.522 29.2731 sdB3VI:He34 38522 164 5.908 0.032 0.045 0.001 8.16E+00<br />
J100740.10+454252.5 10:07:40.10 +45:42:52.5 2.52572 33.5207 sdF0IV:He3 30057 388 5.239 0.051 -2.987 0.421 4.46E+00<br />
J101025.64+045357.0 10:10:25.64 +04:53:57.0 94.7419 30.4631 sdB7II:He5 15489 177 4.171 0.034 -2.103 0.110 8.80E+00<br />
J101213.21+064030.7 10:12:13.21 +06:40:30.7 96.1234 36.4263 sdO4VII:He26 42824 249 5.928 0.056 -0.568 0.010 2.14E+00<br />
J101218.95+004413.4 10:12:18.95 +00:44:13.4 -16.2037 36.3286 sdB0VII:He9 35915 227 5.373 0.042 -1.194 0.027 3.83E+00<br />
J101242.22+484937.4 10:12:42.22 +48:49:37.4 30.3123 31.632 sdB2VI:He4 28000 423 5.109 0.042 -2.195 0.068 1.27E+00<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
J101640.84-010900.6 10:16:40.84 -01:09:00.6 7.0939 31.8275 sdO9VI:He5 30235 296 5.000 0.054 -2.659 0.198 2.13E+00<br />
J102057.16+013751.4 10:20:57.16 +01:37:51.4 238.157 32.448 sdB1VI:He3 28380 289 5.066 0.045 -3.000 0.434 1.43E+00<br />
J102120.45+444636.9 10:21:20.45 +44:46:36.9 -95.3694 22.147 sdO8VII:He40 45002 100 5.828 0.030 1.261 0.047 4.02E+00<br />
J102320.37+462026.8 10:23:20.37 +46:20:26.8 38.9026 20.1206 sdO6VI:He35 45164 147 5.814 0.034 1.272 0.057 5.39E+00<br />
J103022.08+020524.3 10:30:22.08 +02:05:24.3 62.5766 28.0526 sdB7IV:He3 13334 236 3.487 0.045 -2.140 0.120 1.92E+00<br />
J103549.68+092551.9 10:35:49.68 +09:25:51.9 193.763 23.8682 sdO5VII:He39 45000 147 5.745 0.048 1.473 0.232 1.78E+00<br />
J103854.02+525847.8 10:38:54.02 +52:58:47.8 49.5557 30.5354 sdB2VI:He2 27999 411 5.289 0.045 -2.582 0.166 9.41E-01<br />
J104248.95+033355.4 10:42:48.95 +03:33:55.4 370.055 29.0912 sdB0VII:He9 34770 212 5.105 0.041 -2.269 0.081 3.92E+00<br />
J105608.43+034821.3 10:56:08.43 +03:48:21.3 6.83794 29.6862 sdA0III:He-0 14204 151 3.805 0.037 -1.997 0.129 4.82E+00<br />
J110053.56+034622.8 11:00:53.56 +03:46:22.8 278.131 37.5145 sdO0VIII:He18 46166 513 5.616 0.056 -1.181 0.039 3.28E+00<br />
J110215.46+024034.2 11:02:15.46 +02:40:34.2 25.8182 27.2368 sdO3VII:He36 50000 208 5.804 0.066 0.364 0.020 3.51E+00<br />
J110255.98+521858.2 11:02:55.98 +52:18:58.2 -179.727 20.5426 sdO6VII:He38 45355 380 5.549 0.045 0.588 0.030 3.77E+00<br />
J110256.32+010012.3 11:02:56.32 +01:00:12.3 301.18 31.4893 sdB4V:He9 15645 228 3.701 0.037 -1.483 0.053 1.35E+01<br />
J110302.37-010338.7 11:03:02.37 -01:03:38.7 139.329 33.2725 sdO9VII:He4 29649 500 5.096 0.052 -2.579 0.165 1.46E+00<br />
J110445.01+092530.9 11:04:45.01 +09:25:30.9 182.534 29.6543 sdO5VI:He8 30800 333 4.500 0.059 -2.692 0.214 1.38E+00<br />
J111438.57-004024.1 11:14:38.57 -00:40:24.1 110.919 35.1207 sdO1VII:He21 42076 569 5.272 0.055 -0.866 0.038 5.76E+00<br />
continued on next page<br />
179
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J111633.30+052507.9 11:16:33.30 +05:25:07.9 -72.9026 27.5999 sdO9VI:He29 34332 221 4.894 0.037 -0.496 0.010 6.62E+00<br />
J112056.23+093641.8 11:20:56.23 +09:36:41.8 143.332 30.1321 sdO5VII:He11 35000 357 4.714 0.055 -2.584 0.167 1.68E+00<br />
J112242.70+613758.5 11:22:42.70 +61:37:58.5 -64.5492 33.3237 sdB3VI:He6 30001 495 5.622 0.043 -2.053 0.049 8.95E-01<br />
J112504.73+671658.3 11:25:04.73 +67:16:58.3 -88.07 25.0661 sdO9VI:He12 34249 130 4.948 0.033 -1.800 0.082 1.91E+00<br />
J112719.00+660538.7 11:27:19.00 +66:05:38.7 -8.00635 34.4941 sdB5V:He6 30552 197 5.303 0.042 -2.661 0.199 7.43E+00<br />
J113312.13+010824.9 11:33:12.13 +01:08:24.9 535.441 27.4911 sdB5IV:He4 11978 183 3.412 0.047 -2.020 0.319 6.60E+00<br />
J113840.69-003531.8 11:38:40.69 -00:35:31.8 -81.2527 32.8276 sdF5VI:He3 30867 228 5.424 0.048 -2.209 0.070 7.48E-01<br />
J113935.45+614954.0 11:39:35.45 +61:49:54.0 -3.84451 33.6331 sdB0VI:He3 29767 646 5.000 0.079 -2.995 0.430 2.80E+00<br />
J114352.74+660723.4 11:43:52.74 +66:07:23.4 -131.878 35.8462 sdB3VII:He2 30474 218 5.389 0.046 -2.096 0.054 6.31E+00<br />
J114417.53-012914.1 11:44:17.53 -01:29:14.1 286.516 30.0023 sdB4V:He2 13292 160 3.583 0.032 -2.300 0.173 3.84E+00<br />
J114821.30+033625.8 11:48:21.30 +03:36:25.8 -4.57768 35.4397 sdB5V:He5 33168 178 5.686 0.036 -1.602 0.035 6.88E+00<br />
J115009.49+061042.1 11:50:09.49 +06:10:42.1 44.1678 35.0436 sdO3V:He19 40640 288 5.147 0.057 -0.844 0.030 8.63E+00<br />
J115101.04+541003.5 11:51:01.04 +54:10:03.5 -210.473 28.6527 sdO6VI:He8 32000 602 4.484 0.077 -2.241 0.076 2.78E+00<br />
J115115.19-015255.2 11:51:15.19 -01:52:55.2 108.73 21.8454 sdB6V:He12 16099 329 2.774 0.052 -2.995 0.430 1.35E+01<br />
J115654.09-032510.2 11:56:54.09 -03:25:10.2 48.0882 26.4144 sdO9V:He14 32698 279 4.499 0.043 -1.126 0.029 2.16E+00<br />
J115716.38+612410.8 11:57:16.38 +61:24:10.8 -160.892 34.0894 sdB3VII:He1 29999 521 5.224 0.057 -3.000 0.434 2.15E+00<br />
J120311.26+045419.6 12:03:11.26 +04:54:19.6 317.09 31.3655 sdB8IV:He0 12883 188 3.596 0.033 -2.212 0.142 1.39E+01<br />
J120626.55+663352.5 12:06:26.55 +66:33:52.5 -83.8421 22.4985 sdO7VII:He39 43748 87 5.875 0.032 2.007 0.132 2.07E+00<br />
J121123.37+611203.9 12:11:23.37 +61:12:03.9 -147.955 39.0234 sdB7V:He9 32971 192 5.833 0.041 -1.924 0.073 7.23E+00<br />
J121424.81+550226.3 12:14:24.81 +55:02:26.3 -69.0878 35.2988 sdO5VII:He12 41632 331 5.253 0.046 -1.631 0.056 2.43E+00<br />
J121625.83-014804.6 12:16:25.83 -01:48:04.6 67.9746 31.4899 sdB1V:He5 12307 116 3.392 0.041 -1.159 0.087 1.45E+01<br />
J121643.73+020835.9 12:16:43.73 +02:08:35.9 65.6063 30.4367 sdO8VI:He38 40933 127 5.595 0.021 0.065 0.001 1.07E+01<br />
continued on next page<br />
180 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J122057.48-012642.4 12:20:57.48 -01:26:42.4 -4.416 32.305 sdB9III:He1 18548 245 4.613 0.038 -1.862 0.063 1.10E+01<br />
J122444.98+583313.9 12:24:44.98 +58:33:13.9 -85.1992 29.6831 sdB2IV:He-0 11106 135 3.124 0.037 -2.291 0.170 1.74E+01<br />
J122637.12+575927.6 12:26:37.12 +57:59:27.6 -194.41 31.7309 sdB0VII:He3 30382 187 5.350 0.042 -2.718 0.227 2.14E+00<br />
J123808.66+053318.2 12:38:08.66 +05:33:18.2 25.9705 24.1427 sdO4VII:He38 44999 127 5.352 0.041 2.125 0.289 3.96E+00<br />
J123821.49-021211.5 12:38:21.49 -02:12:11.5 83.2689 33.3849 sdO1VIII:He29 48017 516 5.158 0.081 0.210 0.008 7.27E+00<br />
J124706.79-003925.9 12:47:06.79 -00:39:25.9 7.95478 36.143 sdO9VII:He4 36859 318 5.551 0.042 -3.000 0.434 9.50E-01<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
J124728.16+562958.3 12:47:28.16 +56:29:58.3 -159.902 33.0009 sdO9VII:He1 24936 896 4.876 0.085 -2.823 0.289 3.60E+00<br />
J124819.08+035003.3 12:48:19.08 +03:50:03.3 -56.2564 26.8671 sdO6VIII:He16 49129 1057 5.500 0.069 -1.226 0.051 2.62E+00<br />
J125229.60-030129.6 12:52:29.60 -03:01:29.6 47.7554 33.4044 sdB0V:He6 30736 221 5.454 0.047 -2.345 0.096 1.46E+00<br />
J125248.84+521604.1 12:52:48.84 +52:16:04.1 -128.9 29.8305 sdB7III:He4 13184 155 3.605 0.032 -1.715 0.090 8.25E+00<br />
J125328.45+042044.0 12:53:28.45 +04:20:44.0 -1.7345 29.2904 sdB4IV:He3 12250 168 3.500 0.043 -2.231 0.148 2.41E+00<br />
J125408.32+014324.1 12:54:08.32 +01:43:24.1 2.43989 23.7974 sdO6VII:He40 45000 100 5.999 0.031 1.311 0.053 2.30E+00<br />
J125410.86-010408.4 12:54:10.86 -01:04:08.4 88.5476 33.4203 sdB3IV:He6 20858 320 4.632 0.039 -1.689 0.042 5.36E+00<br />
J125941.88-003928.8 12:59:41.88 -00:39:28.8 45.2168 31.1712 sdO5VIII:He15 34375 213 5.149 0.025 -1.570 0.048 2.75E+00<br />
J130025.53+004530.2 13:00:25.53 +00:45:30.2 69.5908 33.3195 sdO9VII:He16 38249 102 5.756 0.025 -0.845 0.021 1.90E+00<br />
J130059.21+005711.8 13:00:59.21 +00:57:11.8 -30.0593 28.2369 sdB0VII:He25 37650 234 5.212 0.034 -0.443 0.010 2.90E+00<br />
J131425.39+011153.4 13:14:25.39 +01:11:53.4 -116.094 28.1469 sdB5IV:He2 13063 134 3.663 0.032 -1.696 0.086 6.81E-01<br />
J131452.97+023740.3 13:14:52.97 +02:37:40.3 -67.101 30.4253 sdB9IV:He2 17290 316 4.000 0.047 -1.858 0.063 5.24E+00<br />
J131638.48+034818.5 13:16:38.48 +03:48:18.5 6.70663 27.1188 sdB1V:He9 32528 59 5.250 0.031 -1.376 0.021 2.06E+00<br />
J131658.35+641522.5 13:16:58.35 +64:15:22.5 -178.639 24.6498 sdB1VI:He14 34296 190 5.544 0.038 -1.427 0.023 2.10E+00<br />
J131745.80+010450.4 13:17:45.80 +01:04:50.4 60.094 30.6874 sdB1VI:He3 26694 850 4.929 0.089 -2.159 0.063 1.48E+00<br />
J131916.15-011405.0 13:19:16.15 -01:14:05.0 227.376 30.5863 sdB4VI:He3 16894 205 4.132 0.035 -1.737 0.047 1.39E+00<br />
continued on next page<br />
181
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J132503.17+043239.4 13:25:03.17 +04:32:39.4 -179.976 30.226 sdB9V:He3 12173 89 3.128 0.023 -0.750 0.047 1.16E+01<br />
J132556.94-032329.6 13:25:56.94 -03:23:29.6 66.5581 34.6956 sdB5VI:He10 41733 589 5.500 0.064 -1.997 0.129 5.42E+00<br />
J132619.95+035754.4 13:26:19.95 +03:57:54.4 -82.3389 32.8285 sdO9VI:He13 34266 169 5.312 0.011 -1.206 0.028 1.41E+00<br />
J133200.96+673325.8 13:32:00.96 +67:33:25.8 -101.751 33.9539 sdO9VII:He7 34458 233 5.474 0.048 -1.532 0.030 2.48E+00<br />
J133449.26+041014.9 13:34:49.26 +04:10:14.9 83.6733 27.9956 sdB0VII:He29 34610 210 4.778 0.033 -0.136 0.004 5.32E+00<br />
J133546.10+555429.8 13:35:46.10 +55:54:29.8 -21.676 31.3781 sdB4V:He3 16963 259 4.022 0.039 -1.767 0.051 9.94E+00<br />
J133757.40-005647.2 13:37:57.40 -00:56:47.2 -159.448 30.4751 sdB6III:He6 12195 126 3.450 0.037 -2.019 0.181 5.28E+00<br />
J134344.11+465825.3 13:43:44.11 +46:58:25.3 3.03651 30.0884 sdO5VIII:He10 29224 223 4.133 0.047 -3.000 0.434 7.49E+00<br />
J134545.24-000641.6 13:45:45.24 -00:06:41.6 -3.70993 37.7028 sdO8VII:He28 35903 207 5.268 0.035 -0.269 0.006 3.79E+00<br />
J134600.55+052034.3 13:46:00.55 +05:20:34.3 -39.3819 33.4071 sdB4V:He0 30255 205 5.216 0.041 -3.000 0.434 1.85E+00<br />
J134948.30-024639.3 13:49:48.30 -02:46:39.3 -149.397 31.2636 sdB4IV:He3 11972 150 3.635 0.044 -2.284 0.167 7.21E+00<br />
J135140.69+023429.2 13:51:40.69 +02:34:29.2 20.1812 31.8161 sdO9VI:He4 33545 283 5.190 0.048 -3.000 0.434 1.74E+00<br />
J135707.35+010454.4 13:57:07.35 +01:04:54.4 -164.908 27.8615 sdO7VIII:He29 31999 274 3.749 0.049 0.597 0.027 4.46E+00<br />
J135746.59+530758.7 13:57:46.59 +53:07:58.7 -88.3881 32.612 sdB2VI:He2 29997 447 5.104 0.044 -2.552 0.155 3.56E+00<br />
J140118.74-012024.8 14:01:18.74 -01:20:24.8 -138.416 33.4862 sdO6VII:He7 30768 263 5.000 0.053 -2.479 0.131 2.19E+00<br />
J140252.20+465918.5 14:02:52.20 +46:59:18.5 -178.233 32.0421 sdB5VI:He6 29111 465 5.131 0.052 -2.315 0.090 5.23E+00<br />
J140545.26+014419.1 14:05:45.26 +01:44:19.1 -66.9086 31.5385 sdO9VI:He6 28704 421 5.389 0.051 -1.645 0.038 1.07E+00<br />
J140715.42+033147.6 14:07:15.42 +03:31:47.6 31.4863 28.4411 sdO8VI:He35 49941 546 5.500 0.087 0.572 0.039 6.76E+00<br />
J140839.10+653124.4 14:08:39.10 +65:31:24.4 -170.614 33.9581 sdB0VI:He4 30105 320 5.122 0.045 -3.000 0.434 2.97E+00<br />
J141812.51-024427.0 14:18:12.51 -02:44:27.0 -183.21 30.1228 sdO1VIII:He32 50000 209 5.951 0.067 0.364 0.020 2.07E+00<br />
J142226.93-023100.5 14:22:26.93 -02:31:00.5 129.172 31.5639 sdO6V:He2 14553 198 3.828 0.040 -1.470 0.090 1.70E+01<br />
J142339.81+014947.3 14:23:39.81 +01:49:47.3 27.9633 33.792 sdB4V:He6 30705 324 5.509 0.049 -1.753 0.049 2.50E+00<br />
continued on next page<br />
182 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J142416.88-014335.0 14:24:16.88 -01:43:35.0 91.6777 27.5647 sdB3V:He3 14999 189 3.634 0.034 -2.168 0.128 9.53E-01<br />
J142459.58+031943.4 14:24:59.58 +03:19:43.4 37.3083 33.0024 sdO8VII:He6 34325 274 5.321 0.038 -1.414 0.023 1.74E+00<br />
J142551.30-013317.3 14:25:51.30 -01:33:17.3 -37.9939 32.3239 sdO2VII:He13 41093 307 5.375 0.046 -1.716 0.045 1.54E+00<br />
J142956.63+563144.0 14:29:56.63 +56:31:44.0 -91.5749 31.4605 sdB1VI:He3 28557 299 5.068 0.046 -2.598 0.172 1.03E+00<br />
J143006.23+510314.1 14:30:06.23 +51:03:14.1 -112.744 21.9973 sdO4VII:He37 45000 99 5.767 0.030 1.261 0.047 2.27E+00<br />
J143153.06-002824.3 14:31:53.06 -00:28:24.3 -31.9216 39.9306 sdO6VIII:He7 35581 232 5.499 0.043 -0.977 0.020 5.72E+00<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
J143917.64+010250.8 14:39:17.64 +01:02:50.8 28.1612 30.2275 sdB1VI:He6 19939 368 4.134 0.043 -1.676 0.041 1.34E+00<br />
J143917.64+010251.1 14:39:17.64 +01:02:51.1 24.1876 30.1996 sdB5V:He3 19464 338 4.116 0.041 -1.709 0.044 2.06E+00<br />
J144024.72+022118.7 14:40:24.72 +02:21:18.7 -35.8459 36.2062 sdO6VI:He11 34898 91 5.332 0.021 -1.067 0.020 3.55E+00<br />
J144141.37+450651.4 14:41:41.37 +45:06:51.4 -154.864 30.9137 sdB2VI:He6 19982 484 4.500 0.059 -1.690 0.043 2.50E+00<br />
J144301.69+514410.3 14:43:01.69 +51:44:10.3 -171.018 29.5335 sdB8IV:He4 19181 308 4.358 0.050 -1.592 0.034 1.22E+00<br />
J144346.62+491733.7 14:43:46.62 +49:17:33.7 -80.1405 35.8834 sdO8VIII:He12 35572 224 5.421 0.048 -1.450 0.024 1.91E+00<br />
J144514.93+000249.0 14:45:14.93 +00:02:49.0 0.302733 25.936 sdB1VII:He12 34682 78 5.019 0.033 -1.399 0.033 3.37E+00<br />
J144709.20+511639.8 14:47:09.20 +51:16:39.8 -148.603 33.0281 sdO7VI:He34 45000 100 5.908 0.031 1.260 0.047 1.74E+01<br />
J144737.76+020942.6 14:47:37.76 +02:09:42.6 35.4762 34.5196 sdO5VII:He11 32773 127 5.563 0.016 -2.085 0.053 7.14E+00<br />
J145049.50+624940.9 14:50:49.50 +62:49:40.9 -159.082 31.7249 sdB0VI:He2 28051 771 4.739 0.071 -3.000 0.434 6.07E+00<br />
J145426.67+472004.4 14:54:26.67 +47:20:04.4 21.6808 32.273 sdB2VI:He5 29437 522 5.478 0.056 -2.180 0.066 9.68E-01<br />
J145606.42+500155.3 14:56:06.42 +50:01:55.3 -69.5387 32.8387 sdB5V:He8 30764 300 5.343 0.044 -1.598 0.034 1.78E+00<br />
J145657.73+495310.8 14:56:57.73 +49:53:10.8 -39.8157 24.9202 sdB1VI:He9 33478 223 5.348 0.046 -1.547 0.031 1.62E+00<br />
J145748.84+561323.5 14:57:48.84 +56:13:23.5 -202.486 30.5689 sdB3VI:He2 12512 109 3.669 0.029 -2.057 0.148 8.49E+00<br />
J150829.03+494051.0 15:08:29.03 +49:40:51.0 -136.073 34.0157 sdB0VII:He1 27313 913 5.012 0.087 -2.359 0.099 4.82E+00<br />
J151030.69-014345.9 15:10:30.69 -01:43:45.9 -152.78 22.8201 sdO7VII:He39 44820 94 5.982 0.033 1.866 0.191 2.06E+00<br />
continued on next page<br />
183
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J151042.06+040955.5 15:10:42.06 +04:09:55.5 -63.2658 33.3369 sdO9VI:He10 34792 231 5.512 0.046 -1.360 0.020 1.64E+00<br />
J151105.38+515956.4 15:11:05.38 +51:59:56.4 -151.756 33.1678 sdB3VI:He1 18406 281 4.286 0.047 -1.793 0.081 1.09E+01<br />
J151231.29+005317.7 15:12:31.29 +00:53:17.7 -47.1969 36.6202 sdB1VI:He29 36493 164 5.723 0.031 -0.486 0.010 7.25E+00<br />
J151306.72+011439.1 15:13:06.72 +01:14:39.1 -93.1191 35.0598 sdB3IV:He2 26002 436 5.037 0.045 -2.301 0.087 3.36E+00<br />
J151415.66-012925.2 15:14:15.66 -01:29:25.2 -121.987 24.7826 sdO5VII:He39 45000 164 5.687 0.044 0.560 0.029 2.33E+00<br />
J151617.94+412948.4 15:16:17.94 +41:29:48.4 -181.258 32.9988 sdB3VI:He2 28962 432 5.175 0.052 -2.447 0.122 3.69E+00<br />
J151743.47+514445.4 15:17:43.47 +51:44:45.4 -79.9052 32.279 sdO8VII:He10 34518 236 5.299 0.046 -1.494 0.027 2.61E+00<br />
J151808.48+041043.7 15:18:08.48 +04:10:43.7 -38.9928 26.0169 sdO9VI:He14 35439 221 5.426 0.048 -1.549 0.031 1.38E+00<br />
J151847.69+551154.2 15:18:47.69 +55:11:54.2 -34.4602 29.707 sdB5V:He3 17317 318 4.000 0.047 -1.605 0.035 2.56E+00<br />
J152332.82+353237.0 15:23:32.82 +35:32:37.0 11.3484 34.981 sdB3VI:He24 36703 193 5.500 0.035 -0.685 0.016 2.50E+00<br />
J152607.88+001640.8 15:26:07.88 +00:16:40.8 87.1892 31.4734 sdO5VI:He34 43975 320 5.050 0.042 -0.072 0.004 2.08E+00<br />
J152833.16+440009.7 15:28:33.16 +44:00:09.7 45.6215 28.5694 sdB7III:He2 12000 184 3.413 0.049 -2.108 0.223 2.66E+00<br />
J153056.33+024222.6 15:30:56.33 +02:42:22.6 -69.1411 21.3778 sdO6VII:He39 44761 109 6.000 0.029 1.163 0.019 1.62E+00<br />
J153204.36+324152.7 15:32:04.36 +32:41:52.7 -152.653 30.2446 sdB0VI:He38 39980 117 5.783 0.018 -0.194 0.004 1.69E+01<br />
J153217.24+454621.0 15:32:17.24 +45:46:21.0 -56.1277 31.2027 sdO8VII:He9 33385 349 5.063 0.024 -1.818 0.057 2.48E+00<br />
J153411.10+543345.3 15:34:11.10 +54:33:45.3 -89.259 33.0638 sdB0VI:He3 31907 321 5.106 0.048 -2.394 0.108 1.78E+00<br />
J153508.52+032456.3 15:35:08.52 +03:24:56.3 11.5925 26.1501 sdB4II:He10 13657 282 3.012 0.047 -1.400 0.065 1.69E+01<br />
J154043.10+435950.1 15:40:43.10 +43:59:50.1 -70.6263 24.0808 sdO7VII:He38 44843 123 5.904 0.033 1.218 0.050 2.40E+00<br />
J154338.69+001202.1 15:43:38.69 +00:12:02.1 -29.8052 32.9385 sdB2VI:He2 29654 499 5.247 0.057 -2.589 0.169 2.86E+00<br />
J154531.02+563944.7 15:45:31.02 +56:39:44.7 -104.244 30.7923 sdB3VI:He5 26509 389 4.933 0.053 -2.039 0.047 1.55E+00<br />
J154809.97-004931.4 15:48:09.97 -00:49:31.4 -28.7556 35.4187 sdO9VI:He4 32645 255 5.499 0.049 -2.978 0.413 1.42E+00<br />
J154830.67+003656.7 15:48:30.67 +00:36:56.7 -126.171 29.7408 sdB4IV:He4 11429 135 3.277 0.035 -0.911 0.052 9.04E-01<br />
continued on next page<br />
184 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J155628.35+011335.0 15:56:28.35 +01:13:35.0 67.4947 34.8931 sdB3V:He2 30988 255 5.425 0.049 -3.000 0.434 8.72E-01<br />
J155642.95+501537.5 15:56:42.95 +50:15:37.5 -173.848 29.5376 sdO2VII:He33 50000 208 5.852 0.066 0.364 0.020 1.69E+00<br />
J160241.13-001207.1 16:02:41.13 -00:12:07.1 -69.1264 35.5392 sdO7VII:He2 31639 283 5.111 0.046 -2.624 0.183 4.62E+00<br />
J160759.27+383746.4 16:07:59.27 +38:37:46.4 -49.3069 34.2912 sdO9VII:He14 34081 201 5.542 0.035 -0.941 0.022 1.90E+00<br />
J160810.18+425845.1 16:08:10.18 +42:58:45.1 -163.815 33.3498 sdB6V:He0 31066 258 5.316 0.047 -3.000 0.434 5.77E+00<br />
J161328.22+004703.2 16:13:28.22 +00:47:03.2 -149.526 32.9946 sdB7IV:He5 21059 530 4.379 0.062 -2.274 0.082 5.34E+00<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
J161418.97+261628.8 16:14:18.97 +26:16:28.8 106.961 33.64 sdB5V:He5 28481 296 5.192 0.048 -2.471 0.128 4.44E+00<br />
J161627.11-002933.0 16:16:27.11 -00:29:33.0 -0.0543509 28.2373 sdO7VI:He39 45000 128 5.573 0.046 2.083 0.210 6.30E+00<br />
J161631.29-003853.3 16:16:31.29 -00:38:53.3 52.866 36.9722 sdB1VII:He6 33369 346 5.202 0.027 -1.568 0.032 2.41E+00<br />
J162250.09+002631.9 16:22:50.09 +00:26:31.9 -15.4032 35.4912 sdB2VI:He5 30731 222 5.410 0.047 -2.238 0.075 1.79E+00<br />
J162256.66+473051.1 16:22:56.66 +47:30:51.1 -68.6 32.3725 sdB2VI:He5 29637 557 5.500 0.059 -1.764 0.050 7.12E-01<br />
J162310.50+425831.2 16:23:10.50 +42:58:31.2 -8.93843 34.4132 sdB6IV:He0 35771 59 5.768 0.025 -2.504 0.139 6.83E+00<br />
J162359.61+375435.3 16:23:59.61 +37:54:35.3 -273.35 27.2163 sdB5V:He3 12662 107 3.617 0.032 -1.457 0.124 1.16E+00<br />
J162535.78+362039.3 16:25:35.78 +36:20:39.3 -218.657 34.2576 sdB5V:He3 30223 254 5.481 0.048 -2.371 0.102 8.39E+00<br />
J162616.71+380710.5 16:26:16.71 +38:07:10.5 -45.5658 21.99 sdO6VII:He40 44792 98 6.000 0.031 1.556 0.094 1.87E+00<br />
J162628.92+370448.6 16:26:28.92 +37:04:48.6 -51.0648 27.457 sdB6IV:He2 12281 171 3.500 0.043 -2.359 0.198 1.25E+00<br />
J162711.81-000950.9 16:27:11.81 -00:09:50.9 34.3853 29.4658 sdB3VII:He34 38699 179 5.927 0.027 0.121 0.003 8.53E+00<br />
J163148.85+372617.2 16:31:48.85 +37:26:17.2 -92.9749 28.006 sdB5III:He2 14660 186 3.539 0.035 -1.962 0.119 2.44E+00<br />
J163306.58+003216.3 16:33:06.58 +00:32:16.3 -61.1765 34.9432 sdB2VI:He1 31471 274 5.390 0.050 -3.000 0.434 1.45E+00<br />
J163446.48-005345.6 16:34:46.48 -00:53:45.6 52.9502 34.1242 sdB4V:He1 29977 519 5.293 0.060 -3.000 0.434 2.42E+00<br />
J163509.13+000235.0 16:35:09.13 +00:02:35.0 41.7581 34.7076 sdB6VI:He1 28046 733 5.097 0.059 -3.000 0.434 3.41E+00<br />
J163702.79-011351.7 16:37:02.79 -01:13:51.7 -20.6207 26.3892 sdO7VI:He39 45001 99 5.664 0.030 1.261 0.047 2.51E+00<br />
continued on next page<br />
185
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J163800.17+010259.7 16:38:00.17 +01:02:59.7 -108.217 29.5044 sdB6V:He3 17182 323 4.000 0.047 -2.192 0.135 1.50E+00<br />
J163815.97-001919.2 16:38:15.97 -00:19:19.2 -136.585 31.9708 sdB4V:He2 20734 493 4.617 0.045 -2.172 0.064 4.45E+00<br />
J163913.62+384957.1 16:39:13.62 +38:49:57.1 -66.4781 29.4706 sdB3V:He2 16016 225 3.671 0.037 -1.574 0.049 3.06E+00<br />
J163936.03+343230.6 16:39:36.03 +34:32:30.6 -228.974 26.0607 sdB2IV:He15 26684 424 4.531 0.046 -1.121 0.023 1.44E+00<br />
J164042.91+311734.6 16:40:42.91 +31:17:34.6 -369.398 41.0764 sdB1VII:He33 30989 270 3.876 0.033 0.945 0.008 1.04E+01<br />
J164122.33+334452.1 16:41:22.33 +33:44:52.1 -49.8183 33.4568 sdO9VI:He7 29235 394 5.530 0.043 -2.085 0.053 1.54E+00<br />
J164204.38+440303.3 16:42:04.38 +44:03:03.3 -336.658 30.8894 sdO9VII:He5 30160 380 4.958 0.058 -2.545 0.152 2.84E+00<br />
J164326.04+330113.2 16:43:26.04 +33:01:13.2 -66.2194 33.4823 sdB2VI:He3 29898 554 5.500 0.059 -2.254 0.078 1.22E+00<br />
J164419.45+452326.8 16:44:19.45 +45:23:26.8 -356.849 35.8627 sdB1VI:He5 32287 245 5.499 0.047 -2.080 0.052 2.67E+00<br />
J164444.94+312345.4 16:44:44.94 +31:23:45.4 -64.8034 31.1266 sdO8VII:He7 32067 275 5.483 0.051 -3.000 0.434 2.03E+00<br />
J165022.05+312749.7 16:50:22.05 +31:27:49.7 8.82859 29.3108 sdB2VI:He26 35685 174 5.204 0.032 0.118 0.003 3.37E+00<br />
J165404.27+303701.8 16:54:04.27 +30:37:01.8 134.707 31.3034 sdB1VI:He6 27306 638 5.502 0.067 -2.177 0.065 1.04E+00<br />
J165422.26+631534.3 16:54:22.26 +63:15:34.3 -20.424 35.2667 sdB2VI:He5 34568 196 5.672 0.037 -1.387 0.021 7.54E-01<br />
J165424.30+303941.3 16:54:24.30 +30:39:41.3 -249.1 28.7513 sdB7IV:He0 13840 313 3.500 0.055 -3.000 0.434 9.77E+00<br />
J165841.83+413115.6 16:58:41.83 +41:31:15.6 -40.2591 30.1816 sdB2VI:He8 32230 294 5.038 0.037 -1.819 0.057 1.59E+00<br />
J170045.67+604308.5 17:00:45.67 +60:43:08.5 -270.047 28.8328 sdO4VII:He36 48271 357 5.869 0.071 0.731 0.052 5.12E+00<br />
J170356.68+341505.0 17:03:56.68 +34:15:05.0 -83.8331 32.767 sdB3VI:He3 28252 453 5.000 0.069 -2.653 0.195 1.91E+00<br />
J170714.27+654025.6 17:07:14.27 +65:40:25.6 -92.8553 34.0747 sdB4VI:He7 34656 250 5.350 0.049 -1.539 0.030 1.04E+00<br />
J171424.17+614711.0 17:14:24.17 +61:47:11.0 -12.9255 33.6796 sdB0VII:He2 32113 394 4.999 0.065 -3.000 0.434 1.81E+00<br />
J171629.93+575121.2 17:16:29.93 +57:51:21.2 -312.791 35.8141 sdO5VIII:He21 34186 229 5.465 0.038 -0.798 0.019 6.08E+00<br />
J171722.10+580558.9 17:17:22.10 +58:05:58.9 -110.822 33.3324 sdO9VI:He9 34248 187 5.153 0.021 -1.578 0.049 1.84E+00<br />
J171813.87+595355.2 17:18:13.87 +59:53:55.2 -109.876 28.7311 sdB5IV:He2 13053 128 3.685 0.032 -2.096 0.163 2.32E+00<br />
continued on next page<br />
186 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J171929.52+273229.3 17:19:29.52 +27:32:29.3 -82.9454 31.6473 sdB1VI:He4 30755 350 4.961 0.061 -3.000 0.434 2.30E+00<br />
J171947.87+591604.3 17:19:47.87 +59:16:04.3 89.6508 29.0229 sdB6IV:He2 16126 217 3.854 0.048 -1.194 0.068 2.05E+00<br />
J172037.66+534009.4 17:20:37.66 +53:40:09.4 -72.6856 35.312 sdO5VI:He7 29304 477 5.200 0.054 -2.384 0.105 8.30E+00<br />
J172338.54+601444.1 17:23:38.54 +60:14:44.1 -41.3729 24.0531 sdO9VI:He12 34424 79 5.235 0.024 -1.353 0.029 2.21E+00<br />
J203729.93+001954.1 20:37:29.93 +00:19:54.1 -79.0277 25.4161 sdO8VII:He9 33681 230 5.173 0.046 -1.762 0.050 1.58E+00<br />
J203826.42+010953.5 20:38:26.42 +01:09:53.5 -113.742 33.1624 sdB4V:He3 29406 458 5.398 0.063 -2.442 0.120 2.16E+00<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
J204546.82-054355.7 20:45:46.82 -05:43:55.7 -44.8684 28.8934 sdO9VI:He8 32441 54 5.167 0.031 -1.477 0.026 3.53E+00<br />
J204658.84-055100.1 20:46:58.84 -05:51:00.1 -57.0742 33.6601 sdB2VI:He3 33102 231 5.450 0.047 -1.545 0.030 1.77E+00<br />
J204726.94-060325.8 20:47:26.94 -06:03:25.8 -1.97117 33.4697 sdB2VI:He13 35395 177 5.648 0.037 -1.330 0.028 3.50E+00<br />
J205030.40-061957.9 20:50:30.40 -06:19:57.9 -489.474 28.5065 sdO5VI:He38 45663 558 5.545 0.047 0.388 0.023 8.00E+00<br />
J210454.89+110645.6 21:04:54.89 +11:06:45.6 -41.5566 31.6509 sdO5VIII:He8 35000 210 5.086 0.037 -3.000 0.434 2.34E+00<br />
J211045.16+000142.1 21:10:45.16 +00:01:42.1 -103.57 31.6623 sdO7VII:He10 33998 369 4.976 0.061 -2.257 0.079 1.96E+00<br />
J211104.97+091042.9 21:11:04.97 +09:10:42.9 157.456 32.7679 sdB4V:He3 29504 485 5.321 0.059 -2.820 0.287 4.77E+00<br />
J211318.37+001738.4 21:13:18.37 +00:17:38.4 -14.4957 21.8039 sdO7VII:He38 45000 100 5.905 0.031 1.475 0.078 1.70E+00<br />
J211338.31-000940.7 21:13:38.31 -00:09:40.7 -23.6116 37.4435 sdB0VII:He7 36649 409 5.500 0.053 -2.148 0.061 3.58E+00<br />
J211339.69+100640.4 21:13:39.69 +10:06:40.4 -65.7901 26.6768 sdO6VII:He9 32140 432 4.916 0.061 -2.154 0.062 2.72E+00<br />
J211425.02+005517.6 21:14:25.02 +00:55:17.6 14.6469 36.5492 sdO9VII:He17 36832 199 5.805 0.038 -1.058 0.020 5.22E+00<br />
J211651.96-003328.5 21:16:51.96 -00:33:28.5 11.2966 32.9385 sdB4V:He5 28210 447 5.495 0.064 -2.409 0.111 6.76E+00<br />
J211921.36+005749.8 21:19:21.36 +00:57:49.8 -49.1174 20.1741 sdO5VII:He38 44901 114 6.000 0.039 1.816 0.170 1.47E+00<br />
J213112.24+112936.2 21:31:12.24 +11:29:36.2 3.08591 34.7669 sdO8VII:He9 35450 179 5.595 0.036 -1.432 0.023 1.28E+00<br />
J213718.87+123303.3 21:37:18.87 +12:33:03.3 -106.859 28.2812 sdB6IV:He2 14718 176 3.601 0.032 -1.630 0.056 2.40E+00<br />
J213808.12+105741.8 21:38:08.12 +10:57:41.8 26.7674 26.7784 sdB2VI:He10 34133 180 5.095 0.017 -1.620 0.036 3.81E+00<br />
continued on next page<br />
187
Table B.1: continued<br />
SDSS Identifier R.A. Decl. cz δcz Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(km s −1 ) (K) (cgs)<br />
J215049.19+010338.4 21:50:49.19 +01:03:38.4 34.2033 36.462 sdB1V:He6 34388 84 5.351 0.023 -1.279 0.033 6.39E+00<br />
J215053.84+131650.6 21:50:53.84 +13:16:50.6 -108.905 33.9992 sdB1VI:He5 30233 248 5.423 0.047 -2.222 0.072 1.65E+00<br />
J215227.25+115726.7 21:52:27.25 +11:57:26.7 5.51462 32.5347 sdB3VI:He5 34586 272 5.273 0.041 -1.354 0.029 3.73E+00<br />
J215307.34-071948.4 21:53:07.34 -07:19:48.4 -13.1178 35.7679 sdB1VI:He7 32449 166 5.683 0.037 -1.996 0.086 1.58E+00<br />
J215631.56+121237.7 21:56:31.56 +12:12:37.7 -66.6073 23.9981 sdO7VII:He40 45581 220 5.891 0.035 1.303 0.052 3.56E+00<br />
J220403.45+122507.3 22:04:03.45 +12:25:07.3 -153.325 28.335 sdB7IV:He1 11740 170 3.500 0.049 -2.050 0.244 1.14E+00<br />
J220810.05+115913.9 22:08:10.05 +11:59:13.9 -69.4664 29.3261 sdB2VI:He7 26862 735 5.000 0.076 -2.150 0.061 3.14E+00<br />
J221816.78+121400.7 22:18:16.78 +12:14:00.7 -74.2256 32.2427 sdB7IV:He4 25415 605 5.000 0.066 -1.422 0.023 2.25E+00<br />
J222238.69+005125.0 22:22:38.69 +00:51:25.0 -114.061 30.8319 sdO9VII:He11 33853 422 4.627 0.058 -3.000 0.434 1.20E+00<br />
J222932.81-004822.5 22:29:32.81 -00:48:22.5 -18.7916 34.4566 sdO9VII:He8 34802 243 5.328 0.044 -1.360 0.030 1.69E+00<br />
J223008.26+132734.2 22:30:08.26 +13:27:34.2 -30.7785 29.1779 sdB5IV:He2 14433 235 3.762 0.036 -2.304 0.175 3.47E+00<br />
J223839.13+122517.9 22:38:39.13 +12:25:17.9 -96.1084 31.3337 sdB0IV:He6 30656 195 5.244 0.042 -3.000 0.434 5.02E+00<br />
J224105.19+141810.2 22:41:05.19 +14:18:10.2 -202.31 30.0845 sdB9III:He0 13763 258 3.501 0.052 -2.669 0.203 1.51E+01<br />
J231956.10-093937.6 23:19:56.10 -09:39:37.6 16.0919 28.428 sdO9VI:He16 35985 299 4.845 0.049 -0.774 0.023 3.23E+00<br />
J233914.00+134214.3 23:39:14.00 +13:42:14.3 -411.792 25.2673 sdO5VI:He39 45402 513 5.500 0.077 0.607 0.030 4.55E+00<br />
J234421.80-101142.8 23:44:21.80 -10:11:42.8 -72.6667 27.5984 sdB7IV:He2 11132 98 3.217 0.026 -0.954 0.054 7.84E-01<br />
J234853.52+151215.5 23:48:53.52 +15:12:15.5 -69.1169 32.1839 sdB0VI:He5 30651 167 5.556 0.035 -2.315 0.090 6.37E-01<br />
J235108.66+002623.0 23:51:08.66 +00:26:23.0 -195.75 28.5503 sdB0VII:He30 38778 177 5.407 0.039 -0.288 0.007 4.21E+00<br />
188 Chapter B - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Appendix C<br />
Results for 83 2MASS-Selected<br />
Hot Subdwarf Candidates<br />
Parameters and classifications are listed in this table for <strong>the</strong> 2MASS-selected stars<br />
obtained from E.M. Green (Green et al., 2006). The internal errors <strong>of</strong> SFIT are given<br />
along with <strong>the</strong> value <strong>of</strong> χ 2 for <strong>the</strong> best fit.<br />
Table C.1: Results for 83 2MASS-Selected Hot Subdwarf Candidates<br />
Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(2MASX J-) (K) (cgs)<br />
Balloon 090900004 sdO7VII:He11 31147 278 4.757 0.054 -1.811 0.056 1.77E+00<br />
BD+48 2721 sdB2VI:He6 22979 240 5.267 0.032 -1.629 0.018 2.54E+00<br />
J011407.62+160800.6 sdB8VI:He5 10795 61 3.156 0.016 -0.368 0.007 1.16E+00<br />
J020656.17+143858.6 sdB1VII:He7 29873 484 5.850 0.046 -1.897 0.034 1.70E+00<br />
J021555.50+234314.3 sdB1VI:He8 32485 57 5.623 0.008 -0.758 0.010 2.60E+00<br />
J021619.04+275902.0 sdB1VI:He6 27594 292 5.719 0.034 -2.100 0.055 1.66E+00<br />
J021742.16+280329.5 sdO9VII:He8 32698 196 5.838 0.033 -1.341 0.019 1.34E+00<br />
J022512.51+234820.7 sdO6VII:He13 38384 119 6.000 0.030 -1.417 0.023 1.86E+00<br />
J030725.66+175248.0 sdB2V:He10 28000 352 5.095 0.030 -0.701 0.010 2.94E+00<br />
J041550.17+015421.0 sdB0VII:He9 32883 197 5.943 0.035 -1.390 0.011 1.38E+00<br />
J042034.85+012041.0 sdO6VII:He38 40547 120 5.117 0.035 1.301 0.017 3.44E+00<br />
J043037.82-010308.3 sdB5VI:He3 13447 91 3.640 0.028 -0.293 0.004 8.08E-01<br />
J074722.07+622545.2 sdB3VI:He12 27665 271 5.752 0.031 -0.696 0.006 8.00E+00<br />
continued on next page<br />
189
190 Chapter C - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
Table C.1: continued<br />
Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(2MASX J-) (K) (cgs)<br />
J075407.66+651540.2 sdB4V:He5 11002 53 2.750 0.020 0.000 0.000 1.42E+00<br />
J075815.66+514348.0 sdB3V:He7 11175 56 2.706 0.020 0.000 0.000 1.67E+00<br />
J080245.68+474817.7 sdB5VII:He2 9783 93 3.797 0.034 -0.954 0.093 1.50E+00<br />
J082643.33+330859.2 sdB4V:He7 18872 171 4.356 0.034 -0.827 0.009 2.38E+00<br />
J082822.23+295131.3 sdB3V:He7 16877 137 4.122 0.022 -0.723 0.013 1.60E+00<br />
J083127.37+422201.7 sdA5VI:He2 10463 50 3.179 0.013 -0.395 0.015 1.39E+00<br />
J083320.34+202424.8 sdB4VI:He14 22956 173 5.216 0.026 -0.407 0.004 6.77E+00<br />
J083535.58+194412.6 sdB3VI:He6 27775 429 5.784 0.041 -1.846 0.030 1.64E+00<br />
J083734.74+672413.6 sdB0V:He17 30001 392 4.688 0.041 -0.578 0.009 4.39E+00<br />
J083909.92+182416.6 sdB4V:He5 10235 43 2.698 0.017 -0.079 0.002 1.78E+00<br />
J084447.93+404426.5 sdB4V:He6 12351 59 3.321 0.024 0.001 0.000 1.29E+00<br />
J084535.67+194150.3 sdB3VI:He7 22899 242 5.236 0.033 -1.309 0.009 1.97E+00<br />
J084937.68+234847.3 sdB3VI:He5 18128 162 4.653 0.029 -1.348 0.019 2.31E+00<br />
J085148.86+434402.5 sdB7VI:He4 10292 49 3.139 0.014 -0.368 0.007 1.55E+00<br />
J085649.27+170114.7 sdB1VI:He5 29527 276 5.747 0.035 -2.111 0.056 1.47E+00<br />
J090158.77+395931.3 sdB6VI:He2 11188 80 3.359 0.026 -0.257 0.006 1.13E+00<br />
J091206.53+091621.7 sdB2V:He10 27999 442 4.689 0.041 -0.672 0.012 1.86E+00<br />
J091706.65+541817.3 sdB5VI:He2 11016 71 3.157 0.019 -0.122 0.002 8.49E-01<br />
J091751.45+615630.1 sdB4V:He5 10286 32 2.662 0.014 0.159 0.002 2.01E+00<br />
J092116.62+023741.0 sdB3VI:He15 27144 227 5.295 0.030 -0.300 0.003 6.12E+00<br />
J092246.92+001741.0 sdB4V:He5 11427 69 2.797 0.019 -0.000 0.000 1.39E+00<br />
J093112.84+051040.4 sdB5VI:He3 10558 56 3.376 0.018 -0.278 0.005 1.11E+00<br />
J093150.58+031848.0 sdB6VI:He1 10097 62 3.303 0.027 -0.645 0.018 1.34E+00<br />
J093426.95+821304.3 sdB7IV:He7 9909 57 2.917 0.025 0.297 0.012 5.71E+00<br />
J093453.32+841851.5 sdO7VII:He37 42886 87 5.789 0.030 1.526 0.015 1.38E+00<br />
J093832.18+041343.9 sdB7V:He1 10290 71 3.306 0.026 -0.553 0.013 1.64E+00<br />
J093935.15+104321.9 sdB3V:He6 10868 53 2.793 0.020 0.000 0.000 1.43E+00<br />
J094047.71+185332.9 sdB4VI:He5 10589 44 2.823 0.014 0.308 0.005 1.64E+00<br />
J094105.31-004755.8 sdO4VII:He33 45003 121 5.312 0.043 0.000 0.000 2.53E+00<br />
J094107.57+375342.6 sdB3V:He7 15712 55 3.398 0.008 -0.597 0.008 1.31E+00<br />
J094353.47+783140.7 sdB2VI:He5 27999 382 5.662 0.036 -2.105 0.055 1.92E+00<br />
J094509.99+553450.2 sdB4V:He6 18717 165 4.251 0.031 -0.898 0.014 1.65E+00<br />
J094637.19+351755.8 sdB7VI:He3 11021 69 3.387 0.021 -0.140 0.005 1.01E+00<br />
J095219.06+441941.9 sdB4V:He7 13014 73 3.197 0.025 0.046 0.000 1.24E+00<br />
J095708.88+223055.6 sdB4V:He5 10910 45 2.714 0.012 0.272 0.004 1.45E+00<br />
J095854.23+360314.3 sdF8VI:He2 9618 68 3.384 0.022 -0.467 0.019 1.81E+00<br />
J095855.78-044413.9 sdB7IV:He6 9321 48 2.795 0.021 -0.368 0.012 3.84E+00<br />
continued on next page
191<br />
Table C.1: continued<br />
Identifier Classification T eff δT eff log g δlog g log(n He /n H ) δlog(n He /n H ) χ 2<br />
(2MASX J-) (K) (cgs)<br />
J095859.91+082504.4 sdB6VII:He4 9806 74 3.560 0.013 -0.319 0.015 1.35E+00<br />
J100058.89+024804.4 sdB4V:He8 17115 148 4.229 0.026 -0.632 0.009 2.76E+00<br />
J100145.47+375733.2 sdB3VI:He3 9842 50 3.190 0.011 -0.336 0.016 1.21E+00<br />
J100509.89+384615.2 sdB6VII:He2 10547 70 3.793 0.030 -0.700 0.021 9.22E-01<br />
J100607.62+005326.2 sdB8V:He3 11614 83 3.096 0.017 -0.368 0.009 1.27E+00<br />
J100739.11+202546.7 sdB5VII:He5 10000 51 3.748 0.014 0.065 0.002 1.27E+00<br />
J104130.43+184209.8 sdB0VII:He8 32521 166 5.635 0.030 -1.401 0.022 1.15E+00<br />
J104653.08+515435.9 sdO8VII:He9 30750 262 4.799 0.053 -1.978 0.083 9.67E-01<br />
J104912.91+380014.9 sdB2V:He9 20087 213 4.130 0.030 -0.718 0.015 1.70E+00<br />
J111631.06+305838.7 sdB4V:He5 9474 52 2.754 0.021 -0.368 0.012 1.84E+00<br />
J111719.94+241207.1 sdB4V:He5 11157 73 2.806 0.023 -0.292 0.006 1.10E+00<br />
J111819.13+093144.4 sdA2V:He5 12109 52 3.182 0.022 -0.131 0.003 1.04E+00<br />
J112129.35+111917.0 sdB4V:He6 12729 67 3.204 0.024 0.056 0.000 1.14E+00<br />
J112832.64+603859.3 sdF5V:He3 10302 52 3.025 0.017 -0.307 0.005 1.53E+00<br />
J113435.70+664252.6 sdB4V:He6 12712 67 3.131 0.023 0.057 0.000 1.07E+00<br />
J113633.63+750653.7 sdO9VII:He7 35699 62 6.000 0.016 -1.672 0.041 1.43E+00<br />
J113837.54+250043.4 sdB5IV:He4 10255 43 2.594 0.016 -0.136 0.003 2.43E+00<br />
J114454.50+031550.2 sdA7V:He4 9764 53 3.118 0.021 -0.368 0.010 1.37E+00<br />
J122617.00+774312.4 sdB1VI:He6 28443 239 5.889 0.035 -1.996 0.043 2.38E+00<br />
J122745.99+113636.1 sdB6IV:He6 10356 40 2.666 0.017 0.030 0.000 3.30E+00<br />
J122843.58+282036.6 sdB4VI:He5 10230 46 2.844 0.018 -0.083 0.002 1.87E+00<br />
J123014.92+463720.0 sdB5V:He6 12513 44 3.084 0.018 0.154 0.003 1.15E+00<br />
J125049.06+743943.5 sdB2VI:He5 27913 436 5.654 0.038 -1.993 0.043 2.28E+00<br />
J131359.98+183131.3 sdB6VI:He3 10504 52 3.505 0.017 -0.083 0.003 1.15E+00<br />
J132546.78+400827.0 sdB3V:He8 11653 68 2.761 0.023 0.122 0.002 1.50E+00<br />
J132546.78+400827.0 sdB3V:He8 16440 74 4.149 0.014 -0.641 0.023 5.48E+00<br />
J132546.78+400827.0 sdB4V:He6 11653 68 2.761 0.023 0.122 0.002 1.50E+00<br />
J132546.78+400827.0 sdB4V:He6 16440 74 4.149 0.014 -0.641 0.023 5.48E+00<br />
J135515.91+533442.5 sdB3VI:He12 25842 231 5.254 0.030 -0.673 0.008 2.54E+00<br />
J135648.63+210510.1 sdB4V:He5 10516 42 2.609 0.012 0.135 0.002 1.45E+00<br />
J140123.40+742150.5 sdB4V:He7 16490 73 4.140 0.014 -0.689 0.026 1.77E+00<br />
J142127.88+712421.4 sdB2VI:He5 25982 319 5.847 0.037 -2.346 0.096 2.57E+00<br />
J143155.38+172404.9 sdA7V:He3 10112 55 2.881 0.019 -0.404 0.015 1.19E+00<br />
J145239.03+412618.1 sdB7V:He10 9479 31 2.852 0.021 0.439 0.005 5.63E+00<br />
J152653.06+794130.7 sdB0VI:He6 32936 67 5.770 0.015 -2.235 0.075 1.69E+00<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Appendix D<br />
The <strong>Armagh</strong> <strong>Observatory</strong> Cluster<br />
Over <strong>the</strong> course <strong>of</strong> this project, <strong>Armagh</strong> <strong>Observatory</strong>, as part <strong>of</strong> <strong>the</strong> CosmoGrid 1 initiative,<br />
acquired a dedicated computing cluster which I helped to set up and administer.<br />
The s<strong>of</strong>tware configuration used by <strong>the</strong> cluster at <strong>the</strong> time <strong>of</strong> writing is documented<br />
herein.<br />
D.1 Hardware Configuration<br />
The cluster presently consists <strong>of</strong> sixteen vertically mounted Blade nodes: one master<br />
node, and fifteen slave nodes. Each slave node contains:<br />
• Two Intel Xeon 3GHz processors each with 1MB cache<br />
• 2GB RAM<br />
• <strong>On</strong>e 40GB Maxtor SATA UDMA/133 hard drive<br />
• <strong>On</strong>e Broadcom BCM5721 1000Base-T PCI Express NIC<br />
for:<br />
The master node has <strong>the</strong> same basic hardware configuration as per <strong>the</strong> slaves except<br />
1 http://www.cosmogrid.ie/<br />
193
194 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
• Two 240GB Maxtor SATA UDMA/133 hard drives<br />
• <strong>On</strong>e CDRW/DVDR drive<br />
• <strong>On</strong>e floppy disk drive<br />
• Two 1000Base-T network cards<br />
All <strong>of</strong> <strong>the</strong> nodes are interlinked by one 24 port gigabit e<strong>the</strong>rnet switch, and are<br />
connected to one 16 port KVM unit.<br />
D.2 S<strong>of</strong>tware Configuration<br />
System S<strong>of</strong>tware<br />
The operating system used on all <strong>of</strong> <strong>the</strong> nodes is currently Red Hat Enterprise Linux<br />
AS release 3 (Taroon Update 3).<br />
The following s<strong>of</strong>tware packages form <strong>the</strong> core <strong>of</strong> <strong>the</strong> cluster setup:<br />
• Condor 2 version 6.6.10<br />
• Intel Fortran Compiler Version 8.1<br />
• MPICH 1.2.4<br />
• Ganglia 3.0<br />
User Account Management<br />
User accounts are managed centrally on <strong>the</strong> master node by editing /etc/passwd and<br />
/etc/shadow using <strong>the</strong> standard account management tools. <strong>On</strong>ce any changes to <strong>the</strong><br />
2 http://www.cs.wisc.edu/condor/
D.2 S<strong>of</strong>tware Configuration 195<br />
user accounts have been made, /etc/passwd and /etc/shadow must be refreshed on<br />
all <strong>of</strong> <strong>the</strong> slave nodes by using <strong>the</strong> brcp and brsh commands.<br />
Home Directories<br />
The central partition <strong>of</strong> user home directories is located on <strong>the</strong> master node, and is<br />
shared out to all <strong>the</strong> slave nodes using NFS. This creates a single storage domain for <strong>the</strong><br />
cluster, allowing user jobs running on <strong>the</strong> slave nodes to read/write data from/to <strong>the</strong><br />
user’s home directory, thus avoiding <strong>the</strong> need for any bo<strong>the</strong>rsome manual file transfer<br />
operations.<br />
Each user has a disk space quota <strong>of</strong> 10GB.<br />
Condor<br />
Condor is a specialised batch system for managing compute-intensive jobs. Like most<br />
batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme,<br />
and resource classifications. Users submit <strong>the</strong>ir compute jobs to Condor, Condor puts<br />
<strong>the</strong> jobs in a queue, runs <strong>the</strong>m, and <strong>the</strong>n informs <strong>the</strong> user as to <strong>the</strong> result.<br />
A Condor cluster is comprised <strong>of</strong> a single machine which serves as <strong>the</strong> central manager,<br />
and an arbitrary number <strong>of</strong> o<strong>the</strong>r machines that are part <strong>of</strong> <strong>the</strong> cluster. Conceptually,<br />
<strong>the</strong> cluster is a collection <strong>of</strong> resources (machines) and resource requests (jobs).<br />
The role <strong>of</strong> Condor is to match waiting requests with available resources. Every part<br />
<strong>of</strong> Condor sends periodic updates to <strong>the</strong> central manager, <strong>the</strong> centralised repository <strong>of</strong><br />
information about <strong>the</strong> state <strong>of</strong> <strong>the</strong> cluster. Periodically, <strong>the</strong> central manager assesses<br />
<strong>the</strong> current state <strong>of</strong> <strong>the</strong> cluster and tries to match pending requests with appropriate<br />
resources.<br />
The basic Condor setup for <strong>the</strong> <strong>Armagh</strong> <strong>Observatory</strong> cluster nominates <strong>the</strong> mas-<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
196 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
ter node as <strong>the</strong> central manager for <strong>the</strong> cluster, with <strong>the</strong> slave nodes functioning as<br />
dedicated computing resources. No jobs are permitted to run on <strong>the</strong> master.<br />
Directory Layout And NFS Shares<br />
The Condor s<strong>of</strong>tware is installed<br />
solely on <strong>the</strong> master in <strong>the</strong> directory<br />
/opt/condor-6.6.10. As <strong>the</strong> name <strong>of</strong> this directory is dependent on <strong>the</strong> version <strong>of</strong><br />
Condor installed, a symbolic link called /opt/condor points to whatever directory contains<br />
<strong>the</strong> latest version. This symbolic link has been added to /etc/exports, and <strong>the</strong><br />
Condor installation directory is shared out to all <strong>the</strong> slaves over NFS.<br />
Condor is set up to require that every node has a directory on its local filesystem<br />
to which <strong>the</strong> Condor daemons can write log information and create temporary work<br />
folders for user jobs. This directory is typically located at /home/condor, however,<br />
<strong>the</strong> central NFS share <strong>of</strong> home directories from <strong>the</strong> master does not allow a unique<br />
/home/condor for every node.<br />
Instead, each slave node has a disk partition called /condorhome which contains <strong>the</strong><br />
directory /condorhome/condor/ that can be used by <strong>the</strong> local Condor daemons.<br />
<strong>On</strong> <strong>the</strong> master node, /condorhome is a symbolic link pointing to <strong>the</strong> /home partition<br />
wherein a directory called condor exists.<br />
Boot Script<br />
To ensure <strong>the</strong> Condor daemons are loaded up when a node is first powered on, a boot<br />
script named condor is located in /etc/init.d on each node. This boot script is <strong>the</strong>n<br />
sym-linked into <strong>the</strong> runlevel 3 startup scripts directory, /etc/rc3.d/, as <strong>the</strong> entry<br />
S98condor.<br />
The boot script listing is:
D.2 S<strong>of</strong>tware Configuration 197<br />
#! /bin/sh<br />
export CONDOR_CONFIG=/opt/condor/etc/condor_config<br />
MASTER=/opt/condor/sbin/condor_master<br />
PS="/bin/ps auwx"<br />
case $1 in<br />
’start’)<br />
if [ -x $MASTER ]; <strong>the</strong>n<br />
echo "Starting up Condor"<br />
$MASTER<br />
else<br />
echo "$MASTER is not executable. Skipping Condor startup."<br />
exit 1<br />
fi<br />
;;<br />
’stop’)<br />
pid=‘$PS | grep condor_master | grep -v grep | awk ’{print $2}’‘<br />
if [ -n "$pid" ]; <strong>the</strong>n<br />
# send SIGQUIT to <strong>the</strong> condor_master, which initiates its fast<br />
# shutdown method. The master itself will start sending<br />
# SIGKILL to all it’s children if <strong>the</strong>y’re not gone in 20<br />
# seconds.<br />
echo "Shutting down Condor (fast-shutdown mode)"<br />
kill -QUIT $pid<br />
else<br />
echo "Condor not running"<br />
fi<br />
;;<br />
*)<br />
echo "Usage: condor {start|stop}"<br />
;;<br />
esac<br />
User Path Setup<br />
The Condor user commands for submitting a job to <strong>the</strong> cluster, checking cluster status<br />
and job queues, etc., along with <strong>the</strong>ir associated manual pages, are located in <strong>the</strong><br />
/opt/condor subtree.<br />
To give users easy access to <strong>the</strong> commands and man pages, <strong>the</strong> appropriate shell<br />
variables are modified on login by two system-wide shell pr<strong>of</strong>ile files, condor.sh and<br />
condor.csh, located in/etc/pr<strong>of</strong>ile.d. They also set up <strong>the</strong> environment for MPICH<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
198 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
and Intel’s Fortran compiler.<br />
For bash users, condor.sh effects this configuration:<br />
export CONDOR_CONFIG=/opt/condor/etc/condor_config<br />
if [ -z "${PATH}" ]<br />
<strong>the</strong>n<br />
export PATH=/opt/condor/bin:/opt/mpich/bin<br />
else<br />
export PATH=/opt/condor/bin:/opt/mpich/bin:$PATH<br />
fi<br />
if [ -z "${MANPATH}" ]<br />
<strong>the</strong>n<br />
export MANPATH=/opt/condor/man:/opt/mpich/man<br />
else<br />
export MANPATH=/opt/condor/man:/opt/mpich/man:$MANPATH<br />
fi<br />
if [ ‘id -u‘ = 0 ]; <strong>the</strong>n<br />
export PATH=$PATH:/opt/condor/sbin:/opt/mpich/sbin<br />
fi<br />
### Set up ifort and idb<br />
. /opt/intel_fc_80/bin/ifortvars.sh<br />
. /opt/intel_idb_80/bin/idbvars.sh<br />
And condor.csh does <strong>the</strong> same for tcsh users:<br />
setenv CONDOR_CONFIG /opt/condor/etc/condor_config<br />
if !($?PATH) <strong>the</strong>n<br />
setenv PATH /opt/condor/bin:/opt/mpich/bin<br />
else<br />
setenv PATH /opt/condor/bin:/opt/mpich/bin:$PATH<br />
endif<br />
if !($?MANPATH) <strong>the</strong>n<br />
setenv MANPATH /opt/condor/man:/opt/mpich/man<br />
else<br />
setenv MANPATH /opt/condor/man:/opt/mpich/man:$MANPATH<br />
endif<br />
### Set up ifort and idb<br />
source /opt/intel_fc_80/bin/ifortvars.csh<br />
source /opt/intel_idb_80/bin/idbvars.csh
D.2 S<strong>of</strong>tware Configuration 199<br />
Condor Configuration Files<br />
/opt/condor/etc/condor_config is <strong>the</strong> global Condor configuration file containing<br />
settings for everything from basic cluster setup details, to network permissions, user<br />
policies, flocking, daemon controls, and so on.<br />
Most <strong>of</strong> <strong>the</strong> setting in this file can be left at <strong>the</strong>ir defaults. However, Part <strong>On</strong>e <strong>of</strong> <strong>the</strong><br />
file contains settings that must be customised for <strong>the</strong> particular Condor installation at<br />
a site. For <strong>the</strong> <strong>Observatory</strong> cluster, <strong>the</strong> settings for Part <strong>On</strong>e are as follows:<br />
CONDOR_HOST<br />
RELEASE_DIR<br />
LOCAL_DIR<br />
LOCAL_CONFIG_FILE<br />
= master<br />
= /opt/condor<br />
= /condorhome/condor<br />
= $(RELEASE_DIR)/etc/$(HOSTNAME).local<br />
REQUIRE_LOCAL_CONFIG_FILE = TRUE<br />
CONDOR_ADMIN<br />
MAIL<br />
UID_DOMAIN<br />
FILESYSTEM_DOMAIN<br />
= root@master<br />
= /usr/bin/mail<br />
= arm.ac.uk<br />
= $(FULL_HOSTNAME)<br />
O<strong>the</strong>r miscellaneous settings that have been changed are:<br />
### <strong>On</strong>ly allow daemon read/write access to <strong>the</strong><br />
### slave nodes connected on <strong>the</strong> LAN.<br />
HOSTALLOW_READ = 192.168.0.*<br />
HOSTALLOW_WRITE = 192.168.0.*<br />
### Fully qualified names are not used in /etc/hosts<br />
### so Condor likes this set.<br />
DEFAULT_DOMAIN_NAME = arm.ac.uk<br />
Each <strong>of</strong> <strong>the</strong> nodes in <strong>the</strong> cluster has its own Condor configuration file in/opt/condor/etc.<br />
The master node and <strong>the</strong> slave nodes are treated differently with <strong>the</strong> master having its<br />
own specific settings, and <strong>the</strong> slaves all having <strong>the</strong> same settings.<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
200 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
The master’s configuration file, m44.local, contains <strong>the</strong> following:<br />
### The master never runs jobs<br />
START = FALSE<br />
### There are two NICs in <strong>the</strong> master. This tells<br />
### Condor to use <strong>the</strong> internal NIC.<br />
NETWORK_INTERFACE = 192.168.0.149<br />
COLLECTOR<br />
NEGOTIATOR<br />
DAEMON_LIST<br />
= $(SBIN)/condor_collector<br />
= $(SBIN)/condor_negotiator<br />
= MASTER, COLLECTOR, STARTD, NEGOTIATOR, SCHEDD<br />
JAVA = /usr/bin/java<br />
### Turn <strong>of</strong>f reporting <strong>of</strong> pool stats to <strong>the</strong><br />
### Condor people<br />
CONDOR_DEVELOPERS_COLLECTOR = NONE<br />
CONDOR_DEVELOPERS = NONE<br />
### PRIORITY_HALFLIFE = 1 adjust a user’s Condor<br />
### priority in real-time. Thus, when <strong>the</strong>ir job<br />
### releases any resources, <strong>the</strong> user’s priority<br />
### returns to 0.5 very quickly.<br />
PRIORITY_HALFLIFE = 1<br />
### Turn <strong>of</strong>f any job preemption. No jobs will be<br />
### preempted for any reason.<br />
PREEMPTION_REQUIREMENTS = FALSE<br />
PREEMPTION_RANK = FALSE<br />
As each slave node has <strong>the</strong> same configuration, a time-saving device has been employed<br />
wherein any modifications to <strong>the</strong> slave setup are made in a template file called<br />
node.local.template. This file is <strong>the</strong>n copied using a shell script to create all <strong>the</strong><br />
nodeXX.local files for <strong>the</strong> slaves.<br />
At present, node.local.template contains:<br />
### Dedicated scheduler for running MPI jobs.<br />
DedicatedScheduler = "DedicatedScheduler@master"<br />
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler<br />
START = TRUE<br />
SUSPEND = FALSE<br />
CONTINUE = TRUE<br />
PREEMPT = FALSE<br />
KILL = FALSE<br />
WANT_SUSPEND = FALSE<br />
WANT_VACATE = FALSE
D.2 S<strong>of</strong>tware Configuration 201<br />
RANK<br />
= Scheduler =?= $(DedicatedScheduler)<br />
### Tell <strong>the</strong> daemons not to pay attention to any<br />
### console activity. Prevents <strong>the</strong>ir Condor status<br />
### changing to ’Owner’ if someone logs in to a<br />
### node to perform maintenance.<br />
VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE = 0<br />
VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD = 0<br />
The shell script which performs <strong>the</strong> copying, refresh.sh, works by slurping <strong>the</strong><br />
node names from /etc/brshtab, and <strong>the</strong>n copies <strong>the</strong> template file using a for loop:<br />
#!/bin/bash<br />
NODES="‘cat /etc/brshtab‘"<br />
for I in $NODES<br />
do<br />
cp node.local.template $I.local<br />
done<br />
Condor User Policies<br />
Most users <strong>of</strong> <strong>the</strong> cluster tend to run large batches <strong>of</strong> relatively short jobs (<strong>of</strong> <strong>the</strong> order<br />
<strong>of</strong> < 1 hour per job), but some have submitted a small number <strong>of</strong> long-running jobs (<strong>of</strong><br />
<strong>the</strong> order <strong>of</strong> several hours to several days).<br />
In general, each job submitted must be allowed to run to completion without being<br />
preempted, o<strong>the</strong>rwise <strong>the</strong> job must start again at <strong>the</strong> beginning when it is reallocated<br />
to a slave node. For users who submit large batches <strong>of</strong> short jobs, such preemption is<br />
merely troubling. However, for users with long-running jobs, any interruption could<br />
mean <strong>the</strong> serious loss <strong>of</strong> several days <strong>of</strong> computation.<br />
To ensure fair use <strong>of</strong> cluster resources without job preemption, <strong>the</strong><br />
PRIORITY_HALFLIFE Condor variable has been set to equal 1 in <strong>the</strong> local configuration<br />
file for <strong>the</strong> master node. This allows Condor to adjust a user’s priority level<br />
almost as soon as <strong>the</strong>ir jobs start running. As <strong>the</strong>ir jobs begin to use cluster resources,<br />
Condor lowers <strong>the</strong> user’s priority. If someone else submits a batch <strong>of</strong> jobs to <strong>the</strong> queue,<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
202 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
<strong>the</strong>ir user priority will be higher than that <strong>of</strong> <strong>the</strong> o<strong>the</strong>r user. So, as one <strong>of</strong> <strong>the</strong> o<strong>the</strong>r<br />
user’s jobs finishes on a node, Condor will <strong>the</strong>n allocate that node to a job belonging to<br />
<strong>the</strong> user with <strong>the</strong> highest priority. This will gradually allow Condor to balance out <strong>the</strong><br />
allocation <strong>of</strong> resources so that no one user can use all <strong>of</strong> <strong>the</strong> resources all <strong>of</strong> <strong>the</strong> time.<br />
To prevent Condor from preempting currently running jobs if someone with a higher<br />
user priority submits jobs to <strong>the</strong> queue, <strong>the</strong> configuration variables<br />
PREEMPTION_REQUIREMENTS and PREEMPTION_RANK have both been set to false in <strong>the</strong><br />
master’s local configuration file.<br />
Over time, undoubtedly <strong>the</strong> user policy for <strong>the</strong> cluster will change. Refer to Section<br />
3 <strong>of</strong> <strong>the</strong> Condor manual.<br />
D.3 MPICH 1.2.4 RPM Spec File<br />
This spec file can be used to build RPM packages from a standard MPICH v1.2.4 tarball.<br />
The spec file ensures that <strong>the</strong> current installation <strong>of</strong> Intel’s Fortran compiler is<br />
used to build <strong>the</strong> F77 and F90 bindings, and it produces two RPMs: one standard RPM<br />
which contains <strong>the</strong> MPICH runtime libraries that should be installed on all <strong>the</strong> nodes,<br />
and a development RPM containing all <strong>the</strong> MPI compiler wrappers which should only<br />
be installed on <strong>the</strong> master node.<br />
Name: mpich<br />
License: O<strong>the</strong>r License(s), see package<br />
Group: Development/Libraries/Parallel<br />
URL: ftp://ftp.mcs.anl.gov/pub/mpi/old/<br />
Version: 1.2.4<br />
Release: 3<br />
Summary: A Portable Implementation <strong>of</strong> MPI<br />
Source: mpich-%{version}.tar.gz<br />
BuildRoot: %{_tmppath}/%{name}-%{version}-build<br />
Autoreqprov: on<br />
%define _mpich_root /opt/mpich<br />
%description<br />
MPICH is a freely available, portable implementation <strong>of</strong><br />
MPI, <strong>the</strong> Standard for message-passing libraries.
D.3 MPICH 1.2.4 RPM Spec File 203<br />
%package devel<br />
Summary: A Portable Implementation <strong>of</strong> MPI<br />
Group: Development/Libraries/Parallel<br />
Autoreqprov: on<br />
Requires: mpich<br />
Provides: mpich-doc<br />
Obsoletes: mpich-doc<br />
%description devel<br />
MPICH is a freely available, portable implementation <strong>of</strong><br />
MPI, <strong>the</strong> Standard for message-passing libraries.<br />
%prep<br />
%setup -q<br />
DIRS=$(find -type d)<br />
%build<br />
CFLAGS=$RPM_OPT_FLAGS; export CFLAGS;<br />
export F90="ifort" ;<br />
export FC="ifort" ;<br />
export CCFLAGS="-O2";<br />
export FFLAGS="-O2";<br />
export RSHCOMMAND="/opt/condor/sbin/rsh";<br />
sh configure --with-arch=LINUX \<br />
--with-device=ch_p4 \<br />
--with-comm=ch_p4 \<br />
--with-romio \<br />
--with-mpe \<br />
--libdir=$RPM_BUILD_ROOT%{_mpich_root}/%_lib \<br />
--enable-sharedlib \<br />
--enable-c++ \<br />
--enable-f77 \<br />
--enable-f90modules \<br />
--disable-mpedbg \<br />
--disable-devdebug \<br />
--disable-debug \<br />
-prefix=$RPM_BUILD_ROOT%{_mpich_root} \<br />
-c++=/usr/bin/g++ \<br />
-opt=-O2 \<br />
-cc=/usr/bin/gcc \<br />
-fc=/opt/intel_fc_80/bin/ifort \<br />
-f90=/opt/intel_fc_80/bin/ifort \<br />
-f90flags=-O2 \<br />
-optcc=-O2 \<br />
-mpe_opts=-O2<br />
make<br />
%install<br />
rm -rf $RPM_BUILD_ROOT<br />
make install PREFIX=$RPM_BUILD_ROOT%{_mpich_root} \<br />
MPIINSTALL_OPTS="-manpath=$RPM_BUILD_ROOT/%{_mpich_root}/man" \<br />
-libdir=$RPM_BUILD_ROOT/%{_mpich_root}/%_lib<br />
find $RPM_BUILD_ROOT%{_mpich_root} -type l -name "mpirun" | \<br />
xargs rm -f<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
204 Chapter D - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
grep -lr "$RPM_BUILD_ROOT" $RPM_BUILD_ROOT/%{_mpich_root}/ | \<br />
xargs perl -pi -e "s@$RPM_BUILD_ROOT@@g"<br />
rm -f examples/perftest/config.cache \<br />
examples/perftest/config.log \<br />
examples/perftest/config.status \<br />
examples/test/config.log \<br />
examples/test/config.status<br />
# libs<br />
rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/lib*<br />
rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared/lib*<br />
[ -e lib/libmpich.a ] && cp -f lib/*.a $RPM_BUILD_ROOT%{_mpich_root}/%_lib<br />
[ -e lib/*.o ] && cp -f lib/*.o $RPM_BUILD_ROOT%{_mpich_root}/%_lib<br />
[ -e lib/*.s* ] && cp -f lib/*.s* $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared<br />
for i in libfmpich libmpich libpmpich; do<br />
echo Working on $i;<br />
cp -f lib/shared/$i.so.1.0 $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared<br />
(<br />
cd $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared;<br />
ln -sf $i.so.1.0 $i.so<br />
)<br />
done<br />
# docs<br />
rm -fr $RPM_BUILD_ROOT%{_mpich_root}/www<br />
export manpath="$manpath /opt/mpich/man"<br />
%clean<br />
#rm -rf $RPM_BUILD_ROOT<br />
%files<br />
%defattr(-,root,root,755)<br />
%doc COPYRIGHT<br />
%{_mpich_root}/sbin/*<br />
%{_mpich_root}/bin/mpirun*<br />
%{_mpich_root}/bin/mpiman<br />
%{_mpich_root}/bin/mpireconfig<br />
%{_mpich_root}/bin/mpireconfig.dat<br />
%{_mpich_root}/bin/tarch<br />
%{_mpich_root}/bin/tdevice<br />
%{_mpich_root}/bin/serv_p4<br />
%{_mpich_root}/%_lib/shared/*.so.*<br />
%{_mpich_root}/share/*<br />
%{_mpich_root}/man/mandesc<br />
%{_mpich_root}/man/man1/*.1*<br />
%files devel<br />
%defattr(-,root,root,755)<br />
%{_mpich_root}/doc/*<br />
%{_mpich_root}/examples/*<br />
%{_mpich_root}/man/man3/*.3*<br />
%{_mpich_root}/man/man4/*.4*<br />
%doc COPYRIGHT<br />
%{_mpich_root}/include/mpi2c++/*.h<br />
%{_mpich_root}/include/f90base/*.mod<br />
%{_mpich_root}/include/f90choice/*.mod<br />
%{_mpich_root}/include/*.h
D.3 MPICH 1.2.4 RPM Spec File 205<br />
%{_mpich_root}/%_lib/*.a<br />
%{_mpich_root}/%_lib/shared/*.so<br />
%{_mpich_root}/etc/*<br />
%{_mpich_root}/bin/mpicc<br />
%{_mpich_root}/bin/mpiCC<br />
%{_mpich_root}/bin/mpif77<br />
%{_mpich_root}/bin/mpif90<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
Appendix E<br />
LTE-CODES<br />
LTE-CODES is a package <strong>of</strong> Fortran programs and supporting libraries for analysing <strong>the</strong><br />
spectra <strong>of</strong> hot stars. The main components <strong>of</strong> <strong>the</strong> package are:<br />
STERNE computes plane-parallel, line-blanketed model atmospheres for hot stars,<br />
T eff > 8000K, in local <strong>the</strong>rmal, radiative, and hydrostatic equilibrium. The code<br />
handles extremely H-deficient mixtures and composition stratification.<br />
SPECTRUM computes syn<strong>the</strong>tic spectra, line pr<strong>of</strong>iles, equivalent widths, and specific<br />
intensities, assuming LTE, from model atmospheres <strong>of</strong> hot stars, T eff ><br />
8000K. It can handle atmospheres <strong>of</strong> arbitrary chemical composition.<br />
SFIT is a general-purpose code designed to optimise <strong>the</strong>oretical stellar spectra to<br />
an observed spectrum. The code <strong>of</strong>fers several different parameter optimisation<br />
methods, including Levenburg-Marquardt, Amoeba, and Genetic Algorithms. It<br />
has also been designed for both single and composite (binary) stellar spectra.<br />
As part <strong>of</strong> this <strong>the</strong>sis, <strong>the</strong> old build system for <strong>the</strong>se codes (which was based on<br />
a series <strong>of</strong> hand-coded Makefiles) was overhauled and ported to <strong>the</strong> GNU Autotools 1<br />
system. GNU Autotools is a suite <strong>of</strong> tools that assists in making s<strong>of</strong>tware projects<br />
1 http://www.gnu.org/<br />
207
208 Chapter E - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
easy to build across many platforms. It <strong>of</strong>fers a flexible environment for automatically<br />
configuring and generating Makefiles according to <strong>the</strong> needs <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware project,<br />
and adapting <strong>the</strong>m to suit <strong>the</strong> specifics <strong>of</strong> whatever operating system, compilers, and<br />
o<strong>the</strong>r system tools are at hand.<br />
E.1 Directory Layout<br />
The hierarchical layout <strong>of</strong> <strong>the</strong> LTE-CODES package is straightforward. The top-level directory<br />
branches into several subdirectories, <strong>the</strong> most important <strong>of</strong> which issrc. Within<br />
src are two subdirectories pointing towards <strong>the</strong> source code for all <strong>the</strong> libraries and<br />
apps (i.e., <strong>the</strong> applications STERNE, SPECTRUM, and SFIT). In summary:<br />
lte-codes-x.x<br />
|<br />
|-- config<br />
|-- include<br />
\-- src<br />
|<br />
|-- libraries<br />
| |<br />
| |--------------\<br />
| | |<br />
| |-- at |-- pr<strong>of</strong><br />
| |-- bb |-- qub<br />
| |-- chr |-- rot<br />
| |-- dp |-- rtf<br />
| |-- mth |-- sdb<br />
| |-- mx |-- stn2<br />
| |-- nr |-- str<br />
| |-- nr_d |-- tap<br />
| |-- op |-- tap95<br />
| |-- opk2 |-- util<br />
| |-- phot \-- xfit<br />
| \-- phys<br />
|<br />
|<br />
\-- apps<br />
|<br />
|-- sfit2<br />
|-- spectrum<br />
\-- sterne
E.2 Build System Organisation 209<br />
E.2 Build System Organisation<br />
The central components <strong>of</strong> <strong>the</strong> autotools-based build system are <strong>the</strong> configure.in<br />
file which resides in <strong>the</strong> top-level directory, and <strong>the</strong> Makefile.am files which are to be<br />
found one in every directory.<br />
configure.in<br />
configure.in is actually a Bourne shell script which contains a number <strong>of</strong> calls to<br />
autoconf and automake macros in order to set up <strong>the</strong> build environment. The particular<br />
language used for <strong>the</strong> project can be selected, and specific details such as compiler<br />
commands and flags can be defined. The autoconf macros also allow <strong>the</strong> programmer<br />
to tell <strong>the</strong> build system to test <strong>the</strong> underlying operating system for <strong>the</strong> existence <strong>of</strong><br />
particular tools, libraries, and files, and to modify <strong>the</strong> source files <strong>of</strong> <strong>the</strong> project as<br />
appropriate.<br />
configure.in is processed by autoconf to generate a configure script. When this<br />
script is executed, it traverses <strong>the</strong> build tree and generates all <strong>the</strong> necessary Makefiles<br />
in <strong>the</strong> correct manner.<br />
The contents <strong>of</strong> configure.in for LTE-CODES 1.4 are as follows:<br />
AC_INIT<br />
AC_CONFIG_AUX_DIR(config)<br />
AM_INIT_AUTOMAKE(lte-codes, 1.4, "http://www.arm.ac.uk/~csj")<br />
AC_SUBST(ac_aux_dir)<br />
# Checks for programs.<br />
AC_PROG_F77(ifort ifc)<br />
AC_PROG_LIBTOOL<br />
AC_PROG_MAKE_SET<br />
FFLAGS=’-I$(top_srcdir)/include -I$(top_srcdir)/include/mod -cm -w -w90 -w95’<br />
AC_OUTPUT(Makefile \<br />
src/Makefile \<br />
src/libraries/Makefile \<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
210 Chapter E - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
src/libraries/at/Makefile \<br />
src/libraries/bb/Makefile \<br />
src/libraries/chr/Makefile \<br />
src/libraries/dp/Makefile \<br />
src/libraries/mth/Makefile \<br />
src/libraries/mx/Makefile \<br />
src/libraries/nr/Makefile \<br />
src/libraries/nr_d/Makefile \<br />
src/libraries/op/Makefile \<br />
src/libraries/opk2/Makefile \<br />
src/libraries/phot/Makefile \<br />
src/libraries/phys/Makefile \<br />
src/libraries/pr<strong>of</strong>/Makefile \<br />
src/libraries/qub/Makefile \<br />
src/libraries/rot/Makefile \<br />
src/libraries/rtf/Makefile \<br />
src/libraries/sdb/Makefile \<br />
src/libraries/stn2/Makefile \<br />
src/libraries/str/Makefile \<br />
src/libraries/tap/Makefile \<br />
src/libraries/tap95/Makefile \<br />
src/libraries/util/Makefile \<br />
src/libraries/xfit/Makefile \<br />
src/apps/Makefile \<br />
src/apps/sfit2/Makefile \<br />
src/apps/spectrum/Makefile \<br />
src/apps/spectrum/data/Makefile \<br />
src/apps/spectrum/models/Makefile \<br />
src/apps/spectrum/scripts/Makefile \<br />
src/apps/sterne/Makefile \<br />
src/apps/sterne/scripts/Makefile \<br />
src/apps/sterne/utils/Makefile)<br />
If any modifications are made to configure.in, autoconf must be invoked on it to<br />
effect <strong>the</strong> changes. A small shell script called bootstrap has been defined to call<br />
autoconf in this instance, and <strong>the</strong> o<strong>the</strong>r autotools utilities, to ensure <strong>the</strong> entire build<br />
system is updated correctly. bootstrap is defined as:<br />
#!/bin/sh<br />
libtoolize --force --copy<br />
aclocal -I config<br />
automake --add-missing --force-missing --gnu --copy<br />
autoconf<br />
In-depth documentation on autoconf can be found in <strong>the</strong> manual located at: http:<br />
//www.gnu.org/s<strong>of</strong>tware/autoconf/manual/index.html
E.2 Build System Organisation 211<br />
Makefile.am<br />
Every Makefile.am is processed by automake to produce a Makefile.in file. This is<br />
subsequently used by <strong>the</strong> configure script to create a Makefile at every point in <strong>the</strong><br />
build tree. Typically, each Makefile.am contains a number <strong>of</strong> variable assignments<br />
that are used to describe what source files are to be compiled, if <strong>the</strong> sources form a<br />
library or a binary, what subdirectories lie beneath <strong>the</strong> current directory, and so on.<br />
In <strong>the</strong> top-level directory, Makefile.am contains <strong>the</strong> following:<br />
include $(top_srcdir)/config/am_global_include.mk<br />
## Proces this file with automake to produce Makefile.in<br />
SUBDIRS = src<br />
# Include bootstrap script and o<strong>the</strong>r folders in distribution<br />
EXTRA_DIST = bootstrap include test<br />
# Include files in config directory in distribution<br />
AUX_DIST = $(ac_aux_dir)/config.guess \<br />
$(ac_aux_dir)/config.sub \<br />
$(ac_aux_dir)/install-sh \<br />
$(ac_aux_dir)/ltmain.sh \<br />
$(ac_aux_dir)/missing \<br />
$(ac_aux_dir)/mkinstalldirs \<br />
$(ac_aux_dir)/am_global_include.mk<br />
MAINTAINERCLEANFILES = Makefile.in aclocal.m4 configure config-h.in $(AUX_DIST)<br />
## Make sure config directory and files it contains are correctly<br />
## added to distribution by ’make dist’<br />
dist-hook:<br />
for file in $(AUX_DIST); do \<br />
cp $$file $(distdir)/$$file; \<br />
done<br />
This file is fairly basic, <strong>the</strong> most significant entry being <strong>the</strong> SUBDIRS variable which<br />
specifies what subdirectories must be traversed from here during <strong>the</strong> build. The rest<br />
<strong>of</strong> <strong>the</strong> assignments are mostly concerned with telling <strong>the</strong> build system about o<strong>the</strong>r files<br />
which are part <strong>of</strong> <strong>the</strong> project but don’t need to be compiled.<br />
The Makefile.am for a program or a library looks like this:<br />
<strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong>
212 Chapter E - <strong>On</strong> <strong>the</strong> <strong>Automatic</strong> <strong>Analysis</strong> <strong>of</strong> <strong>Stellar</strong> <strong>Spectra</strong><br />
include $(top_srcdir)/config/am_global_include.mk<br />
SUBDIRS = scripts data models<br />
bin_PROGRAMS = spectrum<br />
spectrum_SOURCES = Spectrum.f<br />
spectrum_LDADD = \<br />
../../libraries/dp/libdp.a \<br />
../../libraries/qub/libqub.a \<br />
../../libraries/opk2/libopk2.a \<br />
../../libraries/op/libop.a \<br />
../../libraries/tap95/libtap95.a \<br />
../../libraries/str/libstr.a \<br />
../../libraries/chr/libchr.a \<br />
../../libraries/rtf/librtf.a \<br />
../../libraries/nr/libnr.a \<br />
../../libraries/nr_d/libnr_d.a \<br />
../../libraries/mth/libmth.a<br />
Here, <strong>the</strong> name <strong>of</strong> <strong>the</strong> final program is specified along with its source files and libraries<br />
upon which it depends. The build system takes care to ensure that any such dependencies<br />
are compiled first before any attempt is made to compile <strong>the</strong> current program<br />
or library.<br />
Fur<strong>the</strong>r documentation onautomake can be found athttp://www.gnu.org/s<strong>of</strong>tware/<br />
automake/manual/automake.html<br />
E.3 Installation Instructions<br />
To install LTE-CODES from <strong>the</strong> source tarball as a non-root user to an arbitrary directory:<br />
1. Unpack <strong>the</strong> archive: tar -xvzf lte-codes-x.x.tar.gz<br />
2. cd lte-codes-x.x<br />
3. ./configure --prefix=/path/to/install<br />
4. make<br />
5. make install<br />
6. Set <strong>the</strong> shell environment variable LTECODES to point to <strong>the</strong> install location
213