Design, Implementation and Test of a new Feature Extractor for the ...

Design, Implementation and Test 

of a New Feature Extractor 

for the IceCube Neutrino Observatory 

von 

Marius Wallraff 

Diplomarbeit in P H Y S I K 

vorgelegt der 

Fakultät für Mathematik, Informatik und 

Naturwissenschaften 

der Rheinisch-Westfälischen Technischen Hochschule Aachen 

im 

März 2010 

angefertigt am 

III. Physikalischen Institut B 

Prof. Dr. Christopher Wiebusch

NewFeatu r eExtracto r

Abstract 

The IceCube Neutrino Observatory at South Pole consists 

of digital optical modules (DOMs) deep down in the 

ice equipped with photomultipliers to capture Čerenkov 

light induced by muons and other particles. These 

DOMs digitize the analogue photomultiplier signals and 

send the resulting waveforms to the surface. The large 

amount of information has to be condensed for later particle 

track and energy reconstructions. 

This thesis presents a new framework – the NewFeatureExtractor, 

NFE – to extract the arrival times and 

the number of photons. Four algorithms have been implemented 

in this framework to analyze different types 

of waveforms. Their performance is tested by comparison 

between simulated data and experimental data, and 

by comparison with earlier algorithms, which are also 

analyzed conceptually.

Contents 

Abstract 

List of Figures 

List of Tables 

i 

vi 

ix 

1 Introduction 1 

2 Neutrino Astrophysics 5 

2.1 Neutrinos in Comparison with Other Messenger Particles . . . . . . . . . . 6 

2.1.1 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 

2.1.2 Cosmic Rays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

2.1.3 Gravitational Waves . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.2 Neutrino Production Processes . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.2.1 Nuclear Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.2.2 Thermal Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.2.3 Cosmic Ray Interactions . . . . . . . . . . . . . . . . . . . . . . . . 13 

2.3 Air Showers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

2.3.1 Atmospheric Muons . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

2.3.2 Atmospheric Neutrinos . . . . . . . . . . . . . . . . . . . . . . . . . 16 

3 Neutrino Detection 17 

3.1 Neutrino Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

3.2 Lepton Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 

3.2.1 Electron Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

3.2.2 Muon Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

3.2.3 Tau Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 

3.3 Čerenkov Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 

4 The IceCube Neutrino Observatory 25 

4.1 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

4.2 Ice Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 

4.3 Digital Optical Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

4.4 Signal Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

4.4.1 ATWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

4.4.2 FADC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

4.4.3 SLC Chargestamps . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 

4.5 Data Structure and Data Rate . . . . . . . . . . . . . . . . . . . . . . . . . 35 

4.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

4.6.1 DOMcalibrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

iii

CONTENTS 

5 Feature Extraction in IceCube 37 

5.1 FeatureExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 

5.2 PulseExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

5.3 SLCHitExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

5.4 NewFeatureExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

5.4.1 Main Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

5.4.2 Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 

5.4.3 Time Offset Constants . . . . . . . . . . . . . . . . . . . . . . . . . 44 

5.4.4 Waveforms Without Pulses . . . . . . . . . . . . . . . . . . . . . . . 45 

5.4.5 Pulse Merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

6 Algorithms Implemented in NFE 47 

6.1 Pre-evaluation Algorithm “Eva” . . . . . . . . . . . . . . . . . . . . . . . . 48 

6.2 Extraction Algorithm “Simple” . . . . . . . . . . . . . . . . . . . . . . . . 49 

6.3 Extraction Algorithm “BayesUnfold” . . . . . . . . . . . . . . . . . . . . . 50 

6.3.1 Differences of FE’s and PE’s Implementations . . . . . . . . . . . . 54 

6.4 Extraction Algorithm “SLCHE” . . . . . . . . . . . . . . . . . . . . . . . . 55 

7 Performance Optimization 57 

7.1 Calibration Using Monte-Carlo Data . . . . . . . . . . . . . . . . . . . . . 58 

7.1.1 Pre-evaluation Algorithm "Eva" . . . . . . . . . . . . . . . . . . . . 58 

7.1.2 Extraction Algorithm "Simple" . . . . . . . . . . . . . . . . . . . . . 62 

7.1.3 Extraction Algorithm "BayesUnfold" . . . . . . . . . . . . . . . . . 65 

7.1.4 Extraction Algorithm "SLCHE" . . . . . . . . . . . . . . . . . . . . 67 

7.2 Verification Using Experimental Data . . . . . . . . . . . . . . . . . . . . . 71 

8 Performance Tests 79 

8.1 Extraction of Simple Pulses with “Simple” and “BayesUnfold” . . . . . . . 80 

8.2 Extraction of Exotic Features . . . . . . . . . . . . . . . . . . . . . . . . . 90 

8.3 Comparison with Other Feature Extractors . . . . . . . . . . . . . . . . . . 97 

8.3.1 FeatureExtractor in Multi-Pulse Mode . . . . . . . . . . . . . . . . 97 

8.3.2 FeatureExtractor in Single-Pulse Mode . . . . . . . . . . . . . . . . 112 

8.4 SLCHitExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 

8.5 Runtime Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 

9 Summary And Outlook 121 

A Bayesian Unfolding 125 

A.1 Formal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 

A.2 Adaption to IceCube’s Waveforms . . . . . . . . . . . . . . . . . . . . . . . 125 

B Cascade Pulse Tagging 127 

iv

CONTENTS 

C Specific Problems and Anomalies 129 

C.1 ATWD FADC Time Offset Caused Double Extraction . . . . . . . . . . . . 129 

C.2 Implementation of the Second Single-Pulse Extraction Algorithm in FeatureExtractor 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 

C.3 Missing Simulation of the daq_baseline in DOMsimulator . . . . . . . . . 130 

C.4 Time Offset in SLCHitExtractor . . . . . . . . . . . . . . . . . . . . . . . . 130 

Acknowledgements 

Erklärung / Declaration 

References 

I 

III 

V 

v

List of Figures 

2.1 Differential particle flux for major cosmic ray components. . . . . . . . . . 8 

2.2 All-particle cosmic ray energy spectrum from air shower measurements. . . 9 

2.3 Second and first order Fermi acceleration. . . . . . . . . . . . . . . . . . . 10 

2.4 Cosmic ray deflection angles near the GZK cutoff. . . . . . . . . . . . . . . 11 

2.5 Total number of neutrinos produced in stars as function of star mass. . . . 13 

2.6 Cosmic rays and neutrinos hitting the Earth. . . . . . . . . . . . . . . . . . 15 

2.7 Vertical atmospheric muon and muon neutrino differential fluxs. . . . . . . 15 

2.8 Zenith distribution of atmospheric muons and atmospheric muon neutrinos. 16 

3.1 Neutrino nucleon cross-sections as a function of energy. . . . . . . . . . . . 18 

3.2 Signatures of charged current interactions of neutrinos in Čerenkov media. 20 

3.3 Depth development of electron initiated cascades. . . . . . . . . . . . . . . 20 

3.4 Effective radiation length modified by the LPM effect. . . . . . . . . . . . . 21 

3.5 Expected track length for muons and taus in Antarctic ice. . . . . . . . . . 22 

3.6 Emission of Čerenkov radiation. . . . . . . . . . . . . . . . . . . . . . . . . 23 

4.1 IceCube surface geometry with deployment season information. . . . . . . 26 

4.2 Three-dimensional sketch of IceCube with DeepCore. . . . . . . . . . . . . 27 

4.3 Effective scattering length and absorption length in deep Antarctic ice. . . 29 

4.4 Sketch of an IceCube Digital Optical Module (DOM). . . . . . . . . . . . . 31 

4.5 Waveforms illustrating the toroidal droop effect for OT and NT DOMs. . . 32 

4.6 Comparison of different ATWD and FADC SPE shape parametrizations. . 34 

5.1 Sketch illustrating FeatureExtractor’s two single-pulse extraction algorithms. 39 

6.1 Sketch illustrating NFE’s pre-evaluation algorithm “Eva”. . . . . . . . . . . 48 

6.2 Sketch illustrating NFE’s extraction algorithm “Simple”. . . . . . . . . . . 49 

6.3 Sketch illustrating Bayesian Unfolding. . . . . . . . . . . . . . . . . . . . . 51 

6.4 FE, PE, and NFE pulse definition algorithms for deconvoluted distributions. 53 

6.5 Residual plot for the width parametrization used in “BayesUnfold”. . . . . 54 

6.6 Sketch comparing NFE’s extraction algorithm “SLCHE” to SLCHitExtractor. 55 

7.1 Two examples for waveforms with pulses extracted by NFE. . . . . . . . . 59 



7.4 Time residuals of NFE extracted pulses for different “Simple” charge thresholds. 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 

7.5 Charge-time correlation of pulses from NFE extraction algorithm “Simple”. 65 

7.6 Effects of “BayesUnfold”’s variable number of iterations – time residuals. . 68 

7.7 Effects of “BayesUnfold”’s variable number of iterations – charges. . . . . . 69 

7.8 Effects of “BayesUnfold”’s variable number of iterations – numbers of pulses. 70 

7.9 Effects of “BayesUnfold”’s optimized deconvolution starting distribution. . 71 

7.10 Time residuals and charges of “SLCHE” and NFE FADC pulses. . . . . . . 72 

7.11 Time differences of ATWD and FADC NFE pulses for MC and experimental 

data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 

vi

LIST OF FIGURES 

7.12 Ratio of total FADC and ATWD waveform charge for MC and experimental 

data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 

7.13 Charges of the first ATWD and FADC NFE pulses for MC and experimental 

data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 

7.14 Differences between the total charges of ATWD and FADC pulses for MC 

and experimental data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 

7.15 Comparison of ATWD and FADC pulses for MC and experimental data. . 77 

8.1 Time residuals for the first pulses from simple MC waveforms extracted by 

“Simple” and “BayesUnfold”. . . . . . . . . . . . . . . . . . . . . . . . . . . 81 

8.2 Numbers of pulses from simple MC waveforms extracted by “Simple” and 

“BayesUnfold”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 

8.3 Example for a waveform with a high baseline caused by incomplete simulation. 83 

8.4 Effect of erroneous baselines on “Simple”’s and “BayesUnfold”’s numbers 

of pulses from simple MC waveforms. . . . . . . . . . . . . . . . . . . . . . 84 

8.5 Charges of the first pulses from simple MC waveforms extracted by “Simple” 

and “BayesUnfold”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 

8.6 Differences between the total charges of all pulses from simple MC waveforms 

extracted by “Simple” and “BayesUnfold”. . . . . . . . . . . . . . . . 86 

8.7 Effect of erroneous baselines on “Simple”’s and “BayesUnfold”’s total charge 

from simple MC waveforms. . . . . . . . . . . . . . . . . . . . . . . . . . . 87 

8.8 Numbers of pulses from simple experimental waveforms extracted by “Simple” 

and “BayesUnfold”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 

8.9 Differences between the total charges of all pulses from simple experimental 

waveforms extracted by “Simple” and “BayesUnfold”. . . . . . . . . . . . . 89 

8.10 Example waveforms to demonstrate NFE’s option EnforcePulse. . . . . . . 91 

8.11 Example waveforms of exotic or difficult features. . . . . . . . . . . . . . . 92 




8.15 Example of a bright waveform from Markus Voge’s catalog. . . . . . . . . . 97 

8.16 Time residuals of the first pulses extracted by FE and NFE from MC; 

multi-pulse online-filtering settings. . . . . . . . . . . . . . . . . . . . . . . 98 

8.17 Differences between the times of the first pulses extracted by FE and NFE 

from MC and experimental data; multi-pulse online-filtering settings. . . . 100 

8.18 Charges of the first pulses extracted by FE and NFE from MC and experimental 

data; multi-pulse online-filtering settings. . . . . . . . . . . . . . . 101 

8.19 Example waveforms for lower charges in NFE than in FE. . . . . . . . . . 102 

8.20 Differences of the total charges of the pulses extracted by FE and NFE 

from MC and experimental data; multi-pulse online-filtering settings. . . . 103 

8.21 Example of saturated waveforms in different calibrations with pulses from 

NFE and FE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 

vii

LIST OF FIGURES 

8.22 Numbers of pulses extracted by FE and NFE from MC and experimental 

data; multi-pulse online-filtering settings. . . . . . . . . . . . . . . . . . . . 106 


multi-pulse offline-processing settings. . . . . . . . . . . . . . . . . . . . . . 107 


data; multi-pulse offline-processing settings. . . . . . . . . . . . . . 108 

8.25 Numbers of pulses extracted by FE and NFE from MC and experimental 

data; multi-pulse offline-processing settings. . . . . . . . . . . . . . . . . . 109 

8.26 Charge per pulse ratio of simulated data and experimental data for FE and 

NFE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 


single-pulse settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 


data; single-pulse settings. . . . . . . . . . . . . . . . . . . . . . . . 114 

8.29 Differences of the total charges of the pulses extracted by FE and NFE 

from MC and experimental data; single-pulse settings. . . . . . . . . . . . . 115 

8.30 Time residuals of pulses extracted by SLCHitExtractor and “SLCHE” from 

MC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 

8.31 Charges of pulses extracted by SLCHitExtractor and “SLCHE” from MC 

and experimental data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 

B.1 Tagging of pulses caused by cascades during muon propagation. . . . . . . 127 

C.1 Comparison of FeatureExtractor’s documented and implemented second 

single-pulse extraction algorithms. . . . . . . . . . . . . . . . . . . . . . . . 130 

C.2 Effect of the simulation of the daq_baseline on extracted pulses. . . . . . . 131 

viii

List of Tables 

4.1 Overview over different digitizers resp. I3Waveform source types. . . . . . . 33 

7.1 Default values used for pre-evaluation algorithm “Eva”. . . . . . . . . . . . 62 

7.2 Default values used for extraction algorithm “Simple”. . . . . . . . . . . . . 62 

7.3 Parameter values of the SPE pulse parametrizations employed. . . . . . . . 66 

7.4 Default values used for the extraction algorithm “BayesUnfold”. . . . . . . 66 

7.5 Default values used for extraction algorithm “SLCHE”. . . . . . . . . . . . 67 

8.1 Runtimes for different feature extractors and datasets. . . . . . . . . . . . . 119 

ix

LIST OF TABLES 

x

CHAPTER I 

Introduction

1 INTRODUCTION 

Astronomy challenged and inspired mankind since prehistoric times. In its beginning, 

it was strongly connected to astrology and served as basis for cults and religious beliefs. 

With the introduction of agricultural techniques and the rise of great civilizations, many 

practical applications emerged independently across the globe. Astronomy enabled the 

creation of reliable calendars to forecast seasonal climate changes, provided means to 

learn about the topography of Earth, allowed navigation on open sea, and more generally 

motivated mathematical and scientific progress. 

The ancient astronomers were limited to observations that could be undertaken with 

their bare eyes. This did not change until Hans Lippershey invented the first known 

telescope in 1608. Utilizing the light-gathering power of these new instruments, famous 

astronomers like Galileo Galilei and Johannes Kepler soon made new discoveries and laid 

the foundation for modern astronomy and astrophysics. During the following centuries 

constant improvements to the instruments were made, but it was until the 19th century 

that fundamentally new techniques were devised: Photography allowed precise long-time 

observations, spectrography made it possible to analyze the composition and radial velocity 

of distant objects. While especially the latter provided valuable new observables to 

astrophysics, researchers were still depending on visible light. This limitation was eliminated 

in the middle of the 20th century with the establishment of radio astronomy. In 

the following decades, many parts of the electromagnetic spectrum were discovered to be 

valuable means for the exploration of the visible universe, such as far and near infrared, 

ultraviolet, X-rays, and gamma rays. 

In parallel to this late development, Theodor Wulf, Domenico Pacini and Victor Hess 

found the first strong evidences for extra-terrestial particles entering the Earth’s atmosphere, 

the so-called cosmic rays. In the following decades, many experiments were conducted 

to analyze the underlying mechanism, the composition and origins of these cosmic 

rays, and a new field of physics was born: particle physics. Today, most particle physics experimentalists 

work at accelerator experiments which allow high-precision measurements 

in controlled environments and extremely high luminosities. However, cosmic rays remain 

to be an interesting subject for both astrophysics and particle physics as they are both 

astronomical information carriers complementary to electromagnetic radiation and reach 

energies several orders of magnitude higher than those currently produced at particle 

accelerators. 

The advancement of detectors for the different branches of multi-messenger astronomy 

and astroparticle physics has proven to be an important aspect of the efforts made 

to understand the structure of matter and the Universe. This thesis is written in the 

hope to be a small contribution to this endeavor by designing, implementing and finally 

testing a new feature extractor for the IceCube Neutrino Observatory – the NewFeatureExtractor, 

NFE. NFE is a package which analyses the recorded photomultiplier signals and 

extracts the physics information; this is the number and arrival time of photons hitting 

the photomultipliers. 

2

Chapter 2 gives a short motivation and explains the basics of neutrino astrophysics 

with its advantages and challenges. Chapter 3 goes more into detail and explains the 

neutrinos’ interactions with baryonic matter, the propagation of the resulting particles 

through matter and how these properties can be exploited for neutrino detection. Chapter 

4 describes the IceCube Neutrino Observatory, with emphasis on its hardware and the 

data acquisition. Chapter 5’s topic is feature extraction in IceCube. The three most 

important existing feature extractor packages are introduced, and the design decisions for 

NewFeatureExtractor are discussed. Chapter 6 illustrates the different algorithms used 

by NFE; the corresponding performance optimization analyses are explained in chapter 

7. Performance tests are presented in chapter 8. Finally chapter 9 gives a summary and 

provides an outlook on future developments and tasks. 

3

1 INTRODUCTION 

4

CHAPTER II 

Neutrino Astrophysics

2 NEUTRINO ASTROPHYSICS 

2.1 Neutrinos in Comparison with Other Messenger Particles 

Neutrino astronomy is one of the modern branches of observational astronomy. Most of 

the early neutrino detectors were originally designed for astrophysics (e.g., the Homestake 

experiment for measuring the solar neutrino flux) or particle physics (like KamiokaNDE 

for the investigation of proton decay). Supernova SN1987A proved neutrino telescopes 

to be valuable for both observational astronomy and astroparticle physics. Among other 

applications, extra-terrestial neutrinos can be used to test astrophysical models and to 

constrain values for neutrino properties which cannot be accessed directly by accelerator 

experiments. 

A big challenge is the very small cross-section of neutrino interactions (see section 3.1) 

combined with a strong background of atmospheric particles caused by air showers (see 

section 2.3). However, this small cross-section is also a major advantage of neutrino 

astronomy as it allows to probe astrophysical regions obstructed by either foreground 

objects or by the object itself. Furthermore, because of different production processes 

neutrinos carry information that is complementary to that obtainable by other types of 

telescopes, making neutrinos a valuable part of multi-messenger astronomy. 

2.1.1 Photons 

The electromagnetic spectrum offers much diversity in the obtainable data. The wavelength 

λ is coupled with the photon’s energy E by 

λ = c ν = hc 

E 

1.2398 eV m 

≈ , 

E 

where ν denotes the frequency of the wave; because of this, photons of different energies 

interact with different objects. While for example visible light is absorbed by interstellar 

dust, the dust grains reemit this light brightly in infrared frequency bands. The HI line – 

also known as 21 centimeter line – is the spectral line corresponding to the hyperfine level 

transitions of atomic hydrogen and can be observed by microwave telescopes. Because of 

its small width its doppler shifts can be used to calculate velocity maps of the interstellar 

medium. From roughly 200 keV upwards, gamma-ray observatories monitor the sky for 

high-energy objects and events such as supernovae remnants, gamma-ray bursts or active 

galactic nuclei (AGNs). 

One major disadvantage of photons is their high scattering and absorbtion probability. 

In many wavebands, parts of the sky are obstructed by interstellar matter, especially by 

formations known as dark nebulas such as the Great Rift, or by the Galactic Center. 

Furthermore, many parts of the spectrum are absorbed by the Earth’s atmosphere, raising 

the need for extra-terrestial detectors, which are however limited in mass and thereby 

not suitable for ultra-high-energy surveys as the low flux at these energies requires large 

effective areas. 

6


2.1.2 Cosmic Rays 

Hadronic cosmic rays are massive particles originating from outer space. Their composition 

is energy dependent; at low energy about 90% of the particles are protons, 9% are 

helium nuclei and the rest consists of heavier nuclei, neutrons and anti-protons (see figure 

2.1).[3] Cosmic Rays also have leptonic components; e.g., about one additional percent 

consists of electrons. Depending on the definition used, neutrinos and gamma rays are 

also included. For the rest of this section, ”cosmic rays“ is used synonymously for charged 

hadronic cosmic rays unless stated otherwise. 

Cosmic rays can be divided into two categories: Primary cosmic rays are directly emitted 

from astronomical objects such as stars, supernovae or AGNs, while secondary cosmic 

rays emerge from collisions between primary ones and interstellar matter.Some elements 

such as Li, Be, B, F, Sc, and V (which are rare in stars because no stable isotopes are 

generated during stellar nucleosynthesis) as well as anti-protons are unlikely to be primary 

cosmic ray particles. However, during propagation the composition changes because of 

nuclear interactions. Thus, the abundance of the aforementioned particles can be used 

to differentiate between the two categories and thereby to learn about the interstellar 

medium. 

The energy spectrum of cosmic rays above 10 GeV follows an at least twofold broken 

power law: 

dN 

dE ∝ E−α , 

with different spectral indices α at different energy regions. Between about 10 GeV and 

the so-called knee at about 4·10 6 GeV, the spectral index is 2.68 ± 0.02[4], between the 

knee and the disputable second knee near 3·10 8 GeV it is 3.02 ± 0.03[5], from there up to 

the ankle at 5·10 9 GeV its value is 3.16 ± 0.08[5] and at even higher energies it flattens 

to 2.81 ± 0.03[6]. Above the GZK cutoff energy of 5·10 10 GeV the spectrum is assumed 

to steepen dramatically because of interactions of the cosmic rays with cosmic microwave 

background photons, 

p + γ → ∆ + → p + π 0 and 

p + γ → ∆ + → n + π + . 

The nucleons emerge with about 80% of the initial proton energy, the effective mean free 

path is of the order of 13 Mpc.[7] Recent HiRes and Auger results show strong evidence 

for a cutoff, however it is unresolved if this is caused by the GZK effect.[6] 

Low energy cosmic rays up to about 10 GeV are modified by solar flares. The origin 

of high-energy cosmic rays is unknown. Top-down models propose them to be decay 

products of topological defects formed in the very early universe; however, these models 

7


Figure 2.1: Differential particle flux for the major cosmic ray components.[1] Shown are the 

ten elements which have the highest abundance in the solar system.[2] 

8


10 5 

Knee 

E 2.7 F(E) [GeV 1.7 m −2 s −1 sr −1 ] 

10 4 

Grigorov 

JACEE 

MGU 

TienShan 

Tibet07 

Akeno 

CASA/MIA 

Hegra 

Flys Eye 

Agasa 

HiRes1 

HiRes2 

Auger SD 

Auger hybrid 

Kascade 

10 3 10 14 10 15 

2nd Knee 

Ankle 

10 13 10 16 10 17 10 18 10 19 10 20 

E [eV] 

Figure 2.2: All-particle cosmic ray spectrum at high energies from air shower measurements, 

weighted with E 2.7 . The shaded area was probed by direct measurements.[1] 

are disfavored because they predict a photon flux that has already been ruled out by 

EGRET and Auger.[8] Bottom-up models on the other hand explain high-energy cosmic 

rays by acceleration of lower energy ones, e.g., by processes called Fermi acceleration. 

Second order Fermi acceleration relies on the random elastic scattering of charged particles 

at moving magnetic inhomogenities such as sparse plasma clouds, see figure 2.3. If the 

particle enters the inhomogenity head-on and is reflected, its energy increases. Likewise it 

loses energy if it is reflected at a receding inhomogenity, but assuming a random velocity 

distribution this is less likely to happen. The resulting relative energy gain per reflection 

as a function of the entry angle θ 1 , the exit angle θ 2 and the cloud’s speed v c ≡ cβ c for 

relativistic particles (β p ≈ 1) is 

ξ := ∆E 

E = 1 − β c cos(θ 1 ) + β c cos(θ 2 ) − βc 2 cos(θ 1 ) cos(θ 2 ) 

− 1. (2.1) 

1 − βc 

2 

Averaging equation (2.1) over θ 1 and θ 2 leads to 

ξ = 1 + 1 3 β2 c 

1 − β 2 c 

− 1 = 4 3 β2 c + O(β 4 c ). (2.2) 

The speed of galactic plasma clouds is only of the order of β c = 10 −4 , therefore second 

order Fermi acceleration is too ineffective to explain the highest energy cosmic rays.[3] 

This does not apply to first order Fermi acceleration: Instead of considering plasma clouds, 

first order Fermi acceleration takes place at large and sufficiently plain shock fronts such as 

9


E + DE 

E + DE 

v g 

v s 

q 2 

q 1 

v c 

q 2 

q1 

plasma 

cloud 

E 

E 

upstream 

(unshocked) 

downstream 

(shocked) 

Figure 2.3: Left: Second order Fermi acceleration at magnetic plasma clouds. 

Right: First order Fermi acceleration at shock fronts. 

those originating from supernovae (see figure 2.3). If a shock front traveling at β s through 

resting unshocked plasma (upstream) accelerates the plasma to β g < β s (downstream), 

equation (2.1) holds for particles passing the shock front from the upstream side as their 

average speeds adapt to the average speed of the shocked plasma after enough elastic 

scattering processes, with β c := β g . As the particles are now on average comoving with 

the shocked plasma, they will gain the same amount of energy by a second crossing of the 

shock front, with the unshocked plasma approaching at β g in the shocked plasma’s rest 

frame. Averaging over the angles yields 

ξ = 1 + 4 3 β g + 4 9 β2 g 

1 − β 2 g 

− 1 = 4 3 β g + O(β 2 g). (2.3) 

First order Fermi acceleration is substantially more efficient than the second order variant 

not only because of ξ being proportional to the first order of β g but also because of larger 

average speeds (β g ≈ 10 −1 ) and higher repetition rates – the same shock front can be 

passed many times. 

The escape probability can be estimated as P esc = 4(β s − β g ).[3] Using this, the 

expected spectral index for non-relativistic shocks at the source is 

α = P esc 

ξ 

+ 1 ≈ 3(β s − β g ) 

β g 

+ 1. (2.4) 

For strong shocks in a monatomic gas or plasma this evaluates to α ≈ 2.[3] For highly 

relativistic shocks, the spectral index approaches 2.3.[9] The observed cosmic ray spectrum 

with α ≈ 2.7 can be understood with (first order) Fermi acceleration if the effect of the 

propagation is considered: Certain propagation models such as the leaky box model predict 

a softening of the spectrum because higher energy particles are more likely to escape the 

galaxy in which they are accelerated. 

Cosmic rays with energies up to the knee are assumed to originate from galactic accelerators 

such as supernovae remnants, while ultra-high energies above the ankle are 

10


Figure 2.4: Ratio A of the sky with cosmic ray deflection angle δ > δ th at 4·10 10 GeV, extrapolated 

to the propagation distance d = 500 Mpc under the assumption that 

A(δ th , d) = A 0 (δ th · ( d 0 

d 

) α ). Between 70 Mpc and 110 Mpc, α equals 0.8, however it 

might drop to lower values because the magnetic field’s orientation differs between 

galaxy filaments.[10] 

attributed to extra-galactic sources which still have to be determined. The spectral softening 

in the “lower leg” energy region might be explained by propagation models or cut-off 

energies as mentioned above. The transition between galactic and extra-galactic cosmic 

rays remains unknown. 

Cosmic rays are interesting messenger particles because they carry information about 

the composition of their sources (primary cosmic rays) and the properties of interstellar 

matter (secondary cosmic rays). However, only ultra-high-energetic cosmic rays can be 

used to directionally locate the source objects: Primary cosmic rays are charged, thus 

they are subject to electro-magnetic deflection. A particle’s gyroradius is given by 

r = p ⊥ 

qB ≈ 3.3 m E e T 

GeV q B 

(2.5) 

with p ⊥ being the particle’s momentum perpendicular to the magnetic field ⃗ B. For most 

parts of the sky, the average deflection angle for particles near the GZK cutoff does not 

exceed a few degrees over at least 500 Mpc, see figure 2.4. 

11


2.1.3 Gravitational Waves 

The carrier of the gravitational force called graviton has not yet been directly detected 

and might never be due to fundamental difficulties such as extremely small cross-sections, 

very low fluxes and a very strong neutrino background.[11] Still, gravitation can be used 

for astronomical means by searching for gravitational waves, which are distortions of the 

spacetime traveling at the speed of light. They can only be emitted by sources with timedependent 

quadrupole or higher-order moments in their stress-energy tensor; candidates 

are closely rotating or colliding objects such as neutron stars or black holes. 

Up to September 2009 no gravitational waves have been identified, but competitive 

upper limits have been published by various earth-bound detectors.[12] Space-based detectors 

such as LISA are planned to extend the observed frequency range within the next few 

decades. Compared to neutrino observatories, they are more limited concerning the possible 

source objects, but they can deliver complementary information and provide tests for 

general relativity and possibly other models of gravity. Both gravitational waves and neutrinos 

are able to propagate through regions that are not transparent to electro-magnetic 

radiation. 

2.2 Neutrino Production Processes 

2.2.1 Nuclear Processes 

Stars in their nuclear fusion phase continously produce large numbers of neutrinos. For 

stars with masses from 2 up to at least 20 solar masses (M ⊙ ), the total number of neutrinos 

produced by fusion until the end of the helium burning phase in the core of the star is 

approximately proportional to the star’s initial mass (figure 2.5). Later fusion phases 

that can occur in stars heavier than 4 solar masses contribute only about one additional 

tenth to this number as none of the main processes besides the silicon burning require 

proton-neutron-conversions or similar weak interactions.[14] 

Some neutrinos originate from processes with well-defined initial states and two-body final 

states such as electron capture. These neutrinos are emitted at discrete energies, while 

others are produced with many-body final states (e.g. by beta decay) and have continuous 

energy spectra. Independent of this, their highest energies are of the order of some MeV 

because this is the typical energy scale of nuclear processes. 

2.2.2 Thermal Cooling 

In addition to the neutrinos directly produced by nuclear fusion, the extreme temperatures 

and densities inside stars allow the emmitance of thermal or cooling neutrinos by vari- 

12

{1{ 

2.2 Neutrino Production Processes 

10 58 

pp chain 

CNO cycle 

10 57 

Neutrinos 

10 56 

10 55 

− − ignition of He burning 

____ end of He burning 

10 54 

1 10 

M/Mo 

Figure 2.5: Total number of neutrinos produced until the ignition (dashed line) and until the 

end (solid line) of the helium burning phase in population I stars as a function of 

the initial star mass in solar masses.[13] 

ous processes.[15] The resulting neutrinos’ energies directly depend on the temperature. 

Silicon burning is the last and hottest fusion phase and takes place at about 3.5·10 6 K; 

stars below about 4 M ⊙ never reach the last fusion phases but instead gravitationally 

collapse to white dwarfs when their fuel is burnt and heat up to well over 100·10 6 K.[16] 

Proto-neutron stars’ core temperatures can approach 1·10 12 K for a couple of seconds, 

quickly cooling down to below 100 GK. Using this information and the Boltzmann constant 

k B = 86.17 eV/MK, the cooling neutrinos’ energies can be estimated to almost never 

exceed several MeV. 

2.2.3 Cosmic Ray Interactions 

The most probable source of high-energy (> 1GeV) neutrinos are interactions of cosmic 

rays. Various reaction channels exist, among the most important are those with charged 

pion production 

p + p → X + π + 

p + p → n + ∆ ++ → n + p + π + 

p + p → p + ∆ + → p + n + π + 

p + n → p + ∆ 0 → p + p + π − 

p + γ → ∆ + → n + π + 13


and for higher energies those with kaon production. 

The resulting mesons themselves decay quickly by 

K + → µ + + ν µ (63.55%) 

K + → π + + π 0 (20.66%) 

K + → π + + π + + π − (5.59%) 

K + → π 0 + e + + ν e (5.07%) 

K 0 S → π + + π − (69.20%) 

K 0 L → π ± + e ∓ + ( ν ) e (40.55%) 

K 0 L → π ± + µ ∓ + ( ν ) µ (27.04%) 

K 0 L → π + + π − + π 0 (12.54%) 

π + → µ + + ν µ (99.99%). 

Only decay modes with likely neutrino output and branching ratio ≥ 5% have been listed, 

modes for negatively charged particles exist correspondingly.[17] The neutrino production 

efficiency is lower in regions of high density because the particles might interact with other 

particles before they decay. 

The decay of charged pions to electrons is highly suppressed in favor of the decay to 

muons because the latters’ helicity can be changed by boosts more easily. The muons 

almost exclusively decay into electrons by µ + → e + + ν e + ν µ . Tau neutrino production 

is dominated by the decay of D s 

± mesons, which themselves have a low production crosssection. 

Therefore, the ratios between the expected neutrino flavors near the interaction 

point can be roughly estimated to be ( ( ν ) e : ( ν ) µ : ( ν ) τ) = (1 : 2 : 0). 

2.3 Air Showers 

High energy cosmic rays and gamma rays which enter the Earth’s atmosphere generate 

a large amount of particles in form of atmospheric cascades called air showers, see figure 

2.6. Particularly important are atmospheric muons and atmospheric neutrinos. Their 

understanding is vital for all neutrino telescopes; they pose both signal for some analyses 

and the major background source for the others. 

The reactions discussed in section 2.2.3 can also occur in the upper atmosphere. The 

energy spectrum for atmospheric muons and neutrinos follows a power law of E −3.7 at 

lower energies; the reason is that a meson’s probability to interact with the atmosphere 

instead of decaying increases with the meson’s lifetime τ = γτ 0 ∝ E. At high energies a 

prompt decay component from charmed mesons supervenes, hardening the spectrum to 

the primary cosmic ray spectrum of E −2.7 (figure 2.7).[19] 

14

339298621 5634138942 

01234 563416249522.3 Air Showers 

Figure 2.6: Two extraterrestial muon neutrinos (dashed lines, entering from the left) and two 

cosmic rays hitting the Earth. One neutrino passes undetectable, the other interacts 

inside the detector. The upper air shower produces an atmospheric neutrino 

which is detected and an atmospheric muon which is absorbed by the Earth, the 

lower air shower’s atmospheric muon is detected. 

± 

pand K decay 

± 

pand K decay 

secondary decay 

prompt charmed 

meson decay 

prompt charmed 

meson decay 

secondary decay 

Figure 2.7: Vertical atmospheric muon (left) and muon neutrino (right) differential fluxs 

(including anti-particles) weighted with E −3 . Dashed lines indicate alternative 

models.[18] 

15


2.3.1 Atmospheric Muons 

Events Passing 

10 8 40 

10 7 

35 

10 6 

30 

10 5 

25 

10 4 

20 

15 

10 3 

10 

10 2 

5 

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 -1 -0.9 -0 

Despite their short mean lifetime of about 

2.2 µs, high-energetic muons produced in 

the upper atmosphere at a height of typically 

20 km can reach the Earth’s surface 

because of the time dilation or length contraction 

(depending on the frame of reference) 

explained by special relativity. With 

their high penetration potential which is 

further discussed in section 3.2.2 they are 

a major component of the background for 

neutrino telescopes. This background can 

be significantly reduced by “looking downwards”, 

i. e. zenith angle cuts, as muons – 

cos(θ) 

in contrast to neutrinos – are not able to 

pass more than a few kilometers through Figure 2.8: Zenith distribution of atmospheric 

ÙÖ ÄØ Ì muons ÞÒØ (solid) ÒÐ and×ØÖÙØÓÒ atmospheric Ó ÅÆ ØÖ 

the Earth; see figure 2.8. 

ÖÓÑ ÓÛÒÓÒ Ó×Ñ muon neutrinos ÖÝ ÑÙÓÒ×º (dashed), Ì × simulated 

ÒÐfor ×ØÖÙØÓÒ IceCube’s Óprecursor 

ÙÔÛÖ ÖÓÒ×ØÖÙØ 

ÐÒ ×ÓÛ× ØÖ 

ÊØ Ì ÞÒØ 

2.3.2 Atmospheric Neutrinos 

ÒØ× Ø ×ØØ×ØÐ AMANDA.[20] ÔÖ×ÓÒ Ó Ø ØÑÓ×ÔÖ ÒÙØÖÒÓ ×Ñ 

For most analyses, atmospheric neutrinos are background. Except for their energy spectrum 

which differs from the expected signal spectrum, they are fundamentally indistinguishable 

from neutrinos of extraterrestial origin. However, down-going atmospheric 

neutrinos can be tagged by searching for coincident muons in the detector.[21] 

10 4 Data 

Atmospheric MC 

10 3 

Downgoing MC 

10 2 

10 

1 

0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 

Analyses which involve atmospheric neutrinos as signal include neutrino oscillation 

studies. For those, atmospheric neutrinos are of special interest because of their long 

baselines (up to the diameter of Earth) compared to reactor neutrinos or artificial neutrino 

beams from particle accelerators, making these measurements complementary. 

10 -1 

Quality Cut 

Data / MC 

6 

5 

4 

3 

2 

1 

0 

0 2.5 

ÙÖ ÚÒØ ÕÙÐØÝº ÄØ È××Ò ÖØ× Ó ÚÒØ× ÓÚ 

ÖÓÙÒ Å¸ ØÑÓ×ÔÖ ÒÙØÖÒÓ Å Ò ÜÔÖÑÒØÐ Øº 

ÖØÓ Ø»Åº 

16 

¿¼

CHAPTER III 

Neutrino Detection

3 NEUTRINO DETECTION 

Figure 3.1: Cross-sections for scattering of neutrinos (solid) and antineutrinos (dashed) from 

isoscalar nucleons 1 at energies between 10 3 GeV and 10 12 GeV. The results are 

based on HERA particle distribution function measurements and have been extrapolated 

above 10 7 GeV (CTEQ4-DIS).[22][23] 

3.1 Neutrino Interactions 

Neutrinos are electrically neutral leptons with spin 1 . They only interact via weak and 

2 

gravitational forces and come in three weak eigenstates called flavors, named after the 

charged lepton with which the respective neutrino eigenstate can interact by a charged 

current interaction. 

According to the Standard Model, neutrinos are massless particles; however it is widely accepted 

that this is not accurate since many experiments verified the existence of neutrino 

oscillations, which can occur because the neutrino mass eigenstates are pairwise different 

and not congruent with the weak eigenstates. 

Because of their low cross-sections, neutrinos can travel over huge distances through ordinary 

matter almost without attenuation up to very high energies (figure 3.1). Not until 

about 10 6 GeV, the Earth’s core starts to become opaque to neutrinos. 

The relevant processes involving neutrinos are neutral current (NC) and charged current 

(CC) interactions. 

Neutral current interactions – i.e., interactions which are mediated by a Z 0 boson – are 

1 The particle distribution function (PDF) of an isoscalar nucleon (an hadronic single state with isospin 

I = I 3 = 0) is the arithmetic mean between those of a proton (I 3 = + 1 2 ) and a neutron (I 3 = − 1 2 

), or 

half the PDF of a deuteron, which is the true isospin singlet. As we can neglect electrical charges, it is a 

good approximation of the average nucleon PDF. 

18

3.2 Lepton Propagation 

the same for the three flavors and constitute about 10% of the charged current crosssections.[22] 

The most important NC interactions are those with the nucleons N because 

they have much larger cross-sections than interactions with the matter’s electrons: 

ν a + N → ν a + X, a ∈ {e, µ, τ} 

X is the hadronic rest of the nucleon which will trigger a hadronic cascade. The neutrino 

escapes with a large fraction of its initial energy. 

Charged current interactions are mediated by W ± bosons and differ for the different flavors 

mainly in the produced leptons a ± . As above, interactions with the nucleons make up the 

largest part: 

ν a + N → a − + X 

ν a + N → a + + X 

The initial direction of the lepton is well aligned with the neutrino’s track; the average 

angle mismatch can be estimated as ψ = 0.7° · ( ) 

10 3 GeV 0.7.[24] 

E ν 

For both NC and CC up to about 10 7 GeV, the cross-sections for antineutrinos are below 

their neutrino counterparts because of the valence quark distribution. This effect becomes 

negligible for higher energies where the PDFs are dominated by sea quarks (and gluons, 

which, however, do not interact via the weak force). 

Gravitational effects can largely be ignored. Highly relativistic neutrinos independent 

of their mass are subject to gravitational lensing just as photons; in fact, they can even 

be lensed stronger by certain objects as they are able to pass most electromagnetically 

opaque regions without absorption.[25]. However, in most cases the angle of deflection is 

very small compared to the angular resolution of current neutrino telescopes. 

The probability for a detector signal being caused by direct gravitational interactions is 

negligible in the Standard Model but increases under assumptions such as the existence 

of compact extra dimensions. In the latter case, one could expect an increased ratio of 

elastic scattering (at impact parameters b larger than the Schwarzschild radius r S ) and 

cascades without primary lepton through black-hole creation (at b < r S ).[26] 


The charged lepton produced in charged current interactions is the most important signature 

for most types of neutrino detectors. Common to all flavors is the hadronic cascade at 

the interaction point, which contains roughly 40% of the neutrino energy. The signatures 

of the primary leptons inside the detector vary for the different flavors (see figure 3.2). 

19


hadronic 

cascade 

electromagnetic 

cascade 

Èerenkov radiation 

 

e 

 

Figure 3.2: Sketch illustrating the signatures of charged current interactions of neutrinos in 

Čerenkov media as described in sections 3.2 and 3.3. Note that every hadronic 

cascade also has an electromagnetic component. 

Figure 3.3: Depth development of electron initiated cascades in numbers of particles vs. depth 

in units of radiation lengths in ice. Shown are simulated curves for the sum of 

electrons and positrons (blue), the excess of electrons (red), and theory predictions 

(dashed, Approximation-B) at 1, 10, and 100 TeV (down to up).[27] 

20


Figure 3.4: Effective radiation length λ ∼ = X LPM modified by the LPM effect.[28] 

3.2.1 Electron Propagation 

Electrons in dense matter quickly lose energy through bremsstrahlung. The generated 

photons convert to new electrons and positrons via pair production, resulting in an exponential 

growth of the particle number until the photons’ energies fall below 2m e and the 

leptons’ energy falls below the threshold at which ionization losses become dominating, 

called critical energy. For ice, this cricital energy is E e crit = 81 MeV.[29] Since the photon 

and electron free mean paths are of the same order of magnitude, the shower depth X at 

a given initial energy E 0 up to about 1 PeV can be approximated by 

X = X 0 log 2 

( E0 

E crit 

) 

with the radiation length X 0 (36.08 g/cm 2 in ice). As the cascade is highly boosted 

and therefore directed, its length can be estimated to be L = 2Xρ, where ρ is the 

density of the material. ( For deep Antarctic ice (ρ = 0.924 g/cm 3 ), this evaluates to 

E e 

) 

L = 0.67 m log 0 

2 81 MeV , which is of the order of 10 m. 

Starting at about 1 PeV the radiation length X 0 has to be replaced by the effective 

radiation length X LPM to take account for the LPM effect: The formation length for 

bremsstrahlung and pair production interactions increases with the particle’s energy, resulting 

in the suppression of these interactions and therefore in significantly longer cascades 

(about 200 m at 10 EeV, see figure 3.4).[28] 

3.2.2 Muon Propagation 

Because of the muon’s higher mass m µ = 207m e , its energy loss due to bremsstrahlung is 

much smaller. Up to the critical energy (E µ crit ≈ 500 GeV in ice) ionization losses are most 

important, at higher energies electron pair production and radiation losses dominate. The 

21


expected track length l/km 

50 

40 

30 

20 

10 

muon 

tau 

0 

10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 

initial energy E 0 /GeV 

Figure 3.5: Expected track length for muons and tau particles in Antarctic ice according to 

the models explained in section 3.2.1 and section 3.2.2. 

total energy loss is approximately given by 

− dE 

dx 

= a + bE, (3.1) 

where a(E) is the ionization loss given by the Bethe-Bloch equation and b(E)E a parametrization 

for the higher-energy processes; both parameters are fairly constant for 

E > E µ crit.[30] For ice, this yields approximately 

− dE 

dx = 0.25 GeV/m + 4.3·10−4 E/m, [31] 

leading to an expected track length of L = 2300 m · ln ( ) 

1 + E 0 

E (figure 3.5), which is 

µ 

crit 

small compared to the length a muon can travel in its mean decay time at its critical 

energy, l µ ≈ 3 Mm. Since the energy losses are stochastical, actual values vary, and often 

much energy is lost at once, triggering a cascade along the muon track. The scattering of 

high-energy muons is negligible.[31] 

3.2.3 Tau Propagation 

Much heavier than muons (m τ = 16.8m µ ), tau particles lose even less energy when propagating 

through matter. Equation (3.1) holds for them as well, with a being very similar 

and b ≈ 3.6·10 −5 /m.[31] Because of the tau’s short mean lifetime τ τ = 0.29 ps, their detector 

signature is determined by the hadronic cascade at the neutrino interaction point 

and the decay of the tau which usually triggers a second cascade. The branching ratio for 

τ − → µ − + ν µ + ν τ is 17.4%, the remaining decays spawn at least one electron or charged 

22

3.3 Čerenkov Detectors 

C 

Figure 3.6: Sketch illustrating the emission of Čerenkov radiation (blue) by a superluminal 

particle (track in red). 

meson.[17] The interaction points are seperated by the tau decay length l τ = 49 m · E0 

PeV 

(figure 3.5). Up to a few PeV, the resulting cascades overlap and are indistinguishable. 

For higher energies, both cascades are seperated and connected by a faint Čerenkov track, 

forming a characteristic signature called double bang (illustrated in figure 3.2). If one of 

these cascades lies outside the detector volume, the resulting signature is called lollipop 

(resp. inverted lollipop). 

3.3 Čerenkov Detectors 

One of the most effective methods to detect neutrinos is the use of Čerenkov radiation. 

Čerenkov radiation is emitted whenever a charged particle travels through dielectric matter 

(the Čerenkov medium) at a speed v = βc higher than the material’s phase speed of 

light c m = c , where n denotes the material’s refractive index (n = 1.32 for ice). The 

n 

particle travelling through the medium polarizes the surrounding atoms, which emit light 

when they return into equilibrium. If the speed of the particle exceeds c m , the interferences 

become constructive and a wave front of light is sent out at a distinctive angle 

θ C = arccos (n −1 β −1 ) called Čerenkov angle (see figure 3.6). The light’s spectrum is continous 

and its intensity is approximately linear in frequency, resulting in a blue appearance 

to the human eye. At x-ray frequencies the spectrum is cut-off as the refraction index 

becomes smaller than one. 

The rate of emitted photons can be calculated with the simplified Frank-Tamm formula: 

∫ 

dN 

dx = λmax 

2παz 2 ( 

1 − 1 ) 

dλ, (3.2) 

λ min λ 2 n 2 β 2 

where α ≈ 1 is the fine-structure constant and z the particle’s electric charge. The 

137 

wavelength range given by λ min . . . λ max is determined by the optical properties of the 

23


Čerenkov medium. 

Čerenkov media have to be transparent for blue to ultraviolet light. Additionally, for 

neutrino detection they need to have reasonably high density and be available cheaply in 

large quantities. Air is not sufficiently dense, so neutrino interactions in air are rather 

unlikely; it can still be used for air shower detectors. H 2 O, both liquid or solid, is well 

suited for large detectors. Deep and clear lake or sea water generally involves less scattering 

(up to a factor of 10[32]) but higher absorption (about factor 2) than bubble free ice. 

Additional challenges for water neutrino telescopes are the variable environment and a 

strong background of bioluminescence from various lifeforms as well as radioactive decays 

of unstable isotopes such as 40 K.[33][34] The optical properties of deep Antarctic ice will 

be examined in section 4.2. 

24

CHAPTER IV 

The IceCube Neutrino Observatory

4 THE ICECUBE NEUTRINO OBSERVATORY 

4.1 Layout 

The IceCube Neutrino Observatory is a combined air shower and neutrino detector located 

at the Amundsen-Scott South Pole Station at the geographic South Pole. It consists of 

three main parts, InIce, DeepCore, and IceTop (figure 4.2): 

InIce is an underground Čerenkov detector which uses about one cubic kilometer of 

Antarctic ice as neutrino interaction and Čerenkov medium, making it the World’s largest 

neutrino detector at the time of writing. In its final configuration, it will consist of 80 

long cables called strings with 60 digital optical modules (DOMs) each. The strings are 

lowered into hot-water-drilled holes in the ice sheet, with the lowest DOMs positioned 

around 2450 meters below the surface and a vertical spacing of about 17 m, adding up 

to 1000 m of instrumented length. After deployment of each string, the water inside the 

hole refreezes, making recovery and maintenance impossible but also optically coupling 

the DOM to the surrounding ice. The horizontal spacing between the strings is approximately 

125 m in a hexagonal pattern, spanning approximately one square kilometer across 

(figure 4.1). Because of this geometry and the ice properties elaborated in section 4.2, 

InIce’s lower energy threshold is about 100 GeV. 

Figure 4.1: IceCube surface geometry with distances between the strings; the lighter a circle’s 

color, the earlier the string has been deployed. Black circles are strings scheduled 

for deployment in summer 2010/2011, small dots are IceTop tanks.[35] 

26

4.1 Layout 

Figure 4.2: Three-dimensional sketch of IceCube, with DeepCore marked in green. Also shown 

are the dust layer at (2000 . . . 2100) m depth (see section 4.2), the ground below 

the ice sheet at about 2850 m depth, and an image of the Eiffel Tower for size 

comparison.[36] 

27


DeepCore is a low-energy extension to this detector. Located at its lower center, it consists 

of 6 extra strings with 60 high-efficiency DOMs each (section 4.3). In contrast to the 

standard strings, they are more densly instrumented in two groups of 10 and 50 DOMs 

each with an inner-group spacing of 10 m and 7 m respectively, and a gap of 257 m between 

the two groups to take advantage of the clear ice in these regions (section 4.2). These 

strings have been deployed in a half-sized hexagon centered at standard string 36. The 

two standard strings 79 and 80 have not been deployed at their original positions and have 

been proposed to be installed inside the DeepCore volume to make it even more densly 

instrumented. The energy threshold of the combined InIce+DeepCore detector will be 

around 10 GeV. In its task as low-energy extension, DeepCore supersedes IceCube’s now 

decomissioned precursor, the Antarctic Muon And Neutrino Detector Array (AMANDA), 

which took data from 1994 to 2009 and which was integrated into IceCube in 2005. 

IceTop is an air shower Čerenkov detector which in its final configuration will comprise 

of 160 polyethylene tanks at the ice sheet’s surface which encompass each a volume of 

2.5 m 3 of bubble-free ice observed by two DOMs. It will be capable of reliably detecting 

air showers with a primary particle energy threshold of about 150 TeV.[37] Besides conducting 

interesting experiments by itself, it is planned to use it for background rejection 

in neutrino analyses. 

For this thesis InIce and DeepCore form the relevant part of IceCube as IceTop uses 

its own feature extraction algorithms. Therefore “InIce+DeepCore” and “IceCube” will 

be used interchangeably in the following chapters if not stated otherwise. 

In the Antartic winter, temperatures regularly fall below −65 °C, rendering impossible 

the operation of most machines. Because of this, IceCube had different string configurations 

for data-taking during the months in which deployment had to be suspended. 

Construction of Icecube began with one experimental string during 2005, followed by eight 

additional ones in the 2005-2006 Antarctic summer. This configuration – following the 

same naming scheme as the later ones – was called IC9. Together with the thirteen strings 

deployed during the summer of 2006-2007, it formed IC22. During the next summer eighteen 

strings were added, resulting in the detector configuration IC40. In the summer 

between 2008 and 2009 eighteen more standard strings and one DeepCore string were 

deployed (IC58+1 or IC59), as well as twenty during the 2009-2010 deployment season 

(IC73+6 or IC79). With DeepCore already completed, the final detector will be fully 

available in 2011. 

4.2 Ice Properties 

IceCube’s and especially DeepCore’s geometry are heavily influenced by the properties 

of the Antarctic ice. The experiment’s location was chosen because of the deepness and 

purity of the naturally available ice and because of the good infrastructure provided by 

28

4.2 Ice Properties 

Figure 4.3: Inverse effective scattering length (left) and inverse absorption length (right) vs 

depth in deep Antarctic ice measured with pulsed sources at different wavelengths. 

A, B, C and D denote major dust peaks. Values for deep ice (below D) might be 

overestimated, see section 4.2.[39] 

the Amundsen-Scott South Pole Station.[38] The two most important optical properties 

are the effective scattering length λ e and the absorption length λ a . 

The scattering length λ s for IceCube is defined as the mean free path between scattering 

processes. The average angle between the unscattered path and the scattered path is 

〈cos(θ)〉 ≈ 0.94 ≫ 0. Because of this, the effective scattering length is defined as 

λ e := λ s 

n ∑ 

i=0 

〈cos(θ)〉 i 

n→∞ 

−−−−→ 

λ s 

1 − 〈cos(θ)〉 . 

This is an estimate for the length after which a beam of light has become fairly isotropic.[39] 

The scattering inside the upper ice layers is dominated by microscopic trapped air bubbles. 

Because of the enormous pressure at high depths, the bubbles are compressed steadily until 

at about 1400 m scattering at dust particles begins to prevail (figure 4.3). Below, there 

are regions of higher dust concentrations which have been designated as dust peaks A to 

D. The same structures have been identified by ice core measurements in other regions 

and correlate well with cold periods (stadials) during the current interglacial period of 

the current ice age. The strongest of those peaks – peak D, also known as the dust layer 

– is approximately 65000 years old and features very short effective scattering lengths of 

less than 10 m; because of this, DeepCore was designed with DOMs above and below this 

region, but without DOMs within. 

The absorption length – defined as the length after which the light intensity has 

dropped by a factor of e −1 – is significantly higher than the effective scattering length 

(about 100 m compared to about 30 m, see figure 4.3). At green wavelengths the main 

contribution is the absorption by the ice itself; at shorter wavelenghts dust particles 

dominate and the dust peaks can be identified accordingly. 

29


The deep ice starting at 2100 m below the dust layer has not yet been studied as well 

as above because the ice properties were calculated mostly based on data measured with 

AMANDA. Only three AMANDA strings reached below the dust layer, the deepest being 

string 11 with a maximum depth of 2136 m. As well-defined light signals are needed for 

this type of measurement, various calibrated light sources have been used: AMANDA 

optical modules (OM s) had diffusor balls attached which were coupled with a frequency 

doubled Nd-YAG laser at the surface; two nitrogen lasers were deployed near the center 

of the detector; eight OMs were equipped with flasher boards containing six blue LEDs 

each, and one string was equipped with ultraviolet LED flasher boards for every OM. 

Additionally to these pulsed light sources, two continous sources with wider spectrum 

were deployed inside the ice. 

All IceCube DOMs contain similar flasher boards with twelve LEDs each, see section 

4.3. Recent IceCube measurements have shown that the deep ice might be clearer than 

previously thought, with λ e ≈ 50 m and λ a ≈ 200 m at blue wavelengths (405 nm). 

The resulting propagation length λ p acts as damping constant in the fluence development 

of light sources in the diffuse regime, i.e., after a few scattering processes: 

λ p := 

√ 

λe λ a 

3 

with 

F (d 5λ e ) ∝ 1 d e− d 

λp .[39] 

4.3 Digital Optical Modules 

The main task of IceCube’s DOMs is to capture Čerenkov light and to convert it into 

digitized waveforms, which are then sent to the surface. They consist of a photomultiplier 

tube (PMT), a flasher board, a mainboard containing most of the electronics, a delay 

board, and a high-voltage board, housed inside a 32.5 cm diameter glass pressure sphere. 

The photomultiplier is embedded into RTV (room temperature vulcanizing) silicone gel 

to optically couple it to the glass, and it is shielded against Earth’s magnetic field by a 

cage of mu-metal (figure 4.4). 

The flasher board lies in the DOM’s upper hemisphere with six pairs of blue (405 nm) 

LEDs mounted outwards in 60° intervals. Each pair consists of one LED on the lower side 

of the board mounted horizontally and one LED on the upper side tilted approximately 

40° upwards. As pointed out in section 4.2, the flashers can be used to determine or 

validate ice properties and to calibrate the detector. 

The high-voltage board is required to power the PMT. The PMT’s signal is decoupled 

by a toroidal transformer instead of a capacitor; the circuit’s low capacity of about 5 pF 

compared to the about 1 µF needed for a conventional capacitor coupling reduces the risk 

of discharges damaging the mainboard and delivers a more stable long-time signal quality 

as capacitors degrade over time. Furthermore, tests have proven the transformer coupling 

30

4.3 Digital Optical Modules 

Figure 4.4: Sketch of an IceCube Digital Optical Module (DOM).[40] 

to better filter out the power supply’s noises. However, as the transformer acts as high-pass 

filter, it causes droop: The signal’s voltage falls off right after the initial peak, distorting 

the waveform and making it undershoot the former baseline. This effect depends strongly 

both on the surrounding temperature T and on the toroid used. DOMs built before 

2006 – called old toroid DOMs or OT – are highly affected by droop, while new toroid 

DOMs (NT) are almost unaffected (figure 4.5). The individual toroid types are known 

and the temperatures can be measured with an on-board sensor; therefore the droop can 

be corrected for during calibration (section 4.6.1), using the dual-tau parametrization[42] 

in which the DOM’s transient response ˜δ(t) to a signal δ(t) is modelled as 

˜δ(t) = δ(t) − N 

( 

4.3 e − t 

τ 1 

− 3.3 e − t 

τ 2 

) 

, τ 1 (T ) := t 1 + 

t 2 

, τ 

1 + e − T 2 := 0.75 τ 1 . 

Tc 

The parameters N, t 1 , t 2 , and T C have to be determined empirically. While the NT 

DOMs significantly reduce the droop problems, they also possess differently shaped, wider 

waveforms (figure 4.6), which is important for feature extraction, i. e., the analyzation of 

the waveforms captured by the DOMs, see section 5. 

The delay board features a high-quality serpentine circuit that delays the throughgoing 

signal for 75 ns. Only two of the three signal paths are delayed, a third path leads 

to the trigger system to decide whether to launch this DOM, i.e., to digitalize waveform 

31


Figure 4.5: Waveforms illustrating the toroidal droop effect for OT (upper) and NT (lower) 

DOMs, showing an uncalibrated (raw) waveform on the left and the corresponding 

droop-corrected one on the right. For the OT graphs 100 waveforms have 

been averaged, the NT graphs show an individual waveform; both were taken at 

−55 °C.[41] 

and to check for local coincidence, see section 4.4.3. The trigger threshold is typically set 

to 0.25 PE (photoelectron charges). 

From the remaining two paths, one is distributed among three amplifiers (×16, ×2 and 

×0.25) which supply the analog transient waveform digitizer (ATWD, section 4.4.1) with 

data, and the other one supplies the fast analog digital converter (FADC, section 4.4.2). 

Last but not least, the PMT itself is relevant for feature extraction as it is the component 

which generates the signal. It is a Hamamatsu R7081-02 with a diameter of 25 cm 

and a peak quantum efficiency of 25% around 410 nm. It houses ten dynodes leading to 

a gain factor of about 10 7 and has a dark count rate below 400 Hz between −60 °C and 

0 °C, mostly caused by 40 K decay in the glass. The DeepCore high-efficiency DOMs differ 

in that they have an improved photocathode, leading to a higher quantum efficiency of 

up to 33%.[43] 

Three effects can lead to erroneous pulses: Photons may bypass the photocathode and 

directly hit the first dynode ahead of the electrons, triggering an early electron cascade 

from there on. Despite missing one multiplication process, these prepulses exceed a signal 

32

4.4 Signal Digitization 

Table 4.1: Overview over different digitizers resp. I3Waveform source types. 

source n bins T bin T total resolution U satu 

ns ns bit mV 

ATWD ch0 128 3.3 422 10 100 

ATWD ch1 128 3.3 422 10 800 

ATWD ch2 128 3.3 422 10 7500 

FADC 256 25 6400 10 70 

SLC 3 25 75 10 70 

strength of 0.25 PE for about 0.5% of all 50 PE pulses at 25 °C according to measurements 

by Hamamatsu.[44] They typically occur 32 ns prior to the real pulse.[45] 

Late pulses can occur if photoelectrons are backscattered at the first dynode. If they hit 

the first dynode at the second approach, they will trigger a pulse which lags behind by up 

to 66 ns, which is twice the transit time between photocathode and first dynode, or even 

more for multiple backscattering. Their typical rate is 1.5% for pulses as above. 

Finally, afterpulses are caused by ionized molecules of the remaining gas or luminescence 

photons from the dynodes hitting the photocathode. They typically occur for 2.0% of 

the aforementioned pulses and are frequently found a few microseconds after the real 

pulse.[44][46] 

4.4 Signal Digitization 

The two digitizers ATWD and FADC mentioned in section 4.3 act as different sources for 

raw waveforms. These raw waveforms still show undesirable characteristics such as droop, 

and have to be calibrated before they can be used for feature extraction, see section 4.6.1. 

The most important parameters of these digitization circuits are listed in table 4.1. 

4.4.1 ATWD 

The analog transient waveform digitizer is an application-specific integrated circuit (ASIC) 

located on the DOM’s mainboard. Three amplifiers feed the PMT signals into three of 

the ATWD’s input channels, leading to different saturation voltages U satu . If a trigger 

condition is met, channel 0 – the one with the lowest gain – is digitized first. All channels 

feature a resolution of 10 bit ̂= 1024 counts, i.e., discrete values; if the maximum value of 

one of the first two channels exceeds 768 counts, the next higher channel is digitized as 

well. For channel 2, U satu = 7.5 V is greater than the PMT’s maximum voltage of 5 V, so 

all ATWD channels together span the full dynamic range of the PMT. The combination 

33


q / PE 

0.3 

FeatureExtractor 

NFE ATWD OT 

NFE ATWD NT 

NFE FADC 

0.2 

0.1 

0 5 10 15 20 

bin 

Figure 4.6: Comparison of ATWD and FADC single photoelectron (SPE) waveform shape 

parametrizations used in different feature extractors. The three NFE parametrizations 

are based on a study by Christopher Wendt.[47] Note that the bin length 

differs for ATWD and FADC, so the shapes are not to scale time-wise. 

of the different channels is performed by the IceCube software module DOMcalibrator 

either on a bin-to-bin or on a global waveform basis depending on the configuration; the 

three channels’ combined effective resolution is about 14 bit. 

The average waveform shape of a single photoelectron (SPE) signal depends on the DOM’s 

toroid, see figure 4.6. 

The conversion time amounts to 29 µs during which the ATWD can not capture further 

waveforms. To compensate for these dead times, the mainboard contains two different 

ATWD chips called ATWD-A and -B, processing the signal alternatingly. This significantly 

reduces the dead time: When a DOM is triggered (launched), both the FADC and 

one of the ATWD chips start capturing. The DOM can not be retriggered until the FADC 

has done capturing; it is ready again after 6.45 µs (section 4.4.2). If the DOM is launched 

again in the remaining 22.55 µs during which the ATWD chip is digitizing the waveform, 

the FADC will start capturing together with the other ATWD chip.[46] 

4.4.2 FADC 

The fast analog digital converter has a larger bin length resulting in a coarser but longer 

digitized waveform (table 4.1). Its main usage is to capture features which are too late or 

too long to appear in the ATWD waveform(s). 

The FADC’s time resolution is not sufficient to resolve SPE-like PMT features. Because 

34

4.5 Data Structure and Data Rate 

of this, a 180 ns pulse shaper is used to broaden the input signals. The widening allows 

to estimate the feature’s arrival time with sub-bin precision by the distribution of the 

captured charge in several bins. The FADC has negligible dead time (two clock cycles 

equaling 50 ns), and is the digitizer used to provide SLC charge stamps.[46] 

4.4.3 SLC Chargestamps 

Prior to the IC59 data taking season in 2009, DOMs were only read out in case of multiple 

DOMs fulfilling a hard local coincidence (HLC) condition: To reduce data caused by noise, 

a DOM’s ATWD and FADC waveforms were only transmitted to the surface if one of the 

neighboring or – depending on the configuration – one of the next-to-neighboring DOMs 

also triggered within one microsecond. 

Since IC59, DOM launches that do not fulfill HLC are kept irrespectively if other DOMs 

triggered HLC during a small time window. To reduce the amount of data caused by these 

so-called soft local coincidence (SLC) launches, SLC charge stamps are stored instead of 

full ATWD and FADC waveforms. These charge stamps are condensed FADC waveforms; 

instead of 256 bins they contain only the highest of the first 16 bins and its two direct 

neighbors. In most cases, this will yield a concave waveform (in which the middle bin will 

be the highest one), but in some cases FADC bin 0 or 15 might hold the maximum. In 

these cases, the value of bin -1 (which can and will be accessed by the DAQ) resp. bin 16 

might be even larger, leading to an irregular SLC charge stamp in which the middle bin 

is not the highest.[48] 

The comparatively low saturation voltage of the FADC does not hamper SLC charge 

stamps because of two reasons: First the brightness required to saturate FADC would 

almost inevitably trigger nearby DOMs and consequently HLC, and secondly there is 

a third trigger condition called self-local coincidence, fully launching every DOM with 

exceptionally strong PMT signals independently of HLC.[46] 

4.5 Data Structure and Data Rate 

The start times of DOMs fulfilling a HLC condition are sent to the IceCube Laboratory 

on the surface and analyzed by triggers. If they meet the specified criteria of at least 

one trigger condition, the launches are aggregated into an event. All waveforms inside an 

event are calibrated, features are extracted and used by different algorithms to quickly 

reconstruct a muon track or a cascade, which is then assessed by different online-filters. 

If at least one of these filters decides that this event is worth to be kept, it remains in the 

data stream which is written into binary .i3 files that are sent to northern hemisphere 

collaboration members per satellite for offline-processing. Additionally all data is recorded 

on tapes. Up to several hundreds successive .i3 files are summarized into a single detector 

run normally lasting about 8 h, during which the detector configuration is assumed to be 

constant. 

35


A DOM’s in-ice data rate for HLC launches is about 10 Hz, its SLC launch rate 

corresponds to the dark count rate of about 300 Hz. The rate of events passing the 

triggers (global trigger rate) is of the order of 4000 Hz, the global filter rate lies below 

100 Hz. 

4.6 Software 

The IceCube software is built within a highly modular framework called IceTray[49]. Data 

processing of all kinds is organized serially in one-way streams. IceTray itself provides 

base classes for modules (e. g. I3Module) and services (I3Service), frames (I3Frame) 

which are the containers in which all data belonging to an individual IceCube event is 

saved, a definition of physical units (I3Units), Python scripting support and much more. 

Various projects can be loaded to provide further functionality by adding modules and 

services, such as dataclasses as standardized interface for various IceCube specific objects, 

DOMcalibrator for waveform calibration (section 4.6.1), tools for filtering, visualization 

as well as feature extraction and reconstruction. The NewFeatureExtractor (NFE, I3NFE) 

presented in this thesis is one of these projects. 

IceTray and selected projects are bundeled to form different meta-projects which facilitate 

easy installation of software environments for specialized tasks such as online-filtering, 

simulation (IceSim), reconstruction (IceRec), or analysis (e. g. Offline-Software). 

The software is written in C++ almost exclusively, however since IceTray v3 was released 

in 2009, modules can be written in Python as well, which was previously only used for 

steering scripts. 

4.6.1 DOMcalibrator 

Calibration encompasses many different processes in both the detector maintenance and 

data taking processes. This section deals with the waveform calibration conducted by the 

IceCube software module DOMcalibrator (I3DOMcalibrator). 

The DOMcalibrator’s tasks are to convert the raw waveforms (I3DOMLaunchs) captured 

by the PMT in units of counts to calibrated waveforms (I3Waveforms) in units of mV with 

a near-zero baseline, to combine the three ATWD channels, and to compensate for known 

signal distortions and time delays. 

Per-DOM values for the signal transit time, a baseline offset and the gain are obtained by 

monthly DOMCal detector runs and stored in an online database among other important 

information such as droop correction parameters (see section 4.3). For use in offlineprocessing, 

GCD files (geometry, calibration, detector status) belonging to individual 

runs (or simulated datasets) are used instead.[41] 

36

CHAPTER V 

Feature Extraction in IceCube

5 FEATURE EXTRACTION IN ICECUBE 

Feature extraction in IceCube refers to the analysis of digitized calibrated waveforms. 

It aims for the determination of the number and time distribution of photons 

observed by IceCube’s DOMs. The main challenge is to deconvolute the PMT’s and in 

case of FADC the pulse shaper’s smearing functions with robust and efficient algorithms. 

Dataclasses provides two classes for the feature extraction output: the I3RecoHit and 

the I3RecoPulse. 

An I3RecoHit stores information about time and source. It was designed to represent 

a single photon triggering the PMT and therefore corresponds to a deposited charge 

of exactely 1 PE. Similar hits were used in AMANDA’s reconstruction chain, however 

in IceCube hits are largely deprecated in favor of pulses; of the four feature extraction 

modules presented in this thesis, only FeatureExtractor and SLCHitExtractor support 

hits. 

I3RecoPulse also provides information about time and source, but additionally stores 

the deposited charge in units of PE and the width of the pulse, thereby offering more 

information, which depends less on design decisions; for example I3RecoHit creation is 

ambigious in case of three distinct but close 0.7 PE features because their total charge 

corresponds to only two hits, or in case of a single 1.5 PE feature. 

Both classes offer a data member called hitID whose purpose is not well-defined. For 

all four feature extraction modules presented in this thesis, it redundantly reflects the 

hit’s or pulse’s position in their respective series 2 . This hitID is put to better use in 

merged I3RecoPulseSeries as it can be used to trace back a pulse, see section 5.4.5. 

All of the series of one event are finally organized in a STL map using their DOM 

and string numbers combined in OMKeys as key value. The resulting I3RecoHitSeries- 

Map/I3RecoPulseSeriesMap can then be accessed by other modules and services further 

down the data stream. 

For the following chapters, let w i , i = 0 . . . L, be the values of the bins of a given 

calibrated waveform, with L = 127 for ATWD, L = 255 for FADC and L = 2 for SLC. 

5.1 FeatureExtractor 

The FeatureExtractor (I3FeatureExtractor, FE) mainly written by Dmitry Chirkin[50] 

is the feature extraction module currently used for IceCube ATWD and FADC waveforms. 

Developement began in 2003 for fat-reader, an alternative IceCube software suite. A first 

release for IceTray under SVN 3 control was made in 2005 for the use with data from 

2 Series is IceCube’s name for standard C++ STL vectors that contain related objects such as all 

hits/pulses extracted from a single DOM for a distinct event. 

3 SVN aka Subversion is the version-control system used for the central IceCube source code repository. 

38

sum of all bins above 

baseline + error 

extrapolation of 

maximum slope 

to baseline 

width 

width: 

half of the number 

of bins above 

first pulse‘s half 

maximum height 

threshold 

5.1 FeatureExtractor 

parabola fit to maximum bin 

charge: 

parabola maximum 

× pulse width 

extrapolation of first 

local maximum slope 

above threshold 

to baseline 

charge: 

sum of all bins 


maximum slope 

to baseline 

width: 

proportional to 

charge / maximum 

threshold 

width 

Figure 5.1: Sketch illustrating FeatureExtractor’s first single-pulse extraction algorithms to 

left and its second one to the right. Shown is a part of a calibrated waveform in 

arbitrary units; baseline correction has been omitted for reasons of clarity. 

x 

x 

IceCube’s first deployed string. Since then, new algorithms and options were added to 

the module. It is used for practically all physics analyses up to early 2010. 

For ATWD the FeatureExtractor offers two single-pulse extraction algorithms that 

define at most one pulse per waveform as well as two more sophisticated multi-pulse 

x 

extraction algorithms; for FADC it uses a fifth algorithm.[50] 

x 

x 

x 

The first single-pulse algorithm extracts only one pulse given by the largest feature 

in the waveform, and its corresponding charge, see figure 5.1. It searches the maximum of 

the waveform’s slopes ∆w i := w i+1 −w i from the beginning up to the waveform’s maximum 

bin w max (or to the first saturated bin if any). The intersection of the extrapolation of this 

slope with the baseline is used to define the leading-edge time. The algorithm then fits a 

parabola through the waveform’s maximum and its two surrounding bins; the difference 

between the position of the maximum and the leading-edge time from above defines the 

width of the pulse. This width multiplied with the parabola’s maximum defines the charge 

of the pulse. 

This is a very fast algorithm, but it lacks robustness. For a given waveform, the maximum 

of the waveform’s slopes does not neccessarily belong to the feature with the highest 

amplitude; this might be due to overlapping features where one feature’s tail flattens 

the next feature’s leading edge, due to binning effects, or due to statistical fluctuations. 

In these cases, the extracted width and thereby the extracted charge can be extremly 

overestimated, and the time does not match the largest feature. Also, for most track 

reconstruction methods the first feature’s time is more valuable than the time of the 

largest feature, because it is least affected by scattering. Therefore it is disadvantageous 

to extract only the maximum pulse’s leading edge time as done by this algorithm. 

To make it more robust, the algorithm can be configured to use the second single-pulse 

algorithm’s charge estimate instead; however, the whole algorithm is largely superseded 

by the second single-pulse algorithm. 

39


The second single-pulse algorithm extracts the time of the first feature’s leading 

edge along with the waveform’s total integrated charge. Precisely, it searches for the 

first local maximum of the ∆w i for all consecutive pairs (w i , w i+1 ) above a configurable 

threshold and computes the position of baseline-crossing of its extrapolated slope. The 

pulse’s charge q is defined as sum over all waveform bins, and the width is defined by 

q 

const· 

w max 

as illustrated in figure 5.1 (contradicting to the documentation[50], see section 

C.2). 

Like the first single-pulse extraction algorithm, this second one is fast, however it is 

not robust regarding prepulses (see section 4.3): As prepulses are rare and only have 

a small amplitude, they normally have little impact on reconstruction and might not 

even pass the trigger or extraction threshold. However, for many-PE features they can 

cause a comparably tiny peak ahead of the large feature. In these cases, the algorithm 

attributes the whole charge to the prepulse, potentially increasing its weight in the later 

track reconstruction heavily. 

Because of its speed and the low rate of prepulses the algorithm is nonetheless well-suited 

for online-filtering, for which it is currently used.[51] 

The first multi-pulse algorithm is the sophisticated but time-consuming ROOTfit 

algorithm. Using libraries of the eponymous ROOT framework[52], it fits increasing numbers 

of SPE-like pulses to the waveform until either the χ 2 -measured goodness of the fit 

does not increase anymore, or until the maximum number of pulses has been reached. 

FeatureExtractor’s SPE pulse shape parametrization is 

w(t) = q (( 

1 − e 

−τ 2) e −τ + w d (t) ) , (5.1) 

c 1 

( 

√ ( ( ( )) 

w d (t) = −c 2 e −c 3 τ 

1 − e −τ − 1 πe 1 2 

2 erf τ + 

1 

2) ) − erf 1 

, τ = t−t 0 

, 

2 

σ 

where q, t 0 , and σ are fitting-parameters for the pulse’s charge, leading-edge time, and 

width, respectively; erf denotes the error function, and c 1 . . . c 3 are constants. The term 

w d is an approach to deal with droop; if it is to be included, the droop correction in 

DOMcalibrator has to be disabled. The parametrization is shown in figure 4.6 with the 

values used in the second multi-pulse algorithm (q = 1 PE, t 0 = 0, σ = 2 ns), and with 

the droop correction term disabled (i.e., c 2 = 0). 

This algorithm is neither recommended nor maintained anymore, and it is too slow for 

large-scale reconstruction. 

The second multi-pulse algorithm is based upon the method of Bayesian Unfolding. 

Being reasonably fast and capable of determining photon arrival times in complex 

features, it is used for practically all analyses that rely on multi-pulse extraction, e. g., 

cascade reconstruction and offline-processing muon track reconstruction.[51] 

40

5.2 PulseExtractor 

An algorithm using the same underlying method is implemented in the NewFeatureExtractor 

NFE developed within this thesis. Therefore the method and the differences between 

the two implementations are discussed in section 6.3. 

The FADC algorithm is a multi-pulse algorithm similar to the second single-pulse 

ATWD algorithm: It searches the waveform from start to end for a local slope maximum 

of pairs (w i , w i+1 ) with w i+1 above threshold and extrapolates the resulting line to the 

baseline to define the pulse’s time. For the calculation of the charge it sums up the bin 

contents from i+1 on until either a bin is below the threshold or until a bin is higher than 

its predecessor after the feature has fallen below half its previous maximum value, in both 

cases excluding the determining bin. The width is defined as the number of bins summed 

up. The algorithm then continues searching for the next pulse. Incomplete pulses at the 

end of waveforms are not extracted. 

Besides the obvious tasks of a feature extractor, FeatureExtractor also performs some 

waveform calibration tasks such as baseline finding, transit time correction, or droop 

correction. While the employed methods were superior to the ones used in DOMcalibrator 

at the time of their introduction, the good quality of recent DOMcalibrator versions 

caused most of these features to become obsolete. Their remaining in the code impairs 

its readability[53], introduces an overload to the maintenance of the code and increases 

the risk of misconfiguration. 

5.2 PulseExtractor 

PulseExtractor (I3PulseExtractor) is a rewritten and condensed version of FeatureExtractor. 

Created mainly by Dmitry Chirkin and Christopher Wendt in 2009, the only 

algorithm it includes is a modified variant of the original FeatureExtractor’s Bayesian 

Unfolding algorithm. This algorithm can be applied to both ATWD and FADC if the 

module is run in different instances. The module does not offer configuration options 

besides names of the input and output objects.[54] 

The PulseExtractor’s advantages are easy maintainability and high ease of use at the 

expense of adaptivity and extraction performance (see section 6.3.1). 

5.3 SLCHitExtractor 

The SLCHitExtractor (I3SLCHitExtractor) was written in 2009 by Andreas Groß to be 

the first module to extract hits from SLC charge stamps, which are recorded since IC59. 

The algorithm defines the pulse time as t = t 0 + (i max − 1) · T bin − c 1 

w imax−1 

w max 

− c 2 , where t 0 

41


denotes the charge stamp’s startTime and thereby the time corresponding to w 0 , whereas 

i max denotes the bin with the highest value, w max ; this is illustrated on the right side of 

figure 6.6. The charge is defined as q = wmax 

c 3 

. 

The constants c 1 and c 3 are obtained by comparison of SLCHitExtractor’s pulses for SLC 

charge stamps generated from full FADC waveforms with FeatureExtractor’s pulses. The 

time offset c 2 has been abandoned recently in the belief that it was caused by a general 

ATWD FADC time offset, which has been taken care of in DOMcalibrator; see appendix 

C.4 for further discussion.[55] 

5.4 NewFeatureExtractor 

The NewFeatureExtractor (NFE) – written by the author of this thesis – is the latest of 

the four feature extractors covered in this document and constitutes its main topic. 

NFE is special in that it employs multiple algorithms for the same event, choosing an 

approriate algorithm according to the waveform’s complexity; this concept is explained in 

section 5.4.2. The algorithms themselves are discussed in chapter 6. 

5.4.1 Main Characteristics 

Well-documented code conforming the IceCube Coding Standards 4 

Documentation for IceCube users and developers alike exists in form of wiki pages and 

an internal report created out of passages of this thesis. It contains general information 

about the project’s structure and the employed algorithms. Further in-detail information 

for developers is found in doxygen mark-up source code comments that explain class 

structures including purpose of methods and data members, and in standard C++ inline 

comments explaining programming decisions and purpose of small code passages. The 

project also provides example scripts to make the use of this project as easy as possible. 

Focus on the main task of feature extraction 

NFE relies on the waveforms being calibrated prior by DOMcalibrator or any other module. 

It intentionally does not aim to improve for example the baseline level, but instead the 

author communicated potential shortcomings of other modules to the respective authors. 

This preserves IceTray’s data flow concept and prevents double treatment of problems or 

other unintended side-effects later in the reconstruction chain. This also results in simpler 

configuration of NFE. 

4 http://software.icecube.wisc.edu/OFFLINE-SOFTWARE-V02-02-03/codingstandards.html 

42


Ease of use 

To make the transition from other feature extractors to NFE as easy as possible without 

sacrificing configurability and flexibility, all NFE modules and services require only a minimum 

of configuration. This is implemented by reasonable default values which give good 

results under almost all circumstances. Thoroughly tested defaults also help minimizing 

unintentional misconfigurations. 

Modular code organization 

The code is organized into IceTray modules and services with specific purposes to improve 

readability, expandability and possible bug tracking. Modular organization also allows the 

creation of unit tests to thouroughly check passages of code. While initially providing good 

means to test the code for errors during development, the main advantage of unit tests 

is their ability to test the code for bugs introduced later by seemingly unrelated code 

changes. The tests together with the provided example scripts are run prior to releasing 

new versions of the project or of meta-projects containing it. 

High performance 

NFE is intended to succeed FeatureExtractor and SLCHitExtractor not only for offlineprocessing, 

i.e., processing of stored data in the Nothern Hemisphere, but also for onlineprocessing, 

for which performance is a major issue. Besides choosing efficient algorithms, 

a new approach was taken in using different algorithms for different types of waveforms; 

this is elaborated in section 5.4.2. 

5.4.2 Program Structure 

Most of IceCube’s DOM launches are caused by single photons whose corresponding waveforms 

are considerably regular. To take advantage of this, the NFE framework allows 

to use different extraction algorithms for waveforms of varying complexity. The time 

saved during the extraction of simple waveforms can be spent for sophisticated and timeconsuming 

algorithms to improve the results for more complex waveforms. Additionally, 

this allows to use algorithms which excel in extracting certain waveforms but fail for other 

waveform categories, or to use the same algorithm with different settings for different categories. 

To decide which algorithm to use for a given waveform, a pre-evaluation algorithm is initially 

called to quickly assess the waveform. Depending on the results, the NFE framework 

(i.e., the module I3NFE) passes the waveform and all relevant data to the extraction algorithm 

configured for this category. Currently, three categories have been implemented, 

but this number can be expanded; they are simple for ATWD or FADC waveforms with 

simple waveforms, complex for all remaining ATWD and FADC waveforms, and slc for 

SLC charge stamps; the classification of these categories lies in the responsibility of the 

43


pre-evaluation algorithm (e. g. “Eva”, section 6.1). 

All algorithms are implemented as services and inherit from a common pre-evaluation or 

extraction algorithm base class. This modular design – inspired by the IceCube reconstruction 

framework gulliver by David Boersma[56] – simplifies maintenance and facilitates 

easy implementation of new algorithms without breaking existing code. 

NFE’s framework is also responsible for calculating the gain constant needed to translate 

waveforms in mV to waveforms in units of photoelectron charges (PE). Providing this 

gain constant to the algorithms allows to define thresholds independently of hardware or 

firmware thresholds that might change between different firmware or simulation software 

versions. This eliminates one reason for recalibration of configuration settings; in addition 

PE are more practical for setting thresholds as they are closer related to the physics processes. 

The same approach was taken in PulseExtractor, while FeatureExtractor defines 

thresholds in fractions of the trigger’s SPE discriminator threshold. 

The exact conversion factor for the translation of mV to PE is T bin (g · Z · e) −1 . The average 

PMT gain g (SPEmean) and the DOM’s impedance Z (FrontEndImpedance), which 

is dominated by the toroid, are provided by either the GCD file or the online database. 

Another non-obvious task for the framework is to optionally add an extra-info series 

map and an algorithm-info series map to the frame. Those objects mirror the 

I3RecoPulseSeriesMap’s structure, which means that for every pulse of every DOM with 

entries in the pulse map these objects contain one integer each. The extra-info integer is 

a bit mask containing information about whether the pulse was cut off in the beginning, 

cut off in the end, or if it was saturated. The latter information is obtained from DOMcalibrator’s 

StatusCompounds which are added to calibrated waveforms but are largely 

ignored by other feature extractors. The algorithm-info contains the unique ID of the 

algorithm that was used to extract the corresponding pulse. 

To keep the code simple and to offer more configurability when needed, all data sources 

(ATWD, FADC and SLC, resp. all different calibrated waveform series) are attached to 

different instances of the I3NFE module. Algorithm instances (services) can be used by an 

unlimited number of modules. The resulting pulse series maps can optionally be merged 

into a single map by the pulse merger (section 5.4.5). 

5.4.3 Time Offset Constants 

One drawback of the application of different feature extraction algorithms within the 

same event is the need for the introduction of time offset calibration constants. Those are 

required to align the pulses to the same start time independently of the algorithm and 

data source used. The two major objections against these constants are that they might 

require maintenance and that they should ideally not be required. 

The first objection only arises for features of different sources, specifically features which 

44


occur in both ATWD and FADC. It can be mitigated by deducing the time offsets from 

well-calibrated waveforms so that they only compensate offsets introduced by the algorithms, 

but not by a potentially imperfect calibration. 

The second objection only holds under the assumption that SPE pulses are sufficiently 

similar for all sources. Even if a single method could be used for all sources, the different 

pulse shapes would require subsequent alignment (see figure 4.6). 

5.4.4 Waveforms Without Pulses 

Waveforms might not contain any features which pass a given threshold. NFE provides 

three alternatives to deal with this situation: DOMs without pulses can either be excluded 

from the pulse series map, or they can be added with an empty pulse series, or NFE can 

force the algorithms to extract at least one pulse, respectively NFE can pass the waveform 

to an algorithm which is capable of finding at least one pulse if enforced. 

The important difference between the first two alternatives is that they lead to different 

values for the per-event quantity NChan – i.e., the number of DOMs hit during the given 

event, often used for energy estimation and quality cuts –, because it is defined as the 

number of entries in the pulse series map. Therefore, the underlying question is if one is 

to trust the hardware trigger or the feature extraction to decide whether a DOM was hit. 

The physical implications of this choice have to be discussed in a larger scale than it is 

possible in this thesis. 

5.4.5 Pulse Merger 

The pulse merger (I3NFEPulseMerger) is a stand-alone IceTray module designed to join 

up to three pulse series maps into a single one. Usually these maps correspond to the different 

pulse sources ATWD, FADC and SLC individually extracted by NFE, but differing 

usage is supported. 

The input maps are arranged by priority; by default, ATWD has the highest priority and 

SLC has the lowest. The primary map is copied completely, pulses from the other two 

maps are added successively if they don’t overlap with pulses from higher priority input 

maps – they may, however, overlap with pulses from the same input map. An extra time 

window can be configured to be required between the ending (defined as start time plus 

width) of one pulse and the start of another pulse to reduce the probability of double 

extraction, i.e., extraction of the same feature by different sources (compare appendix 

C.1). 

The NFE pulse merger differs from similar modules in that it supports merging of maps 

with extra-info and algorithm-info. 

45


46

CHAPTER VI 

Algorithms Implemented in NFE

6 ALGORITHMS IMPLEMENTED IN NFE 

maximum 

threshold 

x 

feature 

threshold 

x 

x 

Figure 6.1: Sketch illustrating NFE’s pre-evaluation algorithm “Eva”. If a waveform only contains 

features like the first one, it is evaluated as simple; however if it contains 

features which are too long, too high, or too close together, the whole waveform is 

classified to be complex. 

x 

Up to now, one pre-evaluation algorithm (“Eva”) and three extraction algorithms have 

been implemented. Of the latter three, “Simple” was designed mainly for the waveform 

category of the same name, “BayesUnfold” is capable of handling well all types of ATWD 

and FADC waveforms, and “SLCHE” extracts SLC chargestamps 

x 

exclusively; all four 

algorithms are part of the default configuration 

x 

of the NewFeatureExtractor. 

Currently “BayesUnfold” is the only algorithm capable of finding at least one pulse per 

waveform if enforced (see section 5.4.4), so it needs to be used if EnforcePulse is set; 

to accomplish this, a special EnforceAlgorithmServiceName can be set, which is only 

necessary if “BayesUnfold” is not used for either simple or complex waveforms. 

6.1 Pre-evaluation Algorithm “Eva” 

As pointed out in section 5.4.2, the purpose of pre-evaluation algorithm is to quickly assess 

a waveform and to decide whether it belongs to the simple, complex, or slc category. 

The pre-evaluation algorithm “Eva” accomplishes this by four simple checks: 

If a calibrated waveform’s source is SLC, it is directly assigned to the slc category. An 

ATWD or FADC waveform is scanned once; if its features are either too high, too long 

or too close together, the waveform is marked as complex (see figure 6.1). More precisely, 

if one waveform bin’s value w i exceeds the threshold w max , or if l max successive bins 

exceed the threshold w feat , or if the gap between bins exceeding w feat is smaller than the 

allowed minimum distance of d min bins, then the scanning is stopped and the waveform is 

categorized as complex. 

The four parameters w max , w feat , l max , and d min can be configured individually for 

48

6.2 Extraction Algorithm “Simple” 

quadratic interpolation for leading edge times 

linear interpolation for trailing edge times 

charge: 

sum over bins above 

feature threshold 

plus boundary bins 

detection threshold w , 

detect 

rejects flat features 

widths 

feature threshold w , feat 

defines feature bounds 

Figure 6.2: Sketch illustrating NFE’s extraction algorithm “Simple”. In the shown waveform, 

three pulses are identified. The charge compensation has been omitted for reasons 

of clarity. Note that “Simple” was not designed for complex features like the third. 

ATWD and FADC to take into account the different characteristics of these two sources. 

6.2 Extraction Algorithm “Simple” 

The extraction algorithm “Simple” is a fast threshold based algorithm particularly suitable 

for simple features. To define a feature, a threshold w feat is used. Starting at the 

waveform’s beginning, the algorithm searches for bins exceeding w feat . If it finds one in bin 

i, the potential pulse’s leading edge time t is defined as the time t parab where a parabola 

through the points (i − 1, w i−1 ) to (i + 1, w i+1 ) crosses the threshold w feat plus a quadratic 

charge compensation term and a constant time offset: 

⎧ 

t = t parab − t q + t offset with 

⎨c P0 (q − c P1 ) 2 for q < c P1 , 

t q = 

⎩0 otherwise, 

(6.1) 

with charge q and configurable constants c P0 and c P1 . While using a parabola for interpolation 

mainly improves the time resolution at a given charge level, the charge compensation 

term accounts for the fact that smaller pulses cross the threshold later, so they are not 

redundant (compare section 7.1). 

If the waveform starts with a bin surpassing the threshold (i. e., i = 0), t parab in equation 

(6.1) is not well-defined and gets replaced by t lin , which is the time of the intersection of 

w feat with the linear extrapolation of the slope between the points (0, w 0 ) and (1, w 1 ). If 

t lin is more than 10 ns ahead of the waveform’s start time t 0 , it is replaced by t 0 because 

it has to be assumed that the waveform starts well within a feature and the extrapolation 

could be too far off. Analogously, if the threshold is passed by the last bin (i = L), t parab is 

replaced by t lin defined by the last two points, (L−1, w L−1 ) and (L, w L ). Lastly, this linear 

fallback method is used with the points (i − 1, w i−1 ) and (i, w i ) if the parabola’s point of 

49


intersection lies more than one bin length away from bin i, i. e., if |t parab − iT bin | > T bin ; if 

this happens, it must be due to numerical instability which arises for nearly colinear points 

(catastrophic cancellation). It remains to be an open task to rearrange the equation used 

to determine t parab to become more stable; this should be possible using Viète’s formulae, 

but it will not improve the algorithm because the fallback method is well-suited in case 

of colinear points. 

The time calculation based on the intersection with the threshold w feat instead of the 

baseline w = 0 was chosen to reduce the impact of waveform binning effects; extrapolations 

down to the baseline possess a greater lever and thereby larger error potential than 

interpolations between two known points of a relatively smooth curve. A disadvantage is 

the crossing point’s obvious dependence on the pulse charge, or the need for two constants 

to compensate for it. 

The trailing edge time t end is required to calculate the width t − t end of the pulse; 

“Simple” defines it as the point of threshold crossing determined by a linear interpolation 

between the points (i+j −1, w i+j−1 ) and (i+j, w i+j ), where bin i+j is the first one to fall 

below w feat for j > 0. No further corrections are applied because the achieved precision 

for the width is sufficient. If the waveform ends prematurely, the waveform’s ending time 

is taken instead and the pulse is marked as cut off in the extra-info series. 

The pulse’s charge is defined as the total charge contained in the bins i − 1 to i + j, 

q = ∑ i+j 

k=i−1 w k. A potential charge related to a non-zero baseline is not substracted as 

NFE explicitly relies on a correct baseline from DOMcalibrator. 

If a pulse is added to the pulse series depends on whether the feature passes a second 

threshold w detect and whether q ≥ q min . These two criteria can be disabled by setting 

w detect = w feat and q min = 0, respectively. However having at least one of them active 

allows w feat to be set to a substantially lower value without erroneously extracting baseline 

fluctuations (see figure 6.2). 

The two thresholds and the two charge compensation constants can be set individually 

for ATWD and FADC, the time offset constant can be set for ATWD OT, ATWD NT 

and FADC, and q min is the same for all sources, leading to a total of twelve configuration 

parameters. 

6.3 Extraction Algorithm “BayesUnfold” 

“BayesUnfold” (BU ) uses the method of Bayesian Unfolding described by G. D’Agostini 

in his paper A multidimensional unfolding method based on Bayes’ theorem.[57] It is also 

employed in FeatureExtractor and PulseExtractor; this section explains the implementation 

in NFE, section 6.3.1 shows the differences to FE’s and PE’s implementations, and 

appendix A gives a more formal approach to the unfolding itself. 

50


Figure 6.3: Sketch illustrating Bayesian Unfolding as employed in FE, PE and NFE; see text 

for description. Note that the deconvoluted waveform is not drawn to scale, the 

unfolding is charge conserving. 

The underlying idea behind unfolding techniques is to undo smearing and distortion 

effects caused by the experiment’s hardware. Afterwards, 

x 

x it is easier to reconstruct the 

arrival times of photons at the photocathode. 

Besides the calibrated waveform to extract, the unfolding algorithm requires samples of 

generic SPE pulses in the same binning. It then deconvolutes the waveform iteratively, 

moving the charge into those bins in which SPE-like pulses with the given charge must 

have occured to cause the actual waveform with maximum probability. An illustration 

for this is given in figure 6.3: The waveform to the left side is the superposition of five 

SPE-like features, the deconvoluted distribution to the right side is a histogram of the 

same length and total charge, but the charge has been moved to the bins at the beginnings 

of the individual features. For more details, see appendix A. 

x 

Starting Distribution One of the unfolding method’s parameters is the starting distribution 

for the deconvoluted distribution, u i, 0 , i = 0 . . . L. As D’Agostini points out, 

the method “gives the best results (in terms of its ability to reproduce the true distribution) 

if one makes a realistic guess about the distribution that the true values follow, but, 

in case of total ignorance, satisfactory results are obtained even starting from a uniform 

distribution”.[57] 

The SPE samples are zero everywhere but in a small region behind the position of the 

hit. Because of this, a combination of an uniform distribution with the shifted waveform 

provides a good initial guess: 

u i, 0 = 1 2 

w tot 

L + 1 2 w k, k := (i + 1) mod L, w tot = ∑ w i (6.2) 

The shift of one bin length is introduced to minimize a possible bias to later pulse times 

and to speed up the unfolding, because the charge always needs to be shifted to the 

beginning of the features. The uniform term is preserved because the algorithm can not 

increase bin contents which are zero at any time during the deconvolution process, and 

does so only slowly for very low bins contents. 

51


Number of Iterations An important parameter of the algorithm is the number of 

iterations n iter . Steering the number of iterations adaptively can both reduce computation 

time and improve the quality of extraction for “tricky” waveforms by giving them more 

CPU time and thereby more iterations. 

Moreover, terminating the unfolding procedure at the right time is the suppression of the 

method’s inherent amplification of statistical fluctuations (positive feedback). D’Agostini 

suggests to smooth the deconvoluted distribution after every iteration step to both speed 

up the convergence and to decrease the amount of artifacts. However, this does not seem 

to be applicable in this case since the target deconvolution should be spiky. 5 

The “BayesUnfold” algorithm in NFE employs a conjunction of two stopping conditions: 

Both the change of the highest bin and the sum of all bins above half the pulse charge 

threshold q min must be small for two successive iteration steps (with the constraint n iter ≥ 

10): 

max ({u i, ñ }) 

max ({u i, ñ−1 }) < ∆u min 

∧ 

∑ 

i: u i, ñ > 0.5 q min 

u i, ñ 

∑ 

i: u i, ñ−1 > 0.5 q min 

u i, ñ−1 

< 0.3 ∆u min ∀ ñ ∈ {n iter−1 , n iter } 

(6.3) 

Both conditions depend only on relatively few bins which reduces the impact of the aforementioned 

fluctuations. The first condition ensures that the highest pulse is extracted 

precisely, and with it the other pulses as they received as many iterations. The second 

condition ensures that no charge associated to pulses is lost. The factor of 0.3 has been 

determined empirically during this thesis, using the unit test’s extra output for idealized 

pulses. Additionally, a maximum number of iterations n max can be specified. 

Pulse Definition When the unfolding procedure ends after n iter iterations, the features 

in the deconvoluted distribution u i := u i, niter have to be extracted to define the pulses. 

“BayesUnfold” uses three subsequent bins to define a pulse (figure 6.4). This is motivated 

by two facts: First, even in an ideal case, SPE-pulse-like features have to be unfolded into 

two subsequent bins in a deterministic ratio if they occured at a different time than that 

of a bin – i.e., “between” two bins. Secondly, only a finite number of iterations (∼ 30) is 

conducted, so SPE-pulse-like features whose charge should have been moved into a single 

bin in fact end up in this main bin and the surrounding ones. 

The deconvoluted distribution is scanned from its beginning to its end. If for a given i u i 

exceeds 0.5 q min , “BayesUnfold” checks whether u i+1 > u i ; if so, i is incremenetd by one. 

The three bins defining a pulse are u i−1 , u i , and u i+1 . The algorithm then checks whether 

the third bin u i+1 should be accounted completely to the current pulse’s charge q or if it 

5 It might be worthwhile to try out advanced smoothing methods, e. g. a multi-resolution Savitzky- 

Golay filter as described in http://www.er.ams.eng.osaka-u.ac.jp/Paper/2006/Norbert06a.pdf. 

52


Figure 6.4: Sketch illustrating the algorithms used by the different feature extractors to define 

pulses out of the unfolded distribution; areas of the same color belong to a single 

pulse. Dark dashed lines indicate the charge thresholds, light dashed lines indicate 

the threshold boundaries in which the pulse definition does not change. 

left: NFE; middle: FE/PE with the same threshold; 

right: FE/PE with a higher threshold; striped areas apply to FE only. 

needs to be shared with a later pulse (see figure 6.4): 

⎧ 

q = u i−1 + u i + f u i+1 with 

⎨ u i 

u 

f := i +u i+2 

u i+2 ≥ min ({u i , u i+1 }) 

⎩1 otherwise 

If the total charge q exceeds q min , the pulse is accepted (i.e., written to the series) and 

fu i+1 is substracted from bin i + 1 to prevent the charge from being accounted twice. 

The pulse’s leading edge time is defined according to the charge distribution: 

( 

t = t i + T bin − 

u i−1 

+ f 

u i−1 + u i 

) 

u i+1 

u i + u i+1 

The pulse width T is obtained by a charge-dependent parametrization of the SPE pulse 

parametrization’s width at a height of 0.05 PE per bin: 

T = T bin max ({1, c T1 ln (c T2 q − c T3 )}) (6.4) 

The constants c T1 , c T2 , and c T3 are hard-coded for ATWD OT, ATWD NT, and FADC 

respectively. The parametrization error is usually well below 1%, see figure 6.5. At the 

beginning and at the end of the waveform, the algorithm switches to linear calculations 

and marks pulses as cut off. 

In summary, “BayesUnfold” offers three configuration parameters: the maximum number 

of iterations, the stopping parameter ∆u min , and the charge threshold q min . Other 

parameters, such as the SPE pulse samples, the width parametrization, and the time 

offsets are hard-coded, but can easily be changed. 

53


0,01 

relative error DT / T 

0 

−0,01 

−0,02 

2 4 6 8 10 12 14 16 18 20 

q / PE 

NFE ATWD OT 

NFE ATWD NT 

NFE FADC 

Figure 6.5: Residual plot comparing the width parametrizations used in “BayesUnfold” to the 

true SPE pulse parametrizations’ widths in dependence of the pulse’s charge. 

6.3.1 Differences of FE’s and PE’s Implementations 

FeatureExtractor and PulseExtractor both use the same SPE pulse parametrization (shown 

in figure 4.6) which differs significantly from NFE’s; they do not discriminate between old 

toroid and new toroid DOMs. The unfolding and pulse definition algorithms of FE and 

PE are identical except for their handling of pulses below the threshold and the FeatureExtractor’s 

time refinement: FeatureExtractor uses the time obtained from one of its 

single-pulse extraction algorithms to replace the time of the nearest Bayesian Unfolding 

pulse. 

Both extractors conduct a fixed number of unfolding iterations (20) and define pulses 

by two bins compared to the three bins per pulse used by NFE. Because of this, they 

tend to split up features into many pulses and therefore require a comparably high charge 

threshold to reduce the splitting. The only difference with non-negligible impact between 

the FE and PE algorithms is how they handle charge in unfolded bins that fall below this 

threshold. While PulseExtractor omits them and thereby loses some of the pulses’ charge, 

FeatureExtractor compares the sum of all pulses’ charges to the charge computed by the 

selected single-pulse extraction algorithm (preferably the second one, as it computes the 

total charge which is needed here) and multiplies every pulse’s charge with this ratio to 

ensure that the waveform’s charge is conserved. The difference between these two methods 

is depicted as striped areas in figure 6.4. 

The advantage of FeatureExtractor’s method is that no pulse charge is lost, the disadvantage 

is an increased sensitivity to wrong baselines, e. g. those caused by faulty droop 

correction. A disadvantage common to FE and PE is that the high threshold causes 

a shift in the pulse’s time if from a pulse originally consisting of two bins only one is 

incorporated, because the time depends on the charge’s center of gravity. 

54

6.4 Extraction Algorithm “SLCHE” 

parabola through all three points 

time: point of maximum + offset 

charge: const × area below parabola 

width: parametrization T(q) 

time: pre-maximum bin + const × ratio of 

maximum bin value to predecessor bin value 

charge: const × maximum bin value 

width: three bin lengths 

Figure 6.6: Sketch comparing NFE’s extraction algorithm “SLCHE” (left) to SLCHitExtractor 

(right). Shown are three different digitizations of the same SPE-like feature (dashed 

brown lines), the third being an irregular chargestamp. 

6.4 Extraction Algorithm “SLCHE” 

“SLCHE” – named after the SLCHitExtractor on which it was based originally – is an 

exclusive SLC extraction algorithm. For regular charge stamps, i. e., those in which 

the middle bin w 1 is the highest (section 4.4.3), the algorithm computes the position of 

the maximum of the parabola defined by the three points (0, w 0 ) . . . (2, w 2 ), as well as 

the area A enclosed by it and the baseline. To calculate the pulse time, the time t pmax 

corresponding to the parabola’s maximum is shifted by a constant negative time offset 

t offset : 

t = t pmax + t offset , with t pmax = t 0 + T bin 

−3w 0 + 4w 1 − w 2 

−2w 0 + 4w 1 − 2w 2 

. 

The charge is computed by q = c q A, where c q is a configurable constant. It is used to 

estimate the original feature’s width, employing the parametrization in equation (6.4) 

and thereby exploiting that full FADC waveforms and SLC chargestamps have the same 

shape. Figure 6.6 visualizes this algorithm at the two leftmost chargestamps. 

Irregular waveforms are not necessarily concave (see section 4.4.3), therefore the calculations 

of t pmax and A fail for them. Also, two or more photons hitting a DOM with 

a small delay could cause waveforms that are approximately flat, leading to erroneously 

high charge estimates. 

Because of this, a fallback method is implemented: 

t = t i + t offset , and q = c q 

〈 A 

w 1 

〉 

w i , 

55


with i = 0 if w 0 > w 1 , and otherwise i = 2 if w 2 > w 1 or w 0 + w 2 ≥ 1.8 w 1 . 〈〉 

A 

w 1 

denotes 

the mean A to w 1 ratio for regular charge stamps, determined empirically. 

This fallback charge calculation is equivalent to the regular charge calculation 〈 performed 〉 in 

SLCHitExtractor and could be expressed with a single constant ˜c q = c A 

q w 1 

. However, 

the more complicated looking formulation with two multiplicative constants decouples 

these constants and thereby simplifies configuration. Apart from c q , 〈〉 

A 

w 1 

, and toffset , 

there are no other configuration constants. 

56

CHAPTER VII 

Performance Optimization

7 PERFORMANCE OPTIMIZATION 

Prior to the design of the algorithm, many waveforms were studied by eye (e.g., figure 

7.1, visualized using a custom IceTray Python module and matplotlib) to get a feeling 

for pulse shapes, baseline fluctuations, and potential problems arising with the feature 

extraction. Most of the hard-coded and default parameters were based on this experience 

and proved sustainable and robust under further investigation. 

Important performance observables for the feature extraction are the time difference 

∆t between the photon hitting the DOM and the extracted pulse, the charge per pulse, 

the number of pulses, the total charge per waveform, and the number of waveforms in 

which no pulse is found. Measurements of these observables have been conducted using 

custom Monte-Carlo datasets simulated for the calibration of NFE. These datasets (2595 

and 3071) include the Monte-Carlo hit information which is usually discarded after the 

waveform simulation. This allows to directly evaluate the time resolution.[58][59] If not 

stated otherwise, plots are based dataset 3071. Unfortunately, all simulated datasets 

include erroneous data caused by bugs partly found as a consequence of this thesis; this 

includes high baselines for ATWD and wrong impedances in the GCD files, see appendix 

C.3. The impact of these bugs is discussed in the appropriate sections. Independently, 

the calibrations and tests presented here will be repeated when fixed datasets become 

available. 

After all parameters were fixed, the measurements were repeated with experimental data 

to verify the parameters’ correctness. 

7.1 Calibration Using Monte-Carlo Data 

The time offsets (section 5.4.3) of all algorithms have been adjusted to comply with pulses 

extracted by “BayesUnfold” from ATWD new toroid waveforms, as the author assumes 

them to be the most reliable pulses on average. A constant common time offset of all 

algorithms to Monte-Carlo hits is irrelevant as long as it is of the order ten nanoseconds; 

it does not affect the track reconstructions as this only depends on time differences between 

pulses, and it is far too small to affect later high-level analyses. The policy chosen is to 

extract the pulse start times instead of the Monte-Carlo hit times, which are about 11 ns 

too late in comparison and roughly tag a SPE pulse’s maximum, see figure 7.1, figure 7.2, 

and figure 7.3. 

7.1.1 Pre-evaluation Algorithm “Eva” 

First, the parameters of the pre-evaluation algorithm “Eva” and thereby the categorization 

of waveforms were adjusted, as the performance of other algorithms depends on them. 

They were determined using individual waveforms such as those in figure 7.1; they have 

been verified using the SPE parametrizations employed in “BayesUnfold” (equation (7.1), 

figure 4.6). 

58


ATWD (blue) and FADC (red) amplitudes / PE 


0.99 

NFE_ATWDPulses 

OMKey(5,41) 

0.4 

0.3 

0.2 

0.1 

0.0 

14900 15000 15100 15200 15300 15400 15500 15600 15700 

NFE_FADCPulses 

time / ns 

0.4 

0.8 

0.88 

0.57 

0.3 

0.2 

0.1 

0.0 

14900 15000 15100 15200 15300 15400 15500 15600 15700 


OMKey(12,33) 

1.51 

0.58 

1.64 

0.73 0.70 

0.60 

0.6 

0.4 

0.2 

0.0 

13400 13500 13600 13700 13800 13900 14000 14100 14200 


time / ns 

0.8 

0.6 

0.4 

0.2 

1.60 

1.93 0.69 0.61 1.92 

1.58 

0.74 

0.0 

13400 13500 13600 13700 13800 13900 14000 14100 14200 

Figure 7.1: Two examples for calibrated ATWD (blue, narrow) and FADC (red, wide) waveforms; 

only a part of the FADC waveform is shown. Pulses are indicated by dashed 

full-length vertical lines with the charge q given at top of the lines in units of PE; 

horizontal bars indicate the pulse widths. Lines in the upper image each correspond 

to ATWD pulses, lines in lower image to FADC. 

Large solid or dotted ticks at the bottom axis indicate Monte-Carlo hit information 

if available. 

59




0.8 

0.7 

1.63 


0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

OMKey(69,33) 

17500 17600 17700 17800 17900 18000 18100 18200 18300 


time / ns 

0.8 

1.60 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.8 

0.6 

0.4 

0.2 

0.0 

0.8 

0.6 

0.4 

0.2 

17500 17600 17700 17800 17900 18000 18100 18200 18300 

1.51 

0.58 

1.64 

0.73 0.70 


0.60 

13400 13500 13600 13700 13800 13900 14000 14100 14200 


time / ns 

1.60 

0.74 

1.93 0.69 0.61 1.92 

1.58 

OMKey(12,33) 

0.0 

13400 13500 13600 13700 13800 13900 14000 14100 14200 







if available. 

60



0.15 

0.10 

0.05 

0.00 

0.05 

0.15 

0.10 

0.05 

0.00 

0.05 

0.28 


OMKey(11,54) 

67400 67500 67600 67700 67800 67900 68000 68100 68200 


time / ns 

0.17 

67400 67500 67600 67700 67800 67900 68000 68100 68200 


OMKey(87,37) 


1.0 0.95 1.44 

0.8 

0.6 

0.4 

0.2 

0.0 

67600 67800 68000 68200 


time / ns 

1.0 0.78 1.18 2.50 1.03 

0.8 

0.6 

0.4 

0.2 

0.0 

67600 67800 68000 68200 







if available. 

61


Table 7.1: Default values used for pre-evaluation algorithm “Eva”. 

parameter name in code value 

ATWD 

FADC 

w max SimpleThreshold 0.40 PE 0.60 PE 

w feat FeatureThreshold 0.08 PE 0.10 PE 

l max FeatureMaxLength 6 bins 5 bins 

d min FeatureMinDistance 4 bins 3 bins 

Table 7.2: Default values used for extraction algorithm “Simple”. 


ATWD 

FADC 

w feat FeatureThreshold 0.04 PE 0.08 PE 

w detect DetectionThreshold 0.04 PE 0.08 PE 

q min minCharge 0.15 PE 0.15 PE 

c P0 QTCorrelationP0 1.6 ns 17.722 ns 

c P1 QTCorrelationP1 1.7 1.2345 

t offset NT DeltaTNewToroid −0.57 ns −10.03 ns 

t offset OT DeltaTOldToroid −1.83 ns −10.03 ns 

The values were adjusted in such a way that narrow and well-seperated SPE-pulse-like 

features up to 1.5 PE in the spikiest possible binning are still considered to be simple 

(table 7.1). This is tailored to the extraction algorithm “Simple” as it is not capable of 

splitting features, for example to account a feature’s tail to the associated peak if it is 

superimposed by another feature. While SPE features with charges as high as 1.5 PE are 

quite common (e.g., figure 7.1, first pulse in DOM 12-33, or figure 7.2, DOM 69-33), very 

similar features can be caused by two or more coincident photons (thid pulse in DOM 

12-33); however as long as those are sufficiently close in time, this does not contradict a 

pulse’s definition. 

7.1.2 Extraction Algorithm “Simple” 

The extraction algorithm “Simple”’s feature thresholds w feat were set sufficiently high not 

to be surpassed by baseline fluctuations. For lower feature thresholds, the charge threshold 

q min would still reliably prevent individual random fluctuations to be extracted as 

pulses. However, fluctuations at the beginning of a real feature could impede the leading 

edge detection. 

62


10 7 

10 6 

width: 0.81 0.81 

mean: 11.19 11.20 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 

t MC −t pulse /ns 

10 7 

10 6 

width: 0.81 0.81 

mean: 11.19 11.20 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 


Figure 7.4: Distribution of the time residuals of NFE extracted pulses for different charge 

thresholds; shown is the time difference between the first Monte-Carlo hit and the 

first extracted pulse per waveform. 

Blue areas are NFE with default settings (q min = 0.15 PE), green lines to the 

left have q min = 0.2 PE for algorithm “Simple”, and green lines to the right have 

q min = 0.25 PE. Red lines indicate Gaussian fits to all bins (excluding underflow 

and overflow), with its width and its mean specified in the plot. 

63


Tuning of the threshold q min requires a decision about whether some more false pulses outweigh 

a better detection capability for very small pulses. Generally, early falsely detected 

pulses (false pulses) can have a relatively high impact on the later track reconstruction, 

however the number of false first pulses extracted using the current low q min = 0.15 PE 

is deemed to be acceptable: For the about 4.1 million waveforms evaluated in figure 7.4, 

only about 200 first pulses precede the expected time by more than 10 ns. For this optimization 

it has to be considered that increasing the threshold only marginally reduces 

the number of false pulses, but has a relatively high impact on true pulses: 

For the sample in figure 7.4, q min = 0.2 PE eliminates about 50 early false pulses of about 

23 million pulses in total, but almost triples the number of instances where the first hit 

was missed. This is seen by the underflow bin, which rise from about 11000 entries to 

about 31000 entries; q min = 0.25 PE eliminates additional 10 early false pulses at the cost 

of more than 40000 additional missed first hits. In general, an as low as possible setting 

is preferrable. 

In cases where only one feature is present inside the waveform, the option EnforcePulse 

in conjunction with “BayesUnfold” could be used to extract the pulse nonetheless if it 

is missed; however, the rising of the plateau at the top in figure 7.4 with increasing q min 

shows that often other features are extracted instead, preventing the extraction of the 

valuable first feature (possibly direct primary lepton Čerenkov light). 

This calibration will be repeated with future new Monte-Carlo data, because a significant 

fraction of the fake pulses seems to be caused by artifacts from an incomplete simulation, 

see figure C.2 in appendix C. 

By default the detection threshold is disabled (w detect = w feat ). A different setting 

would hamper the detection of very flat features like those seen in figure 7.3 for DOM 

11-54, and q min is sufficiently suppressing noise hits as can be seen at the low fake pulse 

rate discussed above. 

The parameters c P0 and c P1 for the charge-time correlation compensation term t q in 

equation (6.1) were obtained by a fit to the two-dimensional charge-time histograms in figure 

7.5. While the agreement is not perfect, especially for very low charges, the quadratic 

model is robust, and the fit lies within the natural scattering of the distribution. For the 

analyzed dataset, the time resolution measured by a Gaussian fit to the distribution seen 

in figure 7.4 (excluding the underflow bin) improves from 1.04 ns to 0.81 ns for ATWD 

and from 4.68 ns to 3.34 ns for FADC. 

The double structure in figure 7.5 also shows that the ATWD time distribution is heavily 

influenced by the erroneous GCD information (geometry, calibratoin, detector status; 

section 4.6.1), which prevents correct categorization of old toroid and new toroid DOMs. 

The time resolution will probably improve further as soon as this general calibration problem 

is fixed. A rough estimate of this bug’s impact can be obtained by extracting pulses 

exclusively from old toroid DOMs because much fewer NT DOMs are reported to have 

an old toroid than the other way around. The Gaussian width is 0.68 ns in this case. 

The same test applied to FADC pulses yields no improvements in time resolution because 

64


20 

20 

15 

9000 

15 

9000 

7500 

7500 

10 

10 


5 

6000 

4500 


5 

6000 

4500 

0 

3000 

0 

3000 

5 

1500 

5 

1500 

10 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

0 

10 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

0 

25 

2000 

25 

2000 


20 

15 

10 

5 

0 

1750 

1500 

1250 

1000 


20 

15 

10 

5 

0 

1750 

1500 

1250 

1000 

5 

750 

5 

750 

10 

500 

10 

500 

15 

250 

15 

250 

20 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

0 

20 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

0 

Figure 7.5: Charge-time correlation of pulses from NFE extraction algorithm “Simple” only; 

left: uncorrected, right: default c P0 and c P1 ; top: ATWD, bottom: FADC. 

the toroid’s effect is smeared out by the pulse shaper. Nevertheless, further tests will be 

undertaken to check if it can be worthwile to differentiate between FADC OT and NT as 

soon as improved Monte-Carlo datasets become available. 

7.1.3 Extraction Algorithm “BayesUnfold” 

As pointed out in section 6.3, “BayesUnfold” requires generic templates of SPE pulses in 

the same binning as of the waveforms. Basis for these samples was Christopher Wendt’s 

parametrization, calculated from flasher run data (figure 4.6):[47] 

w SPE (t) = a 1 

( 

e 

−a 2 t+a 3 

+ e a 4t−a 5 

) −8 

, (7.1) 

with the parameter values specified in table 7.3. With the given parametrization, the time 

offsets of both the binning and the parametrization itself are still arbitrary. The binning 

was chosen not to be shifted against the arbitrary offset of the parametrization. For 

ATWD NT/OT the samples start at bin 1 and end with bin 12 and 11, respectively; the 

FADC sample reaches from bin 2 to the end of bin 10. These selections include 99.47%, 

65


Table 7.3: Parameter values of the SPE pulse parametrization employed, and 

of the pulse width parametrization, equations (7.1) and (6.4). 

parameter 

value 

ATWD NT ATWD OT FADC 

a 1 4.422419 mV 4.240862 mV 8.793769 mV 

a 2 0.6537139 ns −1 0.7588095 ns −1 0.8369602 ns −1 

a 3 0.1049991 0.0743131 0.3809843 

a 4 0.08669082 ns −1 0.09235075 ns −1 0.13469828 ns −1 

a 5 0.01385802 0.00904426 0.06131466 

T bin · c T1 6.46351 ns 5.96069 ns 29.9302 ns 

c T2 30.856 PE −1 31.983 PE −1 53.063 PE −1 

c T3 4.8980 4.5038 6.3877 

Table 7.4: Default configuration values and hard-coded parameters 

used for the extraction algorithm “BayesUnfold”. 


n max maxIterations 

minRelativeChange 

40 

0.012 PE 

speThreshold 

0.25 PE 

∆u min 

q min 

ATWD 

FADC 

L SPE NT NewToroid::speLength 11 8 

L SPE OT OldTOroid::speLength 10 8 

t offset NT NewToroid::timeOffset 0 ns 6.66 ns 

t offset OT OldToroid::timeOffset −1.82 ns 6.66 ns 

66


Table 7.5: Default values used for extraction algorithm “SLCHE”. 


c q chargeCalibConst 1.1584 

〈 A 

w 1 

〉 

· Tbin meanParabolaArea 53.09 ns 

t offset slcDeltaT −50.14 ns 

99.43%, and 99.60% of the SPE pulses’ charges, yet they are sufficiently short to permit 

unfolding with high CPU efficiency. 

The parameters for the termination condition in equation (6.3) for the number of 

iterations were obtained from idealized SPE pulses using the unit test’s extra output 

(option -as), and are listed in table 7.4. Comparisons between results with fixed and 

variable numbers of iterations show that the time resolution improves slightly (2%) for 

the latter, see figure 7.6; it also tends to extract more charge per pulse (figure 7.7) and in 

total . In general the extraction of more charge is not necessarily advantageous because 

it might originate from a wrong baseline. However, in this case it can be assumed that 

the extra charge belongs to the pulses and got lost because the unfolding was stopped 

too early. Thus the adaptive stopping condition is considered to be worthwile, see also 

section 8.5. 

Nevertheless, the extraction results could probably be further improved by fine-tuning: As 

an example, the number of pulses per waveform shown in figure 7.8 decreases on average 

when using a fixed n iter = 20, and it decreases further for n iter = 30. This indicates that a 

significant fraction of the extra pulses which are extracted with the default settings might 

be caused by excessive splitting. 

After the termination conditions were optimized during development, the improvement 

gained by using the optimized starting condition instead of a flat one mostly vanished. 

The gain is only 0.8 iterations instead of 3 before optimization, see figure 7.9. Still, the 

optimized starting condition is kept as there are no drawbacks. 

7.1.4 Extraction Algorithm “SLCHE” 

For the calibration of the paramters of “SLCHE”, the algorithm has been applied to 

Monte-Carlo SLC charge stamps for which full waveforms were available. The results 

were compared to pulses extracted with one of the other algorithms. Despite of ATWD 

waveforms providing a better time resolution, FADC pulses were chosen for the comparison 

in order to minimize systematical errors due to differences between ATWD and FADC, 

as SLC charge stamps are generated from FADC waveforms. Also, only waveforms from 

DOMs which did not fulfill the hard local coincidence condition were used to get a more 

67


10 7 

10 6 

width: 0.81 0.83 

mean: 11.11 11.07 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 30 20 10 0 10 20 30 40 


10 7 

10 6 

width: 0.81 0.80 

mean: 11.11 11.12 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 30 20 10 0 10 20 30 40 


Figure 7.6: Plots illustrating the effects of “BayesUnfold”’s variable number of iterations. Both 

simple and complex ATWD waveforms have been extracted together by BU. 

Shown are the distributions of the time residuals for the default BU (variable n iter , 

〈n iter 〉 = 19.8; blue areas), and for BU with a fixed number of iterations (green 

lines; n iter = 20 on the top, n iter = 20 on the bottom). Red lines indicate Gaussian 

fits to all bins (excluding underflow and overflow). 

68


500000 

width: 0.34 0.34 

mean: 0.91 0.90 

400000 

300000 

entries 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

500000 

width: 0.34 0.34 

mean: 0.91 0.91 

400000 

300000 

entries 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 



Shown are the distributions of the charges of first pulses extracted by the default 

BU (variable n iter , 〈n iter 〉 = 19.8; blue areas), and for BU with a fixed number of 

iterations (green lines; n iter = 20 on the top, n iter = 20 on the bottom). Red lines 

indicate Gaussian fits to all bins (excluding underflow and overflow). 

69


10 7 

10 6 

width: 0.37 

mean: 0.04 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

10 5 0 5 10 15 20 

∆n pulses 

10 7 

10 6 

width: 0.36 

mean: 0.09 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

10 5 0 5 10 15 20 

∆n pulses 



Shown are the distributions of the differences between the numbers of pulses extracted 

by the default BU (variable n iter , 〈n iter 〉 = 19.8) and BU with a fixed 

number of iterations (n iter = 20 on the top, n iter = 20 on the bottom). Positive 

values correspond to more pulses for default BU. 

70

7.2 Verification Using Experimental Data 

40000 

0cm 

60000 

35000 

30000 

25000 

50000 

40000 

entries 

20000 

entries 

30000 

15000 

20000 

10000 

5000 

10000 

0 

10 15 20 25 30 35 40 

number of iterations 

0 

10 15 20 25 30 35 40 

number of iterations 

Figure 7.9: Effects of “BayesUnfold”’s optimized deconvolution starting distribution: 

Shown is the distribution of the number of iterations n iter for complex waveforms 

from ATWD (left) and FADC (right) for the default starting distribution (solid 

lines) and a uniform one (dashed lines). 

specific sample for the calibration. 

The resulting parameter values are listed in table 7.5. An illustration of the good agreement 

between FADC and SLC in both time and charge can be seen in figure 7.10. 


After all parameters have been adjusted to work well with simulated data, most analyses 

were repeated using experimental data to verify both the correctness of these parameters 

and the correctness of the simulation itself. The dataset used is run 113587 (IC59, 2009- 

04-21), filtered to level 1 which means that it contains the events that passed the online 

filtering. 

The timing tests can not be repeated without modification because the true hit information 

is not available for experimental data. Instead, the relative time distribution 

for different sources (e.g., ATWD and FADC) can be compared to the one obtained from 

Monte-Carlo simulations. 

The results of this test can be seen in figure 7.11. The small time misalignement of 

about 4 ns between the pulses from ATWD and FADC can be attributed to a time offset in 

the experimental dataset itself, caused by an old DOMcal version. This offset is compensated 

for in DOMcalibrator by manually shifting FADC waveforms by the recommended 

value of −15 ns. Still, the Monte-Carlo dataset already uses the new DOMcal version for 

which the problem was fixed; therefore, it is deemed to be more trustworthy. Final time 

offset calibration values for NFE will be obtained using future IC79 Monte-Carlo data, 

which can be verified by future IC79 experimental data. 

71


10 7 

10 6 

width: 3.04 2.25 

mean: 10.70 10.69 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

80 60 40 20 0 20 40 60 


300000 

width: 0.34 0.33 

mean: 0.78 0.78 

250000 

200000 

entries 

150000 

100000 

50000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 7.10: Distributions of the time residuals and charges of “SLCHE” (green lines) and 

FADC (blue areas) pulses from the same waveforms, with the latter extracted by 

NFE with EnforcePulse set. 

72


250000 

width: 3.12 

mean: 0.39 

200000 

150000 

entries 

100000 

50000 

0 

6 4 2 0 2 4 6 

∆t pulse /ns 

250000 

width: 3.68 

mean: -3.46 

200000 

150000 

entries 

100000 

50000 

0 

10 8 6 4 2 0 2 


Figure 7.11: Distributions of the time differences of ATWD and FADC NFE pulses for Monte- 

Carlo (top) and experimental data (bottom). Positive values indicate earlier 

times in the FADC. 

73


1200 

1000 

Old Toroid DOMs 

New Toroid DOMs 

total 

350000 

300000 



total 

800 

250000 

entries 

600 

400 

entries 

200000 

150000 

100000 

200 

50000 

0 

1 0 1 2 3 4 

FADC charge / ATWD charge 

0 

1 0 1 2 3 4 


Figure 7.12: Ratio of the total (integrated) charges of the ATWD waveform and the corresponding 

fraction of the FADC waveform. 

Left side: simulated dataset 3071; right side: experimental dataset L1 113587. 

The distributions of the charge of the first pulse for ATWD and FADC disagree significantly 

between Monte-Carlo data and the experimental data (figure 7.13): Most importantly 

the mean values for Monte-Carlo differ by more than 0.1 PE while those for 

experimental data seem to be well aligned. This is likely a problem of the simulation: 

The ratio of the total (integrated) charge of the ATWD waveform and the corresponding 

fraction of the FADC waveform differs largely between Monte-Carlo and experimental 

data, see figure 7.12. A part of this disagreement can be attributed to a bug in the 

daq_baseline simulation, see appendix C.3 and figure C.2. 

Considered individually, the first ATWD and FADC pulses’ charges match well for experimental 

data. The tails in the FADC charge distributions towards values higher than 

2 PE can be explained by the FADC’s inferior time resolution; the separation of features 

that can be distinguished as SPE-like in ATWD is often rendered impossible in FADC, 

so multiple-PE features are more likely. 

The distributions of the differences between the cumulative charge of all ATWD and 

all FADC pulses per waveform in figure 7.14 support the hypothesis of a wrong simulation; 

while the differences peak at 0 PE for experimental data, Monte-Carlo contains on 

average about 0.14 PE more charge in ATWD than in FADC. 

The distributions show tails towards high FADC charges similar to the tails already observed 

in the distributions of the charges of first pulses (see above); however, in the 

cumulative distributions they do not originate from inseparable pulses because it is irrelevant 

if the charge is distributed between multiple pulses. Instead, they are caused by late 

pulses that are present in the FADC waveform, but not in the ATWD waveform. Evidence 

for this is the shallow secondary peak whose distance to the main peak is approximately 

one mean FADC SPE charge for either Monte-Carlo or experimental data respectively. 

The bottom plots in figure 7.15 show the distributions of the number of pulses per 

waveform; they can not be compared directly because the datasets differ too much: The 

74


500000 

width: 0.36 0.36 

mean: 0.93 0.81 

400000 

300000 

entries 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

600000 

width: 0.39 0.43 

mean: 0.90 0.94 

500000 

400000 

entries 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 7.13: Distributions of the charges of the first pulses of ATWD (blue areas) and FADC 

(green lines) NFE pulses for Monte-Carlo (top) and experimental data (bottom); 

red lines indicate Gaussian fits. 

75


90000 

80000 

width: 0.11 

mean: 0.14 

70000 

60000 

entries 

50000 

40000 

30000 

20000 

10000 

0 

1.5 1.0 0.5 0.0 0.5 

∆q pulses 

160000 

140000 

width: 0.11 

mean: -0.02 

120000 

100000 

entries 

80000 

60000 

40000 

20000 

0 

1.5 1.0 0.5 0.0 0.5 

∆q pulses 

Figure 7.14: Distributions of the differences between the total charges of all ATWD and FADC 

NFE pulses per waveform for Monte-Carlo (top) and experimental data (bottom). 

Positive values indicate higher charges in ATWD; red lines indicate Gaussian fits. 

76


10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 50 100 150 200 

n pulses 

10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 20 40 60 80 100 120 140 160 

n pulses 

Figure 7.15: Distributions of the numbers of ATWD (blue areas) and FADC (green lines) NFE 

pulses for Monte-Carlo (top) and experimental data (bottom). 

77


particle energy in the simulated dataset follows an E −1 neutrino energy spectrum (although 

the dataset is untriggered, favoring low-energy events with few pulses), while the 

experimental dataset follows the atmospheric muon energy spectrum of E −3.7 . 

Still the plots are plausible: The triple peak structure in the Monte-Carlo ATWD and 

the small break around bin 128 in FADC originate from fully illuminated waveforms, i.e., 

waveforms with “BayesUnfold”’s maximum number of pulses; 64 for ATWD and 128 for 

FADC 6 . Higher entries and the third peak respectively originate from DOMs which were 

launched twice because of ongoing events (see section 4.4.2). 

Despite the problems arising due to the insufficient simulation, the comparisons of 

ATWD data with FADC data are promising; the important features are qualitatively 

understood, and the agreements are expected to improve with the next release of the 

simulation software. 

6 Technically, “BayesUnfold” can extract 128 pulses from ATWD and 256 from FADC. However, an 

appropriate waveform is extremly unlikely. 

78

CHAPTER VIII 

Performance Tests

8 PERFORMANCE TESTS 

8.1 Extraction of Simple Pulses with “Simple” and BU 

The use of the algorithm “Simple” instead of “BayesUnfold” (BU) for pulses that have been 

classified as simple by the pre-evaluation algorithm (see section 6.1) not only improves 

NFE’s CPU efficiency (see section 8.5), but also improves the extraction results because 

“Simple” only needs to extract those waveforms for which it was designed. “Simple”’s 

lower charge threshold allows the algorithm to find more small true pulses while extracting 

slightly less false pulses than BU in the same simple sample. 

The leftmost bin of figure 8.2 shows that “Simple” does not have ATWD waveforms 

without pulses compared to about 100000 for BU; the latter corresponds to 1.1% of all 

tested waveforms. In FADC, there are 221 compared to about 170000, which corresponds 

to 1.3% of all tested waveforms. Furthermore, the lower plateau to the left side in the 

distribution of the time residuals of the first pulses in figure 8.1 indicates that the number 

of first true features missed in favor of a later pulse is lower by about a factor of 5 in 

ATWD; it is at the same level for “Simple” and BU in FADC. The time resolution of 

“Simple” measured by a Gaussian fit to the peak (figure 8.1) is worse than that of BU by 

a factor of 4% resp. 8%; this trade-off is deemed to be acceptable because the decline is 

small compared to a waveform bin length. 

The average number of pulses extracted by “Simple” is slightly larger compared to BU 

(figure 8.2). The reason is that while “Simple” is not able to split features into multiple 

pulses, its charge threshold is lower (0.15 PE vs. 0.20 PE). The inability to split can be 

considered advantageous in this case because simple pulses are SPE-like enough to not 

require splitting, and excessive splitting is not desired. 

The large number of waveforms with more than ten pulses from “Simple” is caused by the 

missing simulation of the daq_baseline (see appendix C.3). An example for a waveform 

with extraordinarily high baseline caused by this bug can be seen in figure 8.3. Figure 8.4 

shows the same distribution as figure 8.2, but with “Simple”’s detection threshold w detect 

set to “Eva”’s feature threshold w feat = 0.08 PE. Due to this change the excessive pulses 

are cut away at the cost of losing many true pulses from waveforms with a more normal 

baseline: The number of waveforms without pulses increases from 0 to almost 200000 

(leftmost bin). 

The charge of the first pulse extracted by “Simple” is about 2% higher for ATWD and 

about 2% lower for FADC on average than the charge of the first BU pulse, as can be 

seen in figure 8.5; this is considered to be good agreement. 

Finally, the difference of the total pulse charges extracted per waveform by both algorithms 

peaks near to zero and features a small width of 0.1 PE if the pulses caused by 

the incomplete simulation are cut away, compare figure 8.6 to figure 8.7. 

These results will change slightly for new Monte-Carlo datasets which incorporate 

the simulation of the daq_baseline; however significant deviations from the predicted 

80

8.1 Extraction of Simple Pulses with “Simple” and “BayesUnfold” 

10 7 

10 6 

width: 0.73 0.76 

mean: 10.99 11.09 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 


10 7 

10 6 

width: 2.85 3.08 

mean: 11.16 10.82 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

80 60 40 20 0 20 40 60 


Figure 8.1: Distribution of the time residuals of the first pulses extracted from simple Monte- 

Carlo waveforms by “BayesUnfold” (blue areas) and “Simple” (green lines): Shown 

is the difference between the Monte-Carlo hit time and the time of the first pulse; 

red lines indicate Gaussian fits; top: ATWD, bottom: FADC. 

81


10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 5 10 15 20 25 

n pulses 

10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 5 10 15 20 25 30 35 40 

n pulses 

Figure 8.2: Distribution of the numbers of pulses extracted from simple Monte-Carlo waveforms 

by “BayesUnfold” (blue areas) and “Simple” (green lines); top: ATWD, 

bottom: FADC. 

82



0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

1.30 

NFE_ATWDPulses_bu 

10400 10600 10800 11000 11200 

NFE_ATWDPulses_simple 

time / ns 

1.33 

0.18 

0.65 

0.16 

0.35 

0.27 0.16 

0.39 

0.41 0.48 0.18 0.21 

0.23 

0.37 0.18 

10400 10600 10800 11000 11200 

0.35 

OMKey(9,55) 

Figure 8.3: Example waveform for an erroneously high baseline caused by incomplete simulation. 

Dotted lines indicate ATWD pulses extracted by “BayesUnfold” (top) and 

“Simple” (bottom), with charge given in units of PE. 

83


10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 2 4 6 8 10 12 14 

n pulses 

Figure 8.4: Effect of erroneous baselines on the distributions of the number of pulses extracted 

from simple Monte-Carlo waveforms by “BayesUnfold” (blue areas) and “Simple” 

(green lines). The excessive pulses caused by high baselines were cut away by a 

higher feature threshold in “Simple”; compare to figure 8.2. 

84


1000000 

width: 0.32 0.35 

mean: 0.90 0.92 

800000 

600000 

entries 

400000 

200000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

900000 

800000 

width: 0.32 0.34 

mean: 0.81 0.79 

700000 

600000 

entries 

500000 

400000 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 8.5: Distribution of the charges of the first pulses extracted from simple Monte-Carlo 

waveforms by “BayesUnfold” (blue areas) and “Simple” (green lines); red lines 

indicate Gaussian fits; top: ATWD, bottom: FADC. 

85


10 7 

10 6 

width: 0.07 

mean: -0.08 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

10 5 0 5 10 

∆q pulses 

10 7 

10 6 

width: 0.09 

mean: -0.03 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

10 5 0 5 10 

∆q pulses 

Figure 8.6: Distribution of the differences between the total charges of all pulses extracted 

from simple Monte-Carlo waveforms by “BayesUnfold” and “Simple”: Positive 

values correspond to higher values in BU; red lines indicate Gaussian fits; top: 

ATWD, bottom: FADC. 

86


10 7 

10 6 

width: 0.10 

mean: -0.02 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

10 5 0 5 10 

∆q pulses 

Figure 8.7: Effect of erroneous baselines on the distribution of the difference between the total 

charges of all pulses extracted from simple Monte-Carlo waveforms by “BayesUnfold” 

and “Simple”; positive values indicate higher total charges for BU; the red 

line indicates a Gaussian fit. The excessive pulses caused by high baselines were 

cut away by a higher feature threshold in “Simple”; compare to figure 8.6. 

87


10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 2 4 6 8 10 12 14 16 18 

n pulses 

Figure 8.8: Distribution of the numbers of pulses extracted from simple experimental ATWD 

waveforms by “BayesUnfold” (blue areas) and “Simple” (green lines). Experimental 

data is unaffected by the daq_baseline bug, compare to figure 8.2. 

88


10 7 

10 6 

width: 0.08 

mean: -0.08 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

10 5 0 5 10 

∆q pulses 

Figure 8.9: Distribution of the differences between the total charges of all pulses extracted 

from simple experimental ATWD waveforms by “BayesUnfold” and “Simple”: 

Positive values correspond to higher values in “BayesUnfold”; the red line indicates 

a Gaussian fit. Experimental data is unaffected by the daq_baseline bug, compare 

to figure 8.6. 

89


distributions are not expected because the distributions of total charge and number of 

pulses for experimental data (in which this bug does not occur) show virtually no excessive 

pulses; compare figure 8.2 to figure 8.8 and figure 8.6 to figure 8.9. 

If some of the excessive pulses will remain in the new data regardless, either “Simple”’s 

detection threshold or “Eva”’s feature threshold will be adjusted; lowering the latter helps 

because then the affected waveforms will be marked as complex, and BU is more robust 

towards an increased baseline as can be seen in figure 8.3. 

8.2 Extraction of Exotic Features 

The extraction of exotic features can be examined best by manually checking individual 

waveforms; otherwise problems might be concealed by the higher statistics of normal 

waveforms. Also, certain tests are hard to automatize, e. g., timing distributions of nonfirst 

pulses, because the assignement of pulses to Monte-Carlo hits is not trivial. 

Figure 8.10 shows an example for the performance of NFE for very small features: Normally 

NFE does not extract any pulses because only the middle bin of the ATWD feature 

exceeds “Simple”’s threshold w feat , therefore the extracted charge is about 0.12 PE < q min . 

With EnforcePulse set, the feature is then relayed to “BayesUnfold” which first tries to 

extract pulses normally. If this fails, BU defines a pulse centered at the deconvoluted 

distribution’s maximum bin and accepts it regardless of its charge. 

Most of the time, this works well: The only difference between the blue distribution in 

figure 7.4 and green distribution in figure 8.16 is that for the latter EnforcePulse is set. 

The number of waveforms for which the first extracted pulse is more than 50 ns away 

from the Monte-Carlo hit is reduced from about 11000 to about 5000 (underflow bin). 

The drawback is an increased number of false early pulses; an example for this can be 

seen in figure 8.10: The ATWD feature is extracted successfully, but the FADC pulse (for 

which EnforcePulse was set independently) might be caused by noise. 

Final quantitative tests can only be conducted when simulations with proper baselines 

become available (see appendix C.3). 

Besides many randomly chosen waveforms like the ones shown in figure 7.1, a small 

catalogue of exotic features provided by Markus Voge[60] was examined, along with other 

waveforms from the same dataset; see figure 8.11 to figure 8.15. 

For comparison, FeatureExtractor’s pulses are also shown. 

Figure 8.11, top (DOM 50-24): 

Noise artifacts such as this one appear in experimental data relatively often ( 1%); both 

feature extractors are robust concerning this effect; it might however interfere with FE’s 

charge calculation as single pulse’s charge seems to be overestimated – extraction by eye 

yields about 0.38 PE, NFE finds 0.35 PE, FE 0.60 PE. 

90


NFE_MergedPulses 

OMKey(47,48) 


0.15 

0.10 

0.05 

0.00 

0.05 

0.15 

0.10 

0.05 

0.00 

0.05 

19100 19200 19300 19400 19500 19600 19700 19800 19900 

FE_Pulses 

time / ns 

0.23 

19100 19200 19300 19400 19500 19600 19700 19800 19900 


0.15 

0.10 

0.05 

0.00 

0.05 

0.15 

0.10 

0.05 

0.00 

0.05 

0.15 

0.15 


OMKey(47,48) 

19100 19200 19300 19400 19500 19600 19700 19800 19900 

FE_Pulses 

time / ns 

0.23 

19100 19200 19300 19400 19500 19600 19700 19800 19900 

Figure 8.10: Example waveforms to demonstrate NFE’s option EnforcePulse, shown is a 

waveform with pulses (dotted lines with charges in PE) extracted by NFE (upper 

image each) and FE (lower image each): 

Top: default NFE (EnforcePulse = False); 

bottom: same settings bar EnforcePulse = True for both ATWD and FADC. 

91



0.30 

0.25 

0.20 

0.15 

0.10 

0.05 

0.00 

0.05 

0.30 

0.25 

0.20 

0.15 

0.10 

0.05 

0.00 

0.05 

0.35 


OMKey(50,24) 

9900 10000 10100 10200 10300 10400 10500 10600 10700 

FE_Pulses 

time / ns 

0.60 

9900 10000 10100 10200 10300 10400 10500 10600 10700 


OMKey(18,4) 


0.8 

0.6 

0.4 

0.2 

0.0 

0.8 

0.6 

0.4 

0.2 

0.38 

1.10 

0.76 

1.45 

15500 15600 15700 15800 15900 16000 16100 16200 16300 

FE_Pulses 

time / ns 

0.32 

0.82 

0.40 

0.69 

0.83 

0.63 

0.31 

1.51 

1.44 

0.0 

15500 15600 15700 15800 15900 16000 16100 16200 16300 

Figure 8.11: Example waveforms of exotic or difficult features from IC59 experimental run 

113912, partially from Markus Voge’s catalog of exotic waveforms. All waveforms 

are shown with default NFE merged pulses (ATWD+FADC; top) and FE pulses 

(IC59 multi-pulse online-filtering settings; bottom), indicated by dotted lines with 

the charge given in units of PE. 

92



1.4 

1.2 

1.0 

0.8 

0.6 

0.4 

0.2 

0.0 

1.4 

1.2 

1.0 

0.8 

0.6 

0.4 

0.2 

0.0 

0.60 

0.81 

0.51 

0.70 

0.52 

1.02 


0.94 

0.65 

0.59 

9900 10000 10100 10200 10300 10400 10500 10600 10700 

FE_Pulses 

time / ns 

0.54 

0.45 0.42 

0.40 

0.710.34 

0.34 

0.47 

0.22 

0.81 

0.24 

0.67 

0.41 

0.20 

9900 10000 10100 10200 10300 10400 10500 10600 10700 

1.18 

OMKey(54,1) 


0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.79 

0.92 

0.44 

1.43 

1.41 

0.45 


OMKey(54,2) 

10000 10100 10200 10300 10400 10500 10600 10700 

FE_Pulses 

time / ns 

0.36 

1.62 

0.48 

0.72 

1.02 

10000 10100 10200 10300 10400 10500 10600 10700 






93



0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

1.00 

0.32 

0.28 


1.15 

18300 18400 18500 18600 18700 18800 18900 19000 19100 

FE_Pulses 

time / ns 

0.7 0.24 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

1.03 

0.35 

0.91 

0.46 

0.23 

OMKey(47,51) 

18300 18400 18500 18600 18700 18800 18900 19000 19100 


0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.26 

1.32 


OMKey(65,13) 

11300 11400 11500 11600 11700 11800 11900 12000 

FE_Pulses 

time / ns 

1.08 

11300 11400 11500 11600 11700 11800 11900 12000 






94



0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0.0 

0.25 

0.58 

0.44 

1.35 


17500 17600 17700 17800 17900 18000 18100 18200 18300 

FE_Pulses 

time / ns 

0.75 

1.62 

0.36 

OMKey(83,17) 

17500 17600 17700 17800 17900 18000 18100 18200 18300 


OMKey(38,48) 


0.4 

0.19 

0.65 

0.45 

0.3 

0.2 

0.1 

0.0 

18000 18100 18200 18300 18400 18500 18600 18700 18800 

FE_Pulses 

time / ns 

0.4 

0.3 

0.2 

0.1 

0.0 

0.39 

0.34 

18000 18100 18200 18300 18400 18500 18600 18700 18800 






95


Figure 8.11, bottom (DOM 18-04): 

This waveform was included in Markus Voge’s initial catalog because the first feature 

was not extracted properly with the offline-reconstruction settings of FeatureExtractor 

(see section 8.3.1); in its online-filtering settings that are shown here, FE extracts this 

feature without problems, but tends to split SPE-like features excessively. NFE’s results 

are reasonable. 


In this example, both extractors miss the tiny feature at 10 100 ns. Extraction by eye 

yields about 0.1 PE. 


For this ATWD waveform, both extractors agree well. The ExclusionTime introduced 

in FeatureExtractor to prevent double extraction (see appendix C.1) circumvents the 

extraction of the FADC feature present in NFE. 


Here both extractors recognize the first feature as multi-PE hit with a small charge in 

the first peak (perhabs a prepulse). The middle feature is extracted by NFE exclusively 

despite the fact that it technically lies below FE’s charge threshold: FE splits the last 

feature into three parts of which one only contains 0.23 PE – the missed feature contains 

about 0.28 PE. 


This is an example for a very small first feature. It is extracted by NFE, but not by 

FE. Also, NFE extracts more charge from the late ATWD feature (1.32 PE instead of 

0.26 PE); extraction by eye yields well about 1.2 PE. 


Another example for a small early feature. Again it is missed in FeatureExtractor, but 

here the missing charge is distributed among the extracted pulses. 


The first ATWD feature was too small to be recognized by one of the extractors; however 

in contrast to FE, NFE’s PulseMerger is capable of appending FADC pulses at times 

where ATWD waveforms are available if they do not clash with ATWD pulses. 

Figure 8.15 shows a typical bright waveform, i. e. a waveform with high integrated 

charge. The two feature extractors give roughly comparable results besides FeatureExtractor’s 

more pronounced pulse splitting. Noteworthy is the gap between the end of the 

ATWD waveform and the first FADC pulse for FE; the amount of charge lost due to this 

is estimated by eye to about 7.6 PE of which NFE extracts 5.6 PE. 

One idea for a more quantitative analysis for bright waveforms is to compare the original 

waveform to a waveform generated based on the extracted pulses, for example with a 

96

8.3 Comparison with Other Feature Extractors 


5 

4 

3 

2 

1 

0 

5 

4 

3 

2 

1 

0.79 1.27 1.43 

1.07 1.12 0.74 

0.74 

2.05 

0.71 

1.82 

1.79 

3.12 1.42 

0.50 

1.82 

1.98 

0.90 

3.68 

0.62 

0.63 

0.45 

0.64 2.56 1.88 0.69 

4.84 

0.39 

0.48 

0.49 

1.20 1.66 

2.48 3.20 0.40 

0.95 

1.66 

2.05 

0.71 

1.12 

1.81 


4.31 

1.30 

15400 15600 15800 16000 16200 

FE_Pulses 

time / ns 

1.81 0.66 1.07 

1.15 

0.86 0.44 0.51 

0.68 1.20 

0.48 

0.56 0.75 0.77 1.13 1.71 0.81 0.95 

0.74 0.65 

1.27 

0.72 0.54 

0.79 

0.84 

0.73 

0.60 

0.43 

0.67 

0.56 

0.64 

0.77 

0.84 

0.81 

0.35 0.72 

1.63 

0.85 

0.79 

1.04 

0.70 

0.77 0.80 

1.27 

0.68 

0.57 

1.55 

0.49 

1.34 0.78 

0.48 

0.36 2.23 1.29 0.69 

2.31 0.84 0.74 

0.57 

1.07 

0.71 

0.63 0.62 

1.21 

0.60 

1.63 1.05 

0.74 

OMKey(26,10) 

2.25 

0.81 

1.11 

2.0 

0 

15400 15600 15800 16000 16200 

Figure 8.15: Example of a typical bright waveform from Markus Voge’s IC59 exotic waveform 

catalog. It is shown with default NFE merged pulses (ATWD+FADC; top) and 

FE pulses (IC59 multi-pulse online-filtering settings; bottom), indicated by dotted 

lines with the charge given in units of PE. 

Kolmogorov-Smirnov test, however this has yet to be done systematically. 

Reliable extraction of features where FeatureExtractor has difficulties bodes well; however 

this is no guaranty for bug free and unproblematic operation. Therefore tests such as 

these must be continued in future to identify problems of NFE or to discover unexpected 

changes in the low-level reconstruction chain. 


The comparison of NFE’s results to that of other feature extractors is informative concerning 

deficiencies of either of them. Knowledge of these deficiencies is important for 

further improvements and for the study of systematical errors in low-level reconstruction. 

8.3.1 FeatureExtractor in Multi-Pulse Mode 

FeatureExtractor is the extractor used for ATWD and FADC waveforms in the IC59 

online-filtering and offline-processing. The ability to test it with an independent project 

is one of the main benefits of having multiple feature extractors. 

The tests were conducted using both FE’s IC59 multi-pulse online and offline settings[51]. 

The former will be discussed in detail; afterwards, the differences of the offline-processing 

97


10 7 

10 6 

width: 1.23 0.81 

mean: 0.53 11.19 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 


Figure 8.16: Distributions of the time residuals of the first pulses extracted by FE (multi-pulse 

online-filtering settings, blue area) and NFE (with EnforcePulse set, green line) 

from Monte-Carlo data; red lines indicate Gaussian fits. 

settings will be explained. 

For better comparability, NFE’s option EnforcePulse was set because the analog is set 

for FE. Both extractors are configured to extract features in both ATWD and FADC. 

IC59 online-filtering settings 

The distribution of the time residuals in figure 8.16 shows that NFE has a better time 

resolution (Gaussian width of σ NFE = 0.81 ns instead of σ FE = 1.23 ns). NFE extracts 

19 fake early pulses instead of a single one for FeatureExtractor from about 4.1 million 

waveforms in total. This is balanced by the fact that for NFE only about 5500 first pulses 

do not match the first Monte-Carlo hit, while this happens in about 90000 waveforms for 

FeatureExtractor; this can be seen in the histogram entries in the lower plateau and the 

underflow bin to the left of figure 8.16. 

The time offset of about 11 ns between NFE’s pulses and Monte-Carlo hit times and 

the absence of such an offset in FeatureExtractor’s pulse times is explained by the fact 

that FeatureExtrator is used with its option PMTTransit set to 2 to have it subtract the 

PMT transit time. By design NFE relies on DOMcalibrator’s time calibration instead, 

98


see section 7.1. This time offset can clearly be seen in the figures in section 8.2. 

The distribution of the differences between the times of the first pulses for Monte-Carlo 

data (figure 8.17) has a Gaussian width of σ ∆ = 0.91 ns. Assuming the individual times 

to follow Gaussian distributions, this yields a correlation coefficient 

ρ = σ2 FE + σ 2 NFE − σ 4 ∆ 

2σ FE σ NFE 

= 0.74, 

which is an encouraging result for both extractors because the distributions’ widths themselves 

are significantly smaller than the ATWD waveform bin length (T bin = 3.3 ns). 

In contrast to NFE, FeatureExtractor is not sensitive to the daq_baseline bug explained 

in section 7.2 and appendix C.3 because of its own implementation of baseline 

correction. Therefore, the mean of the charges of the first pulse extracted by FeatureExtractor 

is the same for Monte-Carlo and experimental data, although it is further away 

from 1 PE compared to NFE, see figure 8.18. 

Moreover, FE’s distributions of the first charges are closer to a Gaussian distribution for 

very low pulse charges. However, in a random sample of 37 waveforms for which NFE extracted 

less than 0.2 PE for the first pulse, 11 of these pulses were caused by features that 

were missed by FE in favor of a later feature. The remaining 26 pulses were extracted 

from the only distinct feature of each waveform, triggering the extractors’ methods to 

extract at least one pulse. For almost all of these features FE’s charges were between 

50% and 100% higher than those of NFE, while the latter approximately agreed with the 

charges extracted by eye; see figure 8.19 for an extreme example. The reason for this 

effect might be that FE often raises the baseline and then scales up the pulse charge to 

match the waveform’s integraged charge as described in section 6.3.1. Thus, despite the 

strange shape of the distribution in figure 8.18, the charges of the first pulses extracted 

by NFE are plausible. 

The distributions of the differences of the cumulative charge of all pulses per waveform 

between FE and NFE in figure 8.20 show a higher correlation between the extractors (i. 

e., smaller width) for simulated data than for experimental data. The estimation of the 

correlation coefficient using the fitted Gaussian distributions fails as it yields ρ = 1.02 

and ρ = 1.01, respectively. 

The distributions peak near 0 PE, but show tails with extreme values in both directions. 

These deviations are partly due to problems with the calibration of saturated ATWD waveforms; 

the IC59 online-filtering settings of DOMcalibrator have the SaturationLevel set 

to 1022; this means that DOMcalibrator switchs to higher ATWD channels only for bins 

that reach the digitizer’s maximum value. However, because of noise waveforms often fluctuate 

to slightly lower values even if the channel is saturated. This leads to very uneven 

waveforms of which one can be seen in figure 8.21. Setting the level lower (for example to 

a value of 900) solves the problem; this was already done for the IC59 offline-processing 

and is planned for future uses as well.[61] 

Besides these calibration problems, the tails in figure 8.20 towards higher charges from 

99


900000 

800000 

width: 0.91 

mean: -10.59 

700000 

600000 

entries 

500000 

400000 

300000 

200000 

100000 

0 

16 14 12 10 8 6 4 


900000 

800000 

width: 1.15 

mean: -10.16 

700000 

600000 

entries 

500000 

400000 

300000 

200000 

100000 

0 

16 14 12 10 8 6 4 


Figure 8.17: Distributions of the times of the first pulses extracted by FE (multi-pulse onlinefiltering 

settings) and NFE (with EnforcePulse set). Positive values correspond 

to later times for FE; red lines indicate Gaussian fits. 

Top: Monte-Carlo data; bottom: experimental data. 

100


600000 

width: 0.29 0.36 

mean: 0.81 0.93 

500000 

400000 

entries 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

700000 

600000 

width: 0.31 0.39 

mean: 0.81 0.89 

500000 

entries 

400000 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 8.18: Distributions of the charges of the first pulses extracted by FE (multi-pulse onlinefiltering 

settings, blue area) and NFE (with EnforcePulse set, green line); red 

lines indicate Gaussian fits. 


101




0.30 

0.25 

0.20 

0.15 

0.10 

0.05 

0.00 

0.05 

0.30 

0.25 

0.20 

0.15 

0.10 

0.05 

0.00 

0.05 

0.4 

0.3 

0.2 

0.1 

0.0 

0.4 

0.3 

0.2 

0.1 

0.0 

0.16 


OMKey(3,16) 

11800 12000 12200 12400 12600 

FE_Pulses 

time / ns 

0.42 

11800 12000 12200 12400 12600 


OMKey(47,40) 

0.16 

0.61 

9800 10000 10200 10400 10600 

FE_Pulses 

time / ns 

0.88 

9800 10000 10200 10400 10600 

Figure 8.19: Examples for waveforms in which NFE (upper image each) extracts significantly 

less charge than FE (lower image each). Pulses are indicated by dotted lines with 


102


10 7 

10 6 

width: 0.12 

mean: -0.09 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 

∆q pulses 

10 7 

10 6 

width: 0.26 

mean: 0.03 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 

∆q pulses 

Figure 8.20: Distributions of the differences between the total charges of all pulses extracted 

by FE (multi-pulse online-filtering settings) and NFE (with EnforcePulse set). 

Positive values correspond to higher charges for FE; red lines indicate Gaussian 

fits. 


103




100 

80 

60 

40 

20 

0 

100 

80 

60 

40 

20 

0 

100 

80 

60 

40 

20 

0 

100 

80 

60 

40 

20 

0 

23.41 

331.57 

9.02 

1.78 

41.98 

17.79 

3.14 

15.52 

6.35 

20.62 

12.33 

12.98 

7.66 

4.37 

6.01 

5.26 

6.21 


8.02 

2.92 

0.76 

1.55 

1.59 

6.23 

1.22 

0.29 

12000 12100 12200 12300 12400 

FE_Pulses 

time / ns 

30.68 

25.61 

44.67 

250.20 

55.18 

12000 12100 12200 12300 12400 

0.36 

0.75 


1.87 

0.26 

0.49 

18.26 

42.32 

3.07 

29.75 

4.40 

0.54 

419.15 

8.75 

0.84 6.51 

0.51 

23.47 

5.40 

140.35 14.61 9.05 

1.71 1.37 0.94 0.53 

6.37 1.71 

60.92 

17.12 

10.35 

0.44 2.05 

8.16 

34.48 

0.39 

6.37 

1.21 

OMKey(37,26) 

12000 12100 12200 12300 12400 

FE_Pulses 

time / ns 

43.88 

26.38 

118.37 

147.56 

134.07 

109.03 

75.19 

12000 12100 12200 12300 12400 

1.55 

0.45 

0.42 

0.79 

0.88 

0.74 

0.89 

1.44 

1.06 

OMKey(37,26) 

0.98 

1.80 

0.79 

Figure 8.21: Example of saturated waveforms with pulses from NFE (upper image each) and 

FE (lower image each). Pulses are indicated by dotted lines with the charge 

given in units of PE. The waveforms have been calibrated with the IC59 

online-filtering settings for DOMcalibrator (top) and its offline-processing settings 

(SaturationLevel = 900, bottom), respectively. 

104


NFE (negative values) are mostly caused by saturated waveforms; for those, FeatureExtractor’s 

option TinyThreshold causes pulses below 5% of the charge of the highest pulse 

to be ignored. This can be seen in figure 8.21, where FE does not accept pulses during 

most of the waveform. TinyThreshold is set to zero in the IC59 offline-reconstruction 

settings.[51] 

The tails towards higher charges from FE are probably caused by FADC waveforms that 

are heavily affected by droop. FE successfully employs its own methods to correct for 

this effect. This has to be investigated further with the aim to include these methods in 

DOMcalibrator. 

FeatureExtractor often produces many more pulses in ATWD waveforms because it 

defines a pulse from two bins or from one bin of its deconvoluted distribution while NFE 

uses three bins each, see section 6.3.1: FE extracts up to 127 pulses per ATWD waveform, 

while NFE usually extracts 64 at most. For FADC on the other hand FeatureExtractor 

extracts comparatively few pulses because its FADC algorithm is not capable of splitting 

long features, while NFE usually extracts up to 128 pulses from a saturated FADC waveform. 

This can be verified in figure 8.22, where the FE’s distribution for Monte-Carlo has its 

first bend at about 150 pulses per waveform and then quickly falls off, whereas NFE has 

a first bend at about 50 pulses per waveform, but does not fall off quickly because of high 

numbers of FADC pulses. Very high numbers of pulses in Monte-Carlo data are caused 

by DOMs which were launched twice; in current experimental data, first launch cleaning 

cuts away waveforms from later launches. 

In summary, FeatureExtractor finds less pulses on average because of its FADC algorithm, 

and also because TinyThreshold rejects low pulses in very bright waveforms. 

IC59 offline-processing settings 

The recommended settings for IC59 offline-processing have been used in connection 

with the recommended settings for DOMcalibrator, i. e., SaturationLevel = 900.[62][51] 

For FeatureExtractor, the changes are: 

• ADCThreshold = 1.1 −→ ADCThreshold = 1.8 

The charge threshold to decide whether to accept a pulse or not was increased to 

take account for a recent change in the DOMs’ discriminator thresholds. The effect 

of this change is that for pulses that were previously split excessively now many 

fractions fall below the threshold; this results in a higher charge per pulse because 

the remaining pulses are rescaled (see section 6.3.1). The drawbacks are a higher 

number of missed pulses and more unphysical redistribution of charge between pulses 

of the same waveform. 

105


10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 50 100 150 200 250 

n pulses 

10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 50 100 150 

n pulses 

Figure 8.22: Distributions of the numbers of pulses extracted by FE (multi-pulse onlinefiltering 

settings, blue area) and NFE (with EnforcePulse set, green line). 


106


10 7 

10 6 

width: 0.90 0.81 

mean: 10.18 11.19 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 


Figure 8.23: Distributions of the time residuals of the first pulses extracted by FE (multipulse 

offline-processing settings, blue area) and NFE (with EnforcePulse set, 

green line) from Monte-Carlo data; red lines indicate Gaussian fits. 

107


600000 

width: 0.32 0.36 

mean: 0.88 0.93 

500000 

400000 

entries 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

600000 

width: 0.35 0.39 

mean: 0.96 0.90 

500000 

400000 

entries 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 8.24: Distributions of the charges of the first pulses extracted by FE (multi-pulse offlineprocessing 

settings, blue area) and NFE (with EnforcePulse set, green line); red 



108


10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 50 100 150 200 250 

n pulses 

10 7 

10 6 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

0 50 100 150 200 

n pulses 

Figure 8.25: Distributions of the numbers of pulses extracted by FE (multi-pulse offlineprocessing 

settings, blue area) and NFE (with EnforcePulse set, green line). 


109


• ExclusionSize = 5 −→ ExclusionSize = 1 

This parameters governs the deadtime between ATWD and FADC in units of FADC 

bin lengths during which no FADC pulses are accepted because their ATWD counterpart 

might already have been extracted (double extraction, see appendix C.1); 

it was largely overestimated in the old settings. The consequence of this change is 

higher total charge and more accurate extraction performance. 

• TinyThreshold = 0.05 −→ TinyThreshold = 0.00 

TinyThreshold was originally introduced to prevent the extraction of pulses from 

droop artifacts and prepulses. However, it hampers the accurate extraction of waveforms 

(see figure 8.21) and can cause direct hits to be ignored. This was judged to 

be more important, so the threshold was disabled. The results are more accurate 

extraction and higher total charge, because the charge attributed to pulses which 

do not pass TinyThreshold is forfeit. 

• PMTTransit = 2 −→ PMTTransit = -1 

This parameter influences the timing distribution by adding a time offset and correcting 

for a correlation between the PMT voltage and the pulse time; however, 

observed individually, disabeling the correction improves the time resolution measured 

by the Gaussian width from 1.23 ns (not shown) to 0.90 ns (figure 8.23). 

The offline-processing settings increase FeatureExtractor’s average charge per pulse 

towards 1 PE (figure 8.24: 0.88 PE for Monte-Carlo and 0.96 PE for experimental data 

instead of 0.81 PE for each in figure 8.18). However, the new settings also approximately 

double the number of missed first pulses to about 177000 in 4.1 million waveforms (4.3%), 

see figure 8.23. Besides, the time resolution increases from 1.23 ns to 0.90 ns. 

The total number of pulses (figure 8.25) decreases slightly because of the lower ADCThreshold, 

and there are more waveforms with exeptionally high numbers of pulses ( 100) because 

of the disabled TinyThreshold. 

In total, the changes were carefully compiled by many people and provide for an improvement 

upon the older online-filtering settings, yet they also increase the number of 

missed pulses and do not generally improve the charge per pulse: 

A related test originally conducted by Juanan Aguilar[63] compares the agreement of the 

ratio of charge per pulse between realistically simulated data (CORSIKA) and current 

experimental data. The results can be seen in figure 8.26. The simulated data uses the 

old discriminator thresholds, so it has to be extracted with ADCTreshold = 1.1 (onlinefiltering 

settings). In contrast, the experimental data was taken with the new thresholds, 

so the offline-processing settings are recommended; the ratio for experimental data is 

shown for both FeatureExtractor settings. 

As expected, the agreement is better for the offline-processing settings of FeatureExtractor, 

however it is worse for ratios near 0.3 PE per pulse. 

For NFE, the agreement is good for regions of high abundance. The disagreement for 

110


10 6 

10 5 

exp data (Run114060), onl. 

exp data (Run114060), offl. 

corsika (1628), onl. 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

1.0 0.5 0.0 0.5 1.0 1.5 2.0 

log 10 (q tot PE −1 / n pulses ) 

10 6 

10 5 

exp data (Run114060) 

corsika (1628) 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

1.0 0.5 0.0 0.5 1.0 1.5 2.0 

log 10 (q tot PE −1 / n pulses ) 

Figure 8.26: Charge per pulse ratio for simulated data (dotted line) and experimental data 

(solid lines) for FE (top, online and offline settings) and NFE (bottom, default 

settings). 

111


10 7 

10 6 

width: 1.23 0.81 

mean: 0.55 11.19 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 


Figure 8.27: Distributions of the time residuals of the first pulses extracted by FE (singlepulse 

settings, blue area) and NFE (with EnforcePulse set, green line) from 

Monte-Carlo data; red lines indicate Gaussian fits. 

low charges is probably caused by the missing simulation of the daq_baseline; the disagreement 

for ratios higher than 3 PE per pulse has to be examined. Both datasets have 

been extracted using the same (default) settings; NFE is not directly influenced by DOM 

discriminator threshold changes because all thresholds are defined in units of PE. 

8.3.2 FeatureExtractor in Single-Pulse Mode 

NFE’s results were also compared to FeatureExtractor’s IC59 single-pulse extraction results 

because this was the mode of operation for the IC59 online-filtering muon track 

reconstruction; FE is configured to use its second single-pulse extraction algorithm and 

to only extract ATWD waveforms.[51] For best comparability, EnforcePulse was set for 

NFE and no FADC pulses were extracted; still NFE’s distributions shown in this section 

closely resemble those in section 8.3.1 because the missing FADC pulses barely affect the 

first pulse. 

The distribution of the time residuals of the first pulses (figure 8.27) for FE is similar 

to those obtained by FE in its multi-pulse mode; this is not surprising because FE uses 

the single-pulse extraction time to replace the time of the Bayesian Unfolding pulse closest 

112


to it (section 6.3.1), and the single-pulse extraction algorithm extracts the first feature if 

can find. The distribution of the time residuals shows more early fake pulses; however, 

the shapes and widths of the distributions of the time differences between FE and NFE 

are effectively the same (±0.01 ns, not shown). 

In figure 8.28, the distributions of the charges of the first pulses have a strong tail 

towards high values because of the algorithm’s inability to split features. Furthermore 

they are less stable regarding the transition from Monte-Carlo to experimental data: Their 

mean increases from 0.87 PE to 1.04 PE, and the tail becomes significantly stronger for 

experimental data; this is probably caused by the wrong charge simulation (figure 7.12) 

and the datasets’ different energy spectra. 

The distribution of the differences of the total charge per waveform for simulated data 

in figure 8.29 reveals an on average higher total charge for NFE, despite the fact that 

FE calculates the integrated charge; this is due to the wrong baseline simulation, which 

is often fixed by FeatureExtractor’s own baseline correction algorithm. For experimental 

data, FE’s integrated charge is higher than NFE’s charge because the latter is extracted 

from features only. 


SLCHitExtractor is currently used for the IC59 SLC charge stamp extraction, which 

motivates tests of its performance and its use as a benchmark for NFE; see appendix C.4. 

SLCHitExtractor’s time resolution is better than “SLCHE”’s with a Gaussian width 

of 1.91 ns instead of 2.25 ns, see figure 8.30; both of them have better resolutions than the 

(more flexible) NFE default combination of FADC algorithms (∼ 3 ns, figure 7.10), and 

both resolutions are well below even one ATWD waveform bin length, so both are deemed 

to be excellent. 

SLCHitExtractor’s pulse times match those of FE’s ATWD pulses (if FE’s transit time 

correction is activated), “SLCHE”’s instead match the time of NFE’s FADC pulses (see 

section 7.1.4). 

The charge distribution of SLCHitExtractor is less stable than that of “SLCHE” (figure 

8.31); it shifts by 0.07 PE between Monte-Carlo and experimental data. It also shows an 

unexplained excess in its 0.8 PE bin. Compared to “SLCHE” the mean of the distribution 

is closer to 1 PE because it was calibrated to match the distribution of FeatureExtractor 

in its offline-reconstruction settings.[64] Correspondignly, “SLCHE” matches the average 

FADC charge extracted by NFE from simulated data. 

113


500000 

width: 0.38 0.36 

mean: 0.87 0.93 

400000 

300000 

entries 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

300000 

width: 0.51 0.39 

mean: 1.04 0.90 

250000 

200000 

entries 

150000 

100000 

50000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 8.28: Distributions of the charges of the first pulses extracted by FE (single-pulse settings, 

blue area) and NFE (with EnforcePulse set, green line); red lines indicate 

Gaussian fits. 


114


10 7 

10 6 

width: 0.13 

mean: -0.06 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

40 20 0 20 40 

∆q pulses 

10 7 

10 6 

width: 0.22 

mean: 0.14 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

50 40 30 20 10 0 10 20 30 

∆q pulses 

Figure 8.29: Distributions of the differences between the total charges of all pulses extracted 

by FE (single-pulse settings) and NFE (with EnforcePulse set). Positive values 

correspond to higher charges for FE; red lines indicate Gaussian fits. 


115


10 7 

10 6 

width: 1.91 2.25 

mean: -0.03 10.70 

10 5 

10 4 

entries 

10 3 

10 2 

10 1 

10 0 

10 -1 

80 60 40 20 0 20 40 60 


Figure 8.30: Distributions of the time residuals of the first pulses extracted by SLCHitExtractor 

(blue area) and NFE’s “SLCHE” (green line) from Monte-Carlo data; red 


116


700000 

600000 

width: 0.43 0.33 

mean: 0.96 0.78 

500000 

entries 

400000 

300000 

200000 

100000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

1400000 

1200000 

width: 0.46 0.35 

mean: 1.03 0.79 

1000000 

entries 

800000 

600000 

400000 

200000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

Figure 8.31: Distributions of the charges of the first pulses extracted by SLCHitExtractor 

(blue area) and NFE’s “SLCHE” (green line); red lines indicate Gaussian fits. 


117


8.5 Runtime Performance 

NFE was designed to be viable for real-time online processing. Since computing capacity 

at the South Pole is limited, efficient algorithms and data structures were used to generate 

fast code. 

The profiler callgrind/KCacheGrind was used to analyze the code. As expected, 

the most CPU time consuming part is “BayesUnfold”; the fraction of the time spent 

at this algorithm for ATWD with default NFE settings is 94.5% for E −1 Monte-Carlo 

data, and 89.6% for experimental data. Of this time, “BayesUnfold”’s unfolding method 

GetUnfolding() takes the highest share by far (93.9% of the total time for Monte-Carlo, 

89.0% for experimental data). Thus, the focus in code optimization was on this unfolding 

method. 

Most of the remaining time (4.8% resp. 9.8%) is spent at I3Frame::Get() to read in 

the input data, where calibration informations constitute the largest share of the time 

requirements (> 80%). 

For “BayesUnfold”’s GetUnfolding(), the number of calls to expensive functions or 

operands such as double::/ was minimized and temporary objects were avoided if possible. 

The use of the iterative stopping condition saves roughly 10 iterations on average 

(see section 7.1.3) while it costs less than 10% of the CPU time to check the breaking 

conditions for every iteration. Another tweak for BU was the introduction of templated 

structs containing the source specific parameters from table 7.3 in form of static constants; 

this allows the compiler to optimize the code more effectively and led to a gain in 

speed of about 19%. 

In comparison with other feature extractors, NFE shows superior runtime performance, 

mostly because of the use of multiple algorithms, see table 8.1. The systematically shorter 

times for experimental data are caused by its softer energy spectrum, since low-energy 

events offer less waveforms to extract. 

Switching from FeatureExtractor to NFE for online processing would reduce the average 

3.76 

time required per event to about ≈ 35% and hence would free up many of the 

8.47+2.24 

20 . . . 25 CPUs currently needed for feature extraction at the South Pole. If future muon 

track reconstruction scripts still employ algorithms that depend on integrated charge, a 

small and fast module can easily be created to join all pulses of each waveform. 

118

8.5 Runtime Performance 

Table 8.1: Runtimes for different feature extractors and datasets; errors are about ±3%. 

Module Sources runtime per event 

Simulated data, 

Dataset 3071 

t ms −1 

Experimental data, 

Run 113587, L1 

t ms −1 

FE multi-pulse ATWD+FADC 33.89 8.47 

FE single-pulse ATWD 7.89 2.24 

PE ATWD 18.49 5.39 

PE FADC 37.89 10.66 

SLCHitExtractor SLC < 0.50 < 0.27 

NFE ATWD 4.19 1.21 

NFE FADC 9.06 2.55 

NFE ATWD+FADC 13.25 3.76 

NFE SLC < 0.50 < 0.27 

NFE PulseMerger — ≪ 0.50 ≪ 0.27 

NFE EnforcePulse ATWD 4.21 1.19 

NFE EnforcePulse FADC 9.45 2.61 

NFE EnforcePulse ATWD+FADC 13.87 3.77 

NFE BU only ATWD 20.13 5.70 

NFE BU only FADC 30.21 8.30 

119


120

CHAPTER IX 

Summary And Outlook

9 SUMMARY AND OUTLOOK 

Within this thesis a new feature extraction package for recorded photomultiplier signals 

in the IceCube Neutrino Observatory at South Pole was developed. Its task is to search 

the waveforms captured by the digital optical modules for signals caused by photons. 

This information is made available to other software modules by extracting it from the 

waveforms’ features into pulses. 

Existing feature extractor algorithms have been analyzed conceptually and with respect 

to their performances. A concept for a new feature extractor was designed and key 

characteristics were defined: The new feature extractor – called NFE – should have modular, 

maintainable and well-documented code, it should be easy to use, flexible enough 

to cover all feature extraction demands, it should be reasonably fast, and provide good 

extraction performance in terms of miss rate, noise rate, and charge and time resolutions. 

A new technique of dynamically choosing an appropriate algorithm according to the 

waveform’s complexity was designed and implemented. The implemented algorithms are 

“Eva” to quickly decide which extraction algorithm to use on the waveform, “Simple” 

to extract pulses from waveforms with exclusively SPE-like features, “BayesUnfold” to 

extract complex features, and “SLCHE” to extract pulses from SLC chargestamps. The 

free parameters of these algorithms were calibrated using simulated datasets. 

The performance of the new feature extractor was tested under various conditions. 

The technique of dynamically choosing an appropriate algorithm proved to be successfull; 

it improves the extraction quality and drastically speeds up the process. Testing NFE 

with individual waveforms yields good results, in many cases NFE seems to perform better 

than the currently used FeatureExtractor especially at small features. 

Several characteristic distributions of resulting quantities for different feature extractors 

were both tested individually and compared to each other. NFE’s distributions are promising 

and support the positive impression gained from the individual waveform checks. Finally, 

the CPU efficiency was found to be significantly superior to that of other feature 

extractors. 

On March the 1 st , 2010, NFE passed the collaboration’s code review with positive 

remarks. As a consequence the project will be officially released soon after all comments 

have been incorporated. Fortunately, a new version of the simulation meta-project with 

various bugfixes will be released soon and can be used to verify or recalibrate the free 

parameters of NFE’s algorithms; this is required as errors in the current simulated datasets 

affect feature extraction. 

For now, some open issues remain: More tests have to be conducted concerning saturated 

waveforms. Especially the effect of transistor droop in the optical modules has to be 

adressed; droop correction is already performed by the software module DOMcalibrator, 

however sometimes droop effects remain. The original FeatureExtractor seems to employ 

a powerful method of further repair waveforms, which could probably be implemented 

into DOMcalibrator after exhaustive tests. 

122

Another open issue are tests with flasher runs. Usually, experimental data does not offer 

information about true hit times, so one has to rely on simulated datasets, accepting systematical 

errors. Flasher runs offer an alternative because the time at which the flasher – 

i. e., a light source inside the detector – was activated is well-known. Therefore they can 

be used to verify the parameter settings and to check the extraction performance. 

Furthermore, NFE lacks pybindings, i. e., an interface to directly access the algorithms 

from a possibly interactive Python session. Pybindings can be useful for the verification 

of the low-level reconstruction and will be part of a future release. 

With its release, NFE will be open for analyses by the collaboration to decide whether it 

should replace or complement the existing feature extractors in the official reconstruction 

data chain. Regardless of this decision, much was learned about IceCube’s low-level 

reconstruction, and some improvements were made. 

123

9 SUMMARY AND OUTLOOK 

124

APPENDIX A 

Bayesian Unfolding 

A.1 Formal Approach 

Formally, one considers as given a histogram whose n E entries n Ei are interpreted as 

numbers of effects E i which are caused by a not necessarily equal number n C of causes C j . 

With n tot := ∑ n Ei one can define the probability for effect E i to occur as P (E i ) := n E i 

n tot 

. 

Furthermore, the probability for a certain C j to cause each of the effects is given by 

P (E i |C j ). 

By applying Bayes’ Theorem, one obtaines 

P (C j |E i ) = P (E i|C j ) · P (C j ) 

∑ 

j P (E i |C j ) · P (C j ) 

Using this and the pairwise disjointness of the E i , one can compute the probabilities 

P (C j ) t+1 = ∑ i 

P (C j |E i ) t · P (E i ) = ∑ i 

P (E i |C j ) · P (C j ) t 

∑k P (E i |C k ) · P (C k ) t 

P (E i ) ∀ t ∈ N 

iteratively, which then can easily be transformed into the most probable numbers of causes 

that occured, n Cj = P (C j ) · n tot . 

A.2 Adaption to IceCube’s Waveforms 

The entries of the waveform (with negative values set to zero) are the numbers of effects 

n Ei , the numbers of causes n Cj correspond to the charge (actually ∆t times charge, but 

∆t is constant and can be considered later on) belonging to a pulse originating in bin j, 

using the same binning for both effects and causes and therefore n C = n E . 

P (E i |C j ) denotes the normalized single photo electron (SPE) pulse shape which for ATWD 

is given by Christopher Wendt’s parametrization[47] (see section 7.1.3) 

f : R ≥0 → [0, 1] : t ↦→ c 

( 

e − x−x 0 

b 1 + e x−x 0 

b 2 

) −8 

. 

125

A 

BAYESIAN UNFOLDING 

According to this parametrization, the first L bins contain over 99.4% of the pulse’s 

charge, therefore the computation can be significantly sped up by omitting most addends 

of both sums: 

P (C j ) t+1 = 

j+L−1 ∑ 

i=j 

P (E i |C j ) · P (C j ) t 

∑ ik=i−L+1 

P (E i |C k ) · P (C k ) t 

P (E i ) 

Computation can further be accelerated by saving S m := P (E m |C 0 ) ≡ P (E m+j |C j ) 

instead of calculating P (E i |C j ) for all values of j; the resulting equation is 

P (C j ) t+1 = 

j+L−1 ∑ 

i=j 

S i−j · P (C j ) t 

∑ ik=i−L+1 

S i−k · P (C k ) t 

P (E i ) 

Finally, as we are interested in n Cj , we cancel out n tot : 

n Cj , t+1 = n Cj , t 

j+L−1 ∑ 

i=j 

S i−j · n Ei 

∑ ik=i−L+1 

S i−k · n Ck , t 

126

APPENDIX B 

Cascade Pulse Tagging 

This thesis’ initial topic was to tag pulses originating from cascades to reconstruct locations 

of strong stochastic energy losses at muon propagation, with the aim to improve 

both energy reconstruction and track reconstruction (see figure B.1). 

This turned out to be impractical because of low Čerenkov luminosity, prepulses, and 

too much scattering, but also demonstrated some limitations of FeatureExtractor such as 

excessive pulse splitting (section 6.3.1), which were one of the motivations for the creation 

of NFE. The low-level work also revealed two of the bugs found during the work on this 

thesis. 

U 

U 

t 

DOM 

t 

particle track 

Čerenkov cone 

U 

cascade light 

t 

Figure B.1: This thesis’ initial topic and one of the motivations to create NFE: Tagging of 

pulses caused by cascades during muon propagation. If the waveforms were as 

clear as the idealized ones in this illustration, cascades could be located and used 

for direction and energy reconstruction refinement. 

127

B 

CASCADE PULSE TAGGING 

128

APPENDIX C 

Specific Problems and Anomalies 

Many aspects of this thesis required checks of low-level observables or single waveforms, 

owing to which several previously unknown problems or anomalies were found in the 

projects involved. Some of those relevant to this thesis are explained below; this is by no 

means meant to be offensive. 

C.1 ATWD FADC Time Offset Caused Double Extraction 

Up to very recent versions of the calibration tool DOMcal, there was a time offset between 

the ATWD and FADC waveforms of about 15 ns for experimental data and 34 ns 

for simulated data. This did not only corrupt the late (i. e. FADC) pulse times, but also 

caused pulses extracted by FE just preparatory to the end of the ATWD waveform to 

be extracted again in the FADC waveform, effectively doubling their impact because FE 

automatically merged all ATWD pulses with all FADC pulses found outside the AWTD 

waveform’s timespan. 

This double extraction problem was solved by the respective projects’ authors by introducing 

a time shift in DOMcalibrator and an exclusion time window in FE (25 ns by 

default), during which no FADC pulses are extracted right after the end of the ATWD 

waveform. 

C.2 Implementation of the Second Single-Pulse Extraction Algorithm 

in FeatureExtractor 

The implementation of the second single-pulse extraction algorithm in FeatureExtractor 

(section 5.1) differs from the documentation that ships with the source code[50] and from 

its depiction in presentations[53] (figure C.1). 

The pulse width is not determined by the number of bins that pass half of the first pulse’s 

amplitude, but by a constant multiplied with the extracted charge, and divided by the 

maximum entry of the waveform. 

More importantly, the charge is not defined as charge above threshold, but as total integrated 

charge which includes baseline fluctuations: In the IC59 settings[51], the baseline 

is estimated by taking the average of the first three bins, and the resulting value is substracted 

from all bins if it does not exceed a hard-coded threshold; still, time-dependent 

129

sum of all bins above 

baseline + error 

C 


maximum slope 

to baseline 

SPECIFIC PROBLEMS AND ANOMALIES 

width 

width: 





threshold 




to baseline 

width: 





charge: 






to baseline 

width: 

constant times 


charge: 




threshold 

threshold 

Figure C.1: Sketch illustrating 

parabola fit to maximum bin charge: 

the differences between 

extrapolation 

the 

of firstdocumentation (left) and implementation 

(right) parabola of FeatureExtractor’s maximum 

local maximum secondslope 

single-pulse extraction algorithm. 

× pulse width 


Baseline detection has been omittedto for baseline reasons of clarity. 

x 

x 

width: 

proportional to 


charge: 


baseline shifts caused by droop or bad estimations due to the low statistics of the first 


three binsmaximum can cause 

slope 

deviations of the total charge. 

to baseline 

This behaviour might be responsible for the overestimation of the charge of low pulses 

threshold 

which is described in section 8.3.1 and illustrated in figure 8.19, because this algorithm is 

width 

used in FeatureExtractor’s multi-pulse settings x to obtain the total charge that is used to 

rescale the pulses of its Bayesian Unfolding algorithm (see section 6.3.1). 

x 

x 

x 

C.3 Missing Simulation of the daq_baseline in DOMsimulator 

If available, DOMcalibrator uses the daq_baseline stored in the calibration data to calculate 

a waveform’s average baseline. Up to its current release however, DOMsimulator 

does not simulate this baseline. This leads to an overcompensation by DOMcalibrator 

and thereby to a wrong baseline (e. g., figure 8.3), which in turn affects the extracted 

charge (figure C.2). Moreover, the pulse times are affected by the increased baseline, and 

more fake pulses are extracted. 

After being informed about an apparent mismatch between simulated and experimental 

data baselines, the DOMsimulator’s and DOMcalibrator’s maintainer Stijn Buitink 

quickly tracked and fixed the bug for the next release. 

C.4 Time Offset in SLCHitExtractor 

In its initial release, SLCHitExtractor had a hard-coded time offset calibration parameter 

c 2 (see section 6.4), which was used to align SLCHitExtractor’s pulse times with FE’s 

ATWD pulse times. It has been abandoned later when the ATWD FADC time offset was 

fixed (appendix C.1), because the times matched well without this offset. However, this is 

only true for FeatureExtractor’s FE59 online-filtering settings, or more precisely only for 

130

C.4 Time Offset in SLCHitExtractor 

40000 

35000 



total 

30000 

25000 



total 

30000 

25000 

20000 

entries 

20000 

entries 

15000 

15000 

10000 

10000 

5000 

5000 

0 

1 0 1 2 3 4 


0 

1 0 1 2 3 4 


30000 

width: 0.32 0.31 

mean: 0.98 0.81 

30000 

width: 0.32 0.31 

mean: 0.92 0.81 

25000 

25000 

20000 

20000 

entries 

15000 

entries 

15000 

10000 

10000 

5000 

5000 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

0 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 

q pulse1 

10 7 

10 6 

width: 0.71 2.95 

mean: 11.92 25.94 

10 7 

10 6 

width: 0.70 2.95 

mean: 11.78 25.94 

10 5 

10 5 

10 4 

10 4 

entries 

10 3 

entries 

10 3 

10 2 

10 2 

10 1 

10 1 

10 0 

10 0 

10 -1 

40 20 0 20 40 


10 -1 

40 20 0 20 40 


Figure C.2: Effect of the simulation of the daq_baseline on pulses extracted with NFE; custom 

simulation using 500 non-relativistic monopoles as light sources, the original 

scripts were provided by Thorsten Glüsenkamp. 

Left: data without baseline simulation; right: data with proper simulation. 

Top row: ratio between the total ATWD integrated charge and the corresponding 

FADC charge; 

middle row: the first pulse’s charge per waveform for ATWD (blue area) and 

FADC (green line); 

bottom row: the first pulse’s time per waveform; the ATWD FADC time offset 

was not applied. 

131

C 

SPECIFIC PROBLEMS AND ANOMALIES 

pulses extracted with PMTTransit = 2 (section 8.3.1), and furthermore it is coincidence: 

w 

There is no reason why the time t = t 0 + (i max − 1) · 25 ns − c imax−1 1 w max 

(section 6.4) 

w 

should match the true leading edge, because the term c imax−1 1 w max 

only compensates for the 

correlation (slope) between the ratio of the first two bins and the pulse time (which is 

only well-defined for regular charge stamps). This time offset can be seen in the sketch in 

figure 6.6, which is up to scale. 

The total time offset of SLC pulses compared to FE’s ATWD pulses in IC59 offlineprocessing 

data is composed of the about 11 ns between FE ATWD pulses for PMTTransit 

= 2 and PMTTransit = -1, and the 15 ns caused by the original ATWD FADC time offset, 

which was erroneously not applied to SLC charge stamps. 

132

Acknowledgements 

Without contributions of many people, writing this thesis would not have been possible. 

First I would like to thank Prof. Dr. Christopher Wiebusch, who gave me the opportunity 

to work on this highly interesting project. He invited me into his workgroup and 

provided new ideas as well as neverending enthusiasm. 

Special thanks go to Dr. David Boersma, whose great advice and invaluable experience 

especially in software development and IceCube reconstruction made this project possible. 

Moreover, I would like to thank Prof. Dr. Martin Erdmann for reviewing this thesis 

as second referee. 

Many thanks go to Anne Schukraft, Sebastian Euler, Matthias Schunck, Thomas 

Krings, and Jan-Patrick Hülß for their extensive proof reading, and to them and the 

rest of the IceCube Aachen workgroup for many interesting discussions, a great working 

atmosphere and generally a good time. This particularly includes the Kinderzimmer in 

both its former and its current lineup. 

Special credits go to Thomas Krings, who shared his initial L A TEX templates to start 

a common framework for future theses in Aachen. I would also like to thank Matthias 

Schunk for the simulation of specialized Monte-Carlo datasets needed for tests and calibration, 

and Thorsten Glüsenkamp for providing me with his simulation scripts. 

Furthermore, many thanks go to the rest of the IceCube Collaboration, in particular 

to Dmitry Chirkin, Christopher Wendt, Alex Olivas, Stijn Buitink, Andreas Groß, Cécile 

Portello-Roucelle, Dennis Diederix, Markus Voge, and Fabian Clevermann for many interesting 

discussions, and also to Thorsten Stezelberger from LBNL for his investigation 

on the SLC firmware. 

Finally I want to express my gratitude towards my family and my friends for their 

enduring support! 

I

Erklärung 

Hiermit erkläre ich, dass ich die vorliegende Arbeit selbständig verfasst und keine anderen 

als die angegebenen Quellen und Hilfsmittel verwendet habe. 

Aachen, den 02. März 2010 

Declaration 

I hereby certify that this document has been composed by myself, and describes my own 

work, unless otherwise acknowledged in the text. 

Aachen, March the 2 nd , 2010 

III

References 

[1] Amsler, C. et al.: “Review of Particle Physics – Astrophysics and Cosmology”. 

Physics Letters B, vol. 667(1-5), pp. 212 – 260, 2008. ISSN 0370-2693. doi: 

DOI:10.1016/j.physletb.2008.07.028. Review of Particle Physics. 

URL http://dx.doi.org/10.1016/j.physletb.2008.07.028 

[2] Simpson, J. A.: “Elemental and Isotopic Composition of the Galactic Cosmic Rays”. 

Annual Review of Nuclear and Particle Science, vol. 33(1), pp. 323–382, 1983. 

doi:10.1146/annurev.ns.33.120183.001543. 

URL http://arjournals.annualreviews.org/doi/abs/10.1146/annurev.ns. 

33.120183.001543 

[3] Gaisser, T. K.: Cosmic Rays and Particle Physics. Cambridge Univ. Press, Cambridge, 

1990. 

[4] Amenomori, M. et al.: “The cosmic-ray energy spectrum around the knee measured 

by the Tibet-III air-shower array”. Nuclear Physics B - Proceedings Supplements, 

vol. 175-176, pp. 318 – 321, 2008. ISSN 0920-5632. doi:DOI:10.1016/j.nuclphysbps. 

2007.11.021. Proceedings of the XIV International Symposium on Very High Energy 

Cosmic Ray Interactions. 

URL http://www.sciencedirect.com/science/article/B6TVD-4RJ49K5-26/2/ 

6859b5dcee40103ac6be83912a4b9f55 

[5] Nagano, M., Teshima, M., Matsubara, Y., Dai, H. Y., Hara, T., Hayashida, N., 

Honda, M., Ohoka, H., and Yoshida, S.: “Energy spectrum of primary cosmic rays 

above 10 17 eV determined from extensive air shower experiments at Akeno”. Journal 

of Physics G: Nuclear and Particle Physics, vol. 18(2), pp. 423–442, 1992. 

URL http://stacks.iop.org/0954-3899/18/423 

[6] Cherry, M. L.: “An abrupt slowdown for particles on the fast track”. Physics, vol. 1, 

9, Aug 2008. doi:10.1103/Physics.1.9. 

URL http://physics.aps.org/articles/v1/9 

[7] Greisen, K.: “End to the Cosmic-Ray Spectrum?” Phys. Rev. Lett., vol. 16(17), pp. 

748–750, Apr 1966. doi:10.1103/PhysRevLett.16.748. 

URL http://prl.aps.org/abstract/PRL/v16/i17/p748_1 

[8] Drees, M.: “The Top-Down Interpretation of Ultra-High Energy Cosmic Rays”. Journal 

of the Physical Society of Japan, vol. 77SB(Supplement B), pp. 16–18, 2008. 

doi:10.1143/JPSJS.77SB.16. 

URL http://jpsj.ipap.jp/link?JPSJS/77SB/16/ 

V

[9] Kachelrieß, M. and Semikoz, D.: “Reconciling the ultra-high energy cosmic ray 

spectrum with Fermi shock acceleration”. Physics Letters B, vol. 634(2-3), pp. 143 – 

147, 2006. ISSN 0370-2693. doi:DOI:10.1016/j.physletb.2006.01.009. 

URL http://www.sciencedirect.com/science/article/B6TVN-4J5D6G3-8/2/ 

f2184636c2d405bc6da9f940becf5269 

[10] Dolag, K., Grasso, D., Springel, V., and Tkachev, I.: “Mapping deflections of extragalactic 

ultrahigh-energy cosmic rays in magnetohydrodynamic simulations of the 

local universe”. JETP Letters, vol. 79(12), pp. 583–587, June 2004. ISSN 0021-3640 

(Print) 1090-6487 (Online). doi:10.1134/1.1790011. 

URL http://www.springerlink.com/content/wn2117762165xg21/ 

[11] Rothman, T. and Boughn, S.: “Can gravitons be detected?” Found. Phys., vol. 36, 

pp. 1801–1825, 2006. doi:10.1007/s10701-006-9081-9. 

URL http://arxiv.org/abs/gr-qc/0601043 

[12] Collaboration, T. L. S. and Collaboration, T. V.: “Searches for gravitational waves 

from known pulsars with S5 LIGO data”, September 2009. 

URL http://arxiv.org/abs/0909.3583v1 

[13] Brocato, E., Castellani, V., Degl’Innocenti, S., Fiorentini, G., and Raimondo, G.: 

“Stars as galactic neutrino sources”. Astron. Astrophys., vol. 333, p. 910, 1998. 

URL http://arxiv.org/abs/astro-ph/9711269 

[14] Woosley, S. E., Heger, A., and Weaver, T. A.: “The evolution and explosion of 

massive stars”. Rev. Mod. Phys., vol. 74(4), pp. 1015–1071, Nov 2002. doi:10.1103/ 

RevModPhys.74.1015. 

URL http://rmp.aps.org/abstract/RMP/v74/i4/p1015_1 

[15] Nicolas Chamel, P. H.: “Physics of Neutron Star Crusts”. Living Reviews in Relativity, 

vol. 11(10), 2008. 

URL http://www.livingreviews.org/lrr-2008-10 

[16] Hansen, B.: “The astrophysics of cool white dwarfs”. Physics Reports, vol. 399(1), 

pp. 1 – 70, 2004. ISSN 0370-1573. doi:DOI:10.1016/j.physrep.2004.07.001. 

URL http://www.sciencedirect.com/science/article/B6TVP-4D3B39C-1/2/ 

8143f54436eb55b9ee72bf541e93349e 

[17] Amsler, C. et al.: “Review of Particle Physics”. Phys. Lett., vol. B667, p. 1, 2008. 

doi:10.1016/j.physletb.2008.07.018. And 2009 partial update for the 2010 edition. 

URL http://pdglive.lbl.gov/listings1.brl?quickin=Y 

[18] Pasquali, L., Reno, M. H., and Sarcevic, I.: “Secondary decays in atmospheric charm 

contributions to the flux of muons and muon neutrinos”. Astroparticle Physics, 

vol. 9(3), pp. 193 – 202, 1998. ISSN 0927-6505. doi:DOI:10.1016/S0927-6505(98) 

VI

00019-X. 

URL http://dx.doi.org/10.1016/S0927-6505(98)00019-X 

[19] Thunman, M., Ingelman, G., and Gondolo, P.: “Charm production and high energy 

atmospheric muon and neutrino fluxes”. Astroparticle Physics, vol. 5(3-4), pp. 309 – 

332, 1996. ISSN 0927-6505. doi:DOI:10.1016/0927-6505(96)00033-3. 

URL http://www.sciencedirect.com/science/article/B6TJ1-3VPSFK6-C/2/ 

d99a97b5b0b4a7e077c1dbf6ebe646e8 

[20] Ahrens, J. et al.: “IceCube Preliminary Design Document”. Tech. Rep., The IceCube 

Collaboration, Oct 2001. 

URL http://www.icecube.wisc.edu/science/publications/pdd/pddwhole.php 

[21] Schonert, S., Gaisser, T. K., Resconi, E., and Schulz, O.: “Vetoing atmospheric 

neutrinos in a high energy neutrino telescope”. Physical Review D, vol. 79, p. 043009, 

2009. 

URL doi:10.1103/PhysRevD.79.043009 

[22] Gandhi, R., Quigg, C., Reno, M. H., and Sarcevic, I.: “Neutrino interactions at 

ultrahigh energies”. Phys. Rev. D, vol. 58(9), p. 093009, Sep 1998. doi:10.1103/ 

PhysRevD.58.093009. 

URL http://prd.aps.org/abstract/PRD/v58/i9/e093009 

[23] Reno, M. H.: “High energy neutrino cross sections”. Nuclear Physics B - Proceedings 

Supplements, vol. 143, p. 407, 2005. doi:doi:10.1016/j.nuclphysbps.2005.01.137. 

URL http://arxiv.org/abs/hep-ph/0410109 

[24] Neunhöffer, T.: Die Entwicklung eines neuen Verfahrens zur Suche nach kosmischen 

Neutrino-Punktquellen mit dem AMANDA-Neutrinoteleskop. Shaker, 2004. 

URL http://icecube.berkeley.edu/manuscripts/ 

[25] Escribano, R., Frère, J. M., Monderen, D., and Elewyck, V. V.: “Insights on neutrino 

lensing”. Physics Letters B, vol. 512(1-2), pp. 8 – 17, 2001. ISSN 0370-2693. 

doi:DOI:10.1016/S0370-2693(01)00686-4. 

URL http://www.sciencedirect.com/science/article/B6TVN-43CTFPP-3/2/ 

2777c795ad8bb667d593d14700f378a9 

[26] Illana, J. I., Masip, M., and Meloni, D.: “Probing TeV gravity at neutrino telescopes”. 

In Proc. of the First Workshop on Exotic Physics with Neutrino Telescopes (edited 

by de los Heros, C.). Uppsala, Sep 2006. 

URL http://arxiv.org/abs/hep-ph/0612305 

[27] Alvarez-Muniz, J. and Zas, E.: “Calculations of radio pulses from High Energy Showers”. 

AIP CONF.PROC., vol. 579, p. 117, 2001. doi:doi:10.1063/1.1398165. 

URL http://arxiv.org/abs/astro-ph/0103369 

VII

[28] Voigt, B.: Sensitivity of the IceCube detector for ultra-high energy electronneutrino 

events. Ph.D. thesis, Humboldt-Universität zu Berlin, Mathematisch- 

Naturwissenschaftliche Fakultät I, Nov 2008. 

URL http://edoc.hu-berlin.de/docviews/abstract.php?id=29421 

[29] Besson, D. and The Rice Collaboration: “Modeling of high-energy electromagnetic 

showers in ice”. In International Cosmic Ray Conference, vol. 3 of International 

Cosmic Ray Conference, pp. 1179–+, 2001. 

URL http://adsabs.harvard.edu/abs/2001ICRC....3.1179B 

[30] Amsler, C. et al.: “Review of Particle Physics – Experimental Methods and Colliders”. 

Physics Letters B, vol. 667(1-5), pp. 261 – 315, 2008. ISSN 0370-2693. doi:DOI: 

10.1016/j.physletb.2008.07.029. Review of Particle Physics. 

URL http://dx.doi.org/10.1016/j.physletb.2008.07.029 

[31] Albuquerque, I., Burdman, G., and Chacko, Z.: “Neutrino Telescopes as a Direct 

Probe of Supersymmetry Breaking”. Phys. Rev. Lett., vol. 92(22), p. 221802, Jun 

2004. doi:10.1103/PhysRevLett.92.221802. And corresponding presentation Looking 

for SUSY in the Ice at TeV Particle Astrophysics converence. 

URL http://www-astro-theory.fnal.gov/Conferences/TeV/Albuquerque.pdf 

[32] Katz, U. et al.: “KM3NeT – Conceptual Design Report for a Deep Sea Research Infrastructure 

Incorporating a Very Large Volume Neutrino Telescope in the Mediterranean 

Sea”. Tech. Rep., The KM3NeT Consortium, April 2008. 

URL http://www.km3net.org/CDR/CDR-KM3NeT.pdf 

[33] Halzen, F.: “Status of Neutrino Astronomy: The Quest for Kilometer-Scale Instruments”. 

COMMENTS NUCL.PART.PHYS., vol. 22, p. 155, 1997. 

URL http://lanl.arxiv.org/abs/astro-ph/9701029 

[34] Montaruli, T.: “Neutrino Astronomy in the Ice”. Nuclear Physics B – Proceedings 

Supplements, vol. 188, pp. 239–244, March 2009. Proceedings of the Neutrino 

Oscillation Workshop. 

URL http://arxiv.org/abs/0901.2664 

[35] Woschnagg, K.: “IC79 Geometry Figure”. IceCube Internal Wiki, 2010. 

URL http://wiki.icecube.wisc.edu/index.php/Geometry_figures 

[36] Vevea, D.: “Array-PublicationDL”. IceCube Internal Gallery, 2009. 

URL http://gallery.icecube.wisc.edu/internal/v/graphics/sketchup 

[37] Ruzybayev, B., Hussain, S., Xu, C., Gaisser, T., and the IceCube Collaboration: 

“Small air showers in IceTop”. In Proceedings of the 31 s t ICRC. Łódź, July 2009. 

URL http://www.srl.utu.fi/AuxDOC/kocharov/ICRC2009/pdf/icrc0737.pdf 

VIII

[38] The IceCube Collaboration: “Live at the South Pole”. IceCube Public Website, June 

2009. 

URL http://www.icecube.wisc.edu/info/life.php 

[39] Ackermann, M. et al.: “Optical properties of deep glacial ice at the South Pole”. 

Journal of Geophysical Research - Atmospheres, vol. 111(D13), pp. D13203+, July 

2006. ISSN 0148-0227. doi:10.1029/2005JD006687. 

URL http://dx.doi.org/10.1029/2005JD006687 

[40] The IceCube Collaboration: “Anatomy of a DOM”. IceCube Public Gallery, Nov 

2006. 

URL http://gallery.icecube.wisc.edu/external/4-cons-doms/DOM-Picture. 

png.html 

[41] Portello-Roucelle, C.: “DOMCalibrator”. IceCube Virtual Meeting 2009, July 2009. 

URL http://wiki.icecube.wisc.edu/index.php/Agenda_Day_1%2C_Session_2 

[42] Wendth, C.: “Droop Correction – Dual τ Model”. IceCube Collaboration Meeting, 

Oct 2006. 

URL https://docushare.icecube.wisc.edu/dsweb/Get/Document-30244/ 

Droop-dual-tau-Zeuthen2006_wendt.pdf 

[43] Wiebusch, C. and the IceCube Collaboration: “Physics Capabilities of the IceCube 

DeepCore Detector”. In Proceedings of the 31 s t ICRC. Łódź, July 2009. 

URL http://arxiv.org/PS_cache/arxiv/pdf/0907/0907.2263v1.pdf 

[44] Hamamatsu: Photomultiplier Tube R7081-02 Data Sheet, Nov 2003. 

URL https://docushare.icecube.wisc.edu/dsweb/Get/Document-6637/ 

R7081-02%20data%20sheet.pdf 

[45] The IceCube Collaboration: “Prepulse Data”. IceCube Internal Wiki, May 2007. 

Based on Chris Wendt’s measurements. 

URL http://wiki.icecube.wisc.edu/index.php/Prepulse_Data 

[46] The IceCube Collaboration: “The IceCube data acquisition system: Signal capture, 

digitization, and timestamping”. Nuclear Instruments and Methods in Physics 

Research Section A, vol. 601(3), pp. 294 – 316, 2009. ISSN 0168-9002. doi: 

DOI:10.1016/j.nima.2009.01.001. Revision 1.3. 

URL http://www.sciencedirect.com/science/article/B6TJM-4VBMNCJ-4/2/ 

282b1e53516b6eab577fa652971a8fd9 

[47] Wendt, C.: “DOM SPE Waveform Shape”. IceCube Internal Wiki, 2009. 

URL http://icecube.wisc.edu/~chwendt/dom-spe-waveform-shape/ 

[48] Stezelberger, T.: “private conversation”, Aug 2009. Lawrence Berkeley National 

Laboratory. 

IX

[49] The IceCube Collaboration: “IceTray”, Feb 2010. 

URL http://software.icecube.wisc.edu/offline-software.trunk/projects/ 

icetray/index.html 

[50] Chirkin, D., Klein, S. et al.: “FeatureExtractor V02-03-00 Source Code”. IceCube 

SVN Source Code Repository, January 2010. 

URL http://code.icecube.wisc.edu/projects/icecube/browser/projects/ 

FeatureExtractor/releases/V02-03-00 

[51] The IceCube Collaboration: “Standard Processing Scripts, V10-01-00”. IceCube 

SVN Source Code Repository, Jan 2010. 

URL 

http://code.icecube.wisc.edu/projects/icecube/browser/ 

meta-projects/std-processing/releases/10-01-00/scripts/IC59/level1_ 

CalibrateAndExtractPulses.py 

[52] ROOT Development Team: “ROOT”. http://root.cern.ch/drupal/, 2010. 

[53] Panknin, S.: “The Feature Extractor”. IceTray Seminar, Oct 2008. 

URL 

http://nuastro-zeuthen.desy.de/e13/e63159/e27/e689/e693/ 

infoboxContent722/fe.pdf 

[54] Chirkin, D. and Wendth, C.: “PulseExtractor Source Code”. IceCube SVN Source 

Code Repository, January 2010. 

URL http://code.icecube.wisc.edu/projects/icecube/browser/sandbox/ 

PulseExtractor 

[55] Groß, A.: “SLCHitExtractor V00-01-00 Source Code”. IceCube SVN Source Code 

Repository, January 2010. 

URL http://code.icecube.wisc.edu/projects/icecube/browser/projects/ 

SLCHitExtractor/releases/V00-01-00 

[56] Boersma, D. J.: “Gulliver”. IceCube Internal Wiki, Oct 2009. 

URL http://wiki.icecube.wisc.edu/index.php/Gulliver 

[57] D’Agostini, G.: “A multidimensional unfolding method based on Bayes’ theorem”. 

Nuclear Instruments and Methods in Physics Research Section A, vol. 362(2-3), pp. 

487 – 498, Mar 1995. ISSN 0168-9002. doi:DOI:10.1016/0168-9002(95)00274-X. 

URL http://www.sciencedirect.com/science/article/B6TJM-3YRNX0H-5K/2/ 

3e3a92555a7955c7f4ab989fa99baef7 

[58] The IceCube Collaboration: “IC59 NuMu E −1 dataset 2595”. Simulation Production, 

Sep 2009. 

URL http://internal.icecube.wisc.edu/simulation/dataset/2595 

[59] The IceCube Collaboration: “IC59 NuMu E −1 dataset 3071”. Simulation Production, 

Feb 2010. 

URL http://internal.icecube.wisc.edu/simulation/dataset/3071 

X

[60] Voge, M.: “IC59 Waveforms”, Dec 2009. https://docushare.icecube.wisc.edu/ 

dsweb/Get/Document-52517/MarkusVoge_09-12-15.pdf. 

[61] Merck, M.: “DOMCalibrator Problems”. IceCube Internal Wiki, Jul 2009. 

URL http://wiki.icecube.wisc.edu/index.php/DOMCalibrator_Problems 

[62] Groß, A.: “IC59 L2 processing”. IceCube Internal Wiki, Oct 2009. 

URL http://wiki.icecube.wisc.edu/index.php/IC59_L2_processing 

[63] Aguilar, J.: “IC59 L2 processing/FADC usage”. IceCube Internal Wiki, Dec 2009. 

URL http://wiki.icecube.wisc.edu/index.php/IC59_L2_processing/FADC_ 

usage 

[64] Groß, A.: “SLCHitExtraction”. IceCube Internal Wiki, Apr 2009. 

URL http://wiki.icecube.wisc.edu/index.php/SLCHitExtraction 

XI

Design, Implementation and Test of a new Feature Extractor for the ...

Create successful ePaper yourself

Delete template?

Save as template?