Feature eXtraction from sparse time series data - FreiDok - Albert ...

Cover

**Feature** **eXtraction**

A tool for analyzing **sparse** **time** **series** **data**

Dissertation

vorgelegt der

**Albert**-Ludwigs-Universität

Freiburg im Breisgau

Dirk Steenpaß

März 2002

Picture

Perception and interpretation

(Mount Kalmberg, Ramsauer mountains.)

Abstract

We present a computational methodology for qualitative analysis of **sparse** and noisy **time** **series**.

Information about the changes of the signal level within a **time** **series** and the number of

distinguishable signal levels is extracted and condensed into a pattern string. The qualitative

analysis of a **time** **series** can be done at several levels of detail to generate pattern strings that

encode the sequence of changes within the signal level only, include information about the **time**

points at which these changes took place, or include in addition information about the relative

strength of the signal levels.

The extraction of the qualitative features is based on standard statistical techniques. One-way

analysis of variance is used to detect changes of the signal level and the number of signal levels

within a **time** **series**. For this reason every pattern string is of well-defined significance.

Sparse **time** **series** **data** cannot be used to study the kinetics of a process because they provide

qualitative information only. Microarray **time** **series** are a typical example of **sparse** and noisy **time**

**series**. A single microarray **time** **series** experiment provides simultaneous measurements of tens of

thousands gene expression levels over **time**. The algorithms presented here give a reliable and

compact description of the qualitative features of each individual **time** **series**. The feature extraction

algorithms should facilitate a systematic analysis and use of microarray **time** **series** **data**, but they

can also be used for **time** **series** analysis in general. We illustrate the potential of the methodology

by applying it to two microarray **time** **series** experiments.

3

Acknowledgements

I would like to thank Prof. Dr. Hanns-Christof Spatz for his help and support.

I am also indebted to András Aszódi who was my teacher at the Novartis Forschungsinstitut in

Vienna and without whom this would not have been possible.

I owe thanks to the biologists who did the experiments. Henrietta D Moore provided the

inflammatory bowel disease **data** and Andreas Billich the MIF-mice **data**. Special thanks go to

Frank Kalthoff who invested much work into a **time** **series** experiment only to learn that the mRNA

samples failed to hybridize to the chip due to technical problems. This said, however, he

immediately began to repeat the experiment but unfortunatley these **data** cannot be presented here.

Adrienne James did her best to improve my English.

Torsten Schindler shared my office and thus could not avoid becoming involved in many helpful

discussions.

4

Contents

Cover .......................................................................................................................................................................... 1

Picture......................................................................................................................................................................... 2

Abstract....................................................................................................................................................................... 3

Acknowledgements..................................................................................................................................................... 4

Contents ...................................................................................................................................................................... 5

1 Introduction ..................................................................................................................................7

1.1 Biological regulatory networks........................................................................................................ 7

2 Experimental technology ...........................................................................................................13

2.1 From blots to microarrays ............................................................................................................. 13

2.2 cDNA microarrays .......................................................................................................................... 15

2.2.1 Fabrication ................................................................................................................................................ 15

2.2.2 Sample preparation and hybridization ...................................................................................................... 15

2.2.3 Data extraction and image analysis........................................................................................................... 16

2.3 Oligonucleotide arrays.................................................................................................................... 18

2.3.1 Fabrication ................................................................................................................................................ 18

2.3.2 Sample preparation and hybridization ...................................................................................................... 20

2.3.3 Data extraction and image analysis........................................................................................................... 21

2.4 Other technologies .......................................................................................................................... 21

2.4.1 Microarray variants................................................................................................................................... 22

2.4.2 RT-PCR .................................................................................................................................................... 22

2.4.3 EST **data**.................................................................................................................................................... 22

2.4.4 SAGE........................................................................................................................................................ 23

2.4.5 Proteomics approaches.............................................................................................................................. 24

3 Gene expression analysis ...........................................................................................................26

3.1 Noise and RNA quantification ....................................................................................................... 26

3.2 Analysis by ad-hoc thresholding.................................................................................................... 28

3.3 Approaches to gene expression analysis made ............................................................................. 30

3.4 Sparse **time** **series** **data** and **Feature** **eXtraction**........................................................................... 33

4 Development of the tools ............................................................................................................36

4.1 **Feature** **eXtraction** .......................................................................................................................... 36

4.1.1 Identification of significant jumps ............................................................................................................ 36

4.1.2 **Feature** **eXtraction** algorithms................................................................................................................... 50

4.2 Application of **Feature** **eXtraction**................................................................................................. 58

4.2.1 Inflammatory bowel disease **data**.............................................................................................................. 58

4.2.2 MIF mice **data**........................................................................................................................................... 62

5 Discussion...................................................................................................................................66

5.1 The one-way ANOVA model.......................................................................................................... 66

5.2 Power and experimental design..................................................................................................... 67

5.3 Grouping by qualitative patterns .................................................................................................. 70

5.4 **Feature** **eXtraction** and microarray **data**...................................................................................... 70

5.4.1 Inflammatory bowel disease **data**.............................................................................................................. 70

5.4.2 MIF mice **data**........................................................................................................................................... 71

5.4.3 Microarray **time** **series** in general.............................................................................................................. 71

5.5 Conclusion ....................................................................................................................................... 72

6 References...................................................................................................................................73

5

7 Appendices ..................................................................................................................................82

7.1 Affymetrix analysis metrics............................................................................................................ 82

7.1.1 Probe cell intensity correction................................................................................................................... 82

7.1.2 Empirical metrics...................................................................................................................................... 83

7.1.3 Statistical metrics...................................................................................................................................... 85

7.2 Grouping of the inflammatory bowel disease **data** ...................................................................... 86

7.2.1 Low-resolution pattern.............................................................................................................................. 86

7.2.2 High-resolution pattern ............................................................................................................................. 87

7.2.3 Extended low-resolution pattern ............................................................................................................... 87

7.2.4 Extended high-resolution pattern.............................................................................................................. 88

List of figures............................................................................................................................................................ 89

List of tables ............................................................................................................................................................. 90

Abbreviations............................................................................................................................................................ 91

6

1 Introduction

1 Introduction 7

Biology as a scientific discipline aims to understand the processes of life. This is done on different

levels of detail ranging **from** interactions between populations to the interactions of molecules on a

cellular level. The collection of **data** and the description of a system are a necessary step towards its

understanding.

This work is concerned with biological research that aims to understand biological systems at the

molecular level. In the next section we clarify what we mean by "understanding a biological system

at the molecular level of detail", and what kind of **data** is needed to do so. In this work we present

algorithms to analyze biological **time** **series** **data**. The algorithms are especially useful if large **data**

sets such as those obtained **from** gene expression **time** **series** measured in microarray experiments

have to be analyzed. Microarrays are a relatively new technology that can provide information

about the expression of (basically) any gene of an organism in parallel. Though microarray **data**

alone are not sufficient to understand a biological system in all its detail, valuable insights can be

obtained by the use of this technology.

As the algorithms presented here are designed with microarray technology in mind, we introduce

the technology in some detail in section 2; along with a brief summary of other technologies that

provide information about a biological system in high-throughput manner in section 2.4.

1.1 Biological regulatory networks

We will use the lysis-lysogeny decision-circuit of the phage lambda, to illustrate what we mean by

"understanding a biological system at the molecular level of detail", and to put microarray **time**

**series** **data** into perspective. The decision-circuit is in comparison to other biological systems

simple, and it is one of the biological systems that are understood in considerable detail. We will

not discuss the decision-circuit in detail here, but use it to highlight some features of biological

systems.

The infectious form of the lambda phage consists of a protein capsule and a double stranded DNA

genome. After docking of the phage to an Escherichia coli host-cell, the genome is injected into the

cell. At this stage the phage is reduced to its double stranded DNA. The phage has now two options

for development. Its genome may be integrated into the host genome to be activated later

(lysogeny), or the cellular machinery of its host replicates the genome, the coat proteins are

synthesized and newly assembled phages lyse the host cell (lysis). The development always starts

the same way. First the so-called early genes of lambda are expressed. Among the early genes two

genes play a central role, namely cI and cro. The decision between lysogeny and lysis is the

outcome of a very complex process, but essentially it amounts to a critical race in the buildup of the

two gene products CI and Cro. CI and Cro are both repressor proteins that bind as dimers to the

phage DNA. When one of the competing repressors reaches its active concentration it will switch

off the production of the other repressor. If CI wins the race, the phage genome will be integrated

into the host genome. If Cro wins the race, the phage will replicate and lyse the host cell.

This brief description ignores the detailed knowledge about the circuit that has accumulated during

a research period of roughly 25 years. Ptashne started the research on the decision-circuit in the late

sixties, with the isolation of the CI repressor (Ptashne, 1967). An overview of the decision-circuit as

understood today can be found in "A Genetic Switch: Phage λ and Higher Organism" (Ptashne,

1992). The extensive research done on lambda makes it possible to construct detailed models of the

lysis-lysogeny decision-circuit as shown in Figure 1-1. This figure illustrates an electric circuit

model presented by Shapiro and McAdams (McAdams and Shapiro, 1995). The description of a

biological system as a regulatory network, as shown in Figure 1-1, is based on qualitative and

quantitative information. Qualitative information is used to define the components of the regulatory

1 Introduction: Biological regulatory networks 8

network, and to describe the mechanisms by which these components interact. Quantitative **data** is

necessary to obtain the parameters that define the behavior of these mechanisms in **time**.

Figure 1-1 Schematic illustration of the phage lambda lysis-lysogeny decision circuit. The figure is taken **from**

(McAdams and Shapiro, 1995). The right to left orientation of operons does not necessarily correspond to the

orientation on the chromosome.

In the case of the lambda example presented here, the complete genome is characterized. All coding

regions of the genome are known, and a function can be assigned to every encoded protein. In

addition non-coding nucleotide sequences that are of importance for the expression of genes, such

as the promotors, are well understood. Thus the molecules and nucleotide sequences that form the

components of the regulatory network are known: the qualitative information about the components

of the regulatory network is complete. A second qualitative aspect of a regulatory network is the

mechanism by which these components interact; we will call these the 'wiring' of the network. The

wiring describes the protein-protein, protein-promoter interactions etc occurring in the system,

together with the interaction mechanisms to host cell components. In case of lambda, the

mechanisms of interaction are well characterized for the viral components, while the interaction

mechanisms of the viral components with host cell components are not understood in every detail.

However, the qualitative information that is available about the lysis-lysogeny decision-circuit is

sufficient to construct a very detailed model of the wiring.

To understand the working of the decision circuit, its behavior in **time** must be simulated. To be

able to do so, functions have to be defined that model the interaction mechanisms in **time**. The

model shown in Figure 1-1 uses electric gates for this purpose. The gates act as Boolean or realvalued

functions that take input **from** one or several components and have an output effect on other

components. To tune these functions precisely, parameters such as the association and dissociation

constants of protein-protein and protein-DNA interactions must be known. Even for the wellunderstood

lambda decision-circuit this kind of quantitative information is frequently not available.

Some of the gates shown in Figure 1-1 are marked with the label "kinetic model required". These

1 Introduction: Biological regulatory networks 9

gates were simulated on the basis of sensible assumptions; the **data** that is necessary to characterize

them exactly is still missing. The quantitative **data** needed to simulate the behavior of a biological

system in **time** is in general difficult to obtain.

The lysis-lysogeny decision-circuit can be used to illustrate another important aspect of biological

systems. If sufficient qualitative and quantitative information is given to construct a regulatory

network model, one of two approaches is usually chosen to model the behavior of the interactions in

**time**. The model shown in Figure 1-1 uses an analogy between electrical circuits and macroscopic

chemical kinetics in a well-stirred solution (McAdams and Shapiro, 1995). The use of chemical

kinetics, however, is not ideal for the lambda switching circuit considered here, because the wellstirred

tank reactor model is not always appropriate for biological systems. The assumption

underlying the chemical kinetics model, namely that the amount of reactants can be approximated

as a continuously varying quantity (the concentration) that will change deterministically in **time** is

true only in systems where (i) the number of reactant molecules is large and spatial variation in

concentration are negligible, and (ii) the number of elementary reactions (collisions between

reacting molecules) in an observation interval is large. In biological systems, however, many

regulatory molecules will be present in such small numbers that the concept of concentration

becomes physically meaningless (Guptasarma, 1995). Furthermore, rates of genetic reactions are

frequently slow; minutes may separate successful transcription initiation events **from** the activation

of a promoter. Gillespie has shown that the behavior of such small-number, low-rate systems in

**time** is a stochastic process (Gillespie, 1992). An alternative, and more appropriate, approach to the

simulation of such systems can be based on two algorithms provided by Gillespie (Gillespie, 1976;

Gillespie, 1977). These algorithms were indeed applied to the lysis-lysogeny decision-circuit by

McAdams in a later publication together with Ross and Arkin (Arkin et al., 1998).

In the light of decision circuit example, "understanding a biological system at the molecular level"

refers to a situation where the scientist is able to construct a detailed model of a biological system,

and is able to simulate the behavior of the system in **time**. The model of a biological system can be

seen as a regulatory network, and one can distinguish three crucial steps in the construction of the

model. The components of the system must be identified, the wiring of the system must be

understood, and parameters must be obtained that allow for quantitative modeling of the interaction

mechanisms between the system's components.

Given that system components of interest are known, their behavior can be monitored by **time** **series**

experiments. In a biological system the "concentration" of molecules such as mRNAs, proteins,

metabolites, secondary messengers or others could be measured at several different **time** points. We

want to distinguish between two basic situations in which **time** **series** experiments are done: (i) the

wiring of the system is known and (ii) not known. Consider the former situation first. If the wiring

of the system is known, then **time** **series** experiments are done to study the kinetics of the system.

Thus the information provided by the experiment depends crucially on its resolution in **time**. If the

**time** resolution of the experiment is high enough, i.e., the **time** interval between two successive

measurements is shorter than the characteristic **time** of the underlying processes, kinetic parameters

can be inferred **from** the measurements and quantitative modeling becomes possible (e.g., enzyme

kinetics simulation). If the resolution in **time** is low, the kinetic parameters that are necessary cannot

be inferred **from** the measurements. In this latter case only qualitative features of the system's

kinetics can be analyzed. Many important aspects of the behavior of a system in **time** can still be

captured by a qualitative description. In the case of the phage lambda simplified qualitative models

have been developed that describe the lysis-lysogeny circuit using **data** such as relative velocity of

processes (e.g., process A is faster than process B) and relative strength of interaction effects (e.g.,

protein A has a strong repressing effect on the expression of some gene while the competitive

inducer B has only a weak effect) (Thomas et al., 1976; Thieffry and Thomas, 1995; Thomas et al.,

1995).

When the wiring of the system is not known, or only partially known, **time** **series** experiments are

performed to gain insight into the interactions of the system's components. Time **series** experiments

1 Introduction: Biological regulatory networks 10

can help to propose hypotheses about the interactions in a system. Proposing such hypotheses and

their subsequent validation is a difficult task, especially if the system under study is complex and

only little is known about it. If hypotheses about interactions can be established, despite these

difficulties, the mechanism by which these interactions operate still remains hidden. Consider (for

illustrative purposes) a hypothetical regulatory network consisting of only three components (A, B,

C), where the interactions between these components are unknown. The system is monitored, as

shown in Figure 1-2, during three distinct **time** periods. It is first observed for some **time** in a

"natural" state, then the signal level of A is clamped to a low signal level. The level of this signal is

kept fixed (as indicated by the gray box) and is released after some **time**. The system is then

monitored under "natural" conditions again. The **time** resolution is not very important in this

experiment, basically a **time** point before and after the clamping is sufficient. This experiment

clearly gives an impression of what is going on in the system, and it could be used to propose

hypotheses about interactions of the system components.

signal

A

Figure 1-2 Hypothetical **time** **series** experiment. The signal strength of three components is measured (A, B, C).

The system is observed for some **time**, then the expression level of A is artificially fixed at a low level for a certain

**time** (gray box), to be released again later. The lines represent an experiment with high resolution in **time**, while

the dots represent an experiment with low **time** resolution.

Consider, for example, the component C. The clamping of A somehow causes an increase in the

signal level of C, while the signal level of B changes to a lower value. Several hypotheses for the

regulation of C by the two other system components are compatible with these observations: e.g., A

or B could act as a repressor on C, or the cooperative action of A and B could be necessary to

repress C. Furthermore, a self-regulatory effect of C cannot be excluded. Additional experiments

are needed to establish the causal relations that account for the signal level of C. In the worst case,

namely if the signal level of C does indeed depend on A, B and C, a complete perturbation

experiment, as shown in Figure 1-3 must be done. This artificial example illustrates several aspects

of **time** **series** experiments if used to gain insight into the interaction between system components.

Even if a complete perturbation experiment is performed and thus the interactions occurring in the

system are well understood, the **time** **series** **data** do not provide a clue about the mechanisms by

which these interactions operate. Time **series** **data** do not reveal the physical processes behind the

C

B

**time**

1 Introduction: Biological regulatory networks 11

observed interactions (i.e., it is not clear by which mechanism the components in the hypothetical

example interact). Given the case that not all system components are known, models of interactions

are restricted to the known components. If new components become known the existing interaction

models may need to be revisited. In contrast to our hypothetical example a system component will

in general have more than two significant states, this will increase the complexity of a complete

perturbation experiment immensely. In addition, the number of hypotheses that satisfy a single **time**

**series** experiment (done on all system components in parallel) will grow exponentially along with

the number of experiments needed to verify or reject these by the complete perturbation approach.

Input → Output

a b c → a b ?

a b C → a b C

a B c → a B ?

a B C → a B ?

A b c → A b ?

A b C → A b ?

A B c → A B c

A B C → A B ?

Figure 1-3 A complete perturbation experiment for the hypothetical system A, B, C. The system components

switch between two possible states: "high" (capital letters) and "low" (small letters). Thus only eight system

states are possible. The gray boxes indicate that the signal level is clamped. Capital letters designate a strong

signal; small letters designate a weak signal. In a first step the system is put into a certain state by clamping all

three components. Then the clamping of C is released and the effect of the input state on the signal level of C

must be monitored, giving the output state. Two stable states (bold face) were already observed in the

hypothetical **time** **series** experiment (Figure 1-2), the input state equals the output state in these cases.

Where do these considerations leave us with respect to an "understanding of biological systems at

the molecular level of detail"? If the interaction mechanisms in a system are known, **time** **series**

experiments are a method of choice to understand their kinetics. If mechanisms of interaction

between system components are to be studied, **time** **series** experiments alone will not be sufficient

and in addition frequently even not feasible. The complexity of a system under study will render the

complete perturbation approach useless in many cases. Furthermore, it is necessary to control the

signal level of every component under study; this is not possible in general.

The microarray technology cannot circumvent these fundamental problems and, in addition,

provides information about mRNA levels only. As illustrated by the lambda example, this

information is not sufficient. Information about the protein composition of a biological system is

absolutely necessary. Though the protein composition depends on gene expression, protein levels

cannot in general be correlated directly to mRNA levels. Work that compared the mRNA

abundance against the abundance of respective proteins in human liver (Anderson and Seilhammer,

1997) and yeast (Futcher et al., 1999; Gygi et al., 1999) found weak to moderate correlation with 10

to 20-fold variation for genes with the same mRNA levels. Even if the protein composition could be

directly associated to mRNA levels, **data** about protein-protein and protein-DNA interactions would

be still missing. Several additional factors complicate the picture further. The transcription and

translation machinery, splicing, differential splicing, modulation of translation and posttranslational

modifications must be taken into account in a detailed model. Other factors that can be

of importance are for example molecule fluxes, transport, compartmentalization, spatial

arrangement and intra- and intercellular signaling.

1 Introduction: Biological regulatory networks 12

Though microarray experiments definitely do not provide sufficient information to model a

biological system in detail, important insights concerning the genetic regulatory network can still be

obtained **from** microarray **data**. For example microarrays can be used to set up differential

expression experiments. Time **series** experiments are especially useful to identify differentially

expressed genes. Regulation of gene expression may occur directly after some treatment was

applied, but can also occur hours or even days later. The latter may involve cascades of regulatory

events at several different sites. The reliable identification of differentially regulated genes **from**

such **data** could make it possible to focus research on the interactions of (hopefully) small sub-sets

of genes.

2 Experimental technology

2 Experimental technology 13

The microarray technology enables the quantitative measurement of mRNA expression levels of

biological systems, such as bacterial cultures, cell cultures, or tissue samples in a highly parallel

fashion.

2.1 From blots to microarrays

The starting point for development of the microarray was probably the observation that single

stranded DNA binds strongly to nitrocellulose membranes. The DNA binds in such a way that the

re-association of the double-stranded molecule is prevented, but complementary RNA can hybridize

to the bound DNA. This observation was used to set up a first quantitative assay to measure DNA-

RNA binding (Gillespie and Spiegelman, 1965). Fundamentally important **data** was produced using

this method even before cloning was invented. It was, for example, used to measure the change in

the number of copies of a gene during the process of amplification in Drosophila melanogaster

(Ritossa et al., 1971). As it became possible to clone DNA fragments the technique was used to

identify specific sequences. The Southern Blotting is used to transfer DNA fragments that have

been separated by gel electrophoresis to a nitrocellulose membrane. Radioactively labeled RNA

fragments, frequently called probes, are then used to identify DNA bands on the nitrocellulose

membrane, and thus in the gel, that contain the DNA fragments of interest (Southern, 1975). The

first technique that is rather similar to modern day microarrays is dot blotting. DNA **from** different

clones is arrayed on a nitrocellulose filter. Hybridization with radioactively labeled RNA probes

allows the identification of clones that include specific DNA fragments (Kafatos et al., 1979). Dot

blotting is the basis for cDNA libraries on nitrocellulose membranes. The technique was

subsequently automated and miniaturized, and was even applied to very large genomic cDNA

libraries (Lennon and Lehrach, 1991).

Two technical innovations had crucial impact on the development of microarrays, the introduction

of solid support materials and the in situ synthesis of oligonucleotides. The use of non-porous solid

support, such as glass, has made miniaturization and fluorescence-based (instead of radioactivitybased)

detection possible. Protocols pioneered by Patrick Brown and colleagues (Schena et al.,

1995) allow over 30,000 different cDNAs to be robotically spotted onto a conventional microscope

slide. The spots are typically 100-300 m in diameter and are spaced about the same distance apart.

The resulting microarray is commonly called a "cDNA microarray".

The development of methods for high-density spatial synthesis of oligonucleotides is the basis for

the second type of microarray, the so-called "oligonucleotide microarray", that is in use today.

Steve Fodor and colleagues (Fodor et al., 1991) have adapted photolithographic masking techniques

used in semiconductor manufacturing to produce microarrays that have a dense lawn of

oligonucleotides on their surface. The microarray is divided into distinct areas; each of these has

several hundred thousand oligonucleotide molecules attached to it. An array can be subdivided into

up to 40,000 of such areas, each presenting an oligonucleotide complementary to a specific

fragment of an expressed sequence tag. The oligonucleotides consist typically of 20 to 25

nucleotides.

The nucleotides (cDNA, oligonucleotides) that are attached the surface of the array are commonly

called the "probes". The material that is used for hybridization is called the "sample" or the "target"

material. In the following two sections more detail on the cDNA microarray and the oligonucleotide

microarray is presented. Figure 2-1 gives a preliminary overview of the similarities and differences

of the two technologies. For both technologies three aspects will be discussed in the following: the

fabrication of the microarray, the hybridization of target material to the chip and the extraction of

**data** **from** the hybridized chip.

2 Experimental technology: From blots to microarrays 14

Figure 2-1 Schematic overview of probe array and target preparation for cDNA and oligonucleotide

microarrays. The figure is taken **from** (Schulze and Downward, 2001).

a, cDNA microarrays. Array preparation: inserts **from** cDNA libraries are amplified using either vector specific

or gene specific primers. PCR products are printed at specific sites on the chip surface using a robot. The DNA is

covalently linked to the glass surface. Target preparation: RNA **from** two different tissues or cell populations is

used to synthesize single-stranded cDNA in presence of nucleotides labeled with two different dyes (here Cy3 and

Cy5). Both samples are mixed in a small volume of hybridization buffer and hybridized to the array surface.

During the hybridization the differently labeled cDNAs bind competitively to corresponding elements on the

array surface. Fluorescence scanning of the array with two different wavelengths (corresponding to the dyes)

provides the signal intensity for the two dyes. The ratio of the signal intensities is computed and represented in

pseudo color on the image of the scanned array.

b, oligonucleotide arrays. Array preparation: sequences of 4 to 20 short oligonucleotides (20mers-25mers) are

chosen **from** the mRNA reference sequence of each gene. The probes on the array are synthesized in situ. Target

preparation: polyA + RNA **from** different tissues or cell populations is used to generate double stranded cDNA

carrying a transcriptional start site for T7 DNA polymerase. During in vitro transcription, biotin-labeled

nucleotides are incorporated into the synthesized cRNA molecules. Each target sample is hybridized to a

separate probe array. Target binding is detected by staining with a fluorescent dye coupled to streptavidin.

Signal intensities of probe array element sets of the different arrays are normalized to target intensity. The

normalized intensity values are then used to calculate the relative mRNA abundance of the genes represented on

the array.

2.2 cDNA microarrays

2 Experimental technology: Oligonucleotide arrays 15

The production of cDNA microarrays requires less technical expertise than the production of

oligonucleotide microarrays, and technological know-how can be obtained **from** public sources. The

laboratory of Patrick Brown, one of the pioneers in the field, provides a recipe to build a robot for

printing microarrays at http://cmgm.stanford.edu/pbrown/mguide/. In contrast to this publicly

available technology, the dominating oligonucleotide microarray technology is proprietary;

Affymetrix sells the chips as commercial products. The "standard" version of a cDNA microarray is

described here, but many variations of this basic concept exist.

2.2.1 Fabrication

The first step in the production of a microarray is the selection of an appropriate set of probes to be

printed to the chip surface. This step is heavily influenced by the question the experimenter wants to

elucidate by the chip experiment. Regardless of the source and by what means the cDNA fragments

that represent the genes of interest are obtained, they are amplified by PCR. The PCR products are

typically partially purified by precipitation, gel filtration (or both) to remove salts, detergents, PCR

primers and proteins present in the cocktail.

A robot is then used to print the array of probes on the chip surface. The robot collects several

samples to be spotted **from** a microtitre plate, each in a separate pen. The samples are spotted in a

serial process to several chips. The pens are then cleaned and a new set of samples is collected to

start the next round of spotting. The design of the pens used for the spotting has a big impact on the

quality of a chip. The first spotting robots relied on contact printing using a device not unlike a

fountain pen. Many variations of this original design are now available. Non-contact printing

methods using either piezo or ink-jet devices are also in use.

Most chips are glass based. Ordinary microscope slides are commonly used as they have a low

inherent fluorescence. The glass may be coated with poly-lysine, amino silanes or amino-reactive

silanes (Schena et al., 1995), which enhance both the hydrophobicity of the slide and the adherence

of the deposited DNA. The spots can also be printed on nylon membranes.

The density of spots that can be printed to the chip surface is influenced strongly by the spread of

the droplet. This in turn depends mostly on the probe volume that is spotted. Typically a few

nanoliters per probe are spotted. The concentration of the PCR product in this volume is typically

100-500 g/ml. These spots have a diameter of around 100 to 300 m and are printed the same

distance apart. Up to 30,000 spots may be printed onto a conventional microscope slide.

The probe DNA is cross-linked to the chip surface using either ultraviolet irradiation or chemical

linkers. A certain percentage is then rendered single stranded by treatment with alkali or heat. It is

important to realize that DNA is bound in an ill-defined state to the chip surface. It may be doublestranded,

intra stranded cross-linked or linked to the chip surface several **time**s. This means that

DNA that is bound to a chip is not as well suited for hybridization as single stranded DNA in

solution.

2.2.2 Sample preparation and hybridization

A labeled representation of the cellular mRNA pool is the target material for chip hybridization.

This can be obtained in different ways. The entire mRNA pool, or a part of it that is selected by

oligo-dT primers, is frequently used.

The grid of spots produced by the robot is of limited accuracy. The spots vary slightly in form, size

and position as well as in the amount of DNA they contain. This technical limitation makes it very

difficult to compare the signal strength **from** two different cDNA microarrays in a precise and

quantitative way (Schulze and Downward, 2001). This difficulty is circumvented by a hybridization

approach that simultaneously presents a treated sample and a control sample to the chip. The

2 Experimental technology: Oligonucleotide arrays 16

mRNA of a treated sample and a control sample are prepared, and reverse transcribed using

different labeled nucleotides. The most commonly used nucleotides are Cye3-dUTP and Cye5dUTP.

These have high incorporation efficiency with reverse transcriptase and give fluorescence

signals that are widely separated in the excitation and emission spectra. Reverse transcription of the

treated sample mRNA and the control sample mRNA are done separately, but the labeled cDNA

solutions are mixed before hybridization to the chip. Hybridization is carried out under conditions

of a large excess of immobilized probes relative to the labeled targets. The kinetics of hybridization

is therefore assumed to be of pseudo-first order. Competition between the control sample and the

treated sample is insignificant under these conditions. The hybridization **time** is chosen so that

virtually all target DNA will bind to the chip.

The purity of the RNA is important for the measurement quality of fluorescence detection. Cellular

protein, lipid and carbohydrate may mediate significant non-specific binding of fluorescently

labeled cDNAs to slide surfaces.

To produce fluorescence of sufficient strength a certain amount of target mRNA is needed. Most

protocols require between 25-100 g of mRNA. This amount of mRNA is often not available,

therefore protocols have been developed that amplify the target mRNA by in vitro transcription

(Phillips and Eberwine, 1996). For these protocols it is crucial to make sure that no bias is

introduced by the amplification process. If in vitro transcription is combined with cDNA synthesis

amplification can be further enhanced (Luo et al., 1999). Brady developed a protocol that combines

in vitro transcription and cDNA amplification, and makes it possible to profile the transcripts of a

single cell (Brady, 2000).

Radioactively labeled target DNA has a lower detection limit compared to fluorescence labeled

target DNA. A frequently used protocol was developed in (Eickhoff et al., 2000), where [α-

33

P]dCTP is used as labeled nucleotide. In contrast to the fluorescence labeling technique a

competitive hybridization approach is not possible if radioactively labeled target DNA is used.

2.2.3 Data extraction and image analysis

The intensity of the signal (fluorescence or radioactivity) given by a spot on the chip is directly

proportional to the amount of target bound to the chip. Thus the intensity of the signal of a spot can

be used as a direct measure of the target DNA. Figure 2-2 shows the evaluation steps of a cDNA

microarray experiment using fluorescence labeling. The spots on the chip are excited at different

wavelength using two separate lasers and the resulting fluorescence of the dyes is measured. The

scanning device typically performs this simultaneously for both dyes. During this process every

spot is digitized, and represented by a certain number of pixels, with every pixel representing the

fluorescence of a dye. The excitation and fluorescence scanning step produces two digitized grayscale

images, one for each dye. These images are then pseudo-colored (e.g., one red and the other

one green) and merged such that the resulting color represents the ratio of the intensities of the two

signals. This pseudo-colored ratio representation is used for further **data** analysis. How this is done

in detail may vary between different cDNA array systems.

If radioactive labeling is used, this procedure is slightly different. After hybridization the chip is

incubated with a detector film that stains according to the irradiation emitted by the labeled probes.

The film can be digitized or directly used for image analysis. Details can be found for example in

(Eickhoff et al., 2000).

As soon as several experiments need to be compared the result of a single microarray experiment

must be normalized. The normalization steps try to compensate for local and global intensity

differences. The exact way this is done is highly variable. Brachat and coworkers used Incyte

microarray technology in their apoptosis study, applying the simultaneous hybridization and

fluorescence labeling approach for the control sample and the treated sample (Brachat et al., 2000).

The area surrounding a single spot was scanned to determine a value for the local background.

Every intensity value was corrected by subtracting the local background value and a global intensity

2 Experimental technology: Oligonucleotide arrays 17

correction was also applied. The intensity value for every dye was then summed over the whole

chip, and the ratio of the overall sums was used to normalize the intensity values for the different

dyes. Obviously this approach assumes that the overall signal intensity for both dyes is equally

distributed. This is equivalent to the assumption that there are only insignificant differences

between the treated and the control sample.

Figure 2-2 Data extraction **from** a cDNA array experiment using fluorescence labeling. The figure is taken **from**

(Duggan et al., 1999). The dyes are excited at different wavelengths and the resulting fluorescence is measured.

The intensity for every spot is stored in a gray-scale image corresponding to each dye. These images are then

pseudo-colored by a computer program. Typically one dye is pseudo colored red, the other one green. The two

images are then merged into a single image that is used for further analysis.

Other experimenters use a specialized design of the microarray to be able to measure background

intensities and intensities resulting **from** target solutions that contain a "constant" amount of mRNA.

The array can be subdivided into different blocks, where each of the blocks holds spots that do not

contain probes (empty spots), spots that are used to measure the "constant" signal and spots that are

used to measure the signals that correspond to the genes the experimenter wants to monitor. Such a

chip design can be found in (Herwig et al., 2001). As the chip is designed to handle normalization

issues, this can be done more effectively than in case of the Incyte technology. The cited experiment

used cDNA arrays that were spotted onto a nylon membrane and radioactively labeled nucleotides.

This introduces difficulties in **data** acquisition and normalization. If the spots are arrayed on a

membrane the flexibility of the material induces slight non-linear shifts of the grid that is spotted.

This makes it difficult to develop a highly accurate grid to specify the target locations. The use of

2 Experimental technology: Oligonucleotide arrays 18

radioactively labeled target material complicates the situation further. The image produced by

radioactive exposure is composed of sections at many focal planes, each with a smooth transition

**from** highest signal intensity to background intensity. This makes it very difficult to reconstitute the

spot precisely (Duggan et al., 1999).

2.3 Oligonucleotide arrays

While cDNA arrays have cellular cDNA linked to their surface, oligonucleotide arrays have

synthetic oligonucleotides, typically 20-25mers, attached to the chip surface. The synthetic

oligonucleotides can be synthesized and then linked to defined areas of the chip surface, or can be

in situ synthesized directly on the chip. In situ synthesis is at present the most commonly used

approach. The oligonucleotide arrays produced by Affymetrix (http://www.affymetrix.com) are

currently the most widely used oligonucleotide arrays. The oligonucleotide technology as used now

by Affymetrix is based on work presented in (Fodor et al., 1991; Southern et al., 1992; Fodor et al.,

1993; Pease et al., 1994; Fodor, 1997; Wodicka et al., 1997)

2.3.1 Fabrication

The fabrication of an oligonucleotide array starts the same way as the fabrication of a cDNA array.

At the onset a set of genomic sequences is selected to serve as the probes of the chip. Each of these

sequences is represented by what Affymetrix calls a "probe set" on the chip. The different "probe

cells" of a probe set contain oligonucleotides that are complementary to certain parts of a gene's

mRNA reference sequence. The basic design of a probe set is shown in Figure 2-3.

Figure 2-3 Probe set for the detection of a mRNA reference sequence on an Affymetrix oligonucleotide array.

The figure is taken **from** (Lipshutz et al., 1999). Several regions (typically 20, but some**time**s as few as four) are

selected **from** the reference mRNA to be represented by a probe pair. A probe pair consists of two probe cells, (i)

the perfect match probe cell which displays oligonucleotides that perfectly match the selected region, and (ii) the

mismatch probe cell that has a point mutation with respect to the perfect match.

The probe set is a set of multiple independent detectors whose combined power is used to detect a

transcript in the target solution. The reference mRNA is represented by a number of

oligonucleotides (at least four, typically 20) that do not overlap, or overlap only slightly if

necessary. An additional level of redundancy is introduced by building the probe set **from** pairs of

probe cells. A probe pair consists of a perfect match probe cell and a mismatch probe cell. The

2 Experimental technology: Oligonucleotide arrays 19

perfect match cell has a dense lawn of identical oligonucleotides attached to its surface that

perfectly match a region of the reference mRNA sequence. These oligonucleotides should hybridize

to the target mRNA with high affinity. The mismatch probe cell presents oligonucleotides that show

a single point mutation with respect to the oligonucleotides of the perfect match probe. The

mismatch probe cell should therefore produce a weaker hybridization signal than the perfect probe

cell. One should be aware that the mismatch probe cells not only interact with the unspecific

background, but also interact with the target sequence. This is intuitive and has been shown

experimentally by Chudin et al. (Chudin et al., 2001). However, the perfect match to mismatch chip

design provides a framework to perform statistic analysis to decide whether or not a signal for a

specific target sequence can be detected and to which extent the target sequence was present in the

hybridization sample.

The regions of the target sequence to be represented by probes are selected according to several

criteria. The number of nucleotides used for a probe has a crucial impact on its sensitivity and its

ability to discriminate between similar sequences. On the one hand the probe must be of a certain

length to enable the formation of a stable duplex. On the other hand the number of mutations

needed to destabilize the mismatch-probe-target-duplex grows as the sequence gets longer. In

addition, the hybridization behavior gets more complex with increasing probe length. Affymetrix

uses a probe length of 25 oligonucleotides. The probes are chosen to be as unique as possible with

respect to gene family members and other genes. Probes that are not complementary to abundant

RNAs (e.g., rRNAs, tRNAs, alu-like sequences and actin mRNA) are preferred (Lipshutz et al.,

1999). Furthermore the nucleotide composition of the probe is also considered. Short

oligonucleotides may be extremely biased in their purine to pyridine ratio. This in turn has a strong

effect on the hybridization properties of the oligonucleotide (Lockhart et al., 1996; Wodicka et al.,

1997) as the A:T pairs have a lower stability than G:C pairs. It is obvious that for every probe an

optimal solution cannot be found with respect to these considerations, thus the redundant approach

of a probe set consisting of a number of probe pairs certainly has its merits.

The oligonucleotide arrays produced by Affymetrix use glass as the hybridization surface. The

nucleotides are not bound to the surface directly but to a linker that is coupled to the surface. For

example oligo-ethylene glycol can be used as linker (Maskos and Southern, 1992). Figure 2-4

shows oligonucleotides that are tethered to the chip surface with linkers of different length.

Figure 2-4 Tethered oligonucleotides with linkers of different length. The density shown reflects the situation on

a chip. The figure is taken **from** (Southern et al., 1999).

The length of the linker helps to deal with steric effects that can hamper the duplex formation

during hybridization of the target sequence to the probe (Shchepinov et al., 1997). Tethering one

end of an oligonucleotide to a surface is expected to affect formation of the duplex with a target

sequence in solution, because the bases nearer to the surface are less accessible than more remote

bases. In addition there is steric hindrance between the oligonucleotides as they are extremely

2 Experimental technology: Oligonucleotide arrays 20

densely packed on the chip surface. Duplex yield can be optimised by choosing a linker of

appropriate length and an appropriate probe density

Affymetrix uses a light-directed in situ synthesis for the construction of the probe arrays as

explained in Figure 2-5. Other possible approaches such as ink-jet delivery of nucleotide precursors

to the surface (Blanchard et al., 1996) or confining chemicals physically by masks or physical

barriers (Maskos and Southern, 1993) have been tested but they are not widely used.The glass chip

with the synthetic linker coupled to it is the starting point for the in situ light-directed synthesis of

the oligonucleotides. The linker is protected with a hydroxyl group that can be removed by light. A

photolithographic mask is used to direct light to different areas of the chip resulting in the activation

of the linkers in these areas. Protected nucleotides then couple to the activated linkers. In the next

step other areas on the chip are activated and more protected nucleotides couple to the newly

activated linkers. By repetition of this process arbitrary DNA sequences can be synthesized.

Figure 2-5 Light-directed oligonucleotide synthesis. The figure is taken **from** (Lipshutz et al., 1999). a, A linker

molecule is covalently bound to a solid support. The linker molecule is terminated by a photolabile, protecting

group. Light is directed through a mask to activate selected sites, and protected nucleotides couple to the

activated sites. The process is repeated activating different sets of sites and coupling different bases, thus

allowing arbitrary DNA probes to be constructed at each site. b, Schematic representation of the lamp, mask and

array.

Highly efficient strategies can be used to synthesize the DNA sequences with a minimal number of

coupling steps (Fodor et al., 1991). For example, the complete set of 4 N polydeoxynucleotides of

length N can be synthesized in only 4N cycles.

In practice several chips are produced in parallel. Multiple arrays are synthesized simultaneously on

a large glass wafer. The wafers are then diced, and individual probe arrays are packaged in

injection-molded plastic cartridges that serve as hybridization chambers.

2.3.2 Sample preparation and hybridization

The target mRNA is extracted **from** tissues or cell populations. Routinely PolyA-RNA is used to

generate double stranded cDNA carrying a transcription-start site for T7 DNA polymerase. The

double stranded cDNA is then transcribed in vitro by the T7 DNA polymerase to produce the target

2 Experimental technology: Oligonucleotide arrays 21

cRNA actually hybridized to the chip. In vitro transcription is done in the presence of biotin-labeled

nucleotides, thus the target RNA contains fluorescent markers. As the target RNA is produced by in

vitro transcription, the standard procedure of sample preparation with Affymetrix chips always

includes an amplification step. The standard procedure requires at least 5 g RNA as starting

material. If this amount of RNA cannot be isolated **from** the cell population or the tissue, protocols

have been developed to amplify the starting material. Several of these protocols have been

established (Eberwine et al., 1992; Ohyama et al., 2000; Baugh et al., 2001; Luzzi et al., 2001). It is

crucial to make sure that no bias is introduced by the amplification, and one has to be sure whether

quantitative measurements **from** unamplified targets can be compared to measurements. The

protocol developed in (Eberwine et al., 1992) for example does not allow such comparisons.

As in the cDNA array situation a small amount of target is hybridized to a large excess of

immobilized probes for typically about 15 hours, ensuring that virtually all target sequences bind to

the chip. Purity of the target solution is crucial to avoid artificial fluorescence signals.

In contrast to the cDNA array approach, the target solutions of a comparative experiment are

hybridized to two different oligonucleotide arrays (see Figure 2-1). Thus in the evaluation of this

experiment the results **from** two different hybridization procedures are compared.

2.3.3 Data extraction and image analysis

Data extraction and image analysis are done in a manner that is very similar to the approach used in

a cDNA microarray setting. A laser excites the probes and the resulting fluorescence is scanned,

providing a direct measure for the amount of target RNA bound to the chip. As only biotin is used

as a label for the nucleotides, excitation and fluorescence scanning can be done with slightly less

effort. Just as for cDNA microarrays, a digitized image is produced that represents all probe cells on

the microarray. As comparison of the intensity values produced by different arrays is necessary

even for the most simple comparative experiment, normalization of the **data** obtained by the

fluorescence measurement is an important issue.

Affymetrix computes the intensity of a probe cell as follows. A standard probe cell is represented

by eight **time**s eight pixels. The bordering pixels are excluded and the intensity distribution of the

remaining pixels is computed. The value that is associated to the 75 percentile of this distribution is

used as the intensity of the probe cell. This value is corrected by subtracting a background noise

term. To compute the background the image is divided into a number of sectors (by default 4

horizontal, 4 vertical, giving 16 sectors). For every sector the lowest two percent (for a standard

chip this will result in 430 background cells) of the probe cell intensity values are averaged. This

average is used as the background for the corresponding sector; it is, basically, subtracted **from** all

probe cell intensity values in this sector (see 7.1.11.1 for more details).

Given the corrected intensity values for all probe cells on the array, metrics can be defined that use

the perfect match and the mismatch probe cells of a probe set to give qualitative and quantitative

estimates about the abundance of a target sequence. (See 7.11.1 for some details about metrics

defined by Affymetrix.) The Affymetrix software provides a quantitative metric that is directly

related to the abundance of a target sequence. To be able to compare the metrics for a gene given by

different arrays the values are scaled to target intensity. This step is intended to deal with global

intensity differences between arrays. A target intensity is defined and then the average overall

intensity of all probe cells (after background subtraction) is computed. The factor needed to scale

the overall intensity of an array to the target intensity is thus easily obtained.

2.4 Other technologies

Besides the two basic types of microarrays described above other technologies have been used to

monitor the state of different components of a biological system in parallel. A few examples are

presented here.

2.4.1 Microarray variants

2 Experimental technology: Other technologies 22

Rosetta Inpharmatics, a company owned by Merck & Co, have developed a proprietary microarray

technology. These microarrays are produced by situ-synthesis of oligonucleotides too, but use an

inkjet technology to deliver the nucleotides to the area of synthesis (Blanchard et al., 1996). Several

experiments made use of the Rosetta technology, two examples are (Hughes et al., 2000; Hughes et

al., 2001). A list of publications related to this technology can be found at

http://www.rii.com/publications/default.htm.

Microarrays that have long oligonucleotides (50-100mers) bound to their surface are also in use

(Kane et al., 2000). The oligonucleotides for these arrays are pre-synthesized and then coupled to

defined areas of the chip. Arrays of optical fibers, each tipped with a microsphere carrying

oligonucleotide probes, have also been developed (Steemers et al., 2000).

2.4.2 RT-PCR

RT-PCR is the acronym for Reverse Transcriptase-Polymerase Chain Reaction. Retroviruses

encode for reverse transcriptase that transcribes their single stranded genomic RNA into a double

stranded DNA that may integrate into the host genome. Important early work on retroviruses and

transposons includes (Hu and Tremin, 1990; Luan and Korman, 1993; Lauerman and Boeke, 1994).

To measure the expression level of a gene the reverse transcriptase is used to transcribe the cellular

mRNA into double stranded DNA that is amplified via the polymerase chain reaction. RT-PCR

protocols achieve a rather high degree of accuracy in amplification, but the high error rate of the

reverse transcriptase (around 2x10 -4 ) and different amplification rates for different sequences

impose difficulties on the usage of PCR for exact quantitative measurements. A semi-automated

PCR approach has been used to analyze the expression of 112 genes at nine different **time** points

during rat cervical spinal cord development (Wen et al., 1998). RNA was extracted **from** spinal cord

tissue; for every gene of interest a specific primer was used for the expression measurement. This

approach is not inherently parallel and becomes infeasible if the expression of thousands of genes is

to be measured.

2.4.3 EST **data**

Other sources of information for mRNA expression levels are EST **data**bases. EST stands for

Expressed Sequence Tag and simply designates a cDNA complementary to a part of a mRNA that

is expressed in a cell or tissue type. The term EST appeared first in a paper by M. D. Adams

(Adams et al., 1991). EST libraries are designed such, that their composition should reflect the

mRNA population of the biological sample.

A common approach is to reverse transcribe the total mRNA population and clone 3'-fragments of

the cDNA (200-300bp) into Escherichia coli. Clones **from** the library are then picked at random and

sequenced. The number of tags of a specific sequence can be used as an estimate for the abundance

of a mRNA population in the cell.

This technique was (basically) used to produce large **data**bases of expressed sequence tags for

different cell lines and tissues **from** several organisms. Kousaku Okubo and coworkers published

the first work that performed this approach on a large scale (Okubo et al., 1992).

Large amounts of EST **data** were and are generated mainly through the efforts of the IMAGE

(Integrated Analysis of Genomes and their Expression) consortium (Lennon et al., 1996). The **data**

is deposited in dbEST, a division of GenBank, in which an automated process called UniGene

compares ESTs and assembles overlapping sequences into clusters. GenBank can be found at

http://www.ncbi.nlm.nih.gov. In addition, a large number of ESTs identified by the Institute for

Genomic Research (TIGR, http://www.tigr.org) are now publicly accessible. Several commercial

vendors offer clone sets and EST **data** for purchase.

2 Experimental technology: Other technologies 23

Work published by Rob M. Ewing and coworkers gives a good example of how the publicly

available **data** can be used to infer insight into the regulatory networks of biological systems on the

level of RNA expression (Ewing et al., 1999). EST **data** of Arabidopsis thaliana and Oryza sativa

was scanned to identify genes with identical expression patterns in these plants. Though a large

amount of EST **data** is available, these **data** present a static picture of cell lines and tissues, rather

than **time** courses.

2.4.4 SAGE

SAGE is a technique that was and is used extensively to characterize the mRNA expression profile

of a biological sample by short expressed sequence tags. SAGE is the acronym for Serial Analysis

of gene expression (Velculescu et al., 1995). Figure 2-6 illustrates the SAGE process schematically.

Figure 2-6 Schematic illustration of SAGE. The figure is taken **from** (Stollberg et al., 2000). A, Poly-(A)-RNA is

extracted and transcribed into double stranded cDNA, primed by biotinylated oligo-(dT) (black circles), and

digested with a type II restriction endonuclease. B, The 3'-most fragments are isolated by binding them to

streptavidin beads (gray ellipses). C, The fragments are divided into separate fractions and ligated to different

linkers (L1, L2). D, The isolated linker tags are blunt ended. E, The linker tags are ligated to linker-ditag-linker

constructs and amplified by PCR. F, The ditags are isolated, ligated to concatamers, cloned and sequenced.

SAGE provides short tags, double stranded nucleotide sequences of nine to ten base pairs, **from** a

unique position within each species of mRNA in a mRNA preparation. These tags can be amplified

by the polymerase chain reaction to give a quantitative estimate for the abundance of a mRNA

species. This approach allows the identification of mRNA expression without the sequences being

known in advance, and it uses technology readily available to any lab (with the exception of the

facilities needed for the extensive sequencing to be done).

SAGE has been used to characterize the transcriptome of yeast during the cell cycle (Velculescu et

al., 1997), detecting a total of 4665 genes. It was also used to analyze 19 normal and diseased

tissues (Velculescu et al., 1997; Velculescu et al., 1999), it was found that as many as 43,500 genes

can be expressed in a single human cell type. Other works that use SAGE to analyze transcriptomes

include (Madden et al., 1997; Zhang et al., 1997; Chen et al., 1998).

2 Experimental technology: Other technologies 24

There are however difficulties if the quantitative estimates given by SAGE for different mRNA

species are to be compared directly and in identifying genes using the tags produced by the method

(Stollberg et al., 2000). Factors that contribute to a mathematical bias of the quantitative estimate of

a mRNA species are sampling error, sequencing error, non-unique tag sequences and the inherent

non-randomness of RNA and DNA sequences. One may fail to map a tag correctly to a gene for

example because of the non-uniqueness of the short tags, sample contamination, differential RNA

splicing, DNA polymorphism, incomplete sequence **data** and probably other reasons.

2.4.5 Proteomics approaches

Macromolecular interactions (protein/protein, protein/nucleic acid etc.) are of eminent importance

for the function of a biological system. An understanding of the dynamics of a biological system is

only possible if quantitative and kinetic insight into the protein composition of the system can be

gained. The meaning of the term 'proteomics' is different in this context. Some scientists prefer a

wider meaning of proteomics (Pandey and Mann, 2000); the term proteomics is used then to

designate any kind of experimental technique that can produce quantitative and/or kinetic **data**

about the protein composition of a biological system. We prefer to use the term proteomics in a

more restricted sense: to designate experimental techniques that produce **data** about the protein

expression profile (the proteome) of a biological system in a parallel manner.

Proteomics in this more restricted sense has been associated traditionally with separating a large

number of proteins on two-dimensional polyacrylamid gels (Wilkins et al., 1996; Celis et al., 1996;

Wilkins et al., 1997; Anderson and Anderson, 2001). The usage of two-dimensional gel

electrophoresis dates back to 1975 (O'Farrell, 1975). This technique is used to characterize the

protein composition of a biological system. The two-dimensional polyacrylamide electrophoresis

(2D-PAGE) separates proteins on a sheet of gel, first in one direction based on their isoelectric

point, and then in the other direction based on the molecular weight. After this process the gel will

display a large number of protein spots where each spot consists of proteins of a specific type. The

intensity of a spot on the gel is a direct measure for the abundance of a protein. Every spot of the gel

has to be identified separately, usually by mass spectrometry. Biological mass spectroscopy has

been around since the 1990s and developed into a powerful analytical method that makes it possible

to identify many of the spots on a 2D-PAGE gel rapidly.

However, the analysis of a proteome remains a difficult task. 2D-PAGE gels have been traditionally

hard to reproduce because of the sensitivity to operating conditions and a host of possible artifacts.

With increasing experience these problems could be alleviated somehow using standardized

protocols and higher accuracy techniques (Bjellqvist et al., 1982). In spite of these improvements

the resolution of a gel is limited to about 1000 proteins in a crude cell extract (Futcher et al., 1999;

Gygi et al., 2000). This is no where near enough to characterize the proteome of most human cell

types as these express far more proteins. Given this limited resolution it is obvious that it is difficult

to detect low abundance proteins in a whole protein complement. This is an important problem as

the dynamic range for protein expression is as high as 10 6 (Pandey and Mann, 2000). Usually

affinity based purifying strategies are used to enrich low abundance proteins. Not all proteins can be

separated equally well using 2D-PAGE, for example hydrophobic and large proteins tend not to

enter readily into the second dimension of the gel.

Given a two-dimensional gel the spots on the gel have to be identified by mass spectrometry protein

identification. The 'peptide-mass' approach of Henzel and coworkers relies on digesting the proteins

in the spots by a sequence-specific protease, for example trypsine (Henzel et al., 1993). The

peptides are eluted **from** the gel and subjected to MALDI (Matrix Assisted Laser

Desorption/Ionization), which is a relatively simple **time**-of-flight mass spectrometric method that

distinguishes between peptides of different mass in the mixture. Thus a peptide fingerprint for a

spot on the 2D-PAGE gel is constructed and this fingerprint is used to identify proteins in sequence

**data**bases (Shevchenko et al., 1996). The **data**base sequence of the protein must be complete for the

2 Experimental technology: Other technologies 25

fingerprint to be unambiguous. Some advances have been made in automation of the MALDI

identification procedure (Jensen et al., 1997; Berndt et al., 1999).

A second mass spectrometric approach, called 'tandem mass spectrometry', is used to characterize

the peptide mixture of a 2D-PAGE gel spot. The peptides are ionized by electrospray ionization,

and then sprayed into a tandem mass spectrometer. This device accomplishes two tasks (hence its

name). It resolves the peptides in the mixture, isolating one species at a **time** and dissociates the

peptide into amino- or carboxy-terminal-containing fragments. The amino acid sequence of the

analyzed peptide is thus provided. This additional information makes it easier to identify proteins,

but the approach is technically more complex and less scalable then MALDI fingerprinting. Work

on tandem mass spectrometry and identification of proteins by the resulting mass spectra includes

(Mann and Wilm, 1994; Eng et al., 1994; Wilm and Mann, 1996; Wilm et al., 1996; Mann, 1996;

Yates, 1998; Link et al., 1999; Yates, 2000).

Biological mass spectrometry is evolving rapidly. A promising new technology is the 'MALDI

quadrupole **time**-of-flight' instrument (Shevchenko et al., 2000).

Another way to monitor the protein expression profile of a biological system is the protein chip

approach. A variety of "bait" proteins such as antibodies can be immobilized in array format onto

specially treated surfaces. The surface is then probed with the sample of interest and only proteins

that bind to the relevant antibodies remain bound to the chip (Lueking et al., 1999). This approach

depends on a specific antibodies and a number of technical problems still need to be overcome

(Pandey and Mann, 2000). Other protein chips immobilize peptides, protein fragments or whole

proteins to the surface. One approach uses this technique and couples it to a MALDI to identify the

protein bound to a specific "bait" (Nelson, 1997; Davies et al., 1999).

In summary, proteomics is even more in its infancy than genomics. The tremendous complexity that

is found in the protein composition of a biological system makes it difficult to obtain **data** about it

in a unified and parallel high-throughput fashion. Nevertheless techniques like 2D-PAGE yield

valuable **data** as the protein composition is of crucial importance for the understanding of a

biological system.

3 Gene expression analysis

3 Gene expression analysis 26

Several experimental techniques like microarrays, SAGE and 2D-PAGE are used in parallel to

obtain information about the components of a biological system. Measurements at the RNA level

are at the moment by far the most productive as microarrays allow for rapid measurements in a

high-throughput fashion. In this section some of the most popular methodologies developed for

analysing the huge amounts of microarray **data** are presented.

Microarrays make it possible to monitor the effect of a treatment to a biological system on the

expression of virtually all its genes. The term 'treatment' is intended here to represent a wide variety

of experimental setups. For example two cell populations, where one is treated with a drug and the

other is untreated, may be compared; a wild type bacteria population may be compared to a mutant

population; cell populations under different growth conditions may be compared, or the expression

profile of different types of tumor cells can be studied. Another kind of treatment in this general

sense is a **time** **series** experiment. In this kind of experiment the gene expression profile of a

biological system is monitored at different **time** points. For example, the gene expression during the

sporulation of yeast was measured in one of the early microarray experiments (Chu et al., 1998).

Time can be regarded as the treatment in **time** **series** experiments.

The experimental designs vary strongly in detail according to the question that is to be investigated

by the experiment. However, during the analysis of most of the experiments the question: "Is a gene

expressed differentially?" must be answered. Considering this question one must be aware that

noise is introduced into a microarray experiment by a number of sources. The following subsection

addresses this in a qualitative way.

3.1 Noise and RNA quantification

There are a number of noise sources in a microarray experiment. The measurement noise is

introduced by the technical procedure itself. In addition the biological system under observation is

inherently noisy due to the stochastic nature of the molecular regulatory networks (see 1.1), this will

henceforth be called biological noise. A look at Figure 3-1 will convince the reader that there is

significant variation in microarray experiments.

signal

1000

900

800

700

600

500

400

300

200

100

0

0 5 10 15

days

20 25

Figure 3-1 Three **time** **series** **from** the inflammatory bowel disease **data** (see 4.2.11.1). Every **time** point is the

average of four samples. The error bars indicate the standard deviations.

3 Gene expression analysis: Noise and RNA quantification 27

From the standpoint of the researcher there is a big difference between measurement noise and

biological noise. Measurement noise is undesirable because it masks the biological signal we are

interested in. Biological noise, on the other hand, is an important feature of the system under study,

as many processes at the transcriptional level display stochastic behavior. Variation will thus occur

even in cell populations which are "perfectly" synchronized and are of identical genotype;

consequently insight into the natural variability of the biological system under study is

indispensable for understanding it.

Possible sources of noise are found throughout the measurement process. The probe cell of an

oligonucleotide array, or the spot in the case of a cDNA array, will not be exactly identical **from**

array to array. Oligonucleotide arrays should perform better than cDNA arrays in this respect

because the in situ synthesis of the oligonucleotides is well defined, while the fixation of the cDNA

probes to the surface leaves the probe in an ill-defined state. The differences that arise in the

hybridization signal due to differences of the spots on the cDNA microarrays are directly

demonstrated in the work of Lee and coworkers (Lee et al., 2000). The authors printed a set of 288

genes in triplicate onto cDNA microarrays. A total RNA extract obtained **from** human tissue

specimen obtained during surgical procedure was hybridized and it was found that the triplicate

spots on the chip showed considerable variation. This experiment demonstrates local variations due

to measurement noise only. In addition to the local variation, global intensity differences were

observed **from** chip to chip. The normalization techniques applied to correct for local and global

variations on the chip reflect this fact (see 2.2.3, 2.3.3 and 7.11.1). These algorithms are rather ad

hoc and should not be expected to produce perfectly corrected hybridization signals (Lee et al.,

2000). Additional sources of experimental noise include variation in the hybridization conditions

and the scanning of the chip after the hybridization. Noise can also be introduced into the

measurement during image processing too. It is frequently difficult to superimpose a regular grid

with the target locations in case of cDNA chips that use a nylon membrane as the hybridization

surface (Duggan et al., 1999). The result is a variation in the signal intensity measured **from** chip to

chip that is due to slightly incorrect grids in the image analysis.

Figure 3-2 Oligonucleotide array experiments. Log-log plot of the hybridization intensity (average of the perfect

match and mismatch differences of the probes in a probe set) versus an exactly known target concentration of

ten different target mRNAs. The background was provided for each measurement by a known concentration of

T10 RNA. The figure is taken **from** (Lockhart et al., 1996).

Hybridization of sample RNA to chips is done in large excess of probes bound to the chip surface

with respect to the sample sequences. This allows the hybridization to be treated as a pseudo-first

order reaction. The hybridization **time** is chosen so that virtually all sample sequences bind to the

chip (usually at least overnight). Thus if the relation of the incorporation of the label (fluorescence,

3 Gene expression analysis: Noise and RNA quantification 28

radioactivity) into the sample sequence to the amount of sample sequences is linear, the relation

between the signal strength and the amount of sample sequences is also linear (see Figure 3-2).

However, if the signals due to different genes are to be compared, the situation can be difficult,

because the cross-hybridization behavior of sample sequences differs **from** sequence to sequence.

While some sequences cross-hybridize strongly, others do so only weakly. This is reflected in the

process used by Affymetrix to design the probes for the different cells of a probe set. Factors that

have an impact on the selection of the probes (see 2.3.1) are the length of the oligonucleotides, the

base composition of the probes, and uniqueness of the probe with respect to other genomic

sequences. It is obvious that optimal hybridization cannot be ensured for every sequence of a

solution that contains thousands of different sample sequences. Thus cross-hybridization effects

may have an influence on the signal strength for certain genes.

The situation is even more complicated for cDNA arrays that hybridize control sample and treated

sample to the chip simultaneously using different fluorescence labels to discriminate between them.

There is evidence that incorporation of the different dyes is sequence-dependent (Kerr and

Churchill, 2001).

To answer the question: "Is a gene differentially expressed?" the noise in the experiment has to be

taken into account. The next subsection focuses on the most popular approach to answer this

question.

3.2 Analysis by ad-hoc thresholding

The traditional approach that is used to deal with the noise in microarray **data** is ad-hoc

thresholding. Consider a single treatment experiment. A control sample and a treated sample are

hybridized to two separate chips in the case of a oligonucleotide array experiment, or to a single

chip in case of a cDNA array experiment using two different dyes. The decision as to whether the

control signal is different **from** the treated sample signal cannot be based on statistical grounds,

because at least two samples per treatment condition would be needed. Thus the decision whether

the two values differ is made ad-hoc, based on experiences with similar experiments.

Affymetrix provides an algorithm suite that is recommended for the analysis of the **data** produced

with Affymetrix chips. At the **time** of this writing Affymetrix announced a new version of this

algorithm suite that uses statistics to evaluate a probe set and to compare probe sets **from** different

chips. The new algorithms, however, will not be in use before 2002. The algorithms of the old suite

are based on experiments presented by Lockhart and colleagues (Lockhart et al., 1996; Wodicka et

al., 1997). Affymetrix uses the average difference between the perfect match probes and their

corresponding mismatch probes of a probe set as a measure for the abundance of a transcript. Thus

the metric is an average value obtained **from** the probe pairs that make up a probe set. However, the

old algorithms do not estimate the variation across the probe pairs, thus no confidence limit can be

given for the so-called 'AverageDifference'. In a comparative experiment the transcript levels **from**

two different hybridizations must be compared, this is done by a metric called 'FoldChange'. The

FoldChange is basically computed as the ratio of the average intensities of two corresponding probe

sets (see 7.1.2 for more details). Consider a probe set of the treated sample X, and a probe set of the

control sample Y. If X has a higher signal intensity than Y the transcript is considered to be (X/Y)fold

up-regulated, if Y has a higher signal intensity than X the transcript is considered to be (Y/X)fold

down-regulated. Test experiments as published by Lockhardt and coworkers were done to

establish thresholds for the decision whether a x-fold regulation reflects a real regulation effect or

not (Lockhart et al., 1996). Usually a two to three fold change is considered to be a real change,

while smaller changes are assumed to be due to measurement noise. Figure 3-3 illustrates the use of

these threshold values.

The new, statistical algorithms use Tukey's Biweight method to estimate the abundance of a

transcript (for more details see 7.1.31.1). The quantitative estimate is still computed on the basis of

the difference of perfect match probes and their corresponding mismatch probes, but the Biweight

3 Gene expression analysis: Analysis by ad hoc thresholding 29

methodology provides a confidence limit for the estimate. The comparison of transcript levels **from**

different hybridizations is done in a similar manner by calculating the ratio of the estimated

transcript levels (for more details see 7.1.31.1). The Biweight methodolgy provides confidence

limits for the resulting estimate in this case too. It must be emphasized that these new algorithms do

not alleviate the need for multiple experiments. The algorithms make use of the redundant probe set

design of the Affymetrix chip. This allows an assesment of the variation within a probe set,

basically answering the question: "How big is the variation in the measurement with respect to the

different probe pairs of a probe set?" The new algorithms thus provide valuable information, but

they do not account for the full variation present in a microarray experiment. Variation due to

differences in probe sets on different chips, or global differences of chips are examples for sources

of measurement noise that are not accounted for. Most importantly the biological noise cannot be

assessed by this approach. Multiple measurements are necessary to estimate the full variability of a

microarray hybridization under a given treatment condition.

Figure 3-3 Illustration of the ad-hoc threshold approach for differential expression. Three experiments are

shown. Two RNA samples of yeast grown in a minimum medium (A) and in a rich medium (B) are hybridized

independently and the signal intensities are plotted against each other. (C) The minimum medium intensities are

plotted against the rich medium intensities. The solid lines indicate a two-fold regulation; the dashed lines a

three-fold regulation. A number of genes are expressed differently by a factor two or three in yeast grown in

minimum medium compared to yeast grown in rich medium. The figure is taken **from** (Wodicka et al., 1997) and

is used in a different context as in the original publication.

3 Gene expression analysis: Analysis by ad hoc thresholding 30

Ad-hoc thresholding is done in a very similar way in the case of cDNA array experiments. If a

single type of marker (dye, radioactive labeling) is used in the experiments, the situation is

analogous to an oligonucleotide experiment. However, unless the cDNA array is designed in a

special way (for example duplicate spots for a single probe sequence type (Herwig et al., 2001))

there is no way to account for even a part of the measurement noise. The most popular approach in

the use of cDNA microarrays is simultaneous hybridization of a treated and a control sample to a

single chip. Two different dyes are used to directly compare the signal strengths produced by the

two samples (see 2.2.2). The decision as to whether or not a gene is regulated is done by ad-hoc

thresholding based on the signal intensities obtained **from** a single spot on the chip. Though the

thresholding approaches vary in detail they all share a common basis: the ratio of the signal

intensity for the treated and the control sample (1.0 if no transformations are done) in case of no

differential expression is surrounded by an upper and a lower threshold value (usually in a

symmetric fashion). Every gene that shows an expression ratio value outside these boundaries is

considered to be differentially expressed. DeRisi and coworkers identified differentially expressed

genes using a three-fold change as cutoff for the logarithmic ratios of the fluorescence intensities

(DeRisi et al., 1996). The cutoff was standardized with respect to the mean and the standard

deviation of a set of 90 genes believed to not be differentially expressed ("housekeeping genes"). A

paper published by Brachat and coworkers (Brachat et al., 2000) can be regarded as representative

for the use of ad-hoc thresholding. The experiment monitored the expression level of mouse pro-B

cells after two different apoptotic stimuli at 0, 0.5, 1, 4, 8 and 24 hours. To every cDNA array the

control sample obtained at **time** point 0 and a sample obtained after the apoptotic stimulus was

hybridized simultaneously. All genes that showed at least a two-fold change in their expression

ratio were included into the subsequent analysis.

Regardless of the technology that is used in the analysis, the decision whether the level of a

transcript changes **from** one treatment condition to the other can be done best by the use of several

parallel measurements. These allow the full variation present in the measurements to be taken into

account, and thus the decision about differential expression can be done in a sound statistical

manner.

3.3 Approaches to gene expression analysis made

Before our approach to gene expression analysis is introduced we briefly review the most

productive approaches taken by other groups. Microarray experiments are especially well suited to

be used as a descriptor of a cellular or tissue state. The expression profile of a cell or a tissue is used

here to discriminate between different global states and not to analyze single genes. This approach

is most widely used in cancer research. For example clinical subtypes of leukemia (Golub et al.,

1999), lymphoma (Alizadeh et al., 2000), melanoma (Bittner et al., 2000) and breast cancer (Perou

et al., 2000) can be distinguished by microarray expression profiles, which is very important for

diagnostic and therapeutic purposes. The use of an expression profile as descriptor implies that

profiles must be compared and their similarity or dissimilarity measured.

AML

Figure 3-4 Idealized single-gene expression pattern for the AML-ALL classifier. All genes (five are shown here)

are highly expressed in case of AML and weakly expressed in case of ALL.

One of the goals of the work published by Golub et al. was to establish a systematic distinction

between two known types of leukemia with gene expression profiles using Affymetrix chips. The

ALL leukemia (Acute Lymphoblastic Leukemia), arising **from** lymphoid precursors was to be

discriminated **from** the AML leukemia (Acute Myeloid Leukemia) arising **from** myeloid precursors.

The class predictor was established on a set of 38 tumor samples of known type (27 ALL, 11

ALL

3 Gene expression analysis: Approaches to gene expression analysis made 31

AML). Two idealized single-gene expression patterns, as illustrated in Figure 3-4, were defined for

the ALL and the AML tumor classes, with expression levels being uniformly high for the class

under consideration and being uniformly low for the alternative class.

In the next step single genes whose expression correlated well to the idealized single-gene

expression pattern were identified. To make sure correlation was higher then to be expected by

chance, a set of permutated, idealized single-gene expression patterns was used to determine the

correlation values to be expected in case of chance correlation. The correlation was not measured by

the Pearson correlation coefficient, but rather with an ad hoc measure defined by the authors to

reflect the difference between the two tumor classes with respect to the variation within the classes.

A set of 50 genes to be used by the class predictor was then chosen **from** the set of all genes based

on high correlation to the idealized single-gene expression patterns. Each of these 50 genes casts a

weighted vote in the prediction process that depends on the expression level of the gene and the

correlation to the idealized single-gene expression pattern. The predictor was then used to analyze

an independent set of tumor samples of known type. The independent set consisted of 34 tumor

samples, for 29 of these the predictor made a strong prediction, the accuracy in these cases was 100

percent. Note that only the correlation between single-gene expression patterns is used to construct

the classifier, the decision whether a gene is differentially expressed with respect to the two tumor

types is not necessary.

This work illustrates some principles in the use of microarray experiments as descriptors of the

cellular state. First a training set of samples is needed to construct a classifier; replicated

experiments are used to estimate the variation for a given gene. While Golub and coworkers used a

handcrafted algorithm to construct the classifier, a large selection of automatic machine learning

methods are also available. These methods use a correctly classified **data** set to establish (learn) the

functions that are then applied to new **data** set. Methods that use this approach are called supervised

learning methods, as they need to be trained by correct classification examples. Once the classifier

is built, a single experiment is enough to decide what class a sample belongs to. This makes

classifiers based on microarray **data** useful diagnostic tools. The behavior of a single gene is not

sufficient to establish a classifier that is able to make a prediction based on a single experiment. The

classifier must be composed of several genes to compensate for the variation in the behavior of

single genes. Different supervised methods for class prediction are in use for gene expression

analysis and basically any machine learning technique can be applied to classify microarray **data**.

Brown et al. have trained support vector machines to recognize five functional protein classes in

gene expression **data** (Brown et al., 2000) and compare the performance of support vector machines

to other machine learning techniques. Neuronal networks are a popular machine learning technique

and have been applied to classify gene expression **data** too (Khan et al., 2001).

In contrast to supervised methods, unsupervised methods use predefined functions or criteria to

identify distinct classes in **data** sets. As no training set is needed, these methods are well suited for

exploratory **data** analysis that tries to establish classes and relations within the **data**. A recent

example for such an exploratory analysis is the work of Hughes and coworkers (Hughes et al.,

2000). In this work hundreds of gene expression profiles of yeast in response to treatment with

various drugs and profiles of yeast strains with mutations in genes of known function were stored in

a **data**base. The expression profiles in the **data**base were then compared to the expression profiles of

yeast with mutation of genes of unknown function. In many cases a function for the gene under

consideration was suggested by close similarity of the expression profile of the uncharacterized

mutant to a profile of a well-characterized mutant stored in the **data**base. Hughes and coworkers

judge the difference between two expression profiles simply by the number of genes that are

differentially expressed **from** one profile to the other. To allow for a reliable decision about the

differential expression of a certain gene its variability was measured in over 60 control experiments

with a wild-type strain.

By far the most popular unsupervised technique used in exploratory analysis of microarray **data** is

clustering of single-gene expression patterns according to their pairwise similarity. If several

3 Gene expression analysis: Approaches to gene expression analysis made 32

expression profiles have been obtained **from** a sample, e.g., by a **time** **series** experiment or a multiconditional

treatment experiment, clustering techniques can be used to identify genes that show

similar expression patterns in **time** points or across the treatment conditions. Under the basic

assumption that genes with similar expression patterns may be controlled by the same regulatory

mechanisms, these groups are then analyzed in more detail. For example, advances have been made

in yeast, in which it has been possible to identify common DNA motifs in the promotor regions of

those coregulated groups of genes (Cho et al., 1998; Spellman et al., 1998; Roth et al., 1998;

Tavazoie et al., 1999). Groups of genes with similar expression patterns can be constructed with

different clustering methods. However, all these methods face the same problem. A large number of

**data** vectors (the single-gene expression patterns) are to be partitioned into a number of subgroups

such that the members of a subgroup (named a "cluster") are as similar as possible and the

subgroups are as well separated as possible. With increasing numbers of **data** vectors it is

impossible in practice to evaluate all possible partitionings; to give an example 8.57x10 43

partitionings are possible for 50 **data** vectors, and microarray experiments provide at least 5,000

**data** vectors that must be partitioned. Any clustering method that addresses this problem will not

evaluate all partionings that are possible, but rather adopt a heuristic strategy to come up with a

solution in reasonable **time**. This, however, means that the solution usually will be suboptimal. A

concise introduction to clustering techniques can for example be found in a book by D. Steinhausen

and K. Langer (Steinhausen and Langer, 1977). As clustering is one of the fundamental tasks in

science, and the clustering problem cannot be solved in practice in an optimal way, a large number

of approximate clustering methods have been developed. Practically any standard method that is

available has already been tried on clustering single-gene expression patterns and additionally some

new clustering methods have been designed especially for gene expression **data** (for example (Ben

Dor et al., 1999)). The most widely used standard clustering methods are k-means clustering

(Tavazoie et al., 1999), hierarchical clustering approaches (Spellman et al., 1998)) and selforganizing

maps (Toronen et al., 1999). Standard clustering techniques use a (dis)similarity

measure to evaluate the similarity of two **data** vectors in a pairwise comparison. The choice of this

(dis)similarity measure determines the result of a standard clustering method. Once the

(dis)similarity measure is defined the clustering method will always produce a result for a given

**data** set. It is an art to define the (dis)similarity measure such that the resulting clustering captures

some of the "natural structure" of the **data** and is not just some artificial structure imposed on the

**data** by the clustering process.

Various different (dis)similarity measures have been developed; probably the most popular of them

is the Pearson correlation coefficient. This is used in many studies, just a few examples are (Schena

et al., 1996; Spellman et al., 1998; Eisen et al., 1998; Ewing et al., 1999; Iyer et al., 1999). The

statistical significance of the Pearson correlation coefficient can be assessed. Given two **data** vectors

whose entries are drawn at random **from** a Normal distribution the Pearson correlation coefficient

has bivariate Normal distribution corresponding to the null hypothesis that the **data** vectors are

uncorrelated and a confidence ellipsoid can be computed for the Pearson correlation coefficient. If a

given correlation coefficient is considered, the probability for chance correlation decreases with

increasing length of the **data** vectors. Furthermore during clustering a large number of pairwise

comparisons are done, and with increasing numbers of pairwise comparisons the probability for a

strong correlation due to chance effects increases.

A simple artificial example illustrates the relation between the length of the **data** vector, the number

of **data** vectors and chance correlation (Claverie, 1999). Consider single-gene expression patterns of

length L. We make the (completely unrealistic) assumptions that every measurement point of a

pattern can adopt one of two possible states "on" or "off", and that both states are equiprobable.

Thus the probability that two expression patterns are identical just by chance is ( ) L

1 2 . Now

consider a chip with N genes where N ( N −1)

2 pairwise comparisons are possible,

( ) ( ) L

N 1 2 ⋅ 1

3 Gene expression analysis: Approaches to gene expression analysis made 33

N − 2 perfect identities due to chance effects can be expected. Assume we want to

make sure that not a single pairwise identity is due to chance effects, thus we impose the constraint

Approximating ( N −1)

2

L

( N −1

) 2 ⋅ ( 1 2)

3 Gene expression analysis: Sparse **time** **series** and **Feature** **eXtraction** 34

A key feature of most **time** **series** experiments done with the microarray technology is their

**sparse**ness. The resolution in **time** is usually below the characteristic **time** of genetic regulation: the

mRNA expression level is measured at a few **time** points spread over a large observation interval.

Figure 3-5 shows an example of a typical microarray **time** **series** experiment. This example

illustrates that microarray **time** **series** experiments provide only a few snapshots of the kinetics.

Such **sparse** **time** **series** **data** cannot be used to model the process under study in a quantitative

fashion, thus the analysis is restricted to qualitative aspects. We wanted to answer three basic,

qualitative questions by the analysis of a **time** **series**, namely: "Does the signal level change during

the **time** course?", "At which **time** points do the changes occur?" and "How many signal levels can

be detected?". The basic idea for the algorithms was to extract these qualitative features of a **time**

**series** measurement and to condense it into an easy-to-use pattern string, hence we name the

approach '**Feature** **eXtraction**' or 'FX' for short.

We are interested to evaluate how the signal strength changes during the experiment. Basically

three events can happen: the signal can stay constant, it can increase or decrease. Of course the

signal of any noisy process will go up and down all the **time** due to random effects, thus the correct

question to ask is whether the change observed between two (not necessarily consecutive) **time**

points is statistically significant or not. Furthermore we are interested in identifying the signal levels

in a given **time** course. Figure 3-6 shows an artificially constructed example **time** **series** and the

qualitative features the FX algorithms are intended to extract.

signal

4

3.5

3

2.5

2

1.5

1

0.5

0

-0.5

-1

-1 0 1 2 3 4 5 6 7 8 9 10

**time** points

Figure 3-6 Basic qualitative features of a **time** **series** running over ten **time** points, 0 … 9. The boxes indicate

three distinct signal levels. The arrows indicate the shortest statistically significant jumps between these signal

levels. The decrease between **time** point 6 and **time** point 8 is a slow decrease, as the difference between **time**

point 6 and 7 is not significant. In this artificial example the **time** points are equidistant but this is not a prerequisite

for the FX analysis.

The example **time** **series** in Figure 3-6 displays three significant changes and three signal levels. The

significant changes that occur directly between a **time** point and its successor are obviously

interesting features of the example **time** **series**. The third change of the example **time** **series** is a slow

decrease between the points #6→#8, where the changes between #6→#7 and #7→#8 are not

significant if considered alone. This information can be easily coded into a string to produce a low

resolution pattern that encodes the order of the statistically significant upward and downward

changes only, but does not store information about when these changes occur.

3 Gene expression analysis: Sparse **time** **series** and **Feature** **eXtraction** 35

We can include the information about the **time** points to produce a high-resolution pattern. Storing

the **time** points at which changes take place implicitly introduces a third qualitative feature into the

encoding string, namely "stays constant between #x and #y." In order to make the definition of the

high-resolution patterns unambiguous, we decided that changes of the signal level are to be kept as

short as possible; this implies that the constant regions of the **time** **series** will be as long as possible.

Furthermore it is interesting to detect the number of discrete signal levels in a **time** **series**, as every

discernible signal level may correspond to a distinct regulatory state. The FX algorithms are able to

extract this information, although in case of gene expression one would expect only two signal

levels, "high" and "low", corresponding to the intuitive assumption that a gene is either expressed or

is not expressed. Both, the low and the high-resolution pattern can be extended to include

information about the number of signal levels present in a **time** **series**.

4 Development of the tools

4.1 **Feature** **eXtraction**

4.1.1 Identification of significant jumps

4 Development of the tools 36

The first step of **Feature** **eXtraction** is the identification of significant jumps in the **data**. One must

decide when a **time** **series** "goes up" or "goes down" in a statistical sense. We apply a one-way

analysis of variance (ANOVA) approach to detect significant changes of the signal level in a **time**

**series**. We treat a **time** **series** as a single-factor experiment, whereby the **time** points in a given **time**

**series** are regarded as the ANOVA "groups", and the sole "treatment" that causes differences

between the "groups" is the effect of **time**. A prerequisite for this approach is that at least two

parallel measurements are available for any point of a **time** **series**, otherwise the variance in the **data**

cannot be estimated at all.

Analysis of variance is a widely applied concept in statistical inference. The classical work in this

field was written by Henry Scheffe and published in 1959 (Scheffe, 1959). Material about analysis

of variance and related topics can be found in almost every textbook on applied statistics. Beside

the work of Scheffe we have been guided in the outline of the concepts presented in this chapter by

other books on applied statistics (Winer, 1971a; Ledermann, 1980; Stuart and Keith, 1987; Sachs,

1999).

4.1.1.1 ANOVA basics

The analysis of variance can be based on different models to account for the systematic ("the

treatment effect") and unsystematic ("the noise") in an experiment. Suppose the signal of a process

is measured at k different points in **time**. At every **time** point the signal strength is sampled nj **time**s.

We assume that the measurement procedure at a given **time** point is equivalent to sampling **from** a

population that has a Normal distribution

( µ , σ )

The nj measurements ( x x ,..., x ,..., x )

j : j N X j n i ,..., 2 , 1 =

j = 1,

2,...,

k

1 , 2 j ij n j at **time** point j constitute a random sample **from** the

j j

population Xj, which has normal distribution with mean µj and standard deviation σ. We assume

that the mean may vary in **time**, while the standard deviation is common to the measurements taken

at different **time** points. We will refer henceforth to σ as the common standard deviation of the

measurement and to σ 2 as the common variance of the measurement.

If the treatment (the effect of **time**) induces changes in the signal level, these are reflected by the

mean of the corresponding measurement point. If there is no treatment effect, all group means will

be equal. The null hypothesis: "there is no treatment effect", can be stated more formally as

µ = µ = ... = µ = ... = µ .

1

2

The number of parallel measurements for a given **time** point will mostly be intended to be equal by

the experimenter, but in practice may vary slightly. For example microarrays frequently miss values

for some to all of the parallel measurements of a given **time** point.

Table 4-1 shows a general **time** **series** experiment with several parallels. These quantities are used in

the ANOVA analysis of the **data**. The columns of this table represent the **time** points of a **time**

**series**; thus every **data** row can be seen as a single **time** **series**. The last three rows define quantities

that are useful in the analysis of variance.

j

k

4 Development of the tools: **Feature** **eXtraction** 37

Table 4-1 Basic quantities used in ANOVA. The table is adapted **from** (Ledermann, 1980).

Treatment no

Data

1 2 ··· k

Total

Number of observations

x11 x12 x1k x21 x22 x2k : : :

xi1 xi2 xik

: : :

xn 1

xn 2

x k

1

=

=

1 n

C1 xi1

=

i 1

=

2 n

C1

xi

i 1

n1 n 2

Average c 1 = x⋅1

= C1

n1

c 2 x 2 = C2

n2

2

2

···

···

C

n k

= k n

k n j

1 xik

G =

i=

1

j=

1 i=

1

nk

n =

k

j=

1

= ··· c ⋅ k = x⋅k

= Ck

nk

g x = G n

An important statistic for the analysis of variance is the sample sum of squares. It is in general

defined as

SS

( y y)

i −

= n

where y 1 , y2

,..., yn

is a sample **from** some kind of population and y is the mean of the samples. The

sample sum of squares is a quantity that reflects the deviation of the sample observations **from** their

sample mean. According to the terminology defined in Table 4-1 the total sample sum of squares,

SSt, for a **time** **series** experiment as illustrated in Table 4-1 can be written as

SS

t

=

k n j

(

xij

− x⋅⋅

)

j=

1 i=

1

It can be shown, that the total sample sum of squares can be partitioned as follows:

k n j

k n j

k

2

2

( xij

− x⋅⋅

) = ( xij

− x⋅

j ) + nj(

x⋅

j − x⋅⋅

)

2

,

2 .

j=

1 i=

1

j=

1 i=

1 j=

1

SSt = SSw + SSb.

The total sum of squares can be written as the sum of the within-group sample sum of squares, SSw,

and the between-groups sample sum of square, SSb. The sampling expectation of SSw is (n-k)σ 2 ,

where σ 2 is the common variance of the measurement. It is intuitive that the within-group sample

sum of squares reflects the variations due to unsystematic variation only, as the treatment conditions

are identical for the parallel measurements of a group (e.g., the parallel measurements for a given

**time** point). If there are no treatment effects the sampling expectation of the between-groups sample

sum of squares, SSb, is (k-1) 2 . All the group means are considered to be equal in this case.

However, if there is an effect of the treatment the sampling expectation of SSb will increase with the

treatment effect, while the within-group sample sum of squares is not affected. Thus the quantity

2

,

= ⋅⋅

n

j

x

ij

4 Development of the tools: **Feature** **eXtraction** 38

f

SS

SS

( k −1)

( n − k )

SS

=

SS

σ ⋅

⋅

= b

w

2

b

2

w σ

( k −1)

( n − k )

is an appropriate test statistic, large values of which indicate a real treatment effect.

The sum of squares used in this statistic are normalized by the between-groups degrees of freedom,

dofb = k −1

(the number of **time** points minus 1) and by the within-group degrees of freedom,

dof w = n − k (the number of total observations minus the number of **time** points) to obtain the

between-groups mean square, MSb, and within-group mean square, MSw, respectively. Thus the fstatistic

can be written as

MS

b

f = .

MSw

The f-statistic is employed by the F-test (see 4.1.1.4) and Scheffe's S-method (see 4.1.1.7). In

practice it is convenient to compute the quantities needed for this statistic as follows. In addition to

the quantities defined in Table 4-1 we define the sample sum of squares of all observations

S =

k

n

j

j=

1 i=

1

Then using the terminology **from** Table 4-1 we can compute the within-group sample sum of

squares and the between-groups sample sum of squares as

w

x

2

ij .

( C ⋅c

+ C ⋅c

+ + C c )

SS = S −

... ⋅

1 1 2 2

k k and

SSb = C1

⋅c1

+ C2

⋅c

2 + ... + Ck

⋅c

k − G ⋅ g .

The model we use for the analysis of variance makes two fundamental assumptions, first the

measurements are assumed to have Normal distribution, second the distribution of the measurement

error is assumed to be homogeneous over all treatment groups. The next two subsections deal

shortly with these two assumptions.

4.1.1.2 Test for normality

R.A. Fisher commented on the importance of Normality **from** a practical point of view: "Departures

**from** Normal form, unless very strongly marked, can only be detected in large samples; conversely,

they make little difference to statistical tests on other questions". Though this somewhat general

statement cannot be used as an argument against the testing for Normality, it indicates that large

sample numbers are needed to do so. As the microarray **data** we could obtain provide a maximum of

around five samples per **time** point it is not very sensible to test for Normality of the sample. For the

**data** we are using here we assume that the measurement values have Normal distribution.

Since testing for Normality is of great interest there are consequently many tests available. The tests

approach the problem **from** different directions. The sampling distribution of the skewness and the

kurtosis of a sample that has Normal distribution is distributed normally with mean zero and

defined standard deviation. This property can be used to test for Normality. Another way to test for

Normality is the test of the goodness of fit of the empirical distribution with the Normal

distribution. Either the χ 2 or Kolmogoroff-Smirnoff test for goodness of fit (both in various

modifications) are used in this case.

4.1.1.3 Test for homogeneity of variance

Our model of the **time** **series** assumes that only the group means are affected by the treatment, thus

if a treatment effect is given (e.g., the **time** **series** changes its signal level) the mean of a group is

4 Development of the tools: **Feature** **eXtraction** 39

shifted up or down. The variance of the groups is assumed to be unaffected even if the treatment has

an effect on the group means. This property is usually referred as 'homogeneity of variance'.

As we have no knowledge about the effect of the treatments upon the variance we obviously want

to check this assumption. There are several tests available to test for homogeneity of variance.

Depending on the exact situation one or another is recommended. The most popular test, probably

because it is based on an easy to use χ 2 statistic, is Bartlett's test. However, this test is only

recommended for **data** that are known to have Normal distribution, as the test is very sensitive to

departures **from** Normality. In addition this test is recommended only in situations where the

number of samples per group exceed ten. Both conditions will not be fulfilled in many cases of

**sparse** **time** **series** **data**; thus we did not implement Bartlett's test.

A test that is quite robust to departures **from** Normality but is recommended for sample sizes that

exceed ten is Levene's test. The calculation of the significance of this statistic is straightforward in

our case as it is done via the F-ratio distribution (4.1.1.4).

A test that is recommended by several authors (Winer, 1971a; Sachs, 1999) in cases of small sample

sizes is Cochran's test. This test is especially useful in cases where the variance of a single group is

much bigger than the other variances. The statistic for k treatment groups is

q

2

smax

Cochran = k 2

s

j=

1 j

2

s max , is divided by the sum of all squared within-group

2

s j . A transformation of the statistic can be used to access the significance of the null

The maximal within-group variance,

variances,

hypothesis of equal variances via the F-ratio distribution (see 4.1.1.4). Using this transformation we

compute the significance level, SL, as,

( ( ) )

( ) ( )

1

SL = Pk

⋅ F k −1

⋅ n,

n >

.

qCochran

−1

⋅ k −1

Due to the numerical properties of this transformation and limited precision of the calculation on a

computer, the significance level may exceed one slightly. In this case we adjust the significance

level by subtracting the actually computed value **from** 2, thus producing a significance level that is

slightly smaller than 1. In this expression n denotes the harmonic mean of the sample sizes of the

group means.

In general the harmonic mean, n H , is defined as

4.1.1.4 The F-test

n

k

H =

k

1

j= 1 n j

with nj 0.

In order to find significant changes in a **time** **series** the first step is to test whether a **time** **series** can

be considered constant or not. This is done by a test that is based on the so-called F-ratio

distribution, the F-test.

The weighted ratio of two mutually independent chi-squared variables, Ka and Kb on a and b

degrees of freedom respectively, induces a random variable Fa,b

F

( a,

b)

K

=

K

a

b

a

b

.

4 Development of the tools: **Feature** **eXtraction** 40

that has the so-called variance-ratio or F-distribution on a and b degrees of freedom. The chisquared

variables are normalized by their degrees of freedom for convenience reasons. Using the

scaling factor n/m the expectation of F(m, n) is nearly unity. To be precise it is a/a-2 (note that the

expectation depends only on a).

It can be shown that, in case of no treatment effect, the between-groups sample sum of squares, SSb,

and the within-group sample sum of squares, SSw, divided by the common variance of the

measurement are chi-squared variables on k-1 and n-k degrees of freedom respectively (Scheffe,

1959). Thus the ratio

f

MS

MS

SS

=

SS

( k −1)

( n − k)

SS

=

SS

σ ⋅

⋅

= b

w

b

w

2

b

2

w σ

( k −1)

( n − k )

has F-distribution. The sampling distribution of f is the F(k-1, n-k) distribution and the significance

level of the **data** as an indicator against the truth of the null hypothesis: "there is no treatment effect,

thus all group means are equal", is

{ F(

k − n − k)

f }

SL = P 1 , > .

The significance level, SL, is the probability content of the F-distribution (with the appropriate

degrees of freedom) for all values greater than f. This means a high value of SL discredits the null

hypothesis with a high probability, while very small values of SL can be considered to be consistent

with the null hypothesis. Table 4-2 shows conventional interpretations of the significance level.

Table 4-2 Conventional interpretation of significance levels as indicators for the truth of the null hypothesis H.

The table is taken **from** (Ledermann, 1980).

Eq. 1

Significance level Interpretation

> 0.10 Data consistent with H.

0.05 Possibly significant. Some doubt cast on the truth of H.

0.02 Significant. Rather strong evidence against H.

0.01 Highly significant. H is almost certainly invalidated.

If the F-test indicates that the **time** **series** is non-constant, we want to figure out which group means

differ significantly **from** each other. We do "post hoc" or "a posteriori" testing in this case. These

tests are suggested by the results of the experiment, in contrast to tests that are used to evaluate the

results of an experiment no matter how these results may look. A posteriori tests are conditional on

the observed **data**.

An obvious way to find out which group means are significantly different is to compare them

pairwise. However, one has to be careful if this is done. Suppose we use some kind of test to

estimate whether the two group means differ significantly or not. The null hypothesis for this test is

µr=µs. We define some low significance level p (see Table 4-2) as the critical level. If the test

returns a significance level that is below the critical level, the null hypothesis is considered to be

false. This means that with a probability of p we invalidate our null hypothesis though it is correct;

we operate with a rejection rate of the "null hypothesis though true" that is exactly p. This false

positive behavior of a statistical test is called the Type I error of a test. If we repeat a test that is

designed for a single pairwise comparison many **time**s, there is a probability of p for every test to

reject the null hypothesis though it is true. Thus the overall probability that all the comparisons are

correct reduces, or to put it the other way round, the overall probability for the occurrence of false

positive behavior increases. This has to be avoided.

If the F-test indicates that the **time** **series** is non-constant, we compare all possible **time** point

average pairs to establish which pairs are significantly different to each other. To do so we use

methods that explicitly allow for a posteriori testing and multiple comparisons of means in an

4 Development of the tools: **Feature** **eXtraction** 41

ANOVA setting. A variety of such methods is available (Winer, 1971b), of which we have

implemented the two most conservative methods, the Tukey A test (4.1.1.6) and the Scheffe Smethod

(4.1.1.7).

4.1.1.5 Power of the F-test

In the previous section we briefly discussed the Type I error of a statistical test. If the null

hypothesis is rejected though it is true in reality, a Type I error occurs. The choice of the critical

level determines the probability for a Type I error, the critical level is usually chosen small enough

so that the probability that the true null hypothesis is rejected is very small.

A Type II error occurs if a false null hypothesis is accepted. Obviously it is desirable to have a

statistical test that rejects the null hypothesis if it is false with high probability. In the **time** **series**

analysis context this means that if there was a significant change of the signal level we would like to

detect this change with a high probability. Such a test is called sensitive and has a high power.

Sensitivity and power are statistical concepts that come **from** different fields of statistical theory,

and although their numerical result (i.e.,, the probability that a significant change of the signal level

is detected) is identical for a given critical value, there is a somewhat philosophical difference in

these concepts which we are not going to discuss further (for a reference see (Ledermann, 1980)).

We will use the term 'power' further on as this is most widely used in applied statistics.

Given the null hypothesis that the signal level of the **time** **series** (as modeled in 4.1.1.1) does not

change during its course, the test statistic f (Eq. 1) is distributed according to the F-distribution. If

the **time** **series** does display significant changes during its course the distribution of the significance

level is different. We will illustrate in the following that it has the so-called non-central Fdistribution

(as opposed to the central F-distribution of the former case).

Let W be the sum of a squared Normals ( , 1)

, j ∈[

1,

a]

V j j

µ with means µj and unit standard deviation,

2 2

2

2

1 + V2

+ + V j + ... Va

W = V + ,

then W has the so called non-central chi-squared distribution on m degrees of freedom and with

non-centrality parameter λ,

a 1 2

λ = ⋅

µ j .

2 1

Let Z (being independent of W) have ordinary chi-squared distribution on b degrees of freedom,

then the ratio of W and Z, both normalized with their respecitve degress of freedom, will have noncentral

F-distribution

W a

F ( a,

b,

λ)

=

Z b

with a and b degrees of freedom and with non-centrality parameter λ. If λ equals zero then W is a

sum of Standard Normals, thus it has (just as Z) ordinary chi-squared distribution, and the noncentral

F-distribution becomes the central F-distribution on a and b degrees of freedom (see

4.1.1.4).

It can be shown that

SS

t

− SS

w

where the zj 2 are independently distributed as Normal(j, σ 2 /nj) (Scheffe, 1959). The difference

between the total sample sum of squares and the within-group sample sum of squares, equaling the

between-groups sample sum of squares, can be written as the sum of squared independent Normals,

=

k

j=

1

z

2

j ,

4 Development of the tools: **Feature** **eXtraction** 42

where σ 2 is the common variance of the measurement and nj is the number of samples in the group.

If we normalize the zj to unit standard deviation the expression

k

j=

1

n

⋅

j j z

2

2

σ

is distributed as non-central chi-squared on k-1 degrees of freedom with non-centrality parameter λ:

2

2

n ⋅

2

j ξ j

χ k

−1,

λ = .

σ

Thus in the case of a significant treatment effect the between-groups sample sum of squares is noncentral

chi-squared distributed with k-1 degrees of freedom and with non-centrality parameter λ.

Now consider the F-ratio, it is intuitive that the treatment effect has no effect on the within-group

sample sum of squares as it represents the variance due to unsystematic variations only. It follows

that in the case of a significant treatment effect the sampling distribution of the test statistic f (Eq. 1)

is the non-central F-distribution

F

( k , n − k,

λ)

MS

b,

λ

− 1 = ,

MS w

where the within-group sample sum of squares has chi-squared distribution on k-1 degrees of

freedom, and the between-groups sample sum of squares has non-central chi-squared distribution on

n-k degrees of freedom with non-centrality parameter .

The central F-distribution is the null distribution of the test statistic f (Eq. 1) corresponding to the

null hypothesis of no treatment effect. In the case of a treatment the test statistic f (Eq. 1) is

distributed according to an alternative distribution: the non-central F-distribution . The power of the

test is its capability to detect that a f-value does indeed come **from** the alternative distribution. For a

certain fraction of the f-values, the size of which depends on the size of the treatment effect, this is

not possible. Figure 4-1 illustrates this. It shows the probability density functions of the central (red)

and a non-central (blue) F-distribution given a certain size of the treatment effect. The parameters in

this example are chosen to be quite realistic for the case of **time** **series** obtained **from** microarray

experiments.

0.7

pdf(x)

0.6

0.5

0.4

0.3

0.2

0.1

p=0.01

2 4 6 8 10 12 14

Figure 4-1 The probability density function of the central (red) and non-central (blue) F-ratio distribution. The

parameters are chosen to represent a likely situation for biological **time** **series** (4 **time** points, 4 parallels). The

degrees of freedom for are dofb=3 and dofw=12, the non-centrality parameter is chosen as λ 1/2 x

=4. The gray line

indicates the critical level, any value bigger then this will reject the null hypothesis of no treatment effect. See

text for details.

The two distributions are clearly overlapping. The red line represents the null hypothesis of no

treatment effect. With respect to this distribution any f-value that is bigger than ~6 (marked by the

dotted line) has a probability of at most p=0.01 to come **from** this distribution, we reject the null

4 Development of the tools: **Feature** **eXtraction** 43

hypothesis if such an f-value is encountered. But the probability density function of the alternative,

non-central F-distribution (blue line) has a significant probability content on the left of the gray line.

This means that there is a quite high probability for a f-value smaller or equal to ~6 to come **from**

the non-central F-ratio distribution. In other words there is a rather high probability to accept the

null hypothesis though it is false: Type II errors will occur.

The probability for a Type II error, PTypeII, with respect to a given critical level, p, can be

graphically illustrated as the area under the blue line (the probability content) on the left of the gray

line. The power of the test is defined to be 1- PTypeII. Thus to compute the power we may compute

the inverse cumulative distribution value of the central F-distribution at the specified significance

threshold (p≅6 in our example). The probability for a Type II error then equals the cumulative

distribution function of the non-central F-distribution at this value. Let

( x)

= P{

F(

k −1,

n − k)

) x}

C <

be the cumulative distribution function of the central F-ratio distribution. Then we can compute the

probability of a Type II error that is induced by the choice of the significance threshold p as

The power of the F-test is then

P TypeII

= P

−1

{ F(

k − , n − k,

) < C ( x = p)

}

1 λ .

Power = 1 − P .

The former example made clear that the power of the F-test (and the same holds for any statistical

test in general) depends on the choice of the significance threshold. Increasing the significance

threshold will increase the probability for a Type I error while simultaneously decreasing the

probability of a Type II error, and vice versa. Thus the power of the F-test depends on the choice of

the significance threshold, the degrees of freedom for the between-groups sample sum of squares

and the within-group sample sum of squares, and it depends on the non-centrality parameter λ.

The non-centrality parameter is a quite unintuitive parameter. However, if we use its square root

2

n j ⋅ξ

j

φ = λ =

σ

we can interpret it in a more intuitive way. The parameter φ represents the size of the treatment

effect in units of the common standard deviation of the measurement; or to put it more succintly:

the size of the treatment effect in units of the background error. Given a constant background error

the non-centrality parameter increases with the treatment effect. The bigger the non-centrality

parameter, the smaller the overlap between the two alternative distributions; thus the probability of

a Type II error decreases and the power of the test increases.

The relations between the two different degrees of degrees of freedom and φ can help to plan an

experiment. Suppose a **time** **series** experiment with five **time** points is planned and an estimate

about the measurement error has been already obtained, how many **data** points are needed to detect

a treatment effect of a certain size? Another possible application is that if the experiment has

already been performed, for example five **time** points and four parallel measurements are available,

how big must a treatment effect be to be detected with decent power? We have implemented several

small programs that help to illustrate these relations and thus give quick answers to questions like

these when planning and analyzing a **time** **series** experiment.

It will be illustrated, in the following, that Scheffe's S-method has basically the same power as the

F-test. This makes the considerations in this chapter especially useful as they can be applied directly

to the detection of jumps in a **time** **series** by the S-method. The Tukey A test is known to have more

power then S-method, however, to our knowledge the exact power of the method cannot be

computed analytically.

TypeII

4.1.1.6 The Tukey A test

4 Development of the tools: **Feature** **eXtraction** 44

The Tukey A test is one of the two tests we implemented to detect which group means differ

significantly **from** each other. The test is based on the Studentized range distribution.

The range rn of a sample of observations (x1, x2, …, xn) on a random variable X is the difference

between the largest, xL, and the smallest observation, xS. The formation of the standardized range

rn/σp requires the population standard deviation σp to be known. This is nearly never the case. Thus

σp is replaced by a suitable estimate. If a statistic v is available that is independent of rn, and if

2

v σ p is a chi-squared variable on b degrees of freedom, then the quantity

is a suitable estimate for σp and the statistic

q =

is called the Studentized range.

We want to compare the group means of a **time** **series**. In this setting the distribution of the statistic

v

b

r

n

v

b

cL

− cS

q =

MSw

n

with nL=nS, can be approximated by the Studentized range distribution Q having parameters k, the

number of treatment groups, and dofw, the degrees of freedom for MSw (within-group mean square).

Here cL and cS are the largest and the smallest group mean respectively (using the notation of Table

4-1), and nL, nS designate the number of samples in their respective treatment group. The statistic

assumes equal sample sizes for all treatment groups. The significance level returned by this statistic

is

L

{ Q(

k dof ) q}

SL = P , w > ,

The group means of a **time** **series** experiment can be arranged in increasing order of magnitude as

shown in Table 4-3.

Table 4-3 The group means of a **time** **series** experiment with k **time** points arranged in increasing order. c(1)

designates the smallest group mean, c(k) designates the largest group mean.

Order 1 2 3 … j … k

Mean c(1) c(2) c(3) … c(j) … c(k)

In this arrangement the number of steps between two group means can be regarded as their distance.

For example the distance between c(1) and c(4) is four steps, the distance between c(2) and c(6) is five

steps. In general the number of steps between two group-means c(a) and c(b) (with b>a) in an ordered

setting is defined as b-a+1. The sampling distribution of the difference between any of the ordered

group means is a truncated Studentized range distribution. This statistic is called the qr statistic,

where r designates the number of steps between the two group means. There is only one qr=k

statistic

4 Development of the tools: **Feature** **eXtraction** 45

q

r=

k

c

=

− c

( k ) ( 1)

and this corresponds to the ordinary Studentized range statistic with k treatment groups and nL

degrees of freedom of the within-group mean square. For any other number of steps in this setting

there are two or more qr statistics, which have the sampling distribution of a truncated Studentized

range distribution. For example there are two qr=k-1 statistics

q

c

− c

( k−1)

( 1)

r=

k −1

= and

MS

n

L

w

MS

n

L

q

w

,

r=

k−1

c

=

− c

( k ) ( 2)

The sampling distribution of the qr statistic can be approximated by Studentized range distribution

on r treatment groups and dofw degrees of freedom for the within-group mean square.

The difference between the two group means that is needed to give a low significance level of, say

p, becomes smaller with decreasing number of steps between the group means. R.A. Fisher

proposed the use of the Studentized range statistic of the difference between the largest and the

smallest group mean, qr=k, for all the pairwise comparisons of treatment means that can be made in

a single-factor experiment. This approach was studied and extended by Tukey. Tukey showed that

this statistic could be used even if the collection of all pairwise comparisons is regarded as a single

test. Thus we can be sure that no fewer then 100(1-p) percent of all the statements made are indeed

true with respect to the Type I error. This statistic is called the Tukey A statistic or the honestly

significant difference (hsd). The Tukey A statistic is defined as

q

T

ca

− cb

=

MSw

n

where ca and cb are two arbitrary treatment means (a≠b), and na=nb is the number of samples per

treatment group. The statistic is approximated by a Studentized range distribution on k treatment

groups and dofw degrees of freedom of the within-group mean square. (The use of the absolute

difference simply makes sure that we do not have to worry about the order of the group means with

respect to their size).

In real situations some measurement values are frequently missing. The Tukey A method as

described so far assumes an equal sample size for every treatment group. However if the sample

sizes of the two groups that are compared do not differ too much the harmonic mean of the sample

sizes of the treatment groups can be used. For the comparison of two groups the harmonic mean

(see 4.1.1.3) simplifies to

n

H

a

a

2

= ,

1 1

+

n n

where na and nb are the sample sizes of the first and the second group respectively. The Tukey A

statistic then becomes

q

c − c

MS

n

H

w

b

a b

T = .

MS

n

L

w

.

4 Development of the tools: **Feature** **eXtraction** 46

The Tukey A method is known to be less conservative than the Scheffe S-method, in other words it

has more power than the Scheffe S-method. However, as far as we know the exact power of the

Tukey A method can not be computed analytically.

4.1.1.7 The Scheffe test

The Scheffe S-method is the second test we implemented to identify group means that differ

significantly. Scheffe developed the S-method for situations in which it is difficult to specify in

advance which comparisons of a single-factor experiment in a ANOVA setting are of interest

(Scheffe, 1959).

We want to analyze a **time** **series**. If there is no effect of **time** on the **time** point means (see 4.1.1.1)

1 = µ 2 = ... = µ j = ... µ k holds.

µ =

Thus for any pairwise comparison of treatment means there holds

µ µ = 0 ,

a − b

where µa and µb designate any two treatment means of a **time** **series**. The term contrast among the

parameters µ1, µ2, …, µj, …, µk designates any linear function of the µj, say

k

j=

1

θ = f ⋅ µ ,

with the property that the sum of the fj's equals zero, the fj's being known constants,

k

j=

1

f j

If no treatment effect is given, the expected value for the contrast is zero. We can state this null

hypothesis more formally as

j

j

= 0.

H : θ = 0 .

0

We estimate the µj's using the group averages cj (in the notation of Table 4-1). Any possible

contrast between the treatment group means can thus be estimated by the statistic

s =

k

j=

1

f j ⋅ c

For example the contrast between the first and second **time** point of a **time** **series** can be written as

( 1)

⋅ = c − c .

c1 + ( −1)

⋅c

2 + ( 0)

⋅c3

+ ... + ( 0)

⋅c

j + ... + ( 0)

⋅c

k

Here f1=1, f2=-1 and all other fj's equal zero. This kind of contrast can be used to evaluate the

difference between two group means. Other contrasts are readily constructed, and allow for all

kinds of comparisons. Scheffe invented the S-method to provide a simultaneous confidence limit for

the collection of all possible contrasts. In other words, if the S-method is used to evaluate the

significance of every possible contrast against the null hypothesis: "The contrast equals zero (i.e.,

there is no treatment effect)." then the collection of the contrasts, if considered as a single test, will

have the distinct significance level of say p. The interpretation is the same as for the Tukey A test.

We can be sure that no fewer than 100(1-p) percent of the statements are true with respect to the

Type I error. Scheffe has shown that if the significance level of a single contrast is computed as

P

k 2

( k − ) ⋅ F(

k −1,

n − k)

⋅ ( f n ) ⋅ ( MS )

j

.

1 > s

j 1 j j

,

SL = =

w

1

2

4 Development of the tools: **Feature** **eXtraction** 47

and the null hypothesis for every contrast is rejected at the identical critical level, say p, it follows,

that the collection of all contrasts can be considered to have a significance level of p (in the above

sense) (Scheffe, 1959). The equation can be written as

or in short

SL =

P

( k −1)

⋅ F(

k −1,

n − k )

>

s

1

⋅

MS

k ( f j ⋅ c j )

2

k 2

f

j=

1 j n j

=

w

j=

1

k

j=

1

f

2

j

n

j

2

1

⋅

MS

w

SS

=

MS

SS C

SL = P(

k −1

) ⋅ F(

k −1,

n − k)

> .

MSw

The statistic used in this equation is the so-called component of variation, SSC. The component of

variation is an important quantity in the analysis of variance. We introduce it here solely because it

provides us with an easy-to-use statistic to test the null hypothesis that a certain contrast is zero (i.e.,

there is no treatment effect on the group means under study).

The F-test can be interpreted in terms of the S-method, if we ask the question as to how the constant

coefficients of a contrast must be chosen to make the component of variation a maximum. This

coefficients can always be found and it can be shown that for the maximal component of variation,

SSCmax there holds (Scheffe, 1959)

SS = SS

C max

This is an intuitive result, the maximal component of variation equals the between-groups sample

sum of squares. Thus the contrast that defines the maximal component of variation accounts for the

overall treatment effect. From this point of view the F-test can be interpreted as a test evaluating the

significance of the **data** against the null hypothesis that the contrast defining the maximal

component of variation is zero. The following formula illustrates this,

b

.

( k −1)

( n − k)

( k 1)

MS

b SSb

SSC

max −

SL = PF

( k −1,

n − k)

> f = =

=

.

MS w SS w

MSw

The fact that the maximal component of variation is always found to account for the overall

treatment effect implies, that if the F-test indicates a significant overall difference at the critical

level of p, there must be a contrast that is rejected to be zero at the critical level of p. However, this

contrast is not necessarily a contrast of practical interest.

The FX algorithms need to evaluate the significance of the **data** against the null hypothesis that a

certain contrast equals zero (i.e., displays no difference between its group means). The S-method

does this using the component of variation, SSC, that is defined by the contrast

or in an equivalent way

SS C

SL = P(

k −1

) ⋅ F(

k −1,

n − k)

> ,

MSw

( k 1)

SS

C −

SL = PF

( k −1,

n − k)

> Eq. 2.

MSw

The last formulation shows that the S-method operates in a certain sense with the same power as the

F-test. The null hypothesis of a zero contrast will be rejected at a critical level of, say p, if the

component of variation accounts for a treatment effect that is big enough to cause the F-test to reject

the null hypothesis of no treatment effect.

C

w

,

4 Development of the tools: **Feature** **eXtraction** 48

If the F-test indicates a significant overall treatment effect, the S-method can be used to evaluate

which pairwise contrasts are significantly different **from** zero. The statistic of choice is (see Eq. 2)

f

SS

( k −1)

C

S = Eq. 3.

MSw

Thus we have to compute the component of variation that is defined by the contrast under study. In

general a component of variation is defined as

SS

C

=

k

j=

1

j= 1 j

2

f j ⋅c

j

.

k

2

f j

n

The contrast for a single pairwise comparison can be easily specified as

() 1 ⋅ ca

+ ( −1

cb

s = ) ,

where ca and cb designate the means of the treatment groups that are compared. The component of

variation due to this contrast is (with na and nb the number of samples in the two treatment groups)

2 ( ca

− cb

)

2 () 1 ( −1)

= Eq. 4.

SSC 2

n

a

+

Using Eq. 3 and Eq. 4 the statistic to test whether there is a significant difference between two

group means becomes

f

S

=

n

2 ( ca

− cb

)

( 1 na

+ 1 nb

) ⋅(

k −1)

⋅ MSw

and it is obvious that the fS-statistic can be applied straightforwardly to unequal sample sizes.

Furthermore we want to detect the number of discrete signal levels given in a **time** **series**. Consider

a **time** **series** with four **time** points. Suppose there is good reason to believe that the example **time**

**series** changes its signal level between the second and the third **time** point. Thus the first and the

second, and the third and the fourth **time** point should represent discrete signal levels. To test this

hypothesis we want to compare two meta-groups by a contrast: the meta-group consisting of the

first and the second, and the meta-group consisting of the third and the fourth **time** point. The

contrast

() () ( ) ( )

n

1 n2

n3

n4

1 ⋅ ⋅ c

⋅

+

1 + 1 ⋅ c2

−1

⋅ ⋅ c3

+ −1

⋅

n1

+ n2

n1

+ n2

n3

+ n4

n3

+ n4

does this. The nj's are the number of parallel samples at **time** point j. This contrast weights the group

means according to their relative sample size of the meta-group they belong to.

The contrast of any two meta-groups in a **time** **series** of arbitrary measurement points can be

constructed in a similar fashion. In general let

a

j

b

,

c be a group mean of the first meta-group,

c be a

group mean of the second meta-group and c

c j be any group mean that does not belong either the

first or the second meta-group (while every group belongs to exactly one meta-group). Then the

coefficients for the c

j

c 's will be set to zero. Let nma be the overall number of samples in the first

b

j

4 Development of the tools: **Feature** **eXtraction** 49

meta-group and nmb the overall number of samples in the second meta-group, then the coefficients

for the means of the first and the second meta-group can be expressed as

Then the contrast of the meta-groups is

n

n j

f = 1 ⋅ .

n

a j

b

f j = and j ( − )

nma

a a

b

f j ⋅c

j + f j ⋅

mb

b

s =

c .

The component of variation defined by this contrast is

SS

C

=

a a

b b

( f j ⋅c

j + f j ⋅c

j )

2

Eq. 5,

1

n

ma

1

+

n

and by the use of Eq.3 and Eq.5 the statistic to evaluate the significance of the difference between

any two meta-groups becomes

4.1.1.8 Calculating the probabilities

f

S

=

mb

a a

b b

( f j ⋅c

j + f j ⋅c

j )

( 1 nma

+ 1 nmb

) ⋅(

k −1)

⋅ MSw

Hypothesis testing is usually carried out by calculating the test statistic and then looking up the

critical levels belonging to a predefined significance (such as p=0.05 or p=0.01) in a precompiled

statistical table. In our case it was more convenient to compute the significance level of the **data**

directly, which was then compared to a prescribed critical level to decide whether the null

hypothesis is to be rejected or not.

The F-distribution is used to compute the significance level in case of the F-test and Scheffe's Smethod.

The f-statistic used for the F-test may be regarded as a special case fS-statistic employed by

the S-method (see 4.1.1.7). We calculate the significance level as

{ }

dof

w dof w dofb

SL = P F(

dof , ) > =

b dof w f S IB x =

, a = , b = ,

dof w + dofb

⋅ f 2 2

where dofb and dofw are the degrees of freedom of between-groups sample sum of squares (in case

of the F-test) or the degrees of freedom of the component of variation under study (in case of the

Scheffe-test) and the degrees of freedom of the within-group sample sum of squares, respectively.

IB( x,

a,

b),

0 ≤ x ≤ 1is

the regularised incomplete beta function defined by

IB(

x,

a,

b)

x

0

= 1

0

t

t

a−1

a−1

( 1−

t)

( 1−

t)

The regularised incomplete beta function is approximated by a continued fraction expansion as

described in the Numerical Recipes (Press et al., 1992).

To compute the power of the F-test the cumulative non-central F-ratio distribution must be

computed and the inverse of the cumulative central F-ratio distribution approximated. We use

functions provided by the freely available DCDFLIB package for these calculations (Brown et al.,

1994).

j

b−1

b−1

2

dt

dt

.

4 Development of the tools: **Feature** **eXtraction** 50

The Tukey A method is based on the Studentized range distribution. The statistic qr is used to

compute the significance level for k groups and dof degrees of by the expression:

1 dof / 2 dof / 2

2

2 dof

dof ⋅ x

SL = P

r

k r

dof 2

Γ

0

2

∞

−

dof −1

{ Q(

k,

dof ) > q } =

⋅ x exp−

⋅

w ( q ⋅ x)

dx

where Γ is the Gamma function, and wk (⋅)

is the cumulative distribution function of the sample

range (the difference of the largest and the smallest element in the sample) of a k-element sample

taken **from** a Normal distribution. The Studentized Range Probability integral cannot be calculated

exactly, but an approximation algorithm is fortunately available (Lund, 1983). This approximation

returns a small constant value (4.59x10 -7 in our implementation) when the mean differences are

very large, but this does not present a problem in practice as this value is much smaller than the

significance thresholds we used (on the order of 1x10 -2 to 1x10 -4 ).

4.1.2 **Feature** **eXtraction** algorithms

4.1.2.1 Detection of changes and the **Feature** **eXtraction** matrix

The F-test, the Tukey A test, and the Scheffe S-method are the statistical tools for **Feature**

**eXtraction**. The F-test is used preliminarily to decide whether a significant overall treatment effect

can be detected in a **time** **series**. If the null hypothesis of no overall treatment effect is rejected by

the F-test, either the Tukey A method are the Scheffe S-method is applied to identify significant

differences between the group means of a **time** **series**.

signal

4

3.5

3

2.5

2

1.5

1

0.5

0

-1 0 1 2 3 4 5 6 7 8 9 10

**time** points

Figure 4-2 Example **time** **series** running over 10 **time** points 0…9. In this example the **time** points are equally

spaced but this is not required in general. The error bars represent standard deviations coming **from** 4 parallels.

A "synthetic" example **time** **series** (not **from** real experiments) is shown in Figure 4-2. In this

example the F-test rejects the null hypothesis. We will use the example to illustrate the **Feature**

**eXtraction** algorithms. **Feature** **eXtraction** produces a string that describes the qualitative "up-anddown"

pattern of a **time** **series**, thus changes of the signal level must be detected. If changes of the

signal level always took place between two consecutive **time** points only, then changes would be

very easy to detect: one would simply compare the group mean at **time** tj to the group mean at **time**

tj+1. In general, however, the change of the signal level may be a slow process spanning several **time**

,

4 Development of the tools: **Feature** **eXtraction** 51

points, say **from** tj to tj+k (where k≥1). The differences between all the consecutive group means may

be insignificant in such a case, while the difference between the first and the last **time** point of the

interval is significant. Because of this, it may be necessary (in the worst case) to compare all groupmeans

pairwise in order to detect all the significant changes of the signal level in a **time** **series**. Thus

the first step in **Feature** **eXtraction** is the construction of a matrix that stores the probability of the

null hypothesis: "The contrast of the ith and jth group means under consideration equals zero (i.e.,

the contrast reflects no treatment effect)" in the (i,j)th element. The **Feature** **eXtraction** matrix may

be computed by the use of the Tukey A test or the S-method. As the Tukey A method and the Smethod

have different power we expect slight differences in the **Feature** **eXtraction** matrices these

two methods produce for a given **time** **series**.

**time**

point

0

1 5.46e-03

0 1 2 3 4 5 6 7 8 9

2 2.09e-02 1.00e-00

3 2.34e-08 5.57e-03 1.35e-03

4 9.25e-12 4.14e-07 1.06e-07 8.36e-02

5 1.38e-10 1.30e-05 3.10e-06 6.39e-01 9.80e-01

6 3.49e-10 4.15e-05 9.70e-06 8.56e-01 8.88e-01 1.00e-00

7 2.02e-08 4.76e-03 1.15e-03 1.00e-00 9.50e-02 6.75e-01 8.80e-01

8 1.85e-03 1.00e-00 9.98e-01 1.57e-02 1.20e-06 3.98e-05 1.27e-04 1.35e-02

9 6.99e-04 1.00e-00 9.79e-01 3.73e-02 3.13e-06 1.07e-4 3.43e-04 3.23e-02 1.00e-00

Figure 4-3 The **Feature** **eXtraction** matrix of the example **time** **series** Figure 4-2 as generated by the S-method.

**time**

point

0

1 1.97e-04

0 1 2 3 4 5 6 7 8 9

2 1.11e-03 1.00e-00

3 4.59e-07 2.02e-04 3.60e-05

4 4.59e-07 4.63e-07 4.60e-07 7.51e-03

5 4.59e-07 6.29e-07 4.94e-07 2.42e-01 8.66e-01

6 4.59e-07 1.08e-06 5.81e-07 5.15e-01 5.79e-01 1.00e-00

7 4.59e-07 1.65e-04 2.96e-05 1.00e-00 9.06e-03 2.75e-01 5.62e-01

8 5.25e-05 1.00e-00 9.79e-01 7.60e-04 4.72e-07 1.05e-06 2.68e-01 6.23e-04

9 1.66e-05 9.95e-01 8.64e-01 2.41e-03 4.94e-07 2.28e-06 6.45e-06 1.98e-03 1.00e-00

Figure 4-4 The **Feature** **eXtraction** matrix of the example **time** **series** Figure 4-2 as generated by the Tukey A

test. Due to the limited accuracy of the numerical approximation used, the smallest possible probability is about

4.59x10 -7 as can be seen in several entries in column 0.

The example **time** **series** shown in Figure 4-2 is carefully chosen so that the differences in the FXmatrices

produced by the two methods is even reflected in the qualitative pattern constructed **from**

the matrices. Figure 4-3 and Figure 4-4 show the FX-matrices for the example **time** **series** as

4 Development of the tools: **Feature** **eXtraction** 52

computed by the S-method and the Tukey A test respectively. The matrices are symmetric, and the

main diagonal is meaningless, as entries on this would be self-comparisons; thus only the entries

below the main diagonal are shown.

The FX-matrix provides all information needed to construct a string providing a qualitative

description of a **time** **series**. We use the FX-matrix to extract a raw pattern first, which is refined by

additional steps to yield a low or high-resolution pattern. The low- and the high-resolution patterns

are strings reflecting the qualitative features of a **time** **series** on coarse or fine grained level

respectively.

4.1.2.2 The raw pattern

The raw pattern assigns one of three possible attributes to describe the behavior of the signal level

to every interval of a **time** **series**. During the interval **from** **time** point tj to the consecutive **time** point

tj+1 the signal level may stay constant, increase, or decrease. The raw pattern is a simple string that

reflects the qualitative changes of the signal level using the '+', '-', and '=' characters; a **time** **series**

of n **time** points is described by a raw pattern of n-1 characters. Figure 4-5 shows an example raw

pattern for a hypothetical **time** **series** of five **time** points.

0 1 2 3 4

= + + -

Figure 4-5 An example raw pattern. The **time** points are shown as the first row, the raw pattern is the second

row.

-01- compute the **Feature** **eXtraction** matrix Fxmatrix[][]

-02- initialize the raw pattern RawPattern[] to '='

-03- initialize the significant threshold Sig

-04- initialize Start to 0

-05- initialize End to the number of measurement points

-06- initialize the diagonal specifier Diag to 1

-07- call create_rp(Start, End, Diag)

-08- while Start+Diag < End

-09- initialize Col to Start

-10- initialize Row to Start+Diag

-11- while Row < End

-12- if Matrix[Col][Row] < Sig

-13- initialize I to Col

-14- while I < Row

-15- if the **time** **series** goes up **from** Col to Row

-16- set RawPattern[I] to '+'

-17- else

-18- set RawPattern[I] to '-'

-19- increase I by 1

-20- call create_rp(Start, Row, Diag+1)

-21- set Start to Row

-22- increase Col and Row by 1

-23- increase Diag by 1

Figure 4-6 The raw pattern algorithm. Variables are indicated in italic letters. Arrays are indicated by a [],

matrices by a [][]. Matrices are assumed to be triangular and to have column major order. The raw pattern is

created by the recursive method create_rp(…), its definition is given **from** line -08- to line -23-. The method is

assumed to have access to the storage of the raw pattern and the FX-matrix that holds the information necessary

to create the raw pattern.

Changes of the signal level may happen abruptly between two consecutive **time** points, but may also

happen across an interval spanning several **time** points. The possibility of slow changes presents a

general problem to the construction of the raw pattern, as there may be several possibilities to

4 Development of the tools: **Feature** **eXtraction** 53

describe a slow change in a **time** **series**. Consider for example the constant decrease between **time**

point #4 and **time** point #8 of the example **time** **series** in Figure 4-2. There are several possibilities

for a slow change that accounts for the difference observed in this interval. For example the slow

change could span the complete interval **from** #4 to #8 (4-8), or the **time** **series** could be constant

**from** #4 to #6 and then decrease **from** #6 to #8 (4=6-8). In order to obtain a well defined

description of the **time** **series** we decided to keep the intervals across which the signal level changes

as short as possible, thus implicitly maximizing the length of the constant intervals. We will use the

abbreviation SSJ (Shortest Significant Jump) in the following to refer to the concept of keeping

changes of the signal level as short as possible.

The raw pattern algorithm extracts SSJs by scanning the subdiagonals of a FX-matrix. A

subdiagonal entry is considered to represent a SSJ when its value is below a significance threshold

that was set by the user. Thus the probability for a Type I error is directly controlled by the choice

of the significance threshold. The kth subdiagonal contains the significance levels of the contrasts

between the group means at the **time** points tj and tj+k. Starting with the first subdiagonal SSJs of

length 1 are identified. Then moving onto the second subdiagonal, SSJs of length 2 are looked for in

only those positions where no length-1 SSJs were previously found. The procedure continues until

all SSJs are found. Figure 4-6 presents the recursive raw pattern algorithm in more detail.

0 1 2 3 4 5 6 7 8 9

= = = = = = = = =

**time**

point

0

1 5.46e-03

0 1 2 3 4 5 6 7 8 9

2 2.09e-02 1.00e-00

3 2.34e-08 5.57e-03 1.35e-03

4 9.25e-12 4.14e-07 1.06e-07 8.36e-02

5 1.38e-10 1.30e-05 3.10e-06 6.39e-01 9.80e-01

6 3.49e-10 4.15e-05 9.70e-06 8.56e-01 8.88e-01 1.00e-00

7 2.02e-08 4.76e-03 1.15e-03 1.00e-00 9.50e-02 6.75e-01 8.80e-01

8 1.85e-03 1.00e-00 9.98e-01 1.57e-02 1.20e-06 3.98e-05 1.27e-04 1.35e-02

9 6.99e-04 1.00e-00 9.79e-01 3.73e-02 3.13e-06 1.07e-4 3.43e-04 3.23e-02 1.00e-00

0 1 2 3 4 5 6 7 8 9

+ = + = = = - - =

Figure 4-7 Illustration of the raw pattern algorithm using the FX-matrix in Figure 4-3 (S-method) and a

significance threshold of p=0.01. The algorithm starts with raw pattern that indicates "stays constant" for all

**time** point intervals. All matrix entries that are visited by the algorithm are highlighted. Entries that do not

cause any changes in the raw pattern are highlighted in light gray. Significant entries on the first lower diagonal

corresponding to length-1 SSJs are highlighted dark gray (0→1, 2→3). Only one significant entry on the second

subdiagonal is visited, this is highlighted black. This entry corresponds to the length-2 SSJ 6→8. Below the

matrix the resulting raw pattern is shown. Characters that were changed by the algorithm due to significant

entries are highlighted in the appropriate colors.

It is not necessary to fully construct the FX-matrix to run the raw pattern algorithm. The

subdiagonals could be computed on the fly as needed. This saves memory resources needed to store

4 Development of the tools: **Feature** **eXtraction** 54

the FX-matrix and CPU **time** as in most cases not all the subdiagonals need to be computed.

However, it is some**time**s useful to be able to take a look at the complete FX-matrix, and the

overhead imposed by the precomputation of the complete matrix is negligible in practice.

A graphical illustration of the raw pattern algorithm is given in Figure 4-7. The FX-matrix (Figure

4-3) of the example **time** **series** in Figure 4-2 is processed using a significance threshold of

p=0.01.For this example the raw pattern +=+===--= is obtained. Using the same significance

threshold (p=0.01) **from** the Tukey A matrix (Figure 4-4) the raw pattern +=++----= is obtained

(details not shown). As previoulsy mentioned above, the example **time** **series** was constructed so

that the two statistical methods delivered different raw patterns. The S-method is more stringent in

comparison to the Tukey A test, thus the Tukey A matrix contains more significant entries than the

S-method matrix with respect to the significance threshold that was chosen. This is directly

reflected in the raw patterns. The raw pattern is postprocessed by two simple algorithms to generate

the final low- or high-resolution patterns. In general, different raw patterns may deliver identical

low-resolution patterns, while they always deliver different high-resolution patterns.

4.1.2.3 The low-resolution pattern

Though the raw pattern captures all the relevant features of a **time** **series** we found it in same cases

necessary and in general useful to postprocess the raw pattern to generate the low-resolution

pattern. The low resolution pattern is obtained in two simple steps. In the first step we merge all

identical consecutive characters, and in the second step we drop the '=' characters. Thus the low

resolution pattern consists of '+' and '-' characters only, it reflects the sequence of upward and

downward changes of a **time** **series** but does not carry information about the **time** points at which

the changes take place, or the length and location of the constant intervals of a **time** **series**. Note that

two identical, consecutive characters (++ or --) may appear in a low resolution pattern; this

indicates that the signal level changed twice into the same direction and that these changes are

seperated by a constant interval. A **time** **series** without changes in the signal level has the lowresolution

pattern =. Figure 4-8 presents the simple algorithm used to construct the low-resolution

pattern **from** the raw pattern.

If this algorithm is applied to the raw patterns obtained **from** the Tukey A matrix and the S-method

matrix of the example **time** **series** in Figure 4-2 there is no difference in the resulting low-resolution

patterns.

-01- initialize the low resolution pattern Lores to empty

-02- initialize the variable that remembers the last entry Last to '!'

-03- for every entry Entry of the raw pattern RawPattern

-04- if Entry is equal to '-' AND Last is not equal to '-'

-05- append '-' to Lores

-06- set Last to '-'

-07- else if Entry is equal to '+' AND Last is not equal to '+'

-08- append '+' to Lores

-09- set Last to '+'

-10- else if Entry is equal to '='

-11- set Last to '='

-12- if Lores is still empty

-13- set Lores to '='

Figure 4-8 Construction of the low-resolution pattern **from** the raw pattern. For notation cf. Figure 4-6.

4.1.2.4 The high resolution pattern

The high-resolution pattern contains all the information of the raw pattern, but it is more compact

and it directly displays the **time** points at which changes of the signal level take place. It is obtained

**from** the raw pattern by the use of the simple algorithm shown in Figure 4-9.

4 Development of the tools: **Feature** **eXtraction** 55

The high-resolution pattern is as easily parsed **from** the raw pattern as the low-resolution pattern.

We merge all identical and consecutive entries of the raw pattern into a single entry and include the

**time** point of the first of the identical entries into the high-resolution pattern. As the high-resolution

pattern contains all the information of the raw pattern, the example **time** **series** (Figure 4-2) is

described by different high-resolution patterns in case of analysis by the Tukey A test or Scheffe's

S-method. The raw pattern due to the Tukey A matrix is +=++----=, while the raw pattern due to

the S-method is +=+===--=. The high-resolution patterns for these raw patterns are 0+1=2+4-

8=9 and 0+1=2+3=6-8=9 respectively. As a high-resolution pattern always starts with the index

of the first valid **time** point and ends with the index of the last valid **time** point a constant **time** **series**

will get the pattern = in general. Note that these indices will not be equal to the

indices of the first and the last **time** point of the **time** **series**, if these do not have sufficient parallel

measurements to be evaluated.

-01- initialize the high resolution pattern Hires to empty

-02- initialize the variable that remembers the last entry Last to '!'

-03- for every entry Entry of the raw pattern RawPattern

-04- if Entry is equal to '-'AND Last is not equal to '-'

-05- append the index of the entry to Hires

-06- append '-' to Hires

-07- set Last to '-'

-08- else if Entry is equal to '+' AND Last is not equal to '+'

-09- append the index of the entry to Hires

-10- append '+' to Hires

-11- set Last to '+'

-12- else if Entry is equal to '=' AND Last is not equal to '='

-13- append the index of the entry to Hires

-14- append '=' to Hires

-15- set Last to '='

-16- append the index of the last entry + 1 to Hires

Figure 4-9 Construction of the high-resolution pattern **from** the raw pattern. Line -16- makes sure that the

pattern always ends with the last **time** point of the measurement. For notation cf. Figure 4-6.

4.1.2.5 Identification of discrete signal levels

The low- and the high-resolution patterns reflect the sequence of upward and downward changes of

the signal level in **time**. Thus a **time** **series** is implicitly divided into constant and non-constant

segments by this approach. The constant segments may consist of a single **time** point or several **time**

points and can be regarded as signal level candidates. It has to be decided whether these candidate

levels are indeed well separated or if it is possible that at least some of them represent the same

signal level, and therefore could be grouped together. This problem can be seen as a typical

clustering problem, all candidates that end up in the same cluster can be considered to be on the

same signal level **from** a statistical point of view.

The comparison of candidate levels can be directly translated into the contrast between meta-groups

approach of the S-method discussed above. We compute the contrast between two meta-groups,

each representing a candidate level, and thus obtain the significance level of the null hypothesis:

"The level candidates belong to the same signal level." The Tukey A test is intended for the

pairwise comparison of group means only, therefore the FX-matrix has to be constructed by the Smethod

if identification of signal levels is to be done.

Figure 4-10 shows the candidate signal levels for the example **time** **series**. Note that the last **time**

point is not included because **time** points at the beginning and the end of a **time** **series** present a

problem to the identification of signal levels. Consider the last **time** point of this example. As we do

not know how the **time** **series** will continue, there is no way to decide whether the last **time** point is

a member of a constant segment or belongs to a non-constant segment representing a slow change

between signal levels (i.e., a SSJ spanning several **time** points). Therefore only the innermost **time**

4 Development of the tools: **Feature** **eXtraction** 56

point of the constant segment at the end of a **time** **series** is used in the identification of signal levels.

The same holds for the beginning of a **time** **series**. In the example shown in Figure 4-10 the constant

segment at the beginning of the **time** **series** consists of one point only, thus no **time** points are

excluded **from** signal level identification at the beginning of the **time** **series**. In general we use only

the innermost points of the constant segments at the beginning and the end of a **time** **series** in the

identification of signal levels, while all **time** points of any other constant segments are used.

signal

4

3.5

3

2.5

2

1.5

1

0.5

0

-0.5

-1

-1 0 1 2 3 4 5 6 7 8 9 10

**time** points

Figure 4-10 The candidate signal levels (boxed **time** points) determined for the example **time** **series** (Figure 4-2)

using the S-method and a significance threshold of p=0.01.

We use an agglomerative hierarchical approach that resembles complete linkage clustering to

identify discrete signal levels **from** the signal level candidates. Signal level candidates are treated as

meta-groups of an S-method contrast, and two meta-groups are merged into a single meta-group if

the significance level of the contrast is larger than or equal to a predefined significance threshold.

This approach uses the significance level of a contrast as a distance measure between two candidate

signal levels, and the significance threshold defines a critical maximal distance up two which

candidates are treated as belonging to the same signal level. The basic steps of the algorithm are

presented in Figure 4-11.

-01- Compute the distance matrix for all pairwise combinations of meta-groups

using the S-method approach for meta-group contrasts.

-02- Find the two groups i and j that have minimal distance. Stop the

procedure if this distance exceeds the maximal distance.

-03- Merge the two meta-groups i and j. Copy the group means of j to i.

Delete j.

-04- Recompute the entries of the distance matrix in row and column i using

the S-method approach for meta-group contrasts. Delete row and column j of

the distance matrix.

-05- Continue at -02-.

Figure 4-11 The hierarchical agglomerative algorithm used for the identification of discrete signal levels **from**

candidate signal levels. See text for additional information.

4 Development of the tools: **Feature** **eXtraction** 57

This agglomerative approach is not well defined if two or more pairs of meta-groups have the same

distance at the same iteration step. In this case we use a fallback strategy that was proposed by

Williams and coworkers. (Williams, 1971). We join those two meta-groups whose difference of

their overall distances to all other meta-groups is minimal. If this still leaves more than one pair of

meta-groups, we merge the first of the remaining pairs. Figure 4-12 shows the three discrete signal

levels identified in the example **time** **series** by the signal level candidate clustering.

signal

4

3.5

3

2.5

2

1.5

1

0.5

0

-0.5

-1

-1 0 1 2 3 4 5 6 7 8 9 10

**time** points

Figure 4-12 The signal levels (boxed) identified in the example **time** **series** (Figure 4-2) by candidate level

clustering using a significance threshold of p=0.01 to define the maximal distance.

Once the candidate signal levels are clustered we obtain N meta-groups that can be considered

different signal levels **from** a statistical point of view. The meta-group means are computed and

ranked in ascending order. The meta-group with the smallest average signal strength will get the

level indicator L0, while the group with the highest average signal strength will get the level index

L. We are now able to associate a certain level index with every '+' and '-' of a low- or a

high-resolution pattern. Thus we can easily extend the low- and high-resolution pattern to include

information about the signal levels. The level-extended high-resolution pattern of the example **time**

**series** (Figure 4-2) is L0_0+_L1_1=2+_L2_3=6-_L1_8=9, and the corresponding levelextended

low-resolution pattern is L0+L1+L2-L1. Qualitatively, the **time** **series** starts at level L0,

rises to level L1 and later to L2, to finally return to the intermediate level L1.

This approach makes it possible to identify discrete levels in **time** **series** **data** in a rigorous statistical

way. This is useful for **time** **series** analysis in general, and it is interesting to see whether more than

two statistically distinguishable expression levels can be found in gene expression **time** **series** **data**.

4.1.2.6 Grouping by qualitative patterns

Once a number of **time** **series** is encoded as qualitative patterns, the detection of similarities

between them becomes straightforward. Time **series** that share the same pattern can be considered

similar with respect to the features captured by the qualitative pattern and can be grouped together.

The low-resolution and high-resolution pattern and their extensions offer different degrees of

stringency in the qualitative description of a **time** **series**. Equality of the high-resolution pattern of

two **time** **series** requires a higher degree of similarity than equality of low-resolution patterns. The

most detailed representation of a **time** **series** is the level-extended high-resolution pattern. Thus the

different patterns provide the ability to cluster **time** **series** **data** depending on the level of detail the

researcher is interested in.

4.1.2.7 Implementation

4 Development of the tools: **Feature** **eXtraction** 58

The algorithms described above are implemented in standard ANSI C++ under UNIX. The code has

been tested and run on Linux and Silicon Graphics workstations and should be portable to other

platforms with minimal effort. The programs run very fast: a full analysis of more than 12,000 **time**

**series** with 4 **time** points and 4 parallel measurements took a few seconds to complete on a standard

PC.

4.2 Application of **Feature** **eXtraction**

We applied the FX algorithms to two different **data** sets. One **data** set was obtained **from** an

experiment that monitored gene expression during a severe colitis in mice, the other **from** a

comparative experiment done on wildtype mice and MIF-deficient mice.

4.2.1 Inflammatory bowel disease **data**

This experiment was carried out by Henrietta Moore at the Novartis Forschungsinstitut in Vienna

and provides an unequally spaced gene expression **time** **series** with four **time** points and four

parallel measurements. Figure 4-13 shows the power of the S-method to detect a significant change

of the signal level within these **time** **series** if a significance threshold of p=0.01 is used. The

probability to detect a significant change increases with the relative treatment effect; in case of a

treatment effect that is five **time**s as big as the measurement error the corresponding contrast will be

considered to be different **from** zero in ~80 percent of all cases. Analysis by the Tukey A test will

be even more powerful.

power

1

0.8

0.6

0.4

0.2

0

0 2 4 6 8 10

"treatment effect/background error"

Figure 4-13 Power of the S-method for the analysis of a **time** **series** with four **time** points and four parallel

measurements and a significance threshold of p=0.01. The non-centrality parameter φ is shown on the x-axis and

designated as "treatment effect/background error".

In this experiment severe combined immune deficient (SCID) mice were transplanted with a subset

of T cells, CD4 + CD45RB HI cells. This causes a severe, chronic (lethal) colitis. Maximal severity, as

observed by histological assessments, is reached at around four weeks after transfer of the T cells.

Three, seven and 28 days after the T cell transfer four mice were sacrificed and a piece of colon was

isolated. In addition, colon pieces of four non-transferred SCID were isolated and mRNA was

extracted **from** the colon pieces.

4 Development of the tools: Application of **Feature** **eXtraction** 59

The mRNA samples were sent to the Novartis Institute for Functional Genomics in San Diego, and

hybridized to Affymetrix murine MG_U74A gene chips by John Walker. The MG_U74A gene chip

contains 12,654 different probe sets.

The scaled (target intensity 200) AverageDifference values of the experiment were extracted **from** a

Novartis **data**base to be analysed by **Feature** **eXtraction**. A problem with this **data** set is that

negative AverageDifference values are not stored in the **data**base, but a "missing" place holder is

used instead. This is done, presumably, because negative values are problematic in the traditional

analysis approach employing the ratio of transcript levels (see 1.1), e.g., what does a ratio of 5/1

mean if compared to a ratio of 5/-1? The AverageDifference has a negative value if the mismatch

signal dominates the perfect match signal within a probe set (see 7.1.2.2); this situation is especially

likely if the level of the target transcript is low with respect to the background. The designer of a

chip tried to minimize the likelihood of such a situation, but given the complex mixture of the

sample solution he was (as proven by the negative values) not successful in all cases. However, if

the level of the target transcript increases relative to the background of other transcripts the

AverageDifference increases too: first to less negative values and eventually to positive values.

Thus measurements that give negative AverageDifference values can be regarded as starting **from** a

boundary that is below the ideal lower boundary of zero, while being perfectly ordinary

measurements otherwise. The ANOVA approach used in **Feature** **eXtraction** detects differences

between group means, whether these group means are positive or negative is irrelevant. Thus

valuable information about the level of the transcript represented by a certain probe set was

unfortunately filtered out by eliminating the negative values. Furthermore the number of parallel

measurements that are available for a given **time** point has a crucial impact on the power of **Feature**

**eXtraction**.

4.2.1.1 General observations

Although there are 12,654 different probe sets on the MG_U74A chip, not all the **time** **series** could

be used for analysis. For 6,402 probe sets, all 16 values were specified as "missing" in the original

**data** file. Furthermore for 2,182 probe sets the amount of **data** was insufficient as at least 2 **time**

points and 2 parallel measurements are needed for the ANOVA methodology. This left us with

4,070 genes for which usable measurement **data** were available, ~32.2 percent or roughly one third

of the maximal number of **time** **series** possible with this chip.

One assumption of ANOVA methodology used here is homogeneity of variance across the group

means of a **time** **series**. We checked this assumption in a preprocessing step using Cochran's test.

One hundred and thirty of the **time** **series** left did not pass this test at a significance level of p=0.01,

and thus were removed **from** the **data** set, leaving 3,940 **time** **series** for the analysis. At the

significance threshold of p=0.01 that was applied, we expect one percent of the **time** **series** to be

considered as not homogeneous in variance due to Type I errors only. We found, however, that ~3.3

percent of the **time** **series** are not homogeneous in variance.

4.2.1.2 Grouping by qualitative patterns

The 3,940 analyzable **time** **series** can be grouped easily by use of the qualitative patterns provided

by the FX algorithms. We grouped the **time** **series** at a significance level of p=0.01, grouping

according to low- and high-resolution patterns and the level-extended low- and high-resolution

patterns. The grouping according to the level-extended low- and high-resolution patterns is possible

only if Scheffe's S-method is used for the construction of the FX-matrix. The Tukey A test allows

only for pairwise comparisons of group means. A more complete overview of the grouping results

can be found in 7.2. Some illustrative observations are presented here in short.

Grouping by the low-resolution pattern produced eight different groups as shown in Table 4-4. The

table compares the group sizes of the different groups according to the statistical procedure that was

used to construct the FX-matrix.

4 Development of the tools: Application of **Feature** **eXtraction** 60

Table 4-4 The low-resolution pattern groups of the inflammatory bowel disease **data**. The group sizes according

to the statistical procedure used for the construction of the FX-matrix are given. The significance threshold for

the analysis was p=0.01.

Low resolution Tukey A test S-method

pattern

= 3353 3418

+ 240 210

- 326 297

++ 1 1

-- 3 1

+- 6 5

-+ 9 6

-+- 2 2

As the Tukey A test is more powerful than Scheffe's S-method one expects this test to identify more

significant differences between group means, the results agree with this expectation. No matter

whether the Tukey A test or the S-method is used, the majority of the genes (~82.4% in case of

Tukey A, ~84.0% in case of the S-method) are considered to be constant (=), and among the nonconstant

**time** **series** the simple patterns (+ and -) prevail. Only a very small fraction of the genes

show a more complicated behavior. Patterns reflecting a more complicated behavior of the

expression **time** **series** could, in theory, have more than two discrete signal levels. In general if the

number of characters of a low-resolution pattern is c, the **time** **series** can have at most c+1 discrete

signal levels. If **Feature** **eXtraction** is based on the S-method we can include the information about

the signal levels into the pattern as shown in Table 4-5.

Table 4-5 The level-extended low-resolution pattern group sizes of the inflammatory bowel disease **data**. Groups

with unexpected signal level patterns are highlighted. The high-resolution pattern for the highlighted groups is

shown in addition, for more explanations see text.

Low-resolution S-method High-resolution pattern

pattern

= 3418

L0+L1 210

L1-L0 297

L0+L1+L1 1 L0_0+_L1_1=2+_L1_3

L1-L0-L0 1 L1_0-_L0_1=2-_L0_3

L0+L1-L0 4

L0+L2-L1 1 L0_0+_L2_1-_L1_2=3

L1-L0+L1 6

L1-L0+L1-L0 2

The simple group patterns (=,+,-) have of course the same group size as in the in case of the lowresolution

pattern, however, the signal levels in some of the more complicated patterns come as a

surprise. Consider the two patterns highlighted light gray first. One would expect the low-resolution

patterns ++ and -- to display exactly three signal levels, as the signal level of the **time** **series**

changes twice into the same direction, but the level-extended patterns show only two signal levels

with changes **from** L1 to L1 and L0 to L0 respectively. Changes **from** one signal level to the

identical signal level are obviously not meaningful. The two **time** **series** causing these patterns are

shown in Figure 4-14. Both **time** **series** have a constant segment including the **time** points #1 and

#2. It is this constant segment and the following **time** point #3 that cause the strange level-extended

patterns.

While the pairwise comparison of group means detects a significant difference between the

measurements at **time** point #2 and #3 in both **time** **series**, the meta-group clustering algorithm

merges the meta-group of the constant segment {#1, #2} with the meta-group containing only the

last **time** point {#3} into a single discrete signal level {#1, #2, #3}. This is possible because the

constant segment {#1, #2} has a meta-group mean that is closer to the meta-group mean {#3}, in

comparison to the difference between the single group means of the **time** points #2 and #3. Thus

both descriptions are correct in their own right. The **time** **series** change their signal level twice into

4 Development of the tools: Application of **Feature** **eXtraction** 61

the same direction if interpreted by pairwise comparisons only, but according to the meta-group

comparison the last three **time** points are on the same signal level. Unless the "true" **time** **series** is

known, there is no way to solve this dilemma. However, the level-extended patterns make it easy to

recognize the rare cases where this dilemma shows up.

a

signal

11000

10000

9000

8000

7000

6000

5000

4000

3000

2000

1000

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

days b

signal

520

500

480

460

440

420

400

380

360

340

320

0 2 4 6 8 10 12 14

days

16 18 20 22 24 26 28

Figure 4-14 Two **time** **series** **from** the inflammatory bowel disease **data**, the corresponding probe sets are not

annotated. a, Time **series** 95086_at: the level-extended low-resolution pattern is L0+L1+L1, the level-extended

high-resolution pattern is L0_0+_L1_1=2+_L1_3. b, Time **series** 97826_at: the level-extended low-resolution

pattern is L1-L0-L0, the level-extended high-resolution pattern is L1_0-_L0_1=2-_L0_3.

The pattern L0+L2-L1 is highlighted dark gray in Table 4-5. This pattern indicates that there are

three different signal levels in the **time** **series**. The **data** of this **time** **series** is shown in Figure 4-15,

and the three signal levels that are proposed by the qualitative patterns correspond with human

intuition for this **data**. However, this does not necessarily mean that this gene is indeed expressed at

three different levels of strength. First of all the **data** is very **sparse** and there is always a statistical

chance for a false positive difference. Secondly the mRNA samples in the experiment are taken

**from** colon tissue during several stages of an inflammatory disease. The cellular populations of the

tissue change during the disease, thus this **time** **series** could reflect these changes.

signal

6000

5000

4000

3000

2000

1000

0

0 2 4 6 8 10 12 14

days

16 18 20 22 24 26 28

Figure 4-15 Time **series** 100061_f_at of the inflammatory bowel disease **data**. The level-extended low-resolution

pattern of this **time** **series** is L0+L2-L1, the level-extended high-resolution pattern is L0_0+_L2_1-_L1_2=3.

Grouping the **time** **series** by the high-resolution pattern gave 37 clusters if the S-method was used,

and 38 if the Tukey A test was used for the construction of the FX-matrix (for details see 7.2.2). In

4 Development of the tools: Application of **Feature** **eXtraction** 62

some cases the clusters are not due to biological differences, but are due to the fact that we analyzed

every **time** **series** that had at least two **time** points with two parallel measurements. Thus some **time**

points can be missing in a **time** **series** and this is reflected in the resulting high-resolution pattern.

An illustrative example for this effect is the group of genes whose expression level stays constant

during the experiment. The corresponding low-resolution pattern '=' splits into six different highresolution

patterns 0=1, 0=2, 0=3, 1=2, 1=3, 2=3, solely due to the fact that measurement points

are missing. Time **series** that do not have at least two parallels for every **time** point could of course

be easily removed **from** the **data** set to avoid this kind of splitting.

4.2.1.3 Functional annotation of the small low-resolution clusters

Only very few genes in the inflammatory bowel disease experiment show more than one change in

the signal level. For illustrative purposes we present the functional annotation of these genes (if

there is one) in Table 4-6. The table shows the genes in the three smallest low-resolution groups

resulting **from** **Feature** **eXtraction** via the Tukey A method. Affymetrix recently made the

annotations for the genes represented by a probe set publicly available. They can be found on the

so-called NetAffx webpage (www.netaffx.com).

Table 4-6 Functional assignment of the 17 genes in the 3 smallest low-resolution groups (according to Tukey A).

The Affymetrix annotation for the genes is publicly available on www.netaffx.com. For many genes no

annotation is available: genes marked by an asterisk were annotated using a protein function prediction tool

developed by our group. GenBank EST-identifiers are given (in italics) if no annotation was available.

+- -+ -+-

Affymetrix ID Annotation Affymetrix ID Annotation Affymetrix ID Annotation

100061_f_at Potential

kallikrein

94716_f_at Type C EGFbinding

protein

101025_f_at Small Pro-rich

Type 2A

99701_f_at Small Pro-rich

Type 2B

101386_at Pentaxin* 101551_s_at testin

98938_at 1500026D16

Rik protein*

94041_at K homology

domain (RNA

binding)

103658_r_at PHD finger

(DNA binding,

putative

transcription

factor)

4.2.2 MIF mice **data**

98333_at MHC locus

Class II region

100778_at CD38 antigen

103222_at Eps8

104112_at AI854033

94976_at AI845877

98678_f_at AV371861

92539_at Calpactin I light

chain

99093_at Ribosomal

protein S10*

This experiment was carried out by Andreas Billich at the Novartis Forschungsinstitut in Vienna.

The experiment is a comparative experiment in which **time** **series** **from** wildtype mice are compared

to mice deficient in the macrophage inhibitory factor, MIF. Macrophage migration inhibitory factor

has been postulated to play a central role in the regulation of inflammatory processes. It has been

published that mice deficient in MIF are resistent to endotoxic shock, and that macrophages **from**

these animals produce reduced levels of pro-inflammatory cytokines (e.g., TNF-α) in response to

lipopolysaccharide (LPS) (Bozza et al., 1999). However, A. Billich and others (Honma et al., 2000)

4 Development of the tools: Application of **Feature** **eXtraction** 63

have been unable to reproduce these published **data** in the endotoxic shock model and when

measuring cytokine production **from** stimulated macrophages. Therefore A. Billich wanted to

investigate whether any relevant differences between wild-type and MIF knock-out animals exist at

all with respect to upregulation of genes involved in inflammation, following LPS stimulation.

In the experiment stimulated macrophages were harvested **from** three wild-type mice (MIF +/+ ,

controls) and three MIF-deficient mice (MIF -/- ). The samples were split into three equal parts and

mRNA was extracted **from** these after different **time** of treatment with LPS: mRNA was extracted

without treatment of LPS, mRNA was extracted after 1 hour of treatment with LPS and after 5

hours of treatment with LPS. Thus for the wild-type mice and for the MIF-deficient mice a **time**

**series** with three **time** points in three parallels was obtained.

In more detail the experiment was done as follows. Three wild-type C57Bl6 mice (MIF +/+ , controls)

and three MIF-deficient mice (MIF -/- ) were injected on day 1 with thioglycolate to elicit

macrophage extravasation. On day 4, the peritoneum of the animals was washed with HBSS

(Hank's Balanced Salt Solution). Cells were harvested by centrifugation, washed with RPMI

(Rosswell Park Memorial Institute, the institute where the solution was invented), 10 % FCS (Fetal

Calf Serum), and taken up in the medium at a density of 1.5⋅10 6 /ml. Cells **from** each mouse were

seeded in three Costar 6-well plates (3⋅10 6 cells/well in 2 ml medium). Well #1 received no addition

of LPS, incubation **time**: 1 hour; cells in well #2 were stimulated by addition of LPS (100 ng/ml) for

1 hr; well #3 was treated as #2, but for 5 hrs. After the incubation period, total RNA was extracted

(using the Trizol method) and purified (RNeasy method). Samples were sent to Genetic Therapy

Inc. (Gaithersburg; C. Lavedan) where mRNA was isolated and the templates were transcribed into

cDNA. Samples were hybridised to the murine MG_U74Av2 chip **from** Affymetrix containing

12,488 different probe sets.

power

1

0.8

0.6

0.4

0.2

three parallels

two parallels

0

0 2 4 6 8 10

"treatment effect/background error"

Figure 4-16 Power of the S-method in case for the MIF-experiment. The MIF +/+ -experiment has two parallel

measurements, the MIF -/- -experiment has three parallel measurements. The significance threshold is p=0.01 for

both experiments. The non-centrality parameter φ is shown on the x-axis and designated as "treatment

effect/background error".

Unfortunately chip hybridization failed for one of the parallel mRNA samples **from** the MIF +/+ -

mice. Thus the power for detection of significant jumps differs **from** the wild-type mice **time** **series**

as compared to the MIF-deficient mice **time** **series**. This is illustrated in Figure 4-16. In contrast to

the inflammatory bowel disease experiment, we could obtain all AverageDifference values for this

experiment, including the negative ones.

4.2.2.1 General observations

4 Development of the tools: Application of **Feature** **eXtraction** 64

There are 12,488 different probe sets on the chip. All of the **data** points could be analyzed because

in contrast to the inflammatory bowel disease experiment, we were able to obtain the negative

AverageDifference values as well. We checked for homogeneity of variance using a significance

threshold p=0.01 and Cochran's test. In case of the MIF +/+ -experiment 267 **time** **series** (~2.14

percent) did not satisfy the assumption of homogeneous variance; in the MIF -/- -experiment 361 **time**

**series** (~2.89 percent) did not satisfy the assumption of homogeneous variance. These **time** **series**

were excluded **from** further analysis.

4.2.2.2 Differential regulation in the MIF-experiment

We compared the FX-patterns **from** the MIF +/+ and the MIF -/- **time** **series**. Table 4-7 shows (for

illustrative purposes) the numbers of differentially regulated genes for the low-resolution pattern

groups as based on the Tukey A method. All in all 1,429 of the MIF +/+ **time** **series** that have a

counterpart of homogeneous variance show a different MIF -/- low-resolution pattern.

Table 4-7 Low-resolution patterns (based on Tukey A) for the MIF +/+ and MIF -/- **time** **series**. The 'Different

patterns' column gives the number of MIF +/+ probe sets that have a different low-resolution pattern with respect

to their corresponding MIF -/- probe set.

Pattern MIF +/+ group size MIF -/- group size Different patterns

+ 279 416 179

+- 43 34 32

- 312 515 217

-+ 13 9 13

= 11,574 11,153 988

The signal intensities for TNF-α in the case of the MIF +/+ - and the MIF -/- -mice are shown in Figure

4-17.

signal

10000

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

MIF+/+

MIF-/-

0 1 2 3 4 5

hours

Figure 4-17 The **time** **series** for TNF-α in case of MIF +/+ - and MIF -/- -mice. The MIF +/+ **time** **series** have two

parallel measurements; the MIF -/- **time** **series** have three parallel measurements. The error bars represent

standard deviations.

4 Development of the tools: Application of **Feature** **eXtraction** 65

The low-resolution pattern for the MIF +/+ and MIF -/- **time** **series** is + (regardless whether based on

Tukey A or the S-method). Similarly the MIF +/+ and MIF -/- **time** **series** is 0+1=2 (regardless

whether based on Tukey A or the S-method). The patterns show that TNF-α is regulated identically

in wild-type and MIF-deficient mice. However, the patterns reflect only the sequence of changes

within the **time** **series**, thus TNF-α could be expressed on a quantitatively lower (overall) level in

MIF-deficient mice and still have the same FX-patterns as observed for wild-type mice. Figure 4-17

shows that this is not the case.

5 Discussion

5.1 The one-way ANOVA model

5 Discussion 66

**Feature** **eXtraction** is a useful method for the analysis of general **time** **series** **data**. It condenses the

qualitative features of a **time** **series** into a well-defined, simple pattern string that is associated with

a significance level. The first step of **Feature** **eXtraction** is to identify the significant changes of the

signal level in a given **time** **series**. This can be done only if some kind of model is used that

represents the basic properties of the **data**. Whatever model is used, it will be based on some

fundamental assumptions about the **data**. It is these assumptions that make a statistical interpretation

of the **data** possible. We use a simple model here that allows the interpretation of the **data** by oneway

ANOVA. This simple model will be valid for **time** **series** obtained **from** many different

sources, but it is certainly not sophisticated enough in all cases. However, any model for the **data**

can be used if statistical methods can be inferred **from** it that allow identifying significant changes

of the signal levels. Given such methods, the FX-matrix and subsequently the low-resolution and

high-resolution pattern strings can be constructed. The level-extended pattern strings can be

constructed only if the statistical methods allow for the comparison of meta-groups.

The one-way ANOVA analysis makes it possible to discriminate between systematic variation and

unsystematic variation across the measurement groups of a **time** **series**. The systematic variation

reflects "true" changes of the signal level ("treatment effect" in ANOVA terminology), while the

unsystematic variation is a result of effects that are considered random. As the name "one-way

ANOVA" indicates, this methodology is intended to detect significant systematic variation due to a

single treatment effect against a background of unsystematic variation. Thus the experimental setup

must ensure that there are no additional sources of systematic variation, otherwise it cannot be

decided whether a significant effect is caused by one or another source of systematic variation.

Consider, as an example, a microarray **time** **series** experiment. Suppose that the parallel

measurements taken at **time** point t are done with a set of chips **from** one production batch, and the

measurement at **time** point t+k are done with a set of chips **from** a different batch. If there is a

systematic difference in the signal produced by the two chip sets, the measurement will be biased

due to this additional systematic variation. In the case of microarray experiments the preprocessing,

normalization and background-correction steps that need to be performed to give the final mRNA

expression metric should cope with other sources of systematic variation. However, if in doubt

about the success of these steps, checks can be done with the help of appropriate experimental

design only. The so-called multi-way ANOVA approach can handle several sources of systematic

variation simultaneously. If an experimental setup is used that allows for multi-way ANOVA,

unwanted sources of systematic variation can be identified and accounted for in the analysis.

Furthermore, in many cases additional sources of systematic variation are not unwanted but are

rather introduced on purpose. One frequently wants to compare the **time** **series** obtained **from** some

experimental conditions to its equivalent obtained **from** another or several other experimental

conditions. If the FX-patterns obtained **from** such an experiment differ, this implies that the

different treatment conditions have an effect on the process under study. But this effect will only be

detected if it is reflected in the sequence of upward and downward changes of the **time** **series**. If the

patterns obtained **from** the different treatment conditions are identical, this does not imply that there

is no treatment effect at all. It only means, with a certain significance, that the temporal sequence of

upward and downward changes is identical under the treatment conditions. It is still possible that

the **time** **series** differ in signal strength. The question as to whether two or more **time** **series** that

have identical FX-patterns are significantly different in signal strength can be answered easily

within a multi-way ANOVA setting. One has to ask for differential main effects here (see for

example (Winer, 1971a)).

5 Discussion: The one-way ANOVA model 67

The one-way ANOVA approach is perfectly suited to detect the changes of the signal level within a

single **time** **series**. The inflammatory bowel disease **data** indicate that the model is suited for the

analysis of microarray **data**. We used the inflammatory disease **data** as a test case, because this was

the experiment with the highest number of replications we had access to. (We could not find any

publicly available gene expression **time** **series** with more replicated measurements.) Given the small

number of parallel measurements in the inflammatory disease experiment, it is not sensible to test

for normality of the treatment groups. We simply assumed that the measurements at a given **time**

point have a Normal distribution. The second assumption of the model is constant variance across

the treatment groups and this assumption was checked by Cochran's q. This test is quite robust

against departures **from** normality and is recommended in cases where the number of measurements

per group is below ten. Approximately 97 percent of all analyzable **time** **series** were accepted by

this test to be homogeneous in variance at a significance level of p=0.01. Thus the majority of the

**time** **series** conform to the model. At this significance level one would expect the null hypothesis of

homogeneous variance to be rejected in 1 percent of all cases. But we find that the null hypothesis

is rejected for approximately 3 percent of the **time** **series**, this means that there is definitely a small

fraction of **time** **series** with inhomogeneous variance across the group means. The situation is

similar in the MIF mice **data**: 1 percent false positives are expected but 2.14 percent and 2.89

percent false positives are found in the MIF +/+ and the MIF -/- **time** **series** respectively. The reasons

for this are not clear, and the experimental design does not allow an investigation into whether this

inhomogeneity is due to biological or artificial effects. However, if one considers the complexity of

the measurement procedure and the number of signal correction and signal scaling steps that are

necessary, artificial effects cannot be excluded. Thus we conclude that despite the small

inhomogeneous fraction, the simple model is well supported by the **data**.

5.2 Power and experimental design

**Feature** **eXtraction** identifies significant changes of the signal level by the use of either the Tukey A

test or Scheffe's S-method. These methods allow the detection of significant changes by

discriminating systematic variations **from** a background of unsystematic variations. It is intuitive

that changes due to systematic variations cannot be detected as long as they are not significantly

bigger than changes we would expect due to the unsystematic background variation. Obviously one

is interested in knowing about the sensitivity of the method to detect systematic variations of a

certain size. The statistical concept of 'power' describes the capability of a test to detect a difference

of a certain magnitude, and was discussed in more detail in section 4.1.1.5. The power of the Smethod

is well understood. While the exact power of the Tukey A test cannot be computed

analytically, the test is known to be somewhat more powerful than the S-method. Thus the Smethod

may be used as a "worst case" scenario for the power of **Feature** **eXtraction**.

Consider as an example the setup of the inflammatory bowel disease **data**, where there are four **time**

points in the **time** **series** of each gene, and there are four parallels for each of these **time** points. How

big a treatment effect is detected by the S-method with an acceptable probability? The answer to

this question depends on the choice of the significance threshold used in the detection of significant

differences of group means. As illustrated in Figure 5-1 the power increases if a less stringent

significance threshold is chosen. If we consider a significance threshold of p=0.01, a systematic

variation causing a significant change must be ~5 **time**s as big as the standard error of the

background to be detected with a probability of ~80 percent. If the less stringent significant level

p=0.05 is chosen a significant change must be only be ~4 **time**s as big as the standard error of the

background to be detected with a probability of ~80 percent.

Thus when trying to detect significant changes of the signal level in **time**, a compromise must be

found. On the one hand one wants to detect as many significant changes as there are, but on the

other hand the choice of a less stringent significance level will increase the probability for a Type I

error. This is not specific to **Feature** **eXtraction**, but applies in general to any statistical method that

tries to discriminate between systematic and unsystematic variation.

power

1

0.8

0.6

0.4

0.2

5 Discussion: Power and experimental design 68

p=0.05

p=0.01

0

0 2 4 6 8 10

"treatment effect/background error"

Figure 5-1 Power of the S-method for the analysis of a **time** **series** with four **time** points and four parallel

measurements. Two different significance levels are shown. The non-centrality parameter φ is shown on the xaxis

and designated as "treatment effect/background error".

The most direct way to increase the sensitivity in the analysis of **time** **series** **data** is to reduce the

background variation. Intuitively, the power depends on the overlap of the distribution in the case

where the null hypothesis is true ('null distribution') with the distribution in the case of a certain

treatment effect ('alternative distribution'). We can think of the non-centrality parameter, φ, that

specifies the alternative distribution with respect to the null distribution as the size of the treatment

effect in relation to the background error (the common standard deviation of the measurement).

Suppose a process without any inherent variability is measured. In this case the background

variation is due to measurement errors only. Thus if the measurement error is reduced, smaller

changes in the signal level can be detected as significant. If there is no measurement error at all (this

is never true in practice), any change of the signal level will be detected as significant. However,

many processes, especially biological processes, are inherently variable; even if there is no

measurement error at all, background variation will be encountered. The inherent variability is an

important property of a process as it somehow defines the smallest change of the signal level that

can be considered significant under ideal experimental conditions. If a good estimate of the inherent

variability of a process is at hand, one knows what treatment effect must be detectable with a decent

power in the worst case.

Suppose an inherently variable system is to be analyzed, and we learned somehow that a treatment

effect that is at least four **time**s bigger than the background error is relevant to detect. An increase in

the number of parallel measurements will increase the probability of detecting a treatment effect on

this background error. This is shown in Figure 5-2. The example shows that the effect of an increase

of the number of parallel measurements decreases with a growing number of parallels. In case of

four **time** points and a significance threshold of p=0.05 around ten parallel measurements can be

considered optimal to detect a treatment effect that is four **time**s as big as the standard error of the

background. Any further increase will not result in an increase of power that is of practical

relevance. Given a significance level of p=0.05 and ten parallel measurements the S-method

operates with a power of roughly 90 percent. Note that this power can be achieved at the more

stringent significance level of p=0.01 with an impracticable number of parallels (>>20) only. This is

an important observation with respect to the relation of power and stringency of the S-method

discussed above. If a more stringent significance level is chosen for the analysis, the loss of power

with respect to a given treatment effect cannot be easily compensated for by an increase in the

5 Discussion: Power and experimental design 69

number of parallel measurements. Though a power of 100 percent can be achieved theoretically for

any significance level and any treatment effect, in practice the power of the S-method is limited

with respect to the treatment effect that can be discriminated **from** the background of variation.

power

1

0.8

0.6

0.4

0.2

p=0.05

p=0.01

0

0 4 8 12 16 20

samples per group

Figure 5-2 Power of the S-method for the analysis of a **time** **series** with four **time** points. The "treatment effect

per measurement error" that is to be detected is set to φ=4. An increase of the samples per group (the number of

parallel measurements) increases the power.

Suppose the non-centrality parameter is fixed to φ=4. If the number of **time** points is increased

while the number of parallels is not, the power for detection of significant differences between

group means decreases as shown in Figure 5-3. The number of parallel measurements is four in this

example. In the case of the minimal length of two **time** points the power of the S-method is at its

maximum (in fact it equals the power of the Student's T test), but with increasing number of **time**

points the power decreases.

power

1

0.8

0.6

0.4

0.2

p=0.05

p=0.01

0

0 4 8 12 16 20

number of groups

Figure 5-3 Power of the S-method for the analysis of a **time** **series** with 4 parallel measurements. The "treatment

effect per measurement error" that is to be detected is set to φ=4. The power decreases with an increasing

number of **time** points.

5 Discussion: Power and experimental design 70

One must be aware that in general the analysis of longer **time** **series** needs more parallel

measurements to detect a certain treatment effect with a sufficient power.

5.3 Grouping by qualitative patterns

For any **time** **series** of a given length there exists a certain number of different qualitative patterns.

For example for a **time** **series** with four **time** points there are nine different possible low-resolution

patterns (+, +-, ++, +-+, =, -, -+, --, -+-). Any **time** **series** that is analyzed will be assigned to

one of these possible pattern categories. Grouping thus becomes trivial as all **time** **series** with the

same qualitative pattern can be grouped together.

The probability that a **time** **series** may end up in the wrong category due to a Type I error can be

directly controlled by the choice of the significance level. Given n **time** **series** and a significance

level, p, the expected number of misclassifications (due to a Type I error) is n ⋅ p . We analyzed the

inflammatory bowel disease **data** using a significance threshold of p=0.01. The number of

analyzable **time** **series** in this **data** set is 3,940; thus we expect roughly 40 of the **time** **series** to

misclassify (due to a Type I error). The choice of a more stringent significance level will reduce the

number of misclassified **time** **series**, but simultaneously reduces the power of the method to detect

significant differences between group means. Figure 5-4 illustrates this effect for the setup of the

inflammatory bowel disease experiment.

power

1

0.8

0.6

0.4

0.2

p=0.01

p=0.001

p=0.0001

p=0.00001

0

0 2 4 6 8 10

"treatment effect/background error"

Figure 5-4 Power of the S-method for the analysis of a **time** **series** with four **time** points and four parallel

measurements. Different significance levels are shown. The non-centrality parameter φ is shown on the x-axis

and designated as "treatment effect/background error".

The relation between power and stringency discussed in the previous section is encountered here

again. And as discussed, the loss in power with respect to a given treatment effect cannot be easily

compensated for by an increase of the parallel measurements. Thus there is no simple recipe for the

optimal choice of the significance threshold.

5.4 **Feature** **eXtraction** and microarray **data**

5.4.1 Inflammatory bowel disease **data**

Microarray **time** **series** experiments are frequently peformedin a more exploratory way. The

researcher wants to get a first impression of the transcriptional changes that are associated to a

5 Discussion: **Feature** **eXtraction** and microarray **data** 71

specific situation. These experiments are looked at with a certain preconceived idea about the

behavior of some special genes in mind. One may want to confirm the ideas about the behavior of

these genes, and to detect some links to genes with potentially related functions. If dealing with the

raw **data** directly, this exploratory approach is severely hindered by the sheer amount of **data**. The

pattern strings provided by **Feature** **eXtraction** compress the **data** and provide a concise picture of

the changes in the expression level for every single gene. The inflammatory bowel disease

experiment is a typical example for such an exploratory analysis. The expression of roughly 12,000

genes was measured at four **time** points during a period of 28 days. With the help of the FXpatterns,

the signal level changes for every gene of interest can be looked up easily.

The functional annotation for the 17 genes in the smallest regulation groups (+-, -+, -+-) is given

in Table 4-6. The genes in these groups could be expected to show non-constant expression profiles

based on their functions. For example, pentaxins are serum proteins which include the acute phase

proteins serum amyloid protein and C-reactive protein. Haptoglobin, a mouse acute phase protein, is

first detected 14 days after T cell transfer and is maximal after 28 days. Early upregulation of the

gene may thus be expected. Three of the proteins of the -+ group, MHC Class II region, CD38

antigen and Eps8 are all likely to be expressed at the **time** of ongoing inflammatio. MHC Class II is

involved in antigen presentation, CD38 antigen in adhesion between lymphocytes and endothelial

cells, and Eps8 is a signal transducer involved in cell growth and differentiation.

However the inflammatory bowel disease **data** alone are not sufficient to show that the regulation of

gene expression is due to colitis. As an appropriate control is missing one cannot distinguish

between genes that are regulated due to colitis and genes that would be regulated anyway during the

experimental **time** period.

5.4.2 MIF mice **data**

The MIF mice **data** is obtained **from** a comparative experiment designed to identify genes that are

regulated differentially after LPS treatment of stimulated macrophages (see 4.2.2). It should be

verfied whether some pro-inflammatory cytokines, especially TNF-α, are differentially regulated in

MIF-deficient mice with respect to wild-type mice, as was postulated in the literature (Bozza et al.,

1999). The FX-patterns for TNF-α in MIF +/+ - and MIF -/- -mice are identical, and show that gene

expression is significantly upregulated immediately after LPS-treatment (see 4.2.2.2). Though no

difference in the regulation is detected by the FX-patterns, TNF-α could still be expressed on an

(overall) lower level in MIF-deficient mice. Figure 4-17 shows that this is not the case (no statistical

test is needed in this obvious case). Thus TNF-α cannot be regarded as differentially expressed in

the MIF-mice experiment.

The comparative MIF-mice experiment illustrates that differential regulation can be easily detected

by the use of the FX-patterns. However, the signal level strenghts of **time** **series** cannot be compared

directly using the FX-patterns, but, as discussed in 1.1, this can be easily done by asking for

differential main effects. Comparison of two or more FX-patterns is equivalent to grouping by FXpatterns,

thus misclassification due to Type I errors is to be expected (see 5.3).

TNF-α is not differentially regulated in this experiment, however according to the low-resolution

patterns (as based on Tukey A) 1,429 other genes are differentially regulated. These observations

cannot be interpreted in terms of what is going on after LPS-treatment in stimulated macrophages

directly, but become useful if knowledge or hypotheses about the system (such as "TNF-α is

regulated differentially in MIF +/+ - and MIF -/- -mice") allow us to ask well-defined questions.

5.4.3 Microarray **time** **series** in general

The MIF-mice experiment illustrates an important aspect of microarray **time** **series** experiments. A

long list of differentially regulated genes is obtained, but the biological meaning of these

observations is not clear. This is due to the complexity of biological systems and our still limited

5 Discussion: **Feature** **eXtraction** and microarray **data** 72

knowledge about these. To put it in terms of "understanding a biological system at the molecular

level of detail": the wiring of the regulatory network is, with the exception of very few interaction

mechanisms, not known. Time **series** experiments are in general of limited use to reveal this wiring.

Though interactions can be identified by perturbation experiments, the mechanisms by which these

interactions operate cannot be understood by **time** **series** experiments alone. Furthermore, as the

combinatorial complexity of the perturbation approach grows exponentially with the size of the

system it is not very promising to study large numbers of system components simultaneously. If

small subsets of components that are differentially regulated under certain experimental conditions

can be isolated, research can be focussed on the interaction mechanisms between these components.

The contribution of the microarray technology to an understanding of interaction mechanisms in

biological systems is limited as it only provides information about mRNA levels.

5.5 Conclusion

Sparse **time** **series** provide qualitative information. **Feature** **eXtraction** is a useful tool for the

analysis of such **data** in general. Today's microarray **time** **series** experiments generate large numbers

of **sparse** **time** **series**. This **data** should not be expected to give direct insight into the wiring of

biological systems, but will contribute to the identification of small sets of potentially interacting

components and thus facilitate the planning of experiments that aim at an understanding of the

wiring. **Feature** **eXtraction** is a valuable tool in this setting. It condenses relevant qualitative

information into a simple, easy-to-use pattern string of well-defined significance. The FX-patterns

are useful in the exploratory analysis of microarray **time** **series** and in systematic approaches to the

identification of differentially regulated genes.

The interaction mechanisms in most biological systems are still unknown today. But if the wiring of

a regulatory network is known, the FX-patterns can be straightforwardly applied in a qualitative,

kinetic analysis of its integrated behavior in **time**. Though a detailed understanding of a regulatory

network requires a quantitative analysis, a qualitative analysis can nevertheless be very useful. It is

frequently very hard to obtain the parameters needed for a quantitative analysis, especially in

biological systems. Furthermore, if the system under study is complex, a qualitative analysis can

provide a clearer view on the crucial aspects of the integrated behavior in **time**.

Microarray **time** **series** are noisy. The noise in the **data** is due to measurement errors and the

inherent variability of biological systems. A complete assessment of the noise present in microarray

**time** **series** is possible only with replicated experiments. Our efforts to assess the noise in

microarray **time** **series** were hampered by the fact that we had no access to experiments with a

sufficient number of replications. One reason for this may be the fact that the microarray technology

is still in its infancy, the experiments are **time** consuming, relatively expensive and are plagued by

certain technological difficulties. We expect that with an improvement of the technology

experiments with more replicates will become feasible, as only these allow for sensitive detection of

significant features **from** the background variation.

Alternatives to the simple model for microarray **time** **series** **data** we use here are probably suggested

by experiments with a sufficient number of replicates. It is even worthwhile to set up experiments

solely for the purpose of developing a model of the **time** **series** **data** that are expected **from** future

experiments. The analysis of **time** **series** **data** depends on an appropriate model that allows for

statistical inference. Thus the contribution of microarray **time** **series** **data** to an understanding of the

genetic network does depend on the quality of the technology and the models used in the analysis of

the **data**.

6 References

6 References 73

Adams,M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M., Polymeropoulos,M.H., Xiao,H., Merril,C.R.,

Wu,A., Olde,B., and Moreno,R.F. (1991). Complementary DNA sequencing: expressed sequence

tags and human genome project. Science 252, 1651-1656.

Affymetrix, Technical Note. New Statistical Algorithms for Monitoring Gene Expression on

GeneChip® Probe Arrays. 2001. Affymetrix.

Ref Type: Report

Affymetrix, Technical Note. Statistical Algorithms Reference Guide. 2002. Affymetrix.

Ref Type: Report

Alizadeh,A.A., Eisen,M.B., Davis,R.E., Ma,C., Lossos,I.S., Rosenwald,A., Boldrick,J.C., Sabet,H.,

Tran,T., Yu,X., Powell,J.I., Yang,L., Marti,G.E., Moore,T., Hudson,J., Jr., Lu,L., Lewis,D.B.,

Tibshirani,R., Sherlock,G., Chan,W.C., Greiner,T.C., Weisenburger,D.D., Armitage,J.O.,

Warnke,R., and Staudt,L.M. (2000). Distinct types of diffuse large B-cell lymphoma identified by

gene expression profiling. [see comments]. Nature 403, 503-511.

Alter,O., Brown,P.O., and Botstein,D. (2000). Singular value decomposition for genome-wide

expression **data** processing and modeling. Proceedings of the National Academy of Sciences of the

United States of America 97, 10101-10106.

Anderson,L. and Seilhammer,J. (1997). A comparison of selected mRNA and protein abundance in

human liver. Electrophoresis 18 , 533-537.

Anderson,N.G. and Anderson,N.L. (2001). Twenty years of two-dimensional electrophoresis: past,

present and future. Electrophoresis 17, 443-453.

Arkin,A., Ross,J., and McAdams,H.H. (1998). Stochatic Kinetic Analysis of Developmental

Pathway Bifurcation in Phage λ-Infected Escherichia coli Cells. Genetics 149, 1633-1648.

Baugh,L.R., Hill,A.A., Brown,E.L., and Hunter,C.P. (2001). Quantitative analysis of mRNA

amplified by in vitro transcription. Nucleic Acids Res. 29, e29.

Ben Dor,A., Shamir,R., and Yakhini,Z. (1999). Clustering gene expression patterns. Journal of

Computational Biology 6, 281-297.

Berndt,P., Hobohm,U., and Langen,H. (1999). Reliable automatic protein identification **from**

matrix-assisted laser desorption/ionization mass spectrometric peptide fingerprints. Electrophoresis

20, 3521-3526.

Bittner,M., Meltzer,P., Chen,Y., Jiang,Y., Seftor,E., Hendrix,M., Radmacher,M., Simon,R.,

Yakhini,Z., Ben Dor,A., Sampas,N., Dougherty,E., Wang,E., Marincola,F., Gooden,C., Lueders,J.,

Glatfelter,A., Pollock,P., Carpten,J., Gillanders,E., Leja,D., Dietrich,K., Beaudry,C., Berens,M.,

**Albert**s,D., and Sondak,V. (2000). Molecular classification of cutaneous malignant melanoma by

gene expression profiling. Nature 406, 536-540.

Bjellqvist,B., Ek,K., Righetti,P.G., Gianazza,E., Goerg,A., Westermeier,R., and Postel,W. (1982).

Isoelectric focussing in immobilized pH gradients: principle, methodology and some applications. J.

Biochem. Biophys. Methods 6, 317-339.

6 References 74

Blanchard,A.P., Kaiser,R.J., and Hood,L.E. (1996). Synthetic DNA arrays. Biosensors and

Bioelectronics 11, 687-690.

Bozza,M., Satoskar,A.R., Lin,G., Lu,B., Humbles,A.A., Gerard,C., and David,J.R. (1999). Targeted

disruption of migration inhibitory factor gene reveals its critical role in sepsis. J. Exp. Med. 189,

341-346.

Brachat,A., Pierrat,B., Brungger,A., and Heim,J. (2000). Comparative microarray analysis of gene

expression during apoptosis-induction by growth factor deprivation or protein kinase C inhibition.

Oncogene 19, 5073-5082.

Brady,G. (2000). Expression profiling of single mammalian cells - small is beautiful. Yeast 17, 211-

217.

Brown, B. W., Lovato, J., and Russel, K. DCDFLIB. [1.0]. 1994. Department of Biomathematics at

the University of Texas.

Ref Type: Computer Program

Brown,M.P., Grundy,W.N., Lin,D., Cristianini,N., Sugnet,C.W., Furey,T.S., Ares,M., Jr., and

Haussler,D. (2000). Knowledge-based analysis of microarray gene expression **data** by using support

vector machines. Proceedings of the National Academy of Sciences of the United States of America

97, 262-267.

Celis,J.E., Gromov,P., Otergaard,M., Madsen,P., Honore,B., Deijgaard,K., Olsen,E., Vorum,H.,

Kristensen,D.B., and Gromova,I. (1996). Human 2-D PAGE **data**bases for proteome anaysis in

health and disease: http://biobase.dk/cgi-bin/celis. FEBS Letters 398, 129-134.

Chen,H., Centola,M., Altschul,S.F., and Metzger,H. (1998). Characterization of gene expression in

resting and activated mast cells. J. Exp. Med. 188, 1657-1668.

Cho,R.J., Campbell,M.J., Winzeler,E.A., Steinmetz,L., Conway,A., Wodicka,L., Wolfsberg,T.G.,

Gabrielian,A.E., Landsman,D., Lockhart,D.J., and Davis,R.W. (1998). A genome-wide

transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65-73.

Chu,S., DeRisi,J., Eisen,M., Mulholland,J., Botstein,D., Brown,P.O., and Herskowitz,I. (1998). The

transcriptional program of sporulation in budding yeast. [erratum appears in Science 1998 Nov

20;282(5393):1421]. Science 282, 699-705.

Chudin, E., Walker, R., Kosaka, A., Wu, S. X., Rabert, D., Chang, T. K., and Kreder, D. E.

Assesment of the relationship between signal intensities and transcript concentration for Affymetrix

GeneChip® arrays. Genome Biology 3[1], research0005.1-research0005.10. 14-12-2001.

Ref Type: Electronic Citation

Claverie,J.M. (1999). Computational methods for the identification of differential and coordinated

gene expression. Human Molecular Genetics 8, 1821-1832.

Davies,H., Lomas,L., and Austen,B. (1999). Profiling of amyloid beta peptide variants using SELDI

protein chip arrays. Biotechniques 27, 1261.

DeRisi,J., Penland,L., Brown,P.O., Bittner,M.L., Meltzer,P.S., Ray,M., Chen,Y., Su,Y.A., and

Trent,J.M. (1996). Use of a cDNA microarray to analyse gene expression patterns in human cancer.

Nature Genetics 14, 457-460.

6 References 75

Duggan,D.J., Bittner,M., Chen,Y., Meltzer,P., and Trent,J.M. (1999). Expression profiling using

cDNA microarrays. Nature Genetics 21, 10-14.

Eberwine,J., Yeh,H., Miyashiro,K., Cao,Y., Nair,S., Finnel,R., Zettel,M., and Coleman,P. (1992).

Analysis of gene expression in single live neurons. Proc. Natl. Acad. Sci. USA 89, 3010-3014.

Eickhoff,H., Schuchardt,J., Ivanov,I., Meier-Ewert,S., O'Brien,J., Malik,A., Tandon,N., Wolski,E.,

Rohlfs,E., and Nyarsik,L. (2000). Tissue gene expression analysis using arrayed normalized cDNA

libraries. Genome Res. 10, 1230-1240.

Eisen,M.B., Spellman,P.T., Brown,P.O., and Botstein,D. (1998). Cluster analysis and display of

genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United

States of America 95, 14863-14868.

Eng,J., McCormack,A.L., and Yates,J.R. (1994). An approach to correlate MS/MS **data** to amino

acid sequences in a protein **data**base. J. Am. Soc. Mass Spectrom. 5, 976-989.

Ewing,R., Kahla,A.B., Poirot,O., Lopez,F., Audic,S., and Claverie,J. (1999). Large-scale statistical

analysis of rise ESTs reveal correlated patterns of gene expression. Genome Res. 9, 950-959.

Fellenberg,K., Hauser,N.C., Brors,B., Neutzner,A., Hoheisel,J.D., and Vingron,M. (2001).

Correspondence analysis applied to microarray **data**. Proceedings of the National Academy of

Sciences of the United States of America 98, 10781-10786.

Fodor,S.P.A. (1997). Massively parallel genomics. Science 393-395.

Fodor,S.P.A., Rava,R.P., Huang,X.C., Pease,A.C., Holmes,C.P., and Adams,C.L. (1993).

Multiplexed biochemical assays with biological chips. Nature 364, 555-556.

Fodor,S.P.A., Read,J.L., Pirrung,M.C., Stryer,L., Lu,A.T., and Solas,D. (1991). Light-directed,

spatially addressable parallel chemical synthesis. Science 251, 767-773.

Futcher,B., Latter,G.I., Monardo,P., McLaughlin,C.S., and Garrels,J.I. (1999). A sampling of the

yeast proteome. Molecular and Cellular Biology 19, 7357-7368.

Gillespie,D. and Spiegelman,S. (1965). A quantitative assay for DNA-RNA hybrids with DNA

immobilised on a membrane. J. Mol. Biol. 12, 829-842.

Gillespie,D.T. (1976). A general method for numerically simulating stochastic **time** evolution of

coupled chemical reactions. J. Comput. Phys. 22, 403-434.

Gillespie,D.T. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem.

81, 2340-2361.

Gillespie,D.T. (1992). A rigorous derivation of the chemical master equation. Physica A 188, 404-

425.

Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeek,M., Mesirov,J.P., Coller,H.,

Loh,M.L., Downing,J.R., Caligiuri,M.A., Bloomfield,C.D., and Lander,E.S. (1999). Molecular

classification of cancer: class discovery and class prediction by gene expression monitoring.

Science 286, 531-537.

Guptasarma,P. (1995). Does replication-induced transcription regulate synthesis of the myriad low

copy number protein of Escherichia coli ? BioAssays 17, 987-997.

6 References 76

Gygi,S.P., Rochon,Y., Franza,B.R., and Aebersold,R. (1999). Correlation between protein and

mRNA abundance in yeast. Molecular and Cellular Biology 19, 1720-1730.

Gygi,S.P., Rochon,Y., Franza,B.R., and Aebersold,R. (2000). Evaluation of two-dimensional gel

electrophoresis-based proteome analysis technology. Proc. Natl. Acad. Sci. USA 97, 9390-9395.

Henzel,W.J., Billeci,T.M., Stults,J.T., and Wong,S.C. (1993). Identifying proteins **from** twodimensional

gels by molecular mass searching of peptide fragments in protein sequence **data** base.

Proc. Natl. Acad. Sci. USA 90, 5011-5015.

Herwig,R., Aanstad,P., Clark,M., and Lehrach,H. (2001). Statistical evaluation of differential

expression on cDNA nylon arrays with replicated experiments. Nucleic Acids Res. 29, 117-126.

Hoaglin,D.C., Mosteller,F., and Tukey,J.W. (1983). Understanding Robust and Exploratory Data

Analysis. (New York: Wiley & Sons).

Holter,N.S., Mitra,M., Maritan,A., Cieplak,M., Banavar,J.R., and Fedoroff,N.V. (2000).

Fundamental patterns underlying gene expression profiles: simplicity **from** complexity. Proceedings

of the National Academy of Sciences of the United States of America 97, 8409-8414.

Honma,N., Koseki,H., Akasaka,T., Nakayama,T., Taniguchi,M., Serizawa,I., Akahori,H.,

Osawa,M., and Mikayama,T. (2000). Deficiency of the macrophage migration inhibitory factor

gene has no significant effect on endotoxaemia. Immunology 100, 84-90.

Hu,W.S. and Tremin,H.M. (1990). Retroviral recombination and reverse transcripton. Science 250,

1227-1233.

Hughes,T.R., Mao,M., Jones,A.R., Burchard,J., Marton,M.J., Shannon,K.W., Lefkowitz,S.M.,

Ziman,M., Schelter,J.M., Meyer,M.R., Kobayashi,S., Davis,C., Dai,Y., He,Y.D., Stephaniants,S.B.,

Cavet,G., Walker,W.L., West,A., Coffey,E., Shoemaker,D.D., Stoughton,R., Blanchard,A.P.,

Friend,S.H., and Linsley,P.S. (2001). Expression profiling using microarrays fabricated by an inkjet

oligonucleotide synthesizer. Nature Biotechnol. 19, 342-347.

Hughes,T.R., Marton,M.J., Jones,A.R., Roberts,C.J., Stoughton,R., Armour,C.D., Bennett,H.A.,

Coffey,E., Dai,H., He,Y.D., Kidd,M.J., King,A.M., Meyer,M.R., Slade,D., Lum,P.Y.,

Stepaniants,S.B., Shoemaker,D.D., Gachotte,D., Chakraburtty,K., Simon,J., Bard,M., and

Friend,S.H. (2000). Functional discovery via a compendium of expression profiles. Cell 102, 109-

126.

Iyer,V.R., Eisen,M.B., Ross,D.T., Schuler,G., Moore,T., Lee,J.C., Trent,J.M., Staudt,L.M.,

Hudson,J., Jr., Boguski,M.S., Lashkari,D., Shalon,D., Botstein,D., and Brown,P.O. (1999). The

transcriptional program in the response of human fibroblasts to serum. Science 283, 83-87.

Jensen,O.N., Mortensen,P., Vorm,O., and Mann,M. (1997). Automation of matrix assisted laser

desorption/ionization mass spectrometry using fuzzy logic feedback control. Anal. Chem. 69, 1706-

1715.

Kafatos,F.C., Jones,C.W., and Efstratiadis,A. (1979). A Determination of nucleic acid sequence

homologies and relative concentrations by a dot hybridization procedure. Nucleic Acids Res. 24,

1541-1552.

Kane,M.D., Jatkoe,T.A., Stumpf,C.R., Lu,J., Thomas,J.D., and Madore,S.J. (2000). Assesment of

the sensitivity and specificity of oligonucleotide (50mers) microarrays. Nucleic Acids Res. 4552.

6 References 77

Kerr,M.K. and Churchill,G.A. (2001). Bootstrapping cluster analysis: assessing the reliability of

conclusions **from** microarray experiments. Proceedings of the National Academy of Sciences of the

United States of America 98, 8961-8965.

Khan,J., Wei,J.S., Ringner,M., Saal,L.H., Ladanyi,M., Westermann,F., Berthold,F., Schwab,M.,

Antonescu,C.R., Peterson,C., and Meltzer,P.S. (2001). Classification and diagnostic prediction of

cancers using gene expression profiling and artificial neural networks. [see comments]. Nature

Medicine 7, 673-679.

Lauerman,V. and Boeke,J.D. (1994). The primer tRNA sequence is not inheriteed during Ty1

retrotransposition. Proc. Natl. Acad. Sci. USA 91, 9847-9851.

Ledermann,W. (1980). HANDBOOK OF APPLICABLE MATHEMATICS., W.Ledermann,

R.F.Churchhouse, H.Cohn, P.Hilton, E.Lloyd, S.Vajda, and C.Jenkins, eds. (Chichester: John Wiley

& Sons).

Lee,M.L., Kuo,F.C., Whitmore,G.A., and Sklar,J. (2000). Importance of replication in microarray

gene expression studies: statistical methods and evidence **from** repetitive cDNA hybridizations.

Proceedings of the National Academy of Sciences of the United States of America 97, 9834-9839.

Lennon,G., Auffray,C., Polymeropoulos,M., and Soares,M.B. (1996). The I.M.A.G.E. Consortium:

an integrated molecular analysis of genomes and their expression. Genomics 33, 151-152.

Lennon,G.G. and Lehrach,H. (1991). Hybridization analyses of arrayed cDNA libraries. Trends

Genet. 7, 314-317.

Link,A.J., Eng,J., Schieltz,D.M., Carmack,E., Mize,G.J., Morris,D.R., Garvik,B.M., and Yates,J.R.

(1999). Direct analysis of protein complexes using mass spectrometry. Nature Biotechnol. 17, 676-

682.

Lipshutz,R.J., Fodor,S.P.A., Gingeras,T.R., and Lockhart,D.J. (1999). High density synthetic

oligonucleotide arrays. Nature Genetics 21, 20-24.

Lockhart,D.J., Dong,H., Byrne,M.C., Follettie,M.T., Gallo,M.V., Chee,M.S., Mittmann,M.,

Wang,C., Kobayashi,M., Horton,H., and Brown,E.L. (1996). Expression monitoring by

hybridization to high-density oligonucleotide arrays. Nature Biotechnology 14, 1675-1680.

Luan,D.D. and Korman,M.H. (1993). Reverse transcription of R2Bm RNA is primed by a nick at

the chromosomal target site: a mechanism for non-LTR retrotranskription. Cell 72, 595-605.

Lueking,A., Horn,M., Eickhoff,H., Lehrach,H., and Walter,G. (1999). Protein microarrays for gene

expression and antibody screening. Anal. Biochem. 270, 103-111.

Lund,J.R. (1983). Probabilities and Upper Quantiles for the Studentized Range. J. Roy. Stat. Soc.

(Appl. Statistics) 32, 204-210.

Luo,L., Salunga,R.C., Guo,H., Bittner,A., Joy,K.C., Galindo,J.E., Xiao,H., Rogers,K.E., Wan,J.S.,

Jackson,M.R., and Erlander,M.G. (1999). Gene expression profiles of lser-captured adjacent

neuronal subtypes. Nature Med. 5, 117-122.

Luzzi,V., Holtschlag,V., and Watson,M.A. (2001). Expression profiling of ductal carcinoma in situ

by laser capture microdissection and high-density oligonucleotide arrays. American Journal of

Pathology 158, 2005-2010.

6 References 78

Madden,S., Galella,E., Zhu,J., Bertelsen,A., and Beaudry,G. (1997). SAGE transcript profiles for

p53-dependent growth regulation. Oncogene 15, 1079-1085.

Mann,M. (1996). A shortcut to interesting human genes: peptide sequence tags, ESTs and

computers. Biochem. Sci. 21, 494-495.

Mann,M. and Wilm,M. (1994). Error tolerant identification of peptides in sequence **data**bases by

peptide sequence tags. Anal. Chem. 66, 4390-4399.

Maskos,U. and Southern,E.M. (1992). Oligonucleotide hybridizations on glass supports: a novel

linker for oligonucleotides in situ. Nucleic Acids Res. 20, 1679-1684.

Maskos,U. and Southern,E.M. (1993). A novel method for the analysis of multiple sequence

variants by hybridisation to oligonucleotide arrays. Nucleic Acids Res. 21, 2267-2268.

McAdams,H.H. and Shapiro,L. (1995). Circuit simulation of genetic networks. Science 269, 650-

656.

Nelson,R.W. (1997). The use of bioreactive probes in protein characterization. Mass Spectrom. 16,

353-376.

O'Farrell,P.H. (1975). High-resolution two-dimensional electrophoresis of proteins. J. Biol. Chem.

250, 4007-4021.

Ohyama,H., Zhang,X., Kohno,Y., Alevizos,I., Posner,M., Wong,D.T., and Todd,R. (2000). Laser

capture microdissection-generated target sample for high-density oligonucleotide array

hybridization. Biotechniques 29, 530-536.

Okubo,K., Hori,N., Matoba,R., Niiyama,T., Fukushima,A., Kojima,Y., and Matsubara,K. (1992).

Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene

expression. Nature Genetics 2, 173-179.

Pandey,A. and Mann,M. (2000). Proteomics to study genes and genomes. Nature 45, 837-846.

Pease,A.C., Solas,D., Sulliuvan,E.J., Cronin,M.T., Holmes,C.P., and Fodor,S.P.A. (1994). Lightgenerated

oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91,

5022-5026.

Perou,C.M., Sorlie,T., Eisen,M.B., van de,R.M., Jeffrey,S.S., Rees,C.A., Pollack,J.R., Ross,D.T.,

Johnsen,H., Akslen,L.A., Fluge,O., Pergamenschikov,A., Williams,C., Zhu,S.X., Lonning,P.E.,

Borresen-Dale,A.L., Brown,P.O., and Botstein,D. (2000). Molecular portraits of human breast

tumours. Nature 406, 747-752.

Phillips,J. and Eberwine,J.H. (1996). Antisense mRNA amplification: a linear amplification method

for analyzing the mRNA population **from** single living cells. Methods 10, 283-286.

Press,W.H., Flannery,B.P., Teukolsky,S.A., and Vetterling,W.T. (1992). Numerical Recipes.

(Cambridge: Cambridge University Press).

Ptashne,M. (1967). Isolation of the λ phage repressor. Proc. Natl. Acad. Sci. USA 57, 306-313.

Ptashne,M. (1992). A Genetic Switch: Phage λ and Higher Organism. (Cambridge: Cell Press and

Blackwell Scientific Publications).

6 References 79

Ritossa,F., Malva,C., Bonicelli,E., Graziani,F., and Polito,L. (1971). The first steps of

magnification of DNA complementary to ribosomal RNA in Drosophila melanogaster. Proc. Natl.

Acad. Sci. USA 68, 1580-1584.

Roth,F.P., Hughes,J.D., Estep,P.W., and Church,G.M. (1998). Finding DNA regulatory motifs

within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. [see

comments]. Nature Biotechnology 16, 939-945.

Sachs,L. (1999). Angewandte Statistik. (Berlin: Springer).

Scheffe,H. (1959). The Analysis of Variance. (New York: Wiley & Sons).

Schena,M., Shalon,D., Davis,R.W., and Brown,P.O. (1995). Quantitative monitoring of gene

expression patterns with a complementary DNA microarray. Science 270, 467-470.

Schena,M., Shalon,D., Heller,R., Chai,A., Brown,P.O., and Davis,R.W. (1996). Parallel human

genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of the

National Academy of Sciences of the United States of America 93, 10614-10619.

Schulze,A. and Downward,J. (2001). Navigating gene expression using microarrays--a technology

review. Nature Cell Biology 3, E190-E195.

Shchepinov,M.S., Case-Green,S.C., and Southern,E.M. (1997). Steric factors influencing

hybridisation of nucleic acids to oligonucleotide arrays. Nucleic Acids Res. 22, 1365-1367.

Shevchenko,A., Jensen,O.N., Podtelejnikov,A.V., Sagliocco,F., Wilm,M., Vorm,O., Mortensen,P.,

Shevchenko,A., Boucherie,H., and Mann,M. (1996). Linking genome and proteome by mass

spectrometry: large scale identification of yeat proteins **from** two dimensional gels. Proc. Natl.

Acad. Sci. USA 93, 14440-14445.

Shevchenko,A., Loboda,A., Shevchenko,A., Ens,W., and Standing,K.G. (2000). MALDI

quadrupole **time**-of-flight mass spectrometry: powerful tool for proteomic research. Anal. Chem.

72, 2132-2141.

Southern,E.M. (1975). Detection of specific sequences among DNA fragments seperated by gel

electrophoresis. J. Mol. Biol. 98 , 503-517.

Southern,E.M., Maskos,U., and Elder,R. (1992). Hybridization with oligonucleotide arrays.

Genomics 13, 1008-1017.

Southern,E.M., Mir,K., and Shchepinov,M.S. (1999). Moelcular interactions on microarrays. Nature

Genetics 21, 5-9.

Spellman,P.T., Sherlock,G., Zhang,M.Q., Iyer,V.R., Anders,K., Eisen,M.B., Brown,P.O.,

Botstein,D., and Futcher,B. (1998). Comprehensive identification of cell cycle-regulated genes of

the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9,

3273-3297.

Steemers,F.J., Ferguson,J.A., and Walt,D.R. (2000). Screening unlabeled DNA targets with

randomly ordered fiber-optic gene arrays. Biotechnology 18, 91-94.

Steinhausen,D. and Langer,K. (1977). Clusteranalyse. (Berlin: Walter de Gruyter).

6 References 80

Stollberg,J., Urschitz,J., Urban,Z., and Boyd,C.D. (2000). A quantitative evaluation of SAGE.

Genome Research 10, 1241-1248.

Stuart,A. and Keith,J. (1987). ORDER-STATISTICS. In Kendall's Advanced Theory of Statistics,

Ch. Griffin & Co. Ltd. London), p. 445.

Tavazoie,S., Hughes,J.D., Campbell,M.J., Cho,R.J., and Church,G.M. (1999). Systematic

determination of genetic network architecture. [see comments]. Nature Genetics 22, 281-285.

Thieffry,D. and Thomas,R. (1995). Dynamical behaviour of biological regulatory networks--II.

Immunity control in bacteriophage lambda. Bulletin of Mathematical Biology 57, 277-297.

Thomas,R., Gathoye,A.M., and Lambert,L. (1976). A complex control circuit. Regulation of

immunity in temperate bacteriophages. European Journal of Biochemistry 71, 211-227.

Thomas,R., Thieffry,D., and Kaufman,M. (1995). Dynamical behaviour of biological regulatory

networks--I. Biological role of feedback loops and practical use of the concept of the loopcharacteristic

state. Bulletin of Mathematical Biology 57, 247-276.

Toronen,P., Kolehmainen,M., Wong,G., and Castren,E. (1999). Analysis of gene expression **data**

using self-organizing maps. FEBS Letters 451, 142-146.

Velculescu,V.E., Madden,S.L., Zhang,L., Lash,A.E., Yu,J., Rago,C., Lal,A., Wang,C.,

Beaudry,C.J., Ciriello,K.M., Cook,B.P., Dufault,M.R., Ferguson,A.T., Gao,Y., He,T.C.,

Hermeking,H., Hiraldo,S.K., Hwang,P.M., Lopez,M.A., Luderer,H.F., Mathews,B.,

Petroziello,J.M., Polyak,K., Zawel,L., Zhang,W., Zhang,X., Zhou,W., Haluska,F.G., Jen,J.,

Sukumar,S., Landes,G.M., Riggins,G.J., Vogelstein,B., and Kinzler,K.W. (1999). Analyis of the

human transcriptome. Nature Genetics 23, 387-388.

Velculescu,V.E., Zhang,L., Vogelstein,B., and Kinzler,K.W. (1995). Serial analysis of gene

expression. Science 270, 484-487.

Velculescu,V.E., Zhang,L., Zhou,W., Vogelstein,J., Basrai,M.A., Basset,D.E., Hieter,P.,

Vogelstein,B., and Kinzler,K.W. (1997). Characterization of the yeast transcriptome. Cell 88, 243-

251.

Wen,X., Fuhrman,S., Michaels,G.S., Carr,D.B., Smith,S., Barker,J.L., and Somogyi,R. (1998).

Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl.

Acad. Sci. USA 95, 334-339.

Wilkins,M.R., Pasquali,C., Appel,R.D., Ou,K., Golaz,O., Sanchez,J.C., Yan,J.X., Gooley,A.A.,

Walsh,B.J., Hughes,G., Humphery-Smith,I., Williams,K.L., and Hochstrasser,D.F. (1996). From

proteins to proteoms: large scale protein identification by two dimensional electrophoresis and

amino acid analysis. Biotechnology 14, 61-65.

Wilkins,M.R., Williams,K.L., Apple,R.D., and Hochstrasser,D.F. (1997). Proteome Research: New

Frontiers in Functional Genomics. (Berlin: Springer).

Williams,W.T. (1971). Computer J. 14, 162-165.

Wilm,M. and Mann,M. (1996). Analytical properties of the nanoelectrospray source. Anal. Chem.

68, 1-8.

6 References 81

Wilm,M., Shevchenko,A., Houthaeve,T., Breit,S., Schweigerer,L., Fotsis,T., and Mann,M. (1996).

Femtomole sequencing of proteins **from** polyacrylamid gels by nano electrospray mass

spectrometry. Nature 379, 466-469.

Winer,B.J. (1971a). STATISTICAL PRINCIPLES IN EXPERIMENTAL DESIGN., B.J.Winer,

N.Garmezy, R.L.Solomon, L.V.Jones, and H.W.Stevenson, eds. (New York: McGraw-Hill Book

Company).

Winer,B.J. (1971b). Design and Analysis of Single-Factor Experiments. (New York: McGraw-Hill

Book Company).

Wodicka,L., Dong,H., Mittmann,M., Ho,M.H., and Lockhart,D.J. (1997). Genome-wide expression

monitoring in Saccharomyces cerevisiae. Nature Biotechnology 15, 1359-1367.

Yates,J.R. (1998). Database searching using mass spectrometry **data**. Electrophoresis 19, 893-900.

Yates,J.R. (2000). Mass spectrometry. From genomics to proteomics. Trends Genet. 16, 5-9.

Zhang,L., Zhou,W., Velculescu,V.E., Kern,S., Hruban,R., Hamilton,S., Vogelstein,B., and

Kinzler,K.W. (1997). Gene expression profiles in normal and cancer cells. Science 276, 1268-1272.

7 Appendices

7.1 Affymetrix analysis metrics

7 Appendices 82

Affymetrix's array analysis metrics have been evolving over the last few years but the basics of the

methodology have remained unchanged. However, at the **time** of this writing, a new statistical

algorithm suite has been announced. Any work prior to 2002 that used Affymetrix's algorithm suite

to extract **data** **from** chip hybridization experiments, did this by the means of the "old" empirical

algorithm suite. Both algorithm suites define metrics that are based on the corrected intensity values

of the probe cells. Before the metrics can be computed the signal intensity is adjusted by subtraction

of the local background (see 7.1.1.1) and a correction for measurement noise is made (see 7.1.1.2).

The algorithms for the correction of the probe cell intensity remained basically unaltered during the

transition **from** the old, empirical algorithm suite to the new, statistical algorithm suite (Affymetrix

support helpline, personal communication).

7.1.1 Probe cell intensity correction

7.1.1.1 Background correction

Affymetrix computes the intensity of a probe cell as follows. A standard probe cell is represented

by eight **time**s eight pixels. The bordering pixels are excluded and the intensity distribution of the

remaining pixels is computed. The value that is associated to the 75 percentile of this distribution is

used as the AverageIntensity of the probe cell. This value is corrected by subtracting a background

noise term. To compute the background the image is divided into a number of sectors (by default 4

horizontal, 4 vertical, giving 16 sectors). For every sector the lowest two percent (for a standard

chip this will result in 430 background cells) of the probe cell intensity values are averaged; the

resulting value is the SectorBackground.

If the SectorBackground is simply subtracted **from** the AverageIntensity of every probe cell in the

corresponding sector, abrupt changes between different sectors may occur as a direct consequence

of this. To avoid this a smoothing adjustment is made during background correction. The distance

of the probe cell at position (x, y) to all sector centers is computed. The background for a probe cell

is then computed as a distance-square weighted sum with a contribution **from** every

SectorBackground. The bigger the distance of a sector center to the actual probe cell position the

smaller the contribution of the SectorBackground to the weighted sum. The weighted sum is then

subtracted **from** the intensity value. In addition, the resulting value is corrected by an estimate of the

measurement noise.

7.1.1.2 Noise correction

The background probe cells are used to compute a measurement noise term. This term is intended to

correct for small variations of the digitized signal obtained by the scanner as it samples the chip

surface. The SectorNoise is computed as:

SectorNoise

1

= ⋅

N

N

i=

1

stdev

where N is the number of background cells in the corresponding sector, stdev i is the standard

deviation of the intensities of the pixels in background cell i, and pixel i is the total number of

pixels in background cell i.

To provide a smooth transition between the sectors, distance-square smoothing is performed (as

done for the SectorBackground). The weighted sum obtained by smoothing is then subtracted **from**

the intensity value of a probe cell to give the corrected intensity value.

i

pixel

i

,

7.1.2 Empirical metrics

7.1.2.1 The 'AbsoluteCall'

7 Appendices: Affymetrix analysis metrics 83

Affymetrix terms the algorithms to evaluate the probe sets of a single chip experiment the 'absolute

analysis algorithms'. Several metrics are combined to give the AbsoluteCall: a qualitative metric of

gene expression that falls in one of three categories; a gene is either present, marginal or absent.

The AbsoluteCall is computed as follows. Let PM designate the corrected signal intensity of a

perfect match probe cell, and MM the corrected signal intensity of the corresponding mismatch

probe cell. Then a probe pair is positive if:

A probe pair is negative if:

PM − MM ≥ DT AND PM MM ≥ RT .

MM − PM ≥ DT AND MM PM ≥ RT .

Here DT and ST designate the difference threshold and the ratio threshold respectively. These

thresholds have default values that were established experimentally by Affymetrix, but the user of

the software may modify them to control the stringency of the analysis. The difference threshold is

related to the SectorNoise by

DT = SectorNoise⋅

DT ,

multiplier

thus the difference threshold can be set indirectly via the DTmultiplier.

The PositiveFraction is a metric computed directly **from** the number of positive probe pairs,

PosPair, and the total number of probe pairs, TotPair, by

PositiveFr action = PosPair / TotPair .

The ratio of positive and negative probe pairs, PosNegR, is calculated as

PosNegR = PosPair NegPair ,

where NegPair is the number of negative probe pairs.

A last metric that is used in the calculation of the AbsoluteCall is the logarithm of the ratio of the

average perfect match and mismatch probe signals LogAvgR,

PairNum 1

PM i

LogAvgR = 10 ⋅ ⋅ log .

PairNum i= 1 MM i

Here PairNum is the number of probe pairs used in the computation. Probe pairs can be excluded

**from** the calculation by the use of certain algorithms. Affymetrix calls this process Superscoring.

We do not consider Superscoring in detail, as it is not important in our setting.

Each of the three metrics discussed (PositiveFraction, PosNegR, LogAvgR) make weighted

contributions to the AbsoluteCall of a given probe set. A minimal and a maximal threshold value

define the category marginal as a possible result of the AbsoluteCall for each of the three metrics.

The defaults for these thresholds were established experimentally by Affymetrix (but can be

modified by the user). The AbsoluteCall is given by the following decision rules. If one of the three

metrics exceeds the maximal threshold, and at least one of the metrics exceeds the minimal

threshold, the gene represented by the probe set will be called present. If one of the metrics exceeds

the maximal threshold and one is below the minimal threshold a gene is called marginal. If all

metrics are below the maximal threshold and one is below the minimal threshold a gene will be

called absent.

7.1.2.2 The 'AverageDifference'

7 Appendices: Affymetrix analysis metrics 84

The AverageDifference is the metric we use as an input to **Feature** **eXtraction**. The

AverageDifference serves as a quantitative estimate of the level of the expression of a transcript. As

its name indicates it is defined as the average difference between the perfect match and their

corresponding mismatch cells of a probe set:

N 1

AverageDifference

= ( PM i − MM i ) .

N i=

1

Note that the average difference may be negative if the mismatch signal dominates. This will indeed

happen frequently in real experiments, especially if the transcript level is low relative to the

complex background of other transcripts in a mRNA preparation.

7.1.2.3 Comparison Analysis Algorithms

Differences between the overall signal intensity of microarrays present a problem if the results of

several hybridization experiments are to be compared. To deal with these differences the overall

signal intensity of the arrays is scaled to a target intensity. The overall signal intensity of an array is

calculated as the average of every probe set's AverageDifference value, after excluding the lowest

and highest 2 percent of the values.

Once the signal intensities are scaled to a target intensity, several metrics are defined to compare the

results **from** different hybridizations. In analogy to the AbsoluteCall, the DifferenceCall is a

qualitative metric that assigns one of several categories to the difference between corresponding

probe sets **from** two different experiments. The DifferenceCall decides whether the expression of a

transcript increased, decreased, marginally increased, marginally decreased or did not change **from**

one experiment to the other. The DifferenceCall is defined in a similar way to the AbsoluteCall;

four different metrics contribute to the final decision. Just as the AbsoluteCall, the DifferenceCall is

not used in the work presented here, and we will not discuss the definition of the DifferenceCall in

detail.

The FoldChange is used as the metric to evaluate the difference between the probe cell intensities

**from** two different hybridizations in a quantitative manner. The FoldChange is basically the ratio of

AverageDifference of two corresponding scaled probe sets, say A and B. Suppose the change of the

expression value of A with respect to the expression level of B is to be evaluated. If A is greater than

B the FoldChange is computed as

A

FoldChange = ,

max( B,

c ⋅ SectorNoise)

if B is greater than A the FoldChange is computed as

B

FoldChange = −1

⋅

.

max(

A,

c ⋅ SectorNoise)

The multiplication by -1 indicates that A is down regulated with respect to B. The SectorNoise is

taken into account in the calculation of the FoldChange. If the AverageDifference of a probe set is

smaller than c ⋅ SectorNoise

the fold change will represent the up- or down-regulation with respect

to the SectorNoise; this is a situation where the biological meaning of the fold change is not clear.

This situation will frequently occur if a transcript is absent given one experimental condition, but it

present given the other experimental condition. The constant factor c has a default value that was

established experimentally by Affymetrix. Depending on the type of the array in use it is at least ~2.

Approaches to gene expression analysis traditionally use the FoldChange or some similar metric

that expresses the ratio of the signal strength **from** two different experimental conditions. In contrast

to this **Feature** **eXtraction** uses the AverageDifference value directly.

7.1.3 Statistical metrics

7 Appendices: Affymetrix analysis metrics 85

The "new", statistical algorithm suite provides analogs of the metrics defined in the empirical

algorithm suite. Table 7-1 shows the terminology for the statistical metrics and their corresponding

empirical metrics.

Table 7-1 Terminology comparison of the empirical algorithm metrics and the statistical algorithm metrics.

Empirical algorithms Statistical algorithms

AverageDifference Signal

AbsoluteCall Detection

Difference Call Change

FoldChange SignalLogRatio

The metrics 'Signal' and 'Detection' are used in the analysis of a single hybridization experiment,

while the metrics 'Change' and 'SignalLogRatio' allow for the comparison of hybridization

experiments. At the **time** of this writing we have no access to a detailed description of the statistical

algorithms. Affymetrix published two technical notes that can be found at www.affymetrix.com;

these technical notes were used to compile the short overview about the statistical metrics given

here (Affymetrix, 2001; Affymetrix, 2002).

7.1.3.1 Single array analysis

The Detection algorithm uses probe pair intensities to generate a Detection significance value and

assign a qualitative present, marginal or absent call. A discrimination score for every probe pair is

calculated to be used in subsequent statistic. The discrimination score of a probe pair is defined as

the intensity difference relative to the overall intensity of a probe pair

PM − MM

Dscore = .

PM + MM

Probe pairs are excluded **from** the analysis if one of its probe cells reaches the saturation limit of

signal intensity.

Wilocoxon's rank test is applied to compute the significance level of the discrimination scores of a

probe set against the null hypothesis that the perfect match signal is not different **from** the mismatch

signal. A high significance level indicates that there is no difference between the perfect match and

the mismatch signal of a probe set, while a low significance level indicates the opposite.

Two significance thresholds, a small, Tlow, and a greater value, Tup, are used to make the absent,

marginal, present call. If the significance level is below Tlow the probe set is called present. If the

significance level is in the interval Tlow ≤ SL ≤ Tup

the probe cell is called marginal. And if the

significance level is greater than Tup the probe cell is called absent.

The quantitative estimate for the expression strength of a transcript is obtained by the use of

Tukey's Biweight method. More information on this statistical technique can, for example, be found

in "Understanding Robust and Exploratory Data Analysis" (Hoaglin et al., 1983). The estimate

given by this technique is associated to a confidence interval that is based on the variation

encountered across the probe pairs of a probe cell. Every probe pair makes a weighted contribution

to the biweight estimate, the so-called RealSignal. The RealSignal of a probe pair is calculated as

the difference of the perfect match intensity minus the stray intensity, where the stray intensity is

based on the intensity of a probe pairs mismatch cell. If the mismatch intensities are all smaller than

the perfect match intensities of a probe set they are directly used as the stray intensities. If just a few

mismatch intensities are below the perfect match intensities, an estimate that is based on the

biweight mean of the perfect match to mismatch ratio is used as the stray intensity for these probe

7 Appendices: Affymetrix analysis metrics 86

pairs. If most of the mismatch intensities are greater than the perfect match intensities, a threshold

value that is smaller than the perfect match intensity is used as the stray intensity of a probe pair.

7.1.3.2 Comparison analysis

To allow for comparison between several microarray experiments the overall intensity is scaled to a

target intensity, just as in the case of the empirical algorithm suite. A perturbation factor is applied

after the scaling. Two additional scaling factors are computed **from** this perturbation factor. We

have no detailed information regarding how the perturbation factor is used to compute the

additional scaling factors, but one of the additional scaling factors will be slightly lower and the

other will be slightly higher than the original scaling factor. In the comparison of signal intensities

**from** two different experiments one of these three scaling factors is used for each signal intensity.

The scaling factors is chosen to minimize the difference between the two signal intensities. This ad

hoc approach is intended to account for the variation detected between equivalent, separate

microarray experiments. The default value of the perturbation factor (and the sensible range for it)

has been established experimentally by Affymetrix.

The Change call gives a qualitative estimate as to whether the expression of a transcript increased,

decreased, marginally increased, marginally decreased or did not change **from** one experiment to

the other. The Change is also based on Wilcoxon's rank test. Thus the resulting call is associated to

a significance level.

The SignalLogRatio estimates the magnitude and the direction of a change of a transcript. The

SignalLogRatio is computed **from** the mean of the base-2 logarithm of the ratios of the

corresponding probe pair intensities **from** two different arrays. As with the Signal, Tukey's Biweight

method is used in the calculation of the SignalLogRatio, thus confidence limits can be given. Note

that with this approach negative values as given by the AverageDifference are not longer possible.

7.2 Grouping of the inflammatory bowel disease **data**

The inflammatory bowel disease **data** provides **time** **series** with four **time** points (see 4.2.1). These

**time** **series** were analyzed with **Feature** **eXtraction** using a significance threshold of p=0.01. The

different FX-patterns that were obtained are presented in the following tables.

7.2.1 Low-resolution pattern

Table 7-2 Grouping of the inflammatory bowel disease **data** by the low-resolution pattern. **Feature** **eXtraction** is

based on Scheffe's S-method or the Tukey A test.

Pattern Group size (Schefffe) Group size (Tukey A)

+ 210 240

++ 1 1

+- 5 6

- 297 326

-+ 6 9

-+- 2 2

-- 1 3

= 3418 3353

7.2.2 High-resolution pattern

7 Appendices: Grouping of the inflammatory bowel disease **data** 87

Table 7-3 Grouping of the inflammatory bowel disease **data** by the low-resolution pattern. **Feature** **eXtraction** is

based on Scheffe's S-method or the Tukey A test.

7.2.3 Extended low-resolution pattern

Pattern Group size (Schefffe) Group size (Tukey A)

0+1-2 2 2

0+1-2=3 3 3

0+1=2 1 1

0+1=2+3 1 1

0+1=3 142 158

0+2 1 1

0+2=3 31 40

0+3 20 20

0-1 3 3

0-1+2 1 1

0-1+2-3 2 2

0-1+2=3 3 5

0-1+3 1 2

0-1=2 6 7

0-1=2-3 1 3

0-1=3 217 247

0-2 5 4

0-2=3 13 14

0-3 41 36

0=1 63 63

0=1+2-3 0 1

0=1+2=3 6 8

0=1+3 3 5

0=1-2+3 1 1

0=1-2=3 2 2

0=2 332 332

0=2+3 5 6

0=2-3 2 5

0=3 2677 2612

1+2 1 1

1-2 1 1

1-2=3 4 4

1-3 1 1

1=2 85 85

1=2-3 1 1

1=3 193 193

2-3 1 1

2=3 68 68

Table 7-4 Grouping of the inflammatory bowel disease **data** by the level-extended low-resolution pattern.

Pattern Group size (Scheffe)

L0+L1 210

L0+L1+L1 1

L0+L1-L0 4

L0+L2-L1 1

L1-L0 297

L1-L0+L1 6

L1-L0+L1-L0 2

L1-L0-L0 1

= 3418

7 Appendices: Grouping of the inflammatory bowel disease **data** 88

7.2.4 Extended high-resolution pattern

Table 7-5 Grouping of the inflammatory bowel disease **data** by the level-extended high-resolution pattern.

Pattern Group size (Scheffe)

L0_0+1_L1_1-_L0_2 2

L0_0+_L1_1-_L0_2=3 2

L0_0+_L1_1=2 1

L0_0+_L1_1=2+_L1_3 1

L0_0+_L1_1=3 142

L0_0+_L1_2 1

L0_0+_L1_2=3 31

L0_0+_L1_3 20

L0_0+_L2_1-_L1_2=3 1

L0_0=1 63

L0_0=1+_L1_2=3 6

L0_0=1+_L1_3 3

L0_0=2 332

L0_0=2+_L1_3 5

L0_0=3 2677

L0_1+_L1_2 1

L0_1=2 85

L0_1=3 193

L0_2=3 68

L1_0-_L0_1 3

L1_0-_L0_1+_L1_2 1

L1_0-_L0_1+_L1_2-_L0_3 2

L1_0-L0_1+_L1_2=3 3

L1_0-_L0_1+_L1_3 1

L1_0-_L0_1=2 6

L1_0-L0_1=2-_L0_3 1

L1_0-_L0_1=3 217

L1_0-_L0_2 5

L1_0-_L0_2=3 13

L1_0-_L0_3 41

L1_0=1-_L0_2+_L1_3 1

L1_0=1-_L0_2=3 2

L1_0=2-L0_3 2

L1_1-_L0_2 1

L1_1-_L0_2=3 4

L1_1-_L0_3 1

L1_1=2-_L0_3 1

L1_2-_L0_3 1

List of figures 89

List of figures

Figure 1-1 Schematic illustration of the phage lambda lysis-lysogeny decision circuit. .................................................... 8

Figure 1-2 Hypothetical **time** **series** experiment.. ............................................................................................................. 10

Figure 1-3 A complete perturbation experiment for the hypothetical system A, B, C. ..................................................... 11

Figure 2-1 Schematic overview of probe array and target preparation for cDNA and oligonucleotide microarrays........ 14

Figure 2-2 Data extraction **from** a cDNA array experiment using fluorescence labeling................................................. 17

Figure 2-3 Probe set for the detection of a mRNA reference sequence on an Affymetrix oligonucleotide array............. 18

Figure 2-4 Tethered oligonucleotides with linkers of different length. ............................................................................ 19

Figure 2-5 Light-directed oligonucleotide synthesis......................................................................................................... 20

Figure 2-6 Schematic illustration of SAGE. .................................................................................................................... 26

Figure 3-2 Oligonucleotide array experiments. ................................................................................................................ 27

Figure 3-3 Illustration of the ad-hoc threshold approach for differential expression. ...................................................... 29

Figure 3-4 Idealized single-gene expression pattern for the AML-ALL classifier........................................................... 30

Figure 3-5 Three **time** **series** **from** the inflammatory bowel disease **data** (see 4.2.11.1)................................................... 33

Figure 3-6 Basic qualitative features of a **time** **series** running over ten **time** points, 0 … 9. ............................................ 34

Figure 4-1 The probability density function of the central (red) and non-central (blue) F-ratio distribution. .................. 42

Figure 4-2 Example **time** **series** running over 10 **time** points 0…9. ................................................................................. 50

Figure 4-3 The **Feature** **eXtraction** matrix of the example **time** **series** Figure 4-2 as generated by the S-method. ........... 51

Figure 4-4 The **Feature** **eXtraction** matrix of the example **time** **series** Figure 4-2 as generated by the Tukey A test. ...... 51

Figure 4-5 An example raw pattern. The **time** points are shown as the first row, the raw pattern is the second row....... 52

Figure 4-6 The raw pattern algorithm............................................................................................................................... 52

Figure 4-7 Illustration of the raw pattern algorithm using the FX-matrix in Figure 4-3 (S-method) and a significance

threshold of p=0.01................................................................................................................................................... 53

Figure 4-8 Construction of the low-resolution pattern **from** the raw pattern. ................................................................... 54

Figure 4-9 Construction of the high-resolution pattern **from** the raw pattern. .................................................................. 55

Figure 4-10 The candidate signal levels (boxed **time** points) determined for the example **time** **series** (Figure 4-2) using

the S-method and a significance threshold of p=0.01............................................................................................... 56

Figure 4-11 The hierarchical agglomerative algorithm used for the identification of discrete signal levels **from** candidate

signal levels. See text for additional information...................................................................................................... 56

Figure 4-12 The signal levels (boxed) identified in the example **time** **series** (Figure 4-2) by candidate level clustering

using a significance threshold of p=0.01 to define the maximal distance................................................................. 57

Figure 4-13 Power of the S-method for the analysis of a **time** **series** with four **time** points and four parallel

measurements and a significance threshold of p=0.01. ............................................................................................ 58

Figure 4-14 Two **time** **series** **from** the inflammatory bowel disease **data**. ........................................................................ 61

Figure 4-15 Time **series** 100061_f_at of the inflammatory bowel disease **data**................................................................ 61

Figure 4-16 Power of the S-method in case for the MIF-experiment. .............................................................................. 63

Figure 4-17 The **time** **series** for TNF-α in case of MIF +/+ - and MIF -/- -mice. .................................................................... 64

Figure 5-1 Power of the S-method for the analysis of a **time** **series** with four **time** points and four parallel measurements.

Two different significance levels are shown............................................................................................................. 68

Figure 5-2 Power of the S-method for the analysis of a **time** **series** with four **time** points............................................... 69

Figure 5-3 Power of the S-method for the analysis of a **time** **series** with 4 parallel measurements.................................. 69

Figure 5-4 Power of the S-method for the analysis of a **time** **series** with four **time** points and four parallel measurements.

Different significance levels are shown.................................................................................................................... 70

List of tables 90

List of tables

Table 4-1 Basic quantities used in ANOVA. The table is adapted **from** (Ledermann, 1980)...........................................37

Table 4-2 Conventional interpretation of significance levels as indicators for the truth of the null hypothesis H ........... 40

Table 4-3 The group means of a **time** **series** experiment with k **time** points arranged in increasing order. ...................... 44

Table 4-4 The low-resolution pattern groups of the inflammatory bowel disease **data**. ................................................... 60

Table 4-5 The level-extended low-resolution pattern group sizes of the inflammatory bowel disease **data**..................... 60

Table 4-6 Functional assignment of the 17 genes in the 3 smallest low-resolution groups (according to Tukey A)........ 62

Table 4-7 Low-resolution patterns (based on Tukey A) for the MIF +/+ and MIF -/- **time** **series**......................................... 64

Table 7-1 Terminology comparison of the empirical algorithm metrics and the statistical algorithm metrics................. 85

Table 7-2 Grouping of the inflammatory bowel disease **data** by the low-resolution pattern. ........................................... 86

Table 7-3 Grouping of the inflammatory bowel disease **data** by the low-resolution pattern. ........................................... 87

Table 7-4 Grouping of the inflammatory bowel disease **data** by the level-extended low-resolution pattern.................... 87

Table 7-5 Grouping of the inflammatory bowel disease **data** by the level-extended high-resolution pattern................... 88

Abbreviations 91

Abbreviations

ANOVA: ANalysis Of VAriance

cDNA: complementary DNA

DNA: DesoxyriboNucleicAcid

EST: Expressed Sequence Tag

FCS: Fetal Calf Serum

FX: **Feature** Extraction

HBBS: Hank's Balanced Salt Solution

MIF: macrophage Migration Inhibitory Factor

mRNA: messenger RNA

PCR: Polymerase Chain Reaction

RNA: RiboNucleicAcid

RPMI: Rosswell Park Memorial Institute (this washing solution was invented at the named institute)

SAGE: Serial Analysis of Gene Expression

SSJ: Shortest Significant Jump

TNF-α: Tumor Necrosis Factor α