MOLECULAR IMAGING IN BIOINFORMATICS - Pattern Recognition ...
MOLECULAR IMAGING IN BIOINFORMATICS - Pattern Recognition ...
MOLECULAR IMAGING IN BIOINFORMATICS - Pattern Recognition ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Literature Study<br />
<strong>MOLECULAR</strong> <strong>IMAG<strong>IN</strong>G</strong><br />
<strong>IN</strong><br />
BIO<strong>IN</strong>FORMATICS<br />
Exploring Interdisciplinary Connections<br />
February 11, 2008<br />
Bioinformatics<br />
Information and Communication Theory Group<br />
Delft Technical University<br />
Laboratory for Clinical and Experimental Image Processing (LKEB)<br />
Radiology<br />
Leiden University Medical Center<br />
Author:<br />
Supervisors:<br />
Martin Wildeman<br />
Prof. dr. ir.M. J. T. Reinders<br />
1047973 Dr. ir. B. P. F. Lelieveldt
Contents<br />
1 Introduction 7<br />
2 Molecular Imaging 9<br />
2.1 About Molecular Imaging . . . . . . . . . . . . . . . . . . . . . . . . 9<br />
2.2 Novel contrast mechanisms . . . . . . . . . . . . . . . . . . . . . . . 9<br />
2.2.1 About Reporter Genes . . . . . . . . . . . . . . . . . . . . . 10<br />
2.2.2 Direct and Indirect Protein Detection . . . . . . . . . . . . . 11<br />
2.2.3 Reporter Gene Applications . . . . . . . . . . . . . . . . . . 12<br />
2.2.4 Current Limitations on Reporter Genes . . . . . . . . . . . . 13<br />
2.3 Molecular Imaging Modalities . . . . . . . . . . . . . . . . . . . . . 15<br />
2.3.1 Nuclear Imaging . . . . . . . . . . . . . . . . . . . . . . . . 16<br />
2.3.2 Computed Tomography . . . . . . . . . . . . . . . . . . . . . 18<br />
2.3.3 Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . 18<br />
2.3.4 Optical Imaging . . . . . . . . . . . . . . . . . . . . . . . . 20<br />
2.3.5 Ultrasound Imaging . . . . . . . . . . . . . . . . . . . . . . 23<br />
2.4 Acquisition Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />
2.4.1 Quantification of BLT and FMT . . . . . . . . . . . . . . . . 23<br />
2.4.2 Combining Information: Multi-modality fusion . . . . . . . . 25<br />
2.4.3 Combining Information: Follow Up Registration . . . . . . . 27<br />
2.4.4 Current Limitations in Molecular Imaging . . . . . . . . . . . 27<br />
3
3 Molecular Imaging as extra data source for model generation 29<br />
3.1 Acquisition of Spatiotemporal Gene Expression Data . . . . . . . . . 30<br />
3.2 Inferring a Quantitative Model using Spatiotemporal Protein Expression 32<br />
3.3 Quantitative vs. Qualitative Network Models . . . . . . . . . . . . . 34<br />
3.4 Modeling pathways using time series expression data, using conventional<br />
micro-array data . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />
3.5.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />
3.5.2 Creating models for whole body imaging data . . . . . . . . . 40<br />
4 Molecular Imaging as a means for hypothesis testing 45<br />
4.1 Gene Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />
4.2 Cell Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />
4.3 General signal detection and limitations . . . . . . . . . . . . . . . . 47<br />
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
5 Discussion 51<br />
5.1 Advantages of MI for the field of bioinformatics . . . . . . . . . . . . 51<br />
5.2 Current Issues and Challenges . . . . . . . . . . . . . . . . . . . . . 52<br />
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Abbreviations<br />
In this paper, a lot of abbreviations are used. For readability, a list of abbreviations is<br />
listed here:<br />
• AFP - Auto Fluorescent Protein<br />
• BLI - Bioluminescence Imaging<br />
• BLT - Bioluminescence Tomography<br />
• BRET - Bioluminescence Resonance Energy Transfer<br />
• (C)CCD - (Cooled) Charge-coupeld Device<br />
• CRET - Chemoluminesce Resonance Energy Transfer<br />
• CT - Computed Tomography<br />
• (D)BN - (Dynamic) Bayesian Network<br />
• ES Cell - Embryonic Stem cell<br />
• FMI - Fluorescence Molecular Imaging<br />
• FMT - Fluorescence Molecular Tomography<br />
• FRET - Fluorescence Resonance Energy Transfer<br />
• GOI - Gene of interest<br />
• GFP - Green Fluorescent Protein<br />
• MI - Molecular Imaging<br />
• MRI - Magnetic Resonance Imaging<br />
• NMR - Nuclear Magnetic Resonance<br />
• PET - Positron Emission Tomography<br />
• SNR - Signal to Noise Ratio<br />
• SPECT - Single Photon Emission Computed Tomography<br />
• WT - Wild Type<br />
• YAC - Yeast Artificial Chromosome<br />
5
CHAPTER 1<br />
Introduction<br />
In this literature study, results are presented of research that was done to identify possible<br />
connections between two fields of research; bioinformatics and molecular imaging.<br />
To be able to study potential connections, the possibilities, limitations and pitfalls of<br />
both fields were studied. Existing techniques of both fields were then translated and<br />
interpreted to possible connections to the other fields.<br />
To be able to study the two fields, it is first important to give a definition of both fields<br />
as how they will be used in this paper.<br />
Firstly, the term bioinformatics in this study has been narrowed down to the definition<br />
of computational biology, as given by the NIH: Computational Biology is “the<br />
development and application of data-analytical and theoretical methods, mathematical<br />
modeling and computational simulation techniques to the study of biological, behavioral,<br />
and social systems” [1].<br />
Secondly, the term molecular imaging in this study is defined as ”the in vivo characterization<br />
and measurement of biological processes at a cellular and molecular level in a<br />
noninvasive manner”. In this paper the term will mainly indicate to the field of small<br />
animal whole body molecular imaging.<br />
Recent developments in molecular imaging have made it possible to visualize gene<br />
expression in vivo. It has thereby become possible to acquire data sets that cover gene<br />
expression in time and in space. This new data could be useful for computational<br />
biology, but how it can be used is a topic of research. Also some analytical tools could<br />
be useful, to aid the research that is currently done with molecular imaging, and change<br />
qualitative interpretations of data that are mostly given nowadays, into statistical sound<br />
quantitative measurements.<br />
This paper is divided into five chapters, including this introduction. First an overview of<br />
background knowledge, needed to study possible connections between the two fields,<br />
is presented in Chapter 2. After the basics of biology and molecular imaging have been<br />
7
Chapter 1. Introduction<br />
covered, a study on existing techniques from computational biology is presented in<br />
Chapter 3, including possible applications to the field of molecular imaging. In Chapter<br />
4, a step into current visualizations in molecular imaging is covered, including a review<br />
on how statistical tests can be applied to these visualizations. In the last Chapter, a<br />
discussion will be presented were global concepts and challenges are presented.<br />
8 Martin Wildeman
CHAPTER 2<br />
Molecular Imaging<br />
2.1 About Molecular Imaging<br />
Molecular Imaging can be defined as the in vivo characterization and measurement of<br />
biological processes at a cellular and molecular level in a noninvasive manner. Molecular<br />
Imaging is a relatively new imaging paradigm that instead of looking at macroscopic<br />
physical processes, sheds light onto biological processes. This field of research has its<br />
roots in the field of nuclear medicine, where images are acquired with Positron Emission<br />
Tomography (PET), by using radio labeled tracers. These tracers are injected into<br />
patients to visualize components of interest. The main advantages of molecular imaging,<br />
compared to other imaging techniques such as cryosectioning, are that biological<br />
processes can be measured in the same animal throughout the whole process of study.<br />
This way, with follow up studies in time, it is certain that the same process is observed<br />
and studied and thus no correction due to differences in anatomy between organisms, is<br />
needed. Furthermore less animals are sacrificed, compared to invasive studies, which<br />
is an improvement from an ethical point of view.<br />
Two developments have made it possible for Molecular Imaging to emerge. Firstly new<br />
contrast agents have been developed, which make current modalities from medical<br />
imaging able to be used for detecting molecular processes. This will be covered in<br />
section 2.2. Secondly, imaging devices have been miniaturized, which allows for small<br />
animal research and thus introduces molecular imaging to the pre-clinical and research<br />
laboratories. This will be discussed in section 2.3.<br />
2.2 Novel contrast mechanisms<br />
With the advent of new specific contrast agents, the field of molecular imaging has<br />
boosted. Based on new, advanced biological insights it has become possible to con-<br />
9
Chapter 2. Molecular Imaging<br />
struct probes that bind to specific biomarkers. Biomarkers are proteins that are specific<br />
for some type of tissue or disease. Contrast agents can be fused to proteins directly.<br />
They can be fused to for instance monoclonal antibodies, to bind to specific receptors<br />
that are for example uniquely expressed in certain tissue cells. Also methods exist<br />
to encapsulate contrast agents in carrier proteins. In molecular imaging, specific<br />
molecules, cells or tissues are visualized by means of these contrast agents. To be able<br />
to do so, four basic criteria for these contrast agents always have to be met: The affinity<br />
of the molecular probe has to be high and specific enough, so it can discriminate between<br />
different cell types. The probe has to be able to cross all kinds of barriers, such<br />
as the blood-brain barrier, so it is diffused homogeneously throughout the body, or at<br />
least the ‘spread function’ of the diffusion has to be known, so it can be corrected for.<br />
The contrast agent needs the ability to be amplified and the acquisition devices must be<br />
sensitive enough to measure the low concentrations of the contrast agents [2].<br />
In the last decades it has become possible to visualize gene expression in vivo by the<br />
use of reporter genes. These reporter genes are in fact contrast enhancers for a specific<br />
modality. Reporter genes are used in nuclear imaging and optical imaging, but also<br />
techniques have been developed for magnetic resonance and ultrasound. These new<br />
contrast agents enables the study of gene expression in a spatiotemporal dimension<br />
which give an advance over the traditional use of micro-arrays, which are currently<br />
used for measuring gene expression, because micro-arrays only allow for temporal<br />
expression profiles. No spatial component is possible with micro-array measurements,<br />
because micro-arrays measure RNA concentrations in a solution, extracted from animal<br />
tissue, which basically gives an average expression level as a result. The only way to<br />
incorporate some qualitative spatial expression profile in micro-arrays, is to make use<br />
of sectioned tissue profiling [3]. This literature study will mainly focus on the topic of<br />
reporter gene expression and measurements in molecular imaging.<br />
2.2.1 About Reporter Genes<br />
The purpose of reporter genes is to make invisible gene expression visible. Also<br />
substrate-protein and protein-protein interactions or other molecular events that are<br />
normally not visible may become detectable in an indirect manner. When using reporter<br />
genes it is important to keep in mind that the genes that are detected are not the<br />
compound of interest, but that the measurements are expected to be directly correlated<br />
with these compounds. In this way information on non detectable processes can still<br />
be acquired. In Bright Field Microscopy and (Laser Scanning) Confocal Microscopy<br />
it already was possible to directly view gene expression by tagging proteins with auto<br />
fluorescent protein (AFP) genes. A lot of research has been done on these AFPs and<br />
currently a range of dyes with an emission wavelength between 500 and 950 nm is<br />
available.<br />
Another gene used as a reporter is found the North American firefly or Photinus Pyralis<br />
and it is called luciferase. Luciferase is able to produce light by catalyzing a chemical<br />
reaction with a substrate luciferin and ATP. Luciferase was first used as a reporter gene,<br />
for measuring the concentration of ATP in samples, by using spectroscopic experiments<br />
[4].<br />
Reporter genes can be used to report invisible genes. The way this is done, is that<br />
the reporter gene is expressed at the same time and rate as the gene of interest. The<br />
behavior of the reporter gene is then studied and the results are interpolated to the gene<br />
10 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.1: A. The transcription of a gene is regulated by its promoter. To this promoter all kinds<br />
of regulating transcription factors bind with a certain affinity. B If the same promoter is<br />
placed upstream of a reporter gene, then this reporter gene will be regulated by the<br />
same transcription factors as a gene of interest and thus in parallel.<br />
of interest. If a reporter gene is expressed, it is very likely that the gene of interest also<br />
is expressed, of course given that they both have the same promoter (region).<br />
Because reporter genes are heterologous, i.e. they do not occur in the host organism<br />
naturally, they can be toxic to the host carrying it, or in a less severe case affect biological<br />
processes, so that quantitative measurements are not reliable anymore. To minimize<br />
these effects, regulated gene expression is desirable. Alfke et al. gave a proof of concept<br />
where reporter genes were only synthesized at the times that measurements were<br />
needed [5].<br />
2.2.2 Direct and Indirect Protein Detection<br />
A reporter gene can be constructed by cutting the gene out of a source DNA, using<br />
restriction enzymes. If the same promoter as the gene of interest (GOI) is placed upstream<br />
of the reporter gene, the likely effect will be, that transcription of the reporter<br />
gene will be the same of that of the GOI, see Fig. 2.1. When placing a copy of the<br />
promoter upstream of the reporter gene, the only thing that can be said about the GOI<br />
is that it is transcribed. Nothing can be said about post transcriptional effects (for instance<br />
splicing) and whether a gene is translated into an active enzyme or not. Also<br />
caution should be taken when trying to predict the amount of active genes (proteins)<br />
that are formed, because transcription of a gene and translation into a protein do not<br />
always relate one to one.<br />
It is also possible to construct proteins with reported genes fused to it. This way the<br />
genes of interest can be directly observed [6]. These so called fusion proteins are<br />
inserted into the genome by using standard recombination techniques. GFP proteins are<br />
considered to be non toxic, but it has to be mentioned that altering proteins by fusing a<br />
GFP to them, may alter their functionality or influence post translational alterations.<br />
A gene can be copied by using a technique called Polymerase Chain Reaction (PCR).<br />
To do this, the right primers have to be constructed. Primers are short complementary<br />
RNA strands that have sufficient binding energy at certain temperatures to have a starting<br />
point for DNA-polymerase to start transcription. If enough DNA of transcripts and<br />
vectors is produced, then ligands can be made, which in turn can be transfected into<br />
host cells. It is also possible to directly insert the DNA into undifferentiated embry-<br />
Martin Wildeman 11
Chapter 2. Molecular Imaging<br />
onic stem cells (ES cells) and apply recombination. In this way specific genes can be<br />
replaced with (non)functional genes or they can be deleted (knockout).<br />
It is important to emphasize that most reported genes provide an indirect measuring<br />
technique and that detection of those genes are thus not the detection of a functional<br />
gene of interest, but merely an indication that the genes downstream of the same reporter<br />
as the measured protein (among which the GOI) are transcribed.<br />
2.2.3 Reporter Gene Applications<br />
With the ability to synthesize gene constructs that can be measured, the question arises<br />
on what we want to measure. There are two things that can be measured with reporter<br />
genes, of which the first is the existence and amount of a cell being of a certain genotype<br />
and the second one is the measurement of expression levels of a certain gene.<br />
In the first case, a reporter gene is placed in a construct such that it is positioned downstream<br />
of an ‘always on’ promoter, mostly being a viral promoter such as SV40 or<br />
CMV, and thus constantly synthesized in a cell. If the rate of synthesis within the cell<br />
is known, and thereby also the concentration of reporter gene protein within a cell and<br />
the amount of photons per cell per second is known, then the number of cells observed<br />
can be quantitatively be determined. This fact can be exploited to for instance determine<br />
how fast a tumor is growing over time and if, when and where it is metastasizing.<br />
Also infection processes of viruses, bacteria or parasites can be studied, as will be discussed<br />
in Chapter 4. This technique needs the ability to introduce gene constructs into<br />
cell lines.<br />
In the second case, the reporter gene is placed downstream of the same promoter as<br />
a gene of interest. This gives the ability to study gene regulation within an organism.<br />
With high throughput studies, this would allow for spatiotemporal gene expression<br />
studies and thereby act as data source for gene regulatory network inferring as will<br />
be discussed in Chapter 3. Measuring gene expression profiles needs the ability to<br />
generate transgenic model organisms.<br />
There are several techniques for introducing foreign DNA into animal cells. In cultured<br />
cells micro-injection can be applied. In in vivo cases, DNA can be introduced by<br />
particle bombardment. Both methods are called direct DNA transfer. Also transfection<br />
is possible, and the last method of introducing foreign DNA is by use of transduction,<br />
with the use of retro-viruses. Gene therapy for instance is based on this transduction<br />
method. The most used technique for producing transgenic mice, is to inject DNA into<br />
the pro nucleus of a fertilized egg [7]. A targeting vector with an inserted promoter<br />
and reporter gene is transferred to the DNA of the recipient cells and a small percentage<br />
of these cells will have the new gene incorporated into their genome. The number<br />
of gene copies is not always the same and the copy number varies from a few to hundreds<br />
inserted pieces of DNA. Also YAC vectors are used because they can carry larger<br />
strands of DNA and are thus able to express larger, more complex proteins. For GFP<br />
and Luciferase though, the SV40 vectors suffices [8]. For generation of genetically<br />
altered mice, most commonly micro-injection in blastocysts is applied, which gives at<br />
first chimeric mice as a result. This is because the ES cells in the Blastocysts will be<br />
original and transformed ES cells. If offspring of these mice have the same genes it<br />
will be homozygous. A schematic overview is given in Fig. 2.2.<br />
There is a difference between transient and stable transfection. When inserted genes<br />
12 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.2: Constructed genes are purified and inserted into oocytes. Then a selection is made<br />
out of born mice [9].<br />
are inserted into the genome, by making use of a recombinase, the inserted genes will<br />
be expressed stably, but when new DNA is inserted extra-chromosomal, the inserted<br />
DNA will be degraded over time, because it will not be replicated. For temporal gene<br />
expression measurements, stable transfection is needed, also to be certain that each cell<br />
will contain the same genome.<br />
2.2.4 Current Limitations on Reporter Genes<br />
Gene Transfer Reliability<br />
Transfection is not always effective or efficient. The undetermined gene insertion copy<br />
number, mentioned before, makes it impossible to do a quantitative analysis on gene expression.<br />
When multiple copy-numbers are present, this will result in more translation<br />
and thus in more gene expression. To make things worse, copy number and expression<br />
profiles are not always one to one related [10]. With most DNA transfer techniques it<br />
is difficult to predict side effects based on the location where the DNA is transfected.<br />
For example many non coding RNA’s (ncRNAs) have an unknown function and it is<br />
expected that many ncRNAs are not (yet) known. The size of ncRNAs varies from 20<br />
(microRNA) to thousands of nucleotides [11]. Random insertions therefore can give<br />
unpredicted results.<br />
With a technique called Flp-in from Invitrogen, it becomes easier to insert genes into<br />
a genome. The problem to be solved for this Flp-in technique is to produce a stable<br />
cell-line which contains only one Flp site and that seems to behave like a normal cell<br />
line (the long term side effects of DNA insertion cannot be predicted), but once such a<br />
cell line is generated, virtually every gene can be inserted into the Flp system, by using<br />
homologous recombination [12]. Using a Southern-blot it can detected whether there<br />
is one and only one copy of the inserted Flp site [13].<br />
This technique is mostly used to generate on demand genetically altered cell lines.<br />
When cell lines carrying this Flp-in site are transfected with an always on promoter<br />
Martin Wildeman 13
Chapter 2. Molecular Imaging<br />
and a reporter gene, these cells become trackable with FLI, BLI or any other probe<br />
gene. Note that it is only possible to track the cells and keep track of the number<br />
of cells (quantification). No gene regulation can be monitored using this ‘always on’<br />
technique. This tracking is important for temporal study of for example tumor growth<br />
and metastasis, or tracking of infectious agents such as viruses or bacteria, as will be<br />
discussed later.<br />
As long as the regulatory effect of non-coding elements is not completely understood,<br />
it cannot be guaranteed that an insertion has no effect, but if a stable cell line with<br />
a Flp insertion is used, it is relatively certain that new insertions at that site have no<br />
side-effects on the normal functioning of the studied organism or cell line.<br />
Diffusion Coefficient<br />
When measuring reporter gene concentration it is important to keep in mind that the<br />
genes that are measured probably have the same rate of synthesis, due to the same<br />
promoter region, but it is not likely that they have the same degradation rate. With the<br />
basic conversation law it can be shown that proteins with a faster degradation rate will<br />
appear in a lower concentration than proteins with the same rate of synthesis, but a<br />
lower degradation rate.<br />
The general formula of gene formation can be stated as follows:<br />
( )<br />
time rate of change<br />
of protein conc.<br />
= Regulation + Diffusion + Decay (2.1)<br />
The only part in this equation that is equal between the gene of interest and the reporter<br />
gene, is the regulation part. The level of decay and the diffusion coefficient differ. This<br />
has as effect that the protein concentration of the gene of interest cannot be determined<br />
by the measurement of protein concentration of the reporter gene. Something qualitative<br />
can be said about upregulation or downregulation, but quantitative measurements<br />
on up or down regulation are not possible if the diffusion and decay parameters are<br />
unknown.<br />
Post Translational Effects<br />
In addition to these unknown diffusion parameters, it should also be taken in consideration<br />
that the fact that a gene is transcribed, does not guarantee that the protein is<br />
actually formed, or if it is formed, that it will be in a functional shape. Transcribed<br />
RNA in eukaryotes is often spliced into so called coding DNA (cDNA). This cDNA<br />
determines what the amount and order of amino acids in a protein will be. One single<br />
strand of translated messenger RNA (mRNA) can be spliced in different ways, so that<br />
isoforms of the same gene can appear. This also results in different forms of proteins.<br />
With reporter genes it is not possible to identify different protein isoforms. Alternative<br />
splicing is thought to be one of the most important components of the function<br />
complexity of the human genome. Given that different isoforms may be possible for<br />
different regulation effects and that genes can code for up to 40,000 protein isoforms<br />
at least some caution should be taken when interpreting gene expression data [14]. For<br />
different forms of splicing, see Fig. 2.3.<br />
14 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.3: Different splicing effects are possible. a: exons can be included or excluded, and<br />
splice sites can be altered. b: Initiation of translation or stop signals can be altered and<br />
inframe deletions or insertions are possible [14].<br />
Fig. 2.4: Many modalities from clinical imaging have been miniaturized for the use in Molecular<br />
Imaging [16]<br />
Protein Tagging<br />
When protein tagging is possible, it is relatively certain that the molecule that is visualized<br />
is the same as the gene of interest. For tagging genes the main reporter genes<br />
that are used, are the GFP family proteins. Although these genes are thought to be non<br />
toxic, it should be taken into account that gene tagging may alter the functionality of<br />
proteins and thereby may cause the alteration of biological regulation and functioning<br />
in the studied organisms [15]. In biological processes everything is based on equilibria<br />
and minor distortions may cause great effects.<br />
2.3 Molecular Imaging Modalities<br />
Besides the upcoming of in vivo gene reporters, another trend seen in the field of molecular<br />
imaging is that detection devices have been miniaturized. These micro devices are<br />
cheaper than their clinical counterparts and allow for small animal whole body imaging<br />
[16]. Because these new acquisition devices are smaller, some scaling problems need to<br />
Martin Wildeman 15
Chapter 2. Molecular Imaging<br />
be tackled, for instance how much resolution is needed to get meaningful information<br />
and what the measured volume must be [2].<br />
Commonly seen reporter genes in short can be divided into three imaging modalities:<br />
Radio-nuclide imaging, optical imaging and magnetic resonance imaging. Each<br />
category has its own advantages and disadvantages in terms of resolution, sensitivity,<br />
acquisition time and substrate admission [16]. In Molecular Imaging also the modalities<br />
CT and Echography can be used, but because they cannot or can hardly be used<br />
for visualizing gene expression, they will be discussed in less detail in this literature<br />
study. It should be noted though that CT may give much extra information as an underlying<br />
modality if extra resolution or spatial context is required. To be able to use this<br />
information, image registration is needed, as is discussed in section 2.4.2.<br />
Most imaging modalities seen in medical imaging can be used in molecular imaging,<br />
with appropriate contrast agents. The modalities nuclear imaging, radiography imaging,<br />
magnetic resonance imaging, optical imaging and ultrasound imaging will be described<br />
shortly. For each modality a reporter gene, if applicable, and a short description<br />
of acquisition will be given. For all modalities hold the same arguments; if a contrast<br />
enhancer can be bound to a molecular probe, it is, given that it is not toxic and that it<br />
can pass all necessary barriers, suitable as an (indirect) reporter for gene expression. A<br />
short overview of different modalities and their general specifications is given in table<br />
2.1.<br />
2.3.1 Nuclear Imaging<br />
Nuclear Imaging is based on unstable molecules that emit positrons or γ-rays and<br />
thereby fall into a more stable energy state. Two modalities are seen in molecular<br />
imaging, namely PET and SPECT. In PET, most used isotopes are 15 O, 13 N, 11 C and<br />
18 F and these isotopes emit positrons. When a positron is emitted and collides with an<br />
electron it annihilates into two γ-rays which travel in a ∼ 180 ◦ direction. In PET, these<br />
γ-rays are then collected and converted to a visible image, by making use of a ring<br />
of gamma detectors. Due to the fact that the γ-rays are traveling on one line and due<br />
to attenuation in the different tissue types, the exact location of the positron emitting<br />
source can be located in the 3D space [16]. Coinciding photons in the detector ring are<br />
from the same source (See Fig. 2.5).<br />
Isotopes used in SPECT are 123 I and 99m Tc emit γ-rays [19] which do not simultaneously<br />
travel in opposite direction. It is thus not possible to use a detector ring to pinpoint<br />
the location of the source of emission. Instead of using a detector ring, γ-rays are<br />
detected by special camera’s, that consists of a pinhole collimator, a scintillating crystal<br />
and a photon detector. γ-rays are converted to photons in the visible frequency range<br />
by the use of scintillating crystals and thereafter are detected by the photo detectors.<br />
By making use of pinholes, only photons flying on a line parallel to the pinholes/septae<br />
are detected. Knowing that captured γ-rays can only come from the source directly, a<br />
line in 2D space where the source must lie on is known (Fig. 2.6). When rotating the<br />
camera around the sample, it is possible to reconstruct 2D images. The technique of<br />
SPECT therefore is comparable to CT, but different energy photons are used. Multiple<br />
2D images acquired with SPECT, can be reconstructed to a 3D model the same way as<br />
in CT as will be seen later.<br />
Sensitivity of SPECT is of an order of magnitude lower than what can be achieved with<br />
16 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.5: PET tracers are injected into organism. A PET tracers contain atoms that are unstable<br />
and emit positrons. If these positrons collide with electrons, they annihilate into two<br />
γ-rays traveling in opposite direction. To measure gene expression, reporter genes are<br />
used that can accumulate PET tracers in a cell, so that these cells become visible.[17,<br />
18]<br />
Fig. 2.6: SPECT is based on pinhole detection. PET is based on coincidence events.[19]<br />
Martin Wildeman 17
Chapter 2. Molecular Imaging<br />
PET. This is due to the fact that in SPECT, γ-rays have to be tunneled through septae in<br />
a lead barrier, so that only straight traveling rays are detected. The longer these septae<br />
are, the higher the resolution in SPECT becomes, but also the less sensitive. (Less rays<br />
are detected, because more are shielded.) An advantage of SPECT over PET is that the<br />
used tracers have a longer half life. This allows for studies on slower/longer biological<br />
processes. The biggest disadvantage of SPECT is its lower (but still good) sensitivity<br />
compared to PET.<br />
The reporter genes for PET are genes that have an high binding specificity for some<br />
radio labeled biological molecules. These substrates are normal substrates labeled with<br />
positron emitting isotopes. To make sure that the overall criteria are met, specifically<br />
barrier crossing, it is important to use a molecular target that is expressed on the surface<br />
of a cell, a so called cell surface protein, or to make use of a molecular probe that can<br />
freely pass the cell membrane (For example see [20]). If the probe can pass the membrane,<br />
it is important that it is ‘trapped’ inside the cell, after some chemical reaction, so<br />
it accumulates inside the cell. It is important that the cell is not killed by this (toxicity),<br />
but accumulation of the radioactive compound inside the cell causes a higher signal.<br />
Also the use of monoclonal antibodies, to detect certain cell types is possible [21].<br />
2.3.2 Computed Tomography<br />
By making use of the x-ray wavelength region, the detection of heavy atoms, such as<br />
calcium atoms, is possible, because the attenuation of x-rays is different for different<br />
weight atoms.<br />
By rotating the sample or the scanner, multiple projections of the sample can be obtained<br />
(See Fig. 2.7). The scanned sample can be reconstructed slice by slice, where<br />
multiple projections of a slice are backprojected to obtain a 2D image. The projections<br />
can be filtered before backprojection, to include or occlude certain frequencies. Heavy<br />
atoms cause more attenuation than light atoms and thereby sensitive for difference of<br />
(average) atom weight in tissues. Positions of heavy atoms, or contrast agents, can be<br />
reconstructed by making use of this backprojection algorithm. The resolution of CT is<br />
limited by the ionizing effect of x-rays. This effect causes direct radiation damage and<br />
in the longer term DNA damage. To obtain a higher resolution, more rays per voxel are<br />
needed, which causes more damage and this damage needs to be minimized.<br />
Gene reporting probes, to be detectable, need to contain heavy atoms. The effect of<br />
large quantities of these substrates are not known and CT is not used as a gene expression<br />
measurement. X-ray imaging, and especially computer tomography (CT), are<br />
currently mainly used as a structural modality in MI. By making use of modality fusion,<br />
expression data can be fused into a high resolution spatial context.<br />
2.3.3 Magnetic Resonance Imaging<br />
Nuclei are brought into alignment by a strong magnetic field. They can have a high<br />
energy spin, when the poles of nuclei are the same as in the magnetic field and a low<br />
energy spin when the poles are oppositely aligned. All elements with a nucleus that has<br />
an odd amount of nucleons, being protons and/or neutrons, can be used form MRI. To<br />
be more precise, every nucleus that contains an unpaired proton and/or neutron is suitable<br />
for MRI. Nuclei that are most commonly used are 1 H, 2 H, 31 P, 23 Na, 14 N, 13 C and<br />
18 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.7: Multiple 2D x-ray images of a body are acquired using different rotations. With a set of<br />
these images a 3D space can be reconstructed. (kabayim.com/images/spiralCT.jpg)<br />
19 F. Every isotope that has a non zero nuclear spin can be used for Nuclear Magnetic<br />
Resonance. Once all nuclei are aligned into the magnetic field, a RF pulse is generated<br />
by placing a current through a coiled wire around the sample. This pulse causes the<br />
nuclei to be brought out of alignment of the static magnetic field. After this, the spins<br />
are returning into alignment with the static magnetic field and the duration needed for<br />
this realignment, called the spin relaxation times, are measured. This can be done by<br />
the same coil or by an additional electromagnetic coil.<br />
The location of the molecules can be determined by placing a gradient in the force of<br />
the static magnetic field. This is because the frequency of the spin is determined by the<br />
force of the magnetic field, as is shown in equation 2.2.<br />
ω 0 = γB 0 (2.2)<br />
Only nuclei that have the same frequency (ω 0 ) as the RF signal, will respond to this<br />
signal. This is why the technique is called Magnetic Resonance. B 0 is the force of the<br />
magnetic field in Tesla and γ is the gyromagnetic ratio, which is a specific property of<br />
the nucleus.<br />
There are different relaxation phases, T 1 and T 2 that correspond to the Z and the X-<br />
Y plane respectively, and although these differences are quite fundamental, they are<br />
considered to be out of scope of this study.<br />
The measured relaxation times are mainly determined by the chemo-physical environment.<br />
The combination of all measured relaxation times results in a NMR signal in the<br />
time domain. This signal can then be converted into a frequency domain by applying<br />
a Fourier transform [16, 22]. MR is very sensitive to differences in soft tissues. Extra<br />
contrast agents, such as gadolinium or dysprosium can be used to enhance the MR<br />
signals in regions of interest.<br />
MR is not yet really used for imaging of gene expression, because of its lack of sensitivity<br />
to small amounts of reporter genes. With appropriate amplification strategies<br />
though, it is possible to obtain enough signal and with MR very high resolution can<br />
be achieved. Louie et al. developed a shielding container that is able to ‘switch off’<br />
gadolinium. In the presence of β-Gal, which is the protein produced by the LacZ gene,<br />
Martin Wildeman 19
Chapter 2. Molecular Imaging<br />
Fig. 2.8: Gadolinium encapsulation is cleaved by β -galactosidase at the red bond shown in A.<br />
This way the Gd 3+ becomes detectable by MRI once it gets in contact with water. Left<br />
is the intact cage and right is the cleaved cage where gadolinium is free. (A) shows the<br />
chemical geometrical structural formula and (B) shows the same molecules in a space<br />
filling model. The purple atom that can be seen in (B) right, is the free gadolinium atom<br />
[23].<br />
this shielding container gets cleaved in such a way that a coordination site at the Gd 3+<br />
becomes free and gets ‘activated’ (see Fig. 2.8). The activated Gd atom generates a<br />
roughly twofold stronger signal than the inactive Gd. Furthermore MR does not suffer<br />
from limitations that are seen in optical imaging, concerning spatial reconstruction<br />
algorithms. [23]<br />
MRI is still mainly used in MI as an extra structural modality for modality fusion. Also<br />
combined PET-MRI scanners exist, but combined PET-CT scanners are more common.<br />
2.3.4 Optical Imaging<br />
Optical imaging makes use of the frequency spectrum in the range of visible and near<br />
infra-red light. Images are acquired by using basic CCD Cameras. Photography in<br />
the clinical field was mainly used for showcases of phenotypic effects of diseases or<br />
injuries, mainly for educational purposes, but with the upcoming of optical contrast<br />
agents, it is now possible to use this modality as a molecular imaging modality. An<br />
important development for this to be possible is the availability of more sensitive cameras.<br />
The technique of these cameras is the same as normal CCD cameras, but they<br />
are cooled down. The technique is called CCCD (Cooled Charge Coupled Device) and<br />
enables that light sources with a really low intensity can still be detected.<br />
20 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.9: Schematic overview of different capturing techniques. a and b are planar imaging c is<br />
the principle of tomography. d is a reconstructed result of optical tomography, of which<br />
the emission source has yet to be calculated [25].<br />
Fluorescence Molecular Imaging<br />
The most common Auto Fluorescent Proteins are the eGFPs (enhanced Green Fluorescent<br />
Proteins). These proteins must be excited with an outside light source, the<br />
excitation beam or source. An AFP must be exited with an higher energy than that it<br />
emits. Therefore, with appropriate filtering, emitted light can be filtered out for imaging.<br />
In this way only the light that has its origin from the AFPs is recorded. This is<br />
done because noise from other homologous AFPs might give interference because of<br />
overlapping spectra. With FMI, images can be acquired in a planar form, resulting<br />
in a 2D image, or by using a technique called optical tomography, where a 3D image<br />
can be acquired. The penetration depth for tomography is much higher than for planar<br />
imaging, but planar imaging has the possibility for much higher throughputs [24]. A<br />
short schematic view of different capturing techniques is given in Fig. 2.9.<br />
Bioluminescence Imaging<br />
When bioluminescent proteins, of which luciferase is most common, are present in<br />
an organism, an image of the gene expression can also be made with a Cooled CCD<br />
Camera. This is called bioluminescence imaging. Although the emission intensity<br />
of light in BLI is much lower than in FMI, it has a much higher sensitivity. This<br />
is because there is less background signal in BLI. The only sources of light are the<br />
proteins itself [25]. Bioluminescent sources can be detected by using a very sensitive<br />
camera, combined with a dark chamber in which no other photons are present than the<br />
photons of the bioluminescent protein. A schematics overview of steps needed for BLI<br />
is shown in Fig. 2.10.<br />
Protein-protein interaction with FRET, BRET and the yeast two-hybrid system<br />
GFP and Luciferase can also be used to measure protein-protein interaction, by making<br />
use of a phenomenon called FRET or BRET [27, 28]. It is currently possible to<br />
visualize Protein-Protein interaction [29]. This is done by the use of fusion proteins.<br />
Copies of genes are inserted into the organism of interest. With FRET two GFPs and<br />
Martin Wildeman 21
Chapter 2. Molecular Imaging<br />
Fig. 2.10: Schematic of Bioluminscence Imaging. (A.) BLI genes are inserted into cell lines<br />
or DNA constructs, (B.) are then inserted into an animal model (C.) and images are<br />
captured. (D.) Acquired data is then quantified and visualized [26].<br />
Fig. 2.11: Principles of FRET. a,b,If proteins are in close proximity (less than 60 Å) the emission<br />
of the acceptor GFP is measured. Otherwise, only the emission of the donor GFP,<br />
with different wavelength, is measured. c shows some techniques involving FRET<br />
[29].<br />
with BRET a Luciferase and GFP are fused to gene X and gene Y by placing them<br />
downstream of a promoter. When gene X and Y bind, the two GFP’s get in close proximity<br />
of each other, such that resonance energy transfer is possible, as can be seen in<br />
Fig. 2.11. Not only protein-protein activity can be visualized, but also for instance,<br />
protease activity, which can act on a restriction site in the linker DNA of two fused<br />
GFP proteins. With a CCCD camera acquisition is possible. Another method of visualizing<br />
protein-protein interaction is the yeast two-hybrid system. In [30] in a proof of<br />
concept, the interaction of MyoD and ID is visualized. Y2H is an indirect measuring<br />
technique. The interaction of the two proteins of interest induce the transcription of<br />
Luciferase which in turn is translated and can be visualized with a Cooled CCD Camera.<br />
The reporter gene of use can be chosen freely. For the mechanism, see Fig. 2.12<br />
22 Martin Wildeman
Chapter 2. Molecular Imaging<br />
Fig. 2.12: The Yeast Two Hybrid system. Gene X and Y are fused GAL4 and VP16 which<br />
form an active transcription factor [31] for a luciferase gene, by placing the luc gene<br />
downstream of a GAL4 binding site [30].<br />
2.3.5 Ultrasound Imaging<br />
Ultrasound Imaging is based on echo. To obtain an image with ultrasound, short, high<br />
frequency sound pulses are generated. At each barrier where a change of tissue is<br />
located, a portion of the signal is reflected and can be detected by a scanner. The time<br />
it takes for a signal to return to the source, is correlated to the distance that that signal<br />
has travelled. Ultrasound contrast agents are used to enhance the signal. Most common<br />
agents are small air or gas bubbles, called micro-bubbles. Not only do they form a<br />
strong reflective barrier (blood/gas), they also resonate which make them even more<br />
reflective [32]. Micro-bubbles are quantifiable. Although in the traditional ultrasound<br />
resolutions are not really high, with ultrasonic biomicroscopy resolutions of up to ∼<br />
40µm can be achieved and with scanning acoustic microscopy, which is an even higher<br />
frequency sound (200 MHz and higher) resolution of 3 µm are achievable. It should be<br />
noted though that penetration depth decreases with an increase of frequency. With new<br />
micro-bubble contrast agents, specific surfaces can be bound and contrast is enhanced.<br />
Micro-bubbles are encapsulated in a protein and fused to specific antibodies. This<br />
is used for instance, to image inflammatory cells and these specific contrast agents<br />
opens the door for molecular imaging. Ultrasound is not used for gene expression.<br />
This is mainly due to the lack of suitable gene reporters, but also the resolution versus<br />
penetration depth trade-off plays a role. This technique may provide useful information<br />
on concentration flows as will be discussed shortly in 3.<br />
2.4 Acquisition Challenges<br />
2.4.1 Quantification of BLT and FMT<br />
Forward and Inverse Problem<br />
In contrast to PET, for BLT and FMT a scattering and absorption model is required to<br />
be able to solve the inverse problem. Finding the right parameters is called the Forward<br />
Martin Wildeman 23
Chapter 2. Molecular Imaging<br />
Table 2.1: Short list of specifications of different modalities. Source: Molecular Imaging in Living<br />
Subjects, Massoud<br />
problem. E.g. Given the source of emission what must the parameters of the model<br />
be to generate the observed data Once these parameters are estimated, one can try<br />
to solve the inverse problem, e.g. given a model with known parameters and given an<br />
observation, what is the shape, location and density of the emission source For FMT<br />
it is possible to make an approximation of the forward model, because a known input<br />
light source is available, of which the output can be measured. From the attenuation<br />
model, obtained from the known laser light source, it is then possible to start solving<br />
the inverse problem for a fluorescent source. The forward problem cannot be solved<br />
with BLT as no known light source can be used for estimating the parameters of the<br />
model. A priori anatomical information therefore has to be incorporated [33]. To do<br />
that, a second modality, such as MRI or CT is needed to provide anatomical details<br />
about the model. A priori model information can also be obtained from mouse atlas<br />
databases, see Fig. 2.13 [34]. The problem with multi modality though is, that it is not<br />
straightforward to register these modalities on on each other and errors are introduced<br />
because of differences between the model and the atlas.<br />
When registration is complete and successful, different tissues in the model can be<br />
segmented an with those segments the inverse problem can be solved. For the optical<br />
parameters mean values from the literature can be used. To approximate the photon<br />
propagation, the following equation can be used [35]:<br />
{ −∇·(D(x)∇Φ(x))+µa (x)Φ(x)=S(x)<br />
D(x)=(3(µ a (x)+(1−g)µ s (x))) −1 (x ∈ Ω) (2.3)<br />
In this equation S(x) is the unknown source density, Φ(x) is the photon density at<br />
location x. µ a , µ s and g are optical parameters. In the paper of Cong [35] equation 2.3 is<br />
solved using a modified Newton method. But it is also possible to use a MAP approach<br />
[33]. It is proved that this inverse problem has a unique solution [36], provided that the<br />
model is well enough defined.<br />
Resolution Improvement<br />
A problem concerning the ill-posedness in BLT is that the optical parameters of the<br />
body tissue are temperature dependent [37]. This temperature dependency can be mod-<br />
24 Martin Wildeman
Chapter 2. Molecular Imaging<br />
eled, but this is at the cost of an even more complex model and thus at the cost of extra<br />
computational power. A higher resolution and more accurate result will be gained by<br />
adding this temperature dependency. It should also be noted though that temperature<br />
has to be measured for every tissue which will likely introduce a new inverse problem<br />
for the infrared spectrum.<br />
Chaudhari et al [38] propose to use spectral information for reconstruction of a BLI<br />
source. Because of attenuation in the body tissues, there is a spectral shift in the signal.<br />
By capturing hyper-spectral ( 100 spectrum bins) or multi-spectral( 10 bins) these attenuation<br />
differences can be taken into account. This way, two overlapping sources in<br />
a 2D image of which one is superficial and one is located deeper, can be distinguished.<br />
It should be noted that for each spectral band, an individual inverse problem has to be<br />
solved.<br />
Backprojection<br />
It remains to be seen whether these complex optimization problems are useful. The<br />
optical properties of different tissues in the small animal models are unknown and simplified<br />
assumptions are used for the reconstruction of the BLT energy source [39]. The<br />
most important question for combining BLT (or Fluorescence Tomography for that<br />
matter) and the field of Systems Biology will be: How much resolution in space and<br />
time is needed, for cell specific and process dynamic behavior respectively, for feasible<br />
application of molecular imaging to track gene expression in the organism In the<br />
paper of Kok [39] a relatively straightforward algorithm is used for reconstruction of<br />
the bioluminescent source. Scattering is not taken into account and the tissue structure<br />
is assumed to be homogeneous, which is clearly not the case. Despite these simplifications<br />
a good estimation is achieved for source localization of superficial lesions.<br />
Combined with the fact that the authors only want to attract attention to a location in<br />
the accompanying CT (or another structural data-file), the algorithm can be seen as<br />
an efficient and simple reconstruction algorithm. The authors use a backprojection of<br />
eight planar images, each rotated a known number of degrees, onto a ‘3D’ structural<br />
data set. This methods provides good resolution for superficial BLI sources, but has<br />
lower resolving power for deeper lying tissues. It is also shown though in [40] that also<br />
with coarse grained resolutions interesting new information can be obtained from gene<br />
expression data.<br />
2.4.2 Combining Information: Multi-modality fusion<br />
Because different modalities contain different information it is useful to combine this<br />
information. CT for example is sensitive to elements with a high atomic number, for<br />
example calcium which is found in bones and calcification. Heavy atoms such as iodine<br />
can be injected in the blood stream as contrast agents making veins and blood-rich<br />
organs detectable. MRI on the other hand is very powerful for visualizing different soft<br />
tissues. When these two modalities are correctly combined, they support each other<br />
and fill in tissue differences that the other modality it not able to detect.<br />
Bioluminescence and Fluorescence planar images by themselves don’t give much detail<br />
on the location of gene expression. This is due to diffusion and scattering inside the<br />
body, before photons reach the surface of the body (e.g. the skin of the mouse) from<br />
Martin Wildeman 25
Chapter 2. Molecular Imaging<br />
Fig. 2.13: Mouse atlas with a surface rendering of skeleton and different organs [34].<br />
which the picture is taken. As an effect only a rough indication (in terms of millimeters)<br />
of the location can be given based on the set of 2D images. A huge advantage of BLI<br />
and FMI though, is that they are much more sensitive to abnormalities than the existing<br />
medical imaging modalities. Therefore it is possible to detect diseases, well before<br />
morphological changes are observable. If a detection is made with BLI or FMI, other<br />
modalities can be used to study morphological changes in detail at the specific sites of<br />
interest [39].<br />
How to align different modalities The position of the mouse model during the acquisition<br />
of different modalities most likely differs. If the two modalities are combined, a<br />
reconstruction of the source will be possible. For the combination of multiple modalities<br />
though, alignment by image registration is needed. This 3D alignment is not a<br />
straightforward procedure [16]. If all modalities can be aligned to a standard atlas, this<br />
way modalities can be fused. In the paper of Baiker [41] a registration of the skeleton<br />
is automatically done based on an optimization, that minimizes differences between<br />
an mouse skeleton atlas and a skeleton generated from a CT scan. By extending this<br />
work, it is also possible to register some marks on the mouse skin and combined with<br />
the skeleton information, interpolate where the organs of the mouse are located. It is<br />
also possible to generate a 3D image from structured light from planar images. By<br />
combining those models, is should be possible to estimate where different tissues in<br />
the model are located.<br />
It is important to notice that a mapping to an atlas is needed for both qualitative as<br />
quantitative gene expression measurements [42]. To be able to tell in which organ gene<br />
expression occurs for instance, one has to know where the organs are located in the<br />
3D space of an organism first. A whole range of mouse atlas databases currently is<br />
available [34]. Few of them also contain spatiotemporal gene expression data (Mouse<br />
Atlas Project developed at the University of Edinburgh and DigiMouse), to which new<br />
measurement can be correlated. [43, 42, 34]<br />
26 Martin Wildeman
Chapter 2. Molecular Imaging<br />
2.4.3 Combining Information: Follow Up Registration<br />
Although in vivo imaging allows for continuous measurements in time without moving<br />
the animal, most if not all diseases that are studied have a progression in terms of<br />
weeks rather than in terms of hours. It is therefore infeasible to continuously maintain<br />
the studied animal at the exact same position and it is thus necessary to be able to<br />
register images of the same animal in individual experiments.<br />
For follow-up registration, the same atlas approach can be used as for multi modality<br />
fusion. Once it is possible to register the modality on an atlas, it is a small step to<br />
register a ‘time series’ of this same modality to this atlas.<br />
To overcome or prevent some of the registration problems, it is also possible combine<br />
multiple modalities during the acquisition [38]. This way, it is ensured that both<br />
modalities are exactly in the same location in the x,y,z space. Prita Ray et al. [20]<br />
are doing much work on multi modal capturing, by constructing multi modal reporter<br />
genes. In this way FMT, BLT and PET can be acquired with the use of one and the<br />
same reporter gene construct. Also a combined micro PET-CT scanner is used, to<br />
obtain high-resolution anatomical images and gene expression data [44].<br />
In the ideal case, the lab assistant should not need to worry about how to position the<br />
animal for measurements, but positioning the animal in the same way each experiment<br />
makes the registration a lot easier. An effective way to fix the organism in a spatial<br />
context is the use of animal holders. By positioning animals in the same way each<br />
time a acquisition is done, the registration problem is easier solved by reduction of the<br />
degrees of freedom.<br />
2.4.4 Current Limitations in Molecular Imaging<br />
To obtain useful gene expression data with molecular imaging, multiple measurements<br />
have to be made and results have to be combined in one data set. These measurements<br />
contain some noise which introduces inaccuracies, but registration steps will also introduce<br />
new inaccuracies that further decreases the resolution of measurements that can<br />
be achieved. Different kinds of noise are discussed below.<br />
General Noise<br />
Every modality suffers from its own noise problems. The basic problem with noise is<br />
that it can give an overlap with the signal, especially when the signal to noise ratio is not<br />
high enough. To overcome some of these SNR problems, the means of amplifications<br />
of the reporter contrast agents can be used, but if a quantification of gene expression<br />
levels is necessary it must be known how much amplification is used.<br />
Attenuation<br />
Solving the inverse problem is a difficult task. By using the anatomical information<br />
from an atlas, you introduce an error due to the difference between the organism of<br />
study and the reference organism. The optical parameters of the body tissue are temperature<br />
dependent [37]. This temperature dependency can be modulated, but this is<br />
Martin Wildeman 27
Chapter 2. Molecular Imaging<br />
at the cost of an even more complex model and thus at the cost of extra computational<br />
power. Moreover the temperature in an organism is not homogeneous but differs in<br />
space and over time. This will likely affect reconstruction accuracy.<br />
Multi-modality and Follow-up registration<br />
A problem with BLI and FLI, is that it is based on 2D images that only provide pictures<br />
of the surface. It is possible to register CT data to a 3D mouse atlas, and it is also<br />
possible to register 2D BLI data to 3D CT data [39]. Both registration steps introduce<br />
errors. Moreover because it is relatively easy to model rigid conformational changes,<br />
but it is more difficult to model soft tissue deformations. If BLI sources are located in<br />
soft tissues, the reconstruction of the source therefore becomes more inaccurate. In the<br />
ideal case, small animal models are used to be able to mimic diseases in humans, but if<br />
not high enough resolutions can be obtained with small animal models an exploration<br />
to smaller, simpler and transparent organisms can be made, such that the light sources<br />
can be seen directly and therefore reconstruction of the light source, if already needed,<br />
becomes straightforward.<br />
28 Martin Wildeman
CHAPTER 3<br />
Molecular Imaging as extra data source for model<br />
generation<br />
With the ability to visualize gene expression the question arises on what can be done<br />
with acquired data. To answer this question we take a look into the field of bioinformatics<br />
where gene expression data already is analysed.<br />
One reason to strive for an understanding of the underlying cellular processes in an<br />
organism, is to be able to predict it’s behavior and to change or correct its behavior if<br />
needed. To do this, it is not always needed to understand the full functioning of the<br />
system.<br />
There are two approaches for gaining insight in cellular processes. Firstly, by doing<br />
experiments at a low level and secondly by simulating (high level) processes to mimic<br />
observed data. With large complex biological networks possibly only the latter approach<br />
is feasible for obtaining a ‘full’ understanding [45, 40].<br />
In an attempt to relate the field of molecular imaging to the field of bioinformatics,<br />
some examples from bioinformatics are studied and related to MI in this Chapter.<br />
Firstly some studies will be highlighted where spatiotemporal data is acquired using<br />
high throughput techniques, secondly some findings on mathematical models for network<br />
inference will be presented, thirdly a short concept will be given on how to translate<br />
these mathematical models from quantitative to qualitative model, because data<br />
quality is not always good enough for quantitative model construction. Finally a concept<br />
on statistical model inference will be given, based on time series micro array<br />
experiments.<br />
Some findings will then be discussed and questions will be posed in the discussion<br />
section.<br />
29
Chapter 3. Molecular Imaging as extra data source for model generation<br />
3.1 Acquisition of Spatiotemporal Gene Expression Data<br />
In a spatial-temporal gene expression study on Drosophila melanogaster, Seroude et al.<br />
obtained a set of age related genes of which expression changes with age [46]. For the<br />
measurements, extraction and cryosectioning were used for time and spatial expression<br />
profiles respectively. Genes were visualized using the Flytrap system and staining of<br />
β-galactosidase. This way, a 3D+t gene expression profile was obtained. It should be<br />
noted that this experiment was not an in vivo measurement, but the possibility of Flytrap<br />
to express GFP [47] could open the door for non-invasive molecular imaging. In situ<br />
images of the Drosophila Melanogaster could be clustered by using pattern recognition<br />
techniques. In [3] embryo images were studied by using a Gaussian Mixture Model,<br />
an eigenvector basis and a discrete Haar-wavelet as feature space. All pictures were<br />
aligned by making sure that the dorsal side of the embryos was on top and the anterior<br />
on the left. Similar spatial gene expressions were clustered, using graph partitioning.<br />
This way the authors were able to cluster the embryos into different developmental<br />
stages (temporal) and co-regulated spatial expression profiles in those stages (spatial<br />
correlation). Genes with similar expression profiles are thought to be involved in the<br />
same pathway. With this procedure they were able to get a 99,55% staging overlap,<br />
meaning the difference in developmental stage in embryonic development annotated<br />
by the algorithm, compared to expert annotation. This overlap suggests that automated<br />
gene expression measurements are feasible. Indeed in [48] it is said that automatic<br />
high throughput measurements of ISH is feasible and the authors created a mouse atlas<br />
containing spatial gene expression data. Also in their gene expression profile clustering<br />
was done.<br />
The power of spatiotemporal expression measurements is, next to the fact that spatial<br />
information is obtained, that it is sensitive to gene expression in small clusters of<br />
cells. In microarray data these expression profiles would be averaged out by larger<br />
cell clusters with different expression levels [48]. For example, purely hypothetical,<br />
if in a developing embryo there is upregulation in the anterior and downregulation in<br />
the posterior, a microarray experiment would detect no regulation, whereas a spatial<br />
measurement would be able to show this ‘expression gradient’<br />
Dupuy et al. acquired a spatiotemporal gene expression profile by using in vivo imaging<br />
[49]. Because in their paper the authors make use of spatiotemporal in vivo imaging of<br />
which techniques may be extendable to whole body molecular imaging, their publication<br />
is covered in extra detail here.<br />
In their paper Dupuy et al. made a high throughput analysis of about 900 gene promoters.<br />
They used the technique as visualized in Fig. 2.1. Each of those 900 promoters<br />
were expressing a GFP protein and these promoters covered about 5% of the protein<br />
coding genes in C. elegans. Because they wanted to do gene expression measurements<br />
in a developmental study the authors needed some way to incorporate a temporal component<br />
in their spatial gene expression profile measurements.<br />
Temporal arrangement using COPAS<br />
The authors measured gene expression using GFP as a reporter gene and measured<br />
expression profiles on the longitudinal axis of the organism Caenorhabditis elegans.<br />
Instead of measuring expression profiles directly over time, the authors used the body<br />
30 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
Fig. 3.1: a Images as captured and converted into a one dimensional GFP intensity bar. b They<br />
are aligned with respect to orientation and length, to get a chronogram c. Then the<br />
chronograms are normalized in time d so that correlation can be calculated [49].<br />
length of the organism as an indication of age. This length could automatically be<br />
sorted by a device called COPAS (‘complex object parametric analysis and sorter’,<br />
produced by a company called Union Biometrica). The working of this device is based<br />
on flow-cytometry which basically separates particles on their size. Larger/heavier<br />
particles will have a longer time of flight than relatively smaller organisms. Images<br />
were acquired with a CCD camera and a confocal microscope. The COPAS system<br />
is able to generate fluorescent emission profiles along the anterior-posterior axis of C.<br />
elegans automatically.<br />
Chronograms<br />
With the large amount of gene expression profiles that were measured this way, the<br />
authors created a set of what they call chronograms. A chronogram is a two dimensional<br />
expression profile, containing a spatial component and a temporal component.<br />
As can be seen in Fig. 3.1 the expression data was converted into intensity bars, based<br />
on the intensity measurements of COPAS. These intensity bars were then aligned and<br />
stacked on top of each other, based on size, as can be seen in Fig. 3.1 c. To be able to<br />
compare the chronograms with other genes, these chronograms were normalized to a<br />
standard chronogram size which contains one line for each size. If no measurements<br />
are available for a certain size an empty line appears in the normalized chronogram.<br />
When multiple measurements are available for a certain size, these measurements get<br />
averaged onto one line in the normalized chronogram (Fig. 3.1 d).<br />
Chronograms that were acquired report the activity of the proximal promoter of 1,610<br />
unique predicted loci, i.e. the promoter was active according to the measurements and<br />
1,610 of those chronograms have only one locus on the chromosome containing the<br />
same promoter region. Roughly 900 measurements contained an average signal that<br />
was above background noise. Most of the other 700 chronograms had a too low intensity,<br />
probably due to an extra-chromosal promoter::GFP construct, a result of limitations<br />
in gene transfer discussed earlier in this paper.<br />
Martin Wildeman 31
Chapter 3. Molecular Imaging as extra data source for model generation<br />
Spatial prior knowledge<br />
The chronograms can be related to tissue specific expression profiles. A gene that is<br />
for example only expressed in the Pharynx has a different ‘fingerprint’ than a gene<br />
that is only expressed in the Gonad sheath. To generate the chronograms, qualitative<br />
tags obtained from microscopy and microarray experiments indicating locations of<br />
gene expression were used and clustered and chronograms from all genes known to be<br />
expressed in the same (qualitative) regions were averaged into one chronogram. The<br />
authors warn that this procedure only gives robust fingerprints for large numbers of<br />
measurements containing the same tag, because many genes are expressed in multiple<br />
regions and with little chronograms to average over, these extra locations may show up<br />
as a signal in fingerprints where they actually do not belong. These fingerprint chronograms,<br />
allow for qualitative location statements on newly obtained chronograms.<br />
Temporal prior knowledge<br />
The same approach was used for expression profiles with known high correlations obtained<br />
from microarray data. These expression clusters obtained from microarray data<br />
did not give clear patterns in the averaged chronograms most of the time, indicating<br />
that co expression in time, measured in microarray data, not necessarily means coexpression<br />
in space. Some examples, such as the ‘neurons’, ‘germ line’ and ‘intestine’<br />
clusters were in correspondence with the associated high correlation in microarray data<br />
though (i.e. a clear expression pattern was seen).<br />
The chronogram promoter activity measurements can be correlated to each other. Chronograms<br />
with high correlation can be clustered and most likely will be functionally related.<br />
To get an event better spatial localization, the authors predict that in the near<br />
future COPAS will be able to generate 3D aligned expression profiles. This, they expect,<br />
will give more accurate four dimensional chronograms, where overlapping organs<br />
will not cause inaccuracies anymore.<br />
To summarize the paper of Dupuy et al. shortly: Age/developmental stage is defined as<br />
the temporal element in the measurements. In this way, high throughput measurements<br />
are feasible, where alignment of the measurements is automatically done. When time<br />
and spatial expression are combined, a so called chronogram is obtained; see Fig. 3.1.<br />
After normalization of these chronograms, they can be correlated and when high correlation<br />
is seen, the function of the proteins measured are likely to be involved in the<br />
same cellular process.<br />
Because Caenorhabditis elegans is a transparent organism, measurements are direct<br />
and precise. Compared to whole body imaging of mice, this could give a problem,<br />
because for each gene a location estimation of expression has to be done.<br />
3.2 Inferring a Quantitative Model using Spatiotemporal<br />
Protein Expression<br />
Reinitz et al. state that to model processes, high detail is not needed. The detail of the<br />
model will just be lower if less detail and lower resolution data is available [40]. In<br />
32 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
their work they look at low resolution spatial gene expression profiles to study regulation<br />
effects on eve stripe formation. With a few simplifications, necessary because<br />
of a lack of detailed data, they were still able to construct a model which was capable<br />
of simulating the eve stripe formation. Where Reinitz et al. used only the longitudinal<br />
protein gradients for their model, Krul et al. take the geometrical complexity of<br />
the reality into account [50]. They do this by defining cells as point shaped objects<br />
and the intracellular as the space around it with this space having the shape of the organism,<br />
Drosophila. Krul et al. also simplified the model by only looking at a small<br />
selection of known regulating proteins. With this simplification they were still able to<br />
mimic the systems behavior, but there were deviations due to the simplifications. When<br />
studying the processes in a two dimensional space these deviations became larger. The<br />
model they used consists of the following functions where the difference between intra-<br />
/extracellular and diffusion/non-diffusion is taken into account.<br />
The change over time is described by:<br />
Where h i j =<br />
N g<br />
∑<br />
k=1<br />
δg i j (t)<br />
δt<br />
The extracellular protein concentrations are modeled by:<br />
δc j (x,t)<br />
δt<br />
And equations 3.1 and 3.2 are constrained by:<br />
= φ(h i j)<br />
k j + φ(h i j ) − λ jg i j (t)<br />
W jk g ik + h j and i = 1,..,N c and j = 1,..,N g<br />
(3.1)<br />
= D j ∇ 2 c j (x,t) − λ j c j (x,t) (3.2)<br />
g i j (t) = c j (x i ,t) (3.3)<br />
The symbols in these equations represent: g i j : concentration in cell i for gene j, c j :<br />
extracellular concentration of gene j. λ j : degradation rate of gene j, k j : formation rate<br />
of gene j, h j : activation threshold for gene j and D j : diffusion coefficient of gene j.<br />
W jk contains the regulatory effects of gene j on gene k. It consists of real number values<br />
and these values are positive, negative and zero, for upregulation, downregulation and<br />
no regulation respectively. N c is the number of cells present in the model and N g is the<br />
number of genes incorporated in the model.<br />
Clearly W is the matrix with parameters that we want to estimate, because with these<br />
regulation parameters a gene regulation network can be constructed. Positive or negative<br />
feedback loops for each gene relation are modeled. Also λ,k, h and D are parameters<br />
that need to be set.<br />
Krul tuned or optimized the parameters by hand, to mimic the model. Reinitz et al.<br />
used an optimization algorithm, called simulated annealing, but other optimization algorithms<br />
can be used, such as a genetic algorithm. The cost function they used (equation<br />
3.4) is the difference between the model and the measurements.<br />
E =<br />
∑<br />
all a, i, t and genotypes<br />
for which data<br />
exists<br />
(g a i (t) model − g a i (t) data ) 2 + (penalty terms) (3.4)<br />
Martin Wildeman 33
Chapter 3. Molecular Imaging as extra data source for model generation<br />
These penalty terms can consist of all kinds of terms and their purpose is to direct the<br />
solution faster or more accurate to the optimal solution. It can even be used to avoid<br />
local sub optima. An example of the latter one is the so called niche penalty, used<br />
in genetic algorithms to prevent a local suboptimum to become dominant over other<br />
populations in the optimization field, that are scoring less good [51]. Other terms that<br />
can be used are functions that give a penalty on infeasible solutions. For example a<br />
protein concentration may not get above some soluble value. Also penalty terms that<br />
reduce the complexity of the model, e.g. the number of regulatory connections can be<br />
included [52]. Reinitz et al. used reduction of search space as penalty term and they<br />
also incorporated a term Λ which with a given penalty function makes sure that the<br />
maximum saturation of u is limited to (1 − Λ). u a in the paper of Reinitz means the<br />
total regulatory effect onto the promotor of gene a. The regulatory effects cannot be<br />
too large, so this is also a reduction in the search space of the optimization algorithm.<br />
It should be noted that equations 2.1, 3.1, 3.2 and 3.3 are based on the conversation law<br />
which can be written as [53]:<br />
∫ xb<br />
∫ xb<br />
∫<br />
d<br />
δ<br />
xb<br />
c(x,t)dx =<br />
dt x a x a δx J(x,t)dx + f (x,t,c(x,t))dx (3.5)<br />
x a<br />
J is the flux (or transport rate) of the component and f is the production rate.<br />
In more recent work the eve stripe formation could be correctly be predicted by a more<br />
advanced model. Based on cis-regulatory mechanisms, also known as enhancers, the<br />
activation of expression could be correctly predicted, including the effect of mutations<br />
in the regulatory DNA [54].<br />
In a more recent paper from Fomekong-Nanfack et al. a parameter estimation also is<br />
done [55]. In this paper research was done on how to optimize the parameters of the<br />
eve stripe formation model to fit the observed data. In the paper it is stated that a<br />
brute-force global optimization problem is still the most used method for parameter<br />
estimation problems. This is due to the fact that the parameter fitness landscape is<br />
unknown in most of the cases and therefore the parameter search space is assumed to<br />
be unrestricted. An effective optimization algorithm needs to be found and applied for<br />
each optimization problem. The authors chose for an evolution strategy to study its performance.<br />
An island-Evolutionary Algorithm is chosen and good results are achieved<br />
using this method. 62% of the found solutions were considered to be ‘good’ solutions.<br />
It is further stressed that a good search algorithm for a three-dimensional reactiondiffusion<br />
model is mandatory, because a one dimension model is already difficult (time<br />
consuming) to solve. The authors conclude that an ES algorithm is very effective to<br />
use for estimating an initial guess for local search algorithms, where after these local<br />
search algorithms should be used for fine-tuning the parameter estimation.<br />
3.3 Quantitative vs. Qualitative Network Models<br />
Though in theory it could be possible to generate a quantitative network model of spatiotemporal<br />
gene expression, current measurements on gene expression are not precise<br />
enough. Moreover quantitative measurements of kinetics and molecular concentration<br />
are largely unknown [56]. This is the case for microarray data and missing information<br />
there will also not be available for whole-body optical imaging, so it is for large<br />
networks needed to infer a qualitative model instead of a quantitative one.<br />
34 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
De Jong et al. [57] describe a method to qualitatively describe a gene regulatory network.<br />
Each protein concentration change can be modeled by an equation with generic<br />
form:<br />
ẋ i = f i (x) − g i (x)x i and x i ≥ 0,1 ≤ i ≤ n (3.6)<br />
This equation can be written in vector notation and becomes<br />
ẋ = f (x) − g(x)x with f = ( f 1 ,..., f n ) ′ and g = diag(g 1 ,...,g n ) (3.7)<br />
f i defines how the rate of synthesis of protein i is influenced by the concentrations of<br />
all genes x.<br />
f i (x) = ∑ κ il b il (x) (3.8)<br />
l∈L<br />
κ il is here the reaction rate parameter and b il : R n ≥0<br />
→ {0,1} is a regulation function.<br />
And L is a set of regulation function indices. If no regulators exist for some protein,<br />
then L is an empty set. The regulation function g(x) works at a similar level, with<br />
the exception that its outcome must be strictly positive. (You cannot have negative<br />
degradation, but you can have negative feedback regulation.) In following equations,<br />
there will be a naming convention used, where γ stands for degradation rates and κ<br />
stands for synthesis rates.<br />
b il describes the underlying logic of the gene regulation. Some examples of these<br />
functions are b il (x) = s + (x j ,θ j ), which means that b i j equals 1 if x j is below threshold<br />
θ j and else is equal to 0<br />
These binary conditions are based on the observation that gene expression level changes<br />
normally behave like steep, switch like, sigmoid functions, which means that they are<br />
either regulated or not regulated by a certain gene. (Of course still in relation to some<br />
rate κ).<br />
What follows is a simple example of two genes that autoregulate and regulate each<br />
other, mentioned in the paper of de Jong. In Fig. 3.2, a scheme of regulation is shown,<br />
then how this translates into a quantitative model, and then how the same model translates<br />
into a qualitative model. The difference in a quantitative model is that each value<br />
is given a hard, observed value, whereas in a qualitative model models these values are<br />
given by using inequality constraints.<br />
There are threshold inequalities which basically say that θ 1 ,..,θ n must lie between 0<br />
and the maximum possible concentration of protein a (max a ), and equilibrium inequalities<br />
that indicate that some threshold must be below some equilibrium. In the example<br />
of Fig. 3.2 this translates to θ 2 a < κ a<br />
γ a<br />
lower than the target equilibrium κ a<br />
γ a<br />
< max a which means that the threshold must be<br />
because otherwise the observed negative autoregulation<br />
cannot be explained by the model. κ a s − (x a ,θ 2 a ) = 1 means that while protein<br />
concentration x a is below threshold θ 2 a , protein A is synthesized with rate κ a and while<br />
it is above this threshold it is synthesized with rate 0.<br />
Martin Wildeman 35
Chapter 3. Molecular Imaging as extra data source for model generation<br />
Fig. 3.2: A: A schematic model of gene regulation translates in piecewise lineair equations (B).<br />
In a quantitative model, the values for κ and θ are known and as such put in the model<br />
as a priori knowledge. C gives the quantitative model of the same situation and the<br />
unknown parameters are optimized along with the gene regulation relations [57].<br />
3.4 Modeling pathways using time series expression data,<br />
using conventional micro-array data<br />
Signaling networks and gene networks are, unlike metabolic networks, not well studied<br />
and the network structures are largely unknown. Therefore it is not possible to use<br />
standard analytical tools from metabolic networks to study gene networks [45]. It is<br />
possible to estimate models of gene regulation though, using statistical approaches. To<br />
determine if molecular imaging is suitable for these statistical approaches, we take a<br />
look into microarray data, to study how statistical model inference is applied in this<br />
field of research. As with molecular imaging it is possible to obtain expression data<br />
over time, by taking multiple samples of a culture, or samples of tissue over time. Time<br />
series experiments are most feasible when studying single cell organisms such as yeast<br />
or bacteria while changing the conditions over time.<br />
Bayesian Networks<br />
A way of analyzing this microarray data is by making use of Bayesian Networks to<br />
model regulatory effects of genes on each other. A basic example of a Bayesian Network<br />
is shown in Fig. 3.3. With Bayesian Networks, genes that are co-regulated can<br />
be associated to each other with a certain probability. For instance, given that gene A<br />
is upregulated, gene B has an 95% chance of also being upregulated (see Fig. 3.3). It<br />
is not possible though to model regulation effects over time, or to model a regulatory<br />
36 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
Fig. 3.3: Example of a Bayesian Network. Left side is a network with only observable data.<br />
Right contains hidden nodes that are estimated to obtain observed data [58].<br />
Fig. 3.4: A DBN can model feedback loops, by introducing a time component.<br />
pathway with standard Bayesian Networks. Due to the acyclic constraint of Bayesian<br />
Networks, it is not possible to model autoregulation and feedback loops.<br />
In Bayesian Networks, prior knowledge can be incorporated. If for instance gene A<br />
and gene B are located on the same operon (in prokaryotes), they will automatically be<br />
expressed at the same time and co-regulation is not due to a regulatory effect between<br />
gene A and B, but by a common, invisible, e.g. non measured parent (see Fig. 3.3, right<br />
part).<br />
Dynamic Bayesian Networks<br />
Unlike BNs, Dynamic Bayesian Networks, also called Temporal Bayesian Networks,<br />
are able to model dynamic systems and also feedback mechanisms [59]. Ong et al. use<br />
a Dynamic Bayesian Network for pathway modeling because a DBN is able to handle<br />
prior knowledge, hidden variables, time series data and stochasticity [58]. A DBN is<br />
in fact a BN, but the nodes in a DBN are pointing to an ‘object’ at a given time point.<br />
An object thus can occur multiple times in a DBN (Fig. 3.4).<br />
With these DBN’s, by using an expectation maximization algorithm, a most likely<br />
regulatory pathway can be estimated.<br />
Martin Wildeman 37
Chapter 3. Molecular Imaging as extra data source for model generation<br />
A Bayesian approach for top down modeling is feasible and suitable, because intracellular<br />
networks tend to be sparse and scale free [45]. In [58] the authors had a small<br />
amount of data points available, but they were still able to reconstruct the biological<br />
mechanism by incorporating prior knowledge into the model. With WT time series<br />
expression data, the set of genes that function in a system and the order in time of their<br />
expression can be determined. For the study of gene regulatory networks individual<br />
knockout experiments are needed [60].<br />
Data quality<br />
When using micro array experiment for obtaining time expression data, it is difficult<br />
to obtain a continuous representation of gene expression profiles. This is due to background<br />
noise, missing data points, unsynchronized cell cycles, different phases and<br />
amplitudes of expression and difference in cycle lengths, which in turn might cause<br />
aliasing of signals if the signal is undersampled. Clustering of expression data also becomes<br />
difficult, due to the sparsity of data. Finding correlation in an experiment with<br />
10 time samples is not a trivial task, especially when interpreting causality (e.g. high<br />
correlation, but time shifted).<br />
Data amount<br />
While with microarray data each sample taken costs about $300 [61], with bioluminescence<br />
an extra snapshot would be virtually free of extra costs. Oversampling therefore<br />
is not expensive which is an important advantage, especially when you take the curse<br />
of dimensionality into account, which states that the more dimensions you have, the<br />
more data points you need. With a microarray containing say a thousand gene probes,<br />
a dozen of samples is not much to work with. For robust classification in general a<br />
sample per feature ratio of 5-10 is needed [62]. When looking at BLI in a steady state<br />
process, additional snapshots generate data points that are not completely independent,<br />
because they are of the same source and process and thus no extra information of the<br />
studied process is gained, but at least the measurements will be more reliable with<br />
more samples, because random noise is averaged out. Concluding these arguments;<br />
when looking at time series expression data, an in vivo mouse model would be very<br />
suitable to obtain data.<br />
Another problem that exists with the sparsity of available data sets is, that once classifiers<br />
or models are built, there is no way to determine whether they are really robust or<br />
correct, because there are simply not enough available datasets to test its robustness.<br />
Pathway selection<br />
Microarray data can be used for the search to a high level model. Using Bayesian<br />
inference it is possible to construct a most likely model that best fits the data and by<br />
making perturbations to the network, dependencies can be further modeled. Many<br />
times, especially when a lot of genes are involved in the studied network, a lot of<br />
possible solutions are possible that all give about the same fit to the data. It is possible<br />
to select the top scoring pathway as the correct one, but there is no way to be certain<br />
38 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
whether this pathway is actually the correct one or not. The only method to gain more<br />
certainty, is to make use of extra data, by doing additional experiments.<br />
If ambiguous pathways are found, the most discriminating genes between those pathways<br />
can be selected for additional knockout experiments [63] (See Fig. 3.5). The<br />
‘most discriminant’ genes can be found in different ways. In [63] mutual information<br />
is used, but also random selection, or hub-based selection can be used. Mutual selection<br />
selects the hypothesized knock-out experiment that, given the estimated model, is<br />
expected to cause the maximal information gain (i.e. reduction in ambiguity). By first<br />
designing experiments with these high scoring genes, a fast decrease in ambiguous<br />
pathways is observed. A problem with single knock-out experiments is, that multiple<br />
genes that independently regulate another gene (multiple inbound interactions) are<br />
not detected in these experiments. Multiple-gene knock-out experiments are therefore<br />
needed, to obtain a fully unambiguous regulatory pathway.<br />
With in vivo imaging, once a discriminant gene is found, a knockout model could<br />
easily be created with use of the Flp-In system of Invitrogen. With this method, genes<br />
of interest can be overexpressed or silenced, using Flp recombinase. By using the Flp-<br />
In technique it is certain that only one insertion is done in the genome and that this<br />
insertion is done at a non functional but actively transcribed part of DNA. For example<br />
pathways can be knocked down, by eliminating a certain key gene, to study redundancy<br />
in this pathway functionality or kinetics can be studied by regulating certain network<br />
components [64].<br />
Model Validation<br />
In their paper on model testing, de Jong et al. [65] state that it is infeasible to manually<br />
check the validity of a large (inferred) network model, due to the complexity of the<br />
model and the large amount of free parameters. The only way to check the validity of<br />
a network is by making use of even more data and check how well the model behaves<br />
compared to the observed data. This implicates that high-throughput measurements<br />
are needed for network validation, which immediately raises questions on feasibility of<br />
studies with whole body molecular imaging.<br />
3.5 Discussion<br />
3.5.1 General<br />
In most if not all cases of spatial gene expression measurements, no model inference<br />
is done yet, but databases with spatiotemporal gene expression data have been made<br />
available, which in turn should open the door for network inference. If registration<br />
problems can be solved and spatial gene expression over time can be accurately be<br />
registered, then there is no reason why network model inference cannot be done. This<br />
doesn’t mean it will be an easy or straightforward task as will be discussed in this<br />
section.<br />
With 3D gene expression atlases, such as genepaint.org, it is possible to obtain gene<br />
expression data of in situ hybridization. Genepaint.org only contains a time snapshot of<br />
the developing mouse embryo (E14.5) [48]. It is therefore not possible to directly infer<br />
Martin Wildeman 39
Chapter 3. Molecular Imaging as extra data source for model generation<br />
Fig. 3.5: By running top-priority scoring genes knock out experiments, the actual network can<br />
be found [63].<br />
a regulatory network from the data, but it is possible to cluster data and thereby to create<br />
groups of genes that have a high possibility of being part of the same network module,<br />
because they share the same spatial expression profile during the developmental stage<br />
of the embryo. A big problem concerning this approach is that genes that are silenced<br />
by some gene, and thus directly regulated by that gene, are not clustered to that gene,<br />
because the spatial expression profiles do not match. With temporal observations, the<br />
chance of clustering these negative feedback regulations is bigger, because it is possible<br />
to make use of mixed correlation. In the paper of Visel, only co-expressed genes are<br />
marked as candidates for a perturbation study. A WT and a Pax6 deficient mouse strain<br />
are studied at time point E15.5 and the expression profiles of the genes of interest (i.e.<br />
the genes that had the same spatial expression profile at stage 14.5) are studied and<br />
compared to E14.5 and each other. If expression between Pax6 deficient and WT mice<br />
is different, then these genes are directly or indirectly regulated by Pax6. Of course this<br />
is true, but it should be noted that it will be very difficult to obtain a gene regulatory<br />
network if all negative feedback loops are left out of scope by using this approach.<br />
The EMAP database does contain temporal information on mouse embryo development<br />
and therefore is preferable to use for gene network inferring. A module called<br />
emage, contains gene expression data that is mapped to an anatomical mouse atlas.<br />
Also a text based gene expression database (GXD) is available, which contains the annotation<br />
information. Note that this latter information is qualitative. It mentions the<br />
organs where expression is observed, not the coordinates inside of the mouse atlas.<br />
Emage is also accessible through a programmers SOAP WSDL interface which allows<br />
for data mining [66].<br />
3.5.2 Creating models for whole body imaging data<br />
Because the feasible obtainable resolution in small animals is not as high as for example<br />
in Droshophila, the describing detail of the model will automatically also be of a lower<br />
resolution when using small animal models. And with a lower resolution of the model,<br />
40 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
it has less explaining power and results obtained from the model are not necessarily<br />
biologically meaningful.<br />
Although whole-body imaging does not allow for a large quantitative model easily, it<br />
does generate new information, because a 3D reconstruction of gene expression location<br />
gives a lot more information than one dimensional microarray data alone. Microarrays<br />
allow for many gene expression levels to be probed, and thus large network<br />
inferring, where whole body imaging only allow for a few expression profiles at a time.<br />
Keep in mind that for each gene visualization, a gene modification in the organism is<br />
needed.<br />
The power of small animal in vivo imaging is that processes can be followed in time.<br />
More samples are needed to reduce the degrees of freedom of the network that is modeled.<br />
With current techniques it is possible to visualize multiple gene expression profiles<br />
in the same animal by using multiple fluorescent proteins with different esmission<br />
spectra. DB Living Colors TM fluorescent proteins are an example of fluorescent<br />
proteins that are suitable for this [67]. New attenuation problems arise when different<br />
wavelength fluorophores are used, but given that these are solvable, around 5 to<br />
6 different probes can be measured simultaneously. Despite of high spectral overlaps<br />
in the different fluorophores, it is still possible to separate different reporters by using<br />
multispectral imaging and multiplexing [68]. Caution should be taken when using<br />
multiple fluorophores at the same time, as not all fluorophores can be detected with the<br />
same sensitivity which would falsely suggest that the more sensitive fluorophores are<br />
expressed earlier (because they are detectable earlier), than the less sensitive ones [69].<br />
The possibility of multiple gene taggings and thus the ability to visualize them, in<br />
combination with alignment of distinct measurements to an altas, using registration<br />
techniques also allow for the possibility to use a network inferring algorithm that is<br />
similar to that of Reinitz and Krul [40, 50] in small animal whole body molecular<br />
imaging.<br />
The model would need some changes to overcome the scaling problems observed in<br />
molecular imaging. Equations 3.1 and 3.2 will be discussed including some caution<br />
warnings and changes that are needed to be able to apply it to whole body imaging.<br />
Since we will not be able to see gene expression at a cellular resolution, we need to<br />
define something else as a cell. The most logical solution would be to define a voxel in<br />
the 3D image as a ‘cell’. g i j in equation 3.1 would then not point to cell i, but to voxel<br />
i. N c would then be the number of voxels inside the animal body. This immediately<br />
raises a problem, the correspondence problem. The voxels have to be numbered in<br />
such a way that with each registration, each voxel is numbered in exactly the same<br />
way. This also raises the need that the model embodies the same amount of voxels for<br />
each measurement. These problems can be overcome by discretizing a mouse atlas, to<br />
which we were already registering, into a fixed amount of voxels. The measurements<br />
that are then registered onto the atlas can be interpolated, so that each voxel gets an<br />
averaged out value.<br />
Concentration model<br />
Equation 3.2 was used to model the diffusion coefficient of proteins that can cross the<br />
cell barrier. These extracellular proteins can have a signaling function, where intracellular<br />
proteins that cannot cross the cell membrane will not have this signaling function.<br />
Martin Wildeman 41
Chapter 3. Molecular Imaging as extra data source for model generation<br />
The paracrine proteins, as the diffusing proteins are called, are likely to have a smoother<br />
distribution then the proteins that stay inside the cells. The paracrine signaling accounts<br />
for signaling to cells in close proximity of each other and paracrine signaling there is<br />
likely to cause the formation and survival of differentiated cell clusters.<br />
When looking at whole body models though, endocrine signals also should be taken<br />
into account. The endocrine signals are produced in the endocrine glands and commonly<br />
consists of hormones. The activation of receptors and glands can be visualized<br />
by using multiple fluorophores [69, 4, 68], but no literature of direct in vivo visualization<br />
of endocrine signaling molecules has been found and it can be doubted if reporter<br />
genes can be used to visualize the synthesis of hormones, because they are very small<br />
molecules, compared to the reporter genes. Hormone levels can be measured directly<br />
though, because they are present in the blood as endocrine signaling compounds, but it<br />
can be doubted if their concentrations will be homogeneous.<br />
It might however also be possible to incorporate endocrine signaling into the model<br />
as unknown/invisible regulation factors, without measuring them. The difference with<br />
paracrine signaling is, that the molecules can pass the endothelial barrier, so that they<br />
can travel through the blood circulatory system.<br />
In the model this will translate into a third equation, that is comparable to equation 3.2.<br />
The organs most likely will act as cells and the bloodstream will act as the extracellular<br />
region. The diffusion through the bloodstream will be faster than in the extracellular<br />
region but the rest of the equation will remain the same.<br />
With endocrine signaling incorporated into the model, the steep protein concentration<br />
gradients that are most likely to be observed at the boundaries of organs, or more<br />
generic, the boundaries between clusters of different cell types, can be explained. The<br />
equation for endocrine signaling will in the form of:<br />
δb j (x,t)<br />
δt<br />
= D2 j ∇ 2 b j (x,t) − λ j b j (x,t) (3.9)<br />
Where b j (x,t) is the concentration of gene j (or compound j, because it is most likely a<br />
hormone) and D2 j is the diffusion coefficient in the bloodstream. λ is still the degradation<br />
component.<br />
Then the difference of solubility of proteins in different cell types might also needed<br />
to be taken into account, but it might also be neglectable because both are watery environments.<br />
Also endocrine molecules are secreted directly into the bloodstream which<br />
makes it difficult to make a restriction between the concentration in the bloodstream<br />
and the secreting cell. The equation probably will be of the form:<br />
g i j (t) = b j (x i ,t) (3.10)<br />
Where g i j (t) is the concentration of gene j in voxel i at time t and b j (x i ,t) is the concentration<br />
of gene j at the location of voxel i at time t.<br />
Location model<br />
When we are able to relate different expression profiles to different organs, we would<br />
gain extra insight into functionality of the proteins. This is not necessary for the model<br />
42 Martin Wildeman
Chapter 3. Molecular Imaging as extra data source for model generation<br />
to work though.<br />
It may also be needed to know the direction of bloodstream near endocrine glands, to<br />
correctly predict the concentration gradients of the endocrine signals. With ultrasound<br />
it is possible, by using High frequency Doppler flow mapping, to determine parameters<br />
as blood velocity, blood flow and blood volume [70].<br />
Again, if endocrine signaling is modeled as invisible or free parameter, then this extra<br />
data is not needed and the endocrine signaling can be seen as a way to explain steep<br />
concentration gradients in spatial expression profiles, but strong temporal relationships<br />
in seemingly spatially non connected regions, i.e. it explains how a gene can be expressed<br />
in for example the liver and the kidneys, but not in between.<br />
For a full understanding of spatial and temporal regulation, it is necessary to register<br />
anatomical data to the gene expression data. In that way steep, concentration gradients<br />
can be explained by, for example, a boundary of an organ.<br />
It should be kept in mind though that steep gradients in protein concentration can also<br />
be caused by paracrine signaling, as can be seen with the eve stripe formation. Coregulation<br />
in non continuous space though cannot be explained by paracrine signaling<br />
alone.<br />
Martin Wildeman 43
CHAPTER 4<br />
Molecular Imaging as a means for hypothesis testing<br />
Molecular Imaging has potential to generate data for regulatory network model inferring.<br />
As was shown in Chapter 3 is has some major limitations though, such as the lack<br />
of high throughput possibilities, direct protein measurements and direct expression detection<br />
(need for reconstruction), but it does generate some new information that is<br />
not available with current techniques. The most important new aspect is probably the<br />
possibility to study processes over time.<br />
This new aspect in the data is not only useful for model inference. It also enables researchers<br />
to study (morphologic) processes over time. Although molecular imaging<br />
techniques such as BLI, FMI and PET lack high contrasts, they are much more sensitive<br />
and specific then their clinical counterparts, and thus processes that could not be<br />
detected with other techniques can now be visualized and studied.<br />
If researchers can see and study processes over time, that enables them to test new<br />
or existing hypotheses. Two possible fields of study emerge from molecular imaging,<br />
being gene tracking and cell tracking. The differences will be explained below.<br />
4.1 Gene Tracking<br />
With reporter genes, different processes can be visualized. The effect of repressors<br />
and enhancers can be studied, predicted pathways can be validated by knocking out<br />
or upregulating gene expression, given that it is not lethal. Also gene activity during<br />
events in the body can be measured, in for example growth, degradation, apoptosis,<br />
circadian cycle, etc. All these processes can be studied using techniques as discussed<br />
in Chapter 2. Examples found in literature are the inhibition of the Cdk2 gene [71],<br />
transcriptional regulation of the CYP3A4 gene [72], visualization of active estrogen<br />
receptors [73] and responses to bacterial and viral infections [26].<br />
45
Chapter 4. Molecular Imaging as a means for hypothesis testing<br />
Currently there are mainly qualitative visual inspections done on these processes. It is<br />
possible though to create statistical tests to determine gene expression levels. In the<br />
study on the CYP3A4 the authors used a post hoc t-test to compare between mean<br />
expression differences in time in one group, and multivariate analysis of variance<br />
(MANOVA) tests to compare control groups with injected groups for different injections<br />
and the difference between male and female mice [72].<br />
When combining a two-dimensional BLI/FMI image with a three-dimensional anatomical<br />
atlas it would also be possible to attach qualitative expression tags to the BLI image,<br />
in terms of location of expression. When looking at the combination of the 2D<br />
image and the registered 3D anatomical atlas, statements like: The chance of this gene<br />
being expressed in the liver is 50%, in the stomach 30% and in the kidneys 20%.<br />
Some genes are expected to have a function in the development of organs. For example,<br />
gene expression is expected to be visible before formation of an organ. To test if this<br />
expression is significantly more located at the location of the organ formation, one<br />
must first be able to indicate where the organ is formed. This can be done by making<br />
an analysis over time and registering the gene expression to another modality where<br />
the morphological formation of the organ can be detected. If the location of the organ<br />
formation is known, and the genes of interest are expected to be functional for the<br />
formation of that organ, then it is expected that those specific genes are expressed at<br />
higher levels at these locations than in other locations.<br />
4.2 Cell Tracking<br />
When no transgenic animals are used for the research, molecular imaging can still be<br />
useful. It is possible to generate xenografts that are detectible by molecular imaging<br />
techniques. The most commonly used are luc and GFP reporter genes. Examples<br />
of cells that can be tracked are labeled bacteria and viruses to determine their pathogenecity.<br />
Also the effectiveness of antibiotic therapies can be studied this way [4].<br />
A lot of work is done on cell tracking of cancer cells. Cell lines with an ‘always on’ luc<br />
reporter gene are constructed and these are injected into model organisms. The Flp-in<br />
system can be used to easily knock out or upregulate specific genes in a (tumor)cell<br />
that can afterwards be measured by using an ‘always on’ Luc gene.<br />
It should be noted that the proliferation and location of tumor cells can be followed and<br />
what is seen is not the gene regulation, but the amount and location of active (living)<br />
tumor cells, or other studied xenografts for that matter. When comparing differences of<br />
tumor growth in follow-up studies, a t-test could be used, to look for statistical relevant<br />
differences in tumor growth.<br />
It should be noted that, although the amount of active reporter enzymes will be roughly<br />
the same for each tumor cell, as with all enzymatic reactions, the turnover rate is not<br />
only depending on the enzyme concentration, but also on the amount of substrate (luciferin)<br />
and the reaction temperature. Both these variables may vary in follow-up studies.<br />
Also diffusion speed of substrate through the body is dependent on temperature<br />
profiles. All measurement techniques were substrates are involved, will suffer from<br />
these dependencies in terms of accurate quantification. FLI is likely to be less sensitive<br />
to changes in environment.<br />
46 Martin Wildeman
Chapter 4. Molecular Imaging as a means for hypothesis testing<br />
Fig. 4.1: Two datasets of the same gaussian distribution were obtained. One of 100 and one<br />
of 100,000 samples. Then two kernel density estimations (Normal kernel, width 0.2)<br />
were plotted on the dataset. Clearly the estimation made with 100,000 data points<br />
resembles the gaussian distribution better than the dataset with 100 samples.<br />
4.3 General signal detection and limitations<br />
To be able make any statements about a studied signal, a first step is to determine if any<br />
signal of interest is present at all, or that the signal is only consisting of noise. To be<br />
able to draw such conclusions, the characteristics of noise have to be determined and<br />
tests have to be created to see whether there is any signal present that is unlikely to be<br />
caused by noise alone.<br />
If such a test is created, it would be possible to set some threshold on a p-value, which<br />
can be seen as a term for likelihood, for which a image below some p-value threshold<br />
can be labeled as ‘signal found’. I.e. when the p-value is low, the chance of the<br />
observation being generated under a null hypothesis, i.e. no signal is observed, is so<br />
small, that it is likely that a signal is present and thus a significant signal is detected. A<br />
common p-value threshold used in scientific research is 0.05.<br />
To determine whether a signal is significant, or whether it is significantly located in<br />
space somewhere, a null hypothesis has to be constructed and rejected. A dataset of n<br />
elements can be seen as n random samples from a probability density function.<br />
Model estimation<br />
Thus, to be able to say something about significance, an observation has to be tested<br />
against some null distribution, but before that is possible, that null distribution has to<br />
be estimated.<br />
To do this, regression to some data has to be applied. The more data points are available<br />
from the distribution to test against (the null hypothesis), the more accurate the<br />
estimation of this null distribution will be (See Fig. 4.1) [74].<br />
There can be made a distinction between an empirical estimation, a parametric estimation<br />
and semi parametric estimation. The first one does not assume any information to<br />
be known about the model and non parametric estimation such as kernel smoothing or<br />
K Nearest Neighbor algorithms can be used to ‘reconstruct’ the model from which the<br />
samples were drawn.<br />
The second one assumes full knowledge about the model, such as a normal or a Poisson<br />
distribution. The only thing that has to be estimated then are the parameters of the<br />
Martin Wildeman 47
Chapter 4. Molecular Imaging as a means for hypothesis testing<br />
Fig. 4.2: 1. Only noise, 2. Only expression in tissue, 3. Only expression in/on bone 4. Overall<br />
expression or more noise<br />
distribution. If a correct distribution form is chosen, then this method will give smooth<br />
and well fitted distributions.<br />
The last model is a mixture of parametric and non parametric estimators. A mixture of<br />
Gaussians is a good example.<br />
Model testing<br />
The important question for each test of significance will be, against which null distribution<br />
the test will be applied. In other words, what distribution has to be rejected in<br />
order to accept the alternative hypothesis which states that the dataset is not generated<br />
by the probability function of the null hypothesis<br />
If the significance of a signal can be determined and a significant signal of p
Chapter 4. Molecular Imaging as a means for hypothesis testing<br />
null hypothesis would hold, and thus the observation could be generated by noise and<br />
thus no significant signal would be found.<br />
It can also occur that expression occurs only in A, when the test is designed for B (2).<br />
If inaccuracies in the measurements are present, then noise at the borders of B will be<br />
higher than normal noise and the test could falsely suggest that the expression measured<br />
in B is not caused by noise, and that thus expression is occurring in B. The statement<br />
that this observation is not caused by noise is indeed correct, but the alternative hypothesis<br />
that expression is thus caused by B is visually easily falsified. Another, more<br />
robust hypothesis is thus needed. This shows the complexity of statistical testing. Not<br />
only is it necessary to carefully select the null hypothesis, the alternative hypothesis<br />
needs to be correct as well.<br />
The last possibility is that expression is seen both in A and B (4). Here a new difficulty<br />
appears, because it could mean that somehow the sample is very noisy, but it could also<br />
well be that indeed overall expression is observed.<br />
4.4 Discussion<br />
For detecting signals in acquired images of gene expression, many times the methods<br />
found in literature for detecting regions of interest are by means of a qualitative, subjective,<br />
visual selection. Quantification is done by counting the number of illuminated<br />
pixels, that have a value above a certain threshold and by translating this to the number<br />
of measured photons, or photons per second [75, 76]. For automatic processing and<br />
high throughput analysis it is needed that these regions of interest are found automatically<br />
if present.<br />
Also important is to calculate the probabilities for different qualitative location information<br />
tags, which has the following meaning; Given a segmentation and expression<br />
at location x,y,z, the probability that expression is located in this organ is x %. Manual<br />
analysis would not be able to provide such objective probability estimations.<br />
Important to keep in mind, is that much data is needed to estimate probability distribution<br />
functions. When studying gene expression in 2D, at lot of samples are needed for<br />
reliable density estimations. For the estimation of noise distribution this is probably<br />
still feasible, but when estimating a reliable model for gene expression it gets complicated<br />
and one mouse as data source simply doesn’t suffice. In [77] it is stated that for a<br />
two dimensional non parametric density estimation of a normal distribution with a relative<br />
MSE of less than 0.1 using normal kernels for the estimation, at least 19 samples<br />
are needed. For three dimensions, already 67 samples are needed.<br />
It is also important to notice that it will not always be a trivial task to register segmented<br />
data (in the form of an atlas) to measured BLI, FMI, PET or SPECT data.<br />
Commonly seen is that with these modalities only two dimensional planar images are<br />
available onto which the 3D BLI, FMI, PET or SPECT data acquisition is calibrated.<br />
The only information that is available in these cases for registration are the two dimensional<br />
surface pictures of the organism to register the 3D atlas. This 2D/3D sparse data<br />
registration needs to be solved, before segmentation of the BLI, FMI, etc. data can be<br />
accomplished, let alone the statistical tests be designed and applied.<br />
Martin Wildeman 49
CHAPTER 5<br />
Discussion<br />
In this chapter a global discussion is presented on the topics covered in this literature<br />
study. New aspects that are introduced by MI and that are unique in bioinformatics<br />
will be highlighted and global issues that are limiting the feasibility of application in<br />
bioinformatics will be summarized, including challenges that must be solved and the<br />
expertise that is needed to do so.<br />
Before that is possible, it should be noted that visualization of gene expression itself<br />
can already be seen as bioinformatics. The definition of bioinformatics in this paper is<br />
therefore restricted to the field computational biology.<br />
5.1 Advantages of MI for the field of bioinformatics<br />
As stated several times in this paper, the most important advantage of MI over existing<br />
data sources in bioinformatics is the possibility of follow-up studies in the same animal,<br />
due to the non invasive nature of MI. In all known other techniques animals have to be<br />
sacrificed in order to obtain spatial and or temporal gene expression profiles by using<br />
sectioning techniques and extraction techniques respectively. Another advantage is that<br />
spatial and temporal information are obtained simultaneously.<br />
Another advantage, as with cryosectioning and in situ hybridization, is the high sensitivity<br />
to local gene expression, compared to micro arrays in which RNA concentrations<br />
are averaged out in an extraction sample.<br />
51
Chapter 5. Discussion<br />
5.2 Current Issues and Challenges<br />
Image Processing<br />
In molecular imaging, digital image processing is a very important aspect. Thresholding,<br />
backprojection, registration of multiple modalities on each other and registration<br />
of modalities onto an atlas, are all examples of image processing techniques. Though<br />
in theory it is possible to do spatial registration on different modalities, by applying<br />
some optimization function, it will not always be straightforward on how to formulate<br />
these optimization functions.<br />
New gene expression measurements, two or three dimensional, need to be aligned to<br />
MRI or CT data, which are also in two or three dimensional format. Also 2D optical<br />
surface images that are directly related (in space) to BLI, FMI, PET or SPECT, in the<br />
form of for example structured light, need to be registered to a 3D atlas.<br />
Registration is needed, to be able to relate spatial expression to segmented models, and<br />
thus to obtain qualitative knowledge on spatial expression. The segmentation information<br />
will be available in an atlas, and once registration of gene expression to an atlas<br />
is successful, the corresponding segmentation information can be related to the spatial<br />
gene expression information.<br />
All these problems lie in the field of image processing and new modality specific optimization<br />
algorithms need to be constructed. In principle the data to do that is available,<br />
so in time these problems will be solved.<br />
Undefined sources<br />
When interpreting gene expression data with small animal whole body optical imaging,<br />
the major challenge is that registration on some sort of atlas is needed before an estimation<br />
can be made on the qualitative spatial expression levels of the measured genes.<br />
Combined with the fact that RNA expression levels are measured indirectly by the use<br />
of reporter genes, in comparison to direct measurements by micro array probes, and<br />
the fact that post translational effects are not detectable with MI, many assumptions<br />
on gene expression are needed when using molecular imaging as data source. This<br />
could or could not influence data analysis and this uncertainty makes the use of optical<br />
imaging as source difficult.<br />
Radionuclide imaging gives similar problems. Here the main problems would be that<br />
reporter genes need to be expressed at cell surfaces to be able to detect radioactive<br />
compounds, or reporter enzymes are needed to ‘trap’ radioactive compounds inside the<br />
cells, with possible toxic effects and disturbed biological processes as a result.<br />
The problems in MI concerning gene expression are thus not the technical challenges<br />
of reconstructing the source of emission of photons, which can be calculated for every<br />
modality to some resolution, but the biological meaning of what is actually measured<br />
(See Fig. 5.1).<br />
What is needed for MI to overcome this problem is the development of contrast agents<br />
that are directly correlated to the expression levels of the gene of interest. Most likely<br />
this must be some sort of fusion protein, because the only way to be certain that a<br />
protein is expressed is to be able to detect it directly. Also this is the only accurate<br />
52 Martin Wildeman
Chapter 5. Discussion<br />
Fig. 5.1: By using gene reporters as gene expression source, many parameters remain unknown,<br />
with unreliable expression estimations as a result<br />
Martin Wildeman 53
Chapter 5. Discussion<br />
possibility to determine protein concentrations in vivo, because otherwise differences in<br />
diffusion will prevent accurate concentration measurements. Solutions to this problem<br />
will probably come from the field of pharmaceutical development by newly developed<br />
probes and from the field of life sciences [16].<br />
Statistical Approaches<br />
In cases where it is known what the expression levels of reporter genes mean, such<br />
as with cell tracking or fusion protein detection, in for example gene therapy [16],<br />
there is a need for high(er) throughput measurements to be able to construct reliable<br />
density models for obtaining reliable prior probability distributions. Without enough<br />
data samples, only statistical statements on difference in expression can be made in<br />
follow-up studies in the same animal model, with the use of t-tests, but even then<br />
multiple measurements are needed, to at least get an indication of means and variances<br />
in different time points.<br />
To be able to generate more data, an efficient and reliable way of gene expression is<br />
needed. It can be seen in the paper of Dupuy et al. about high throughput analysis on<br />
C. Elegans, that gene transfer efficiency was responsible for a too low signal in 36%<br />
of the total samples obtained. The Flp-In technique will enable efficient gene transfer<br />
techniques. New developments will probably come from high throughput screening of<br />
cell lines. More difficult will be to obtain similar results for more complex organisms<br />
because high throughput screening is less feasible for those organisms and long term<br />
effects are more difficult to spot because full development of the organisms are needed<br />
before side effects can be seen.<br />
Not only is an efficient gene transfer system necessary, also fully automatic registration<br />
is needed for high throughput segmentation. For these problems to be solved, work<br />
has to be done in the fields of genetics and image processing for data generation and<br />
processing.<br />
If and when enough data is available, statistical tests will have to be designed, to obtain<br />
new (statistical) information on developments in studied processes. For different<br />
studies, different tests will have to be developed.<br />
5.3 Conclusion<br />
The field of molecular imaging comprises some very powerful techniques to visualize<br />
gene expression of certain genes. Unfortunately some criteria needed for the use of<br />
bioinformatics are not met. The most important criterion that is not met is that it is not<br />
yet feasible to do high throughput measurements for whole body imaging. The main<br />
reason for this is, that unlike with micro arrays, only a few (up to 5 with FMI) genes<br />
per animal can be measured at a time with MI. This is because for each promoter a<br />
unique reporter gene will be needed to specifically visualize the corresponding gene of<br />
interest. To generate mice to obtain expression levels in the same amount as with micro<br />
arrays would be time consuming.<br />
Also some registration problems need to be solved before data from molecular imaging<br />
can be used for bioinformatics. Once registration, segmentation and high throughput<br />
54 Martin Wildeman
Chapter 5. Discussion<br />
measurements are technically feasible or solved, molecular imaging could prove to be<br />
a valuable addition to the existing data modalities in bioinformatics.<br />
The fact that only indirect measurements of protein expressions are obtained, does<br />
not necessarily mean that the data cannot be used. Regulation networks can still be<br />
obtained from the 3D+t gene expression data, but it should not be forgotten that measurements<br />
are indirect and thus expression data could be incorrect.<br />
Molecular imaging does provide a new way to observe biological processes in vivo that<br />
were not available for study without the existence of molecular imaging. For instance<br />
so called ‘biomarkers’ that are used and searched for in bioinformatics can (indirectly)<br />
be visualized in MI, by using reporter genes or specific antibody contrast agent fusions,<br />
so that not only can be determined if a disease is present, but also where it is located.<br />
In other words, micro arrays can be used to search for genes of interest and once found<br />
the ‘behavior’ of those genes can be studied with MI techniques.<br />
Also the behavior of for example cancer cells after genetic alteration can be studied,<br />
which opens new possibilities for research on gene therapy in cancer treatment.<br />
To put it bold and shortly. The field of bioinformatics in the form of computational<br />
biology and the field of molecular imaging in the form of whole body imaging are<br />
not yet ready for each other, but if the discussed technical challenges are solved, their<br />
combination holds great potential.<br />
Martin Wildeman 55
Bibliography<br />
[1] Michael Huerta, Michael Huerta, Yuan Liu, Gregory Downing, and Belinda Seto. Nih working definition<br />
of bioinformatics and nih working definition of bioinformatics and computational biology, july<br />
2000.<br />
[2] R. Weissleder and U. Mahmood. Molecular imaging. Radiology, 219(2):316–333, 2001.<br />
[3] H. Peng, F. Long, J. Zhou, G. Leung, M.B. Eisen, and E.W. Myers. Automatic image analysis for gene<br />
expression paterns of fly embryos. BMC Cell Biology, 8, July 2007.<br />
[4] D.K. Welsh and S.A. Kay. Bioluminescence imaging in living organisms. Current Opinion in Biotechnology,<br />
16:73–78, 2005.<br />
[5] H. Alfke, H. Stöppler, F. Nocken, J.T. Heverhagen, B. Kleb, F. Czubayko, and K.J. Klose. In vitro mr<br />
imaging of regulated gene expression. Radiology, 228:448–492, 2003.<br />
[6] T. Mistelli and D.L Spector. Applications of the green fluorescent protein in cell biology and biotechnology.<br />
Nature Biotechnology, 15:961–964, 1997.<br />
[7] S.B. Primrose, R.M. Twyman, and R.W. Old. Principles of Gene Manipulation. Blackwell Sciences, 6<br />
edition, 2001.<br />
[8] A. Schedl, Z. Larin, L. Montoliu, E. Thies, G. Kelsey, H. Lehrach, and S. SchuLtz. A method for the<br />
generation of yac transgenic mice by pronuclear microinjection. Nucleic Acids Research, 21(20):4783<br />
–4787, 1993.<br />
[9] The BSE Inquiry. Bse inquiry report, volume 2 science.<br />
[10] P.J. Mogayzel and M.A. Ashlock. Cftr intron 1 increases luciferase expression driven by cftr 5-flanking<br />
dna in a yeast artificial chromosome. Genomics, 64(2):211–215, March 2000.<br />
[11] S.A. Shabalina and A. Spiridonov. The mammalian transcriptome and the function of non-coding dna<br />
sequences. Genome Biology, 5, 2004.<br />
[12] N.V. Henriquez, P.G.M. Overveld, I. Que, J.T. Buijs, R. Bachelier, E.L. Kaijzel, C.W.G.M. Löwik,<br />
P. Clezardin, and G. van der Pluijm. Advances in optical imaging and noval model systems for cancer<br />
metastatis research. Clinical and Experimental Metastasis, 2007.<br />
[13] Irene C Notting, Jeroen T Buijs, Ivo Que, Ratna E Mintardjo, Geertje van der Horst, Marcel Karperien,<br />
Guy S O A Missotten, Martine J Jager, Nicoline E Schalij-Delfos, Jan E E Keunen, and Gabri van der<br />
Pluijm. Whole-body bioluminescent imaging of human uveal melanoma in a new mouse model of<br />
local tumor growth and metastasis. Invest Ophthalmol Vis Sci, 46(5):1581–1587, 2005.<br />
[14] Barmak Modrek and Christopher Lee. A genomic view of alternative splicing. Nat Genet, 30(1):13–19,<br />
2002.<br />
[15] Agenor Limon, Jorge Mauricio Reyes-Ruiz, Fabrizio Eusebi, and Ricardo Miledi. Properties of glur3<br />
receptors tagged with gfp at the amino or carboxyl terminus. Proc Natl Acad Sci U S A, 104(39):15526–<br />
15530, 2007.<br />
57
Bibliography<br />
[16] Tarik F. Massoud and Sanjiv S. Gambhir. Molecular imaging in living subjects: seeing fundamental<br />
biological processes in a new light. Genes Dev, 17(5):545–580, 2003.<br />
[17] Y Yu, A J Annala, J R Barrio, T Toyokuni, N Satyamurthy, M Namavari, S R Cherry, M E Phelps,<br />
H R Herschman, and S S Gambhir. Quantification of target gene expression by imaging reporter gene<br />
expression in living animals. Nat Med, 6(8):933–937, 2000.<br />
[18] Centre for positron emission tomography website. http://www.petnm.unimelb.edu.au/pet/detail/nucphysics.html.<br />
[19] N.I.L.J Bohnen. Toepassingen van pet en spect in de neurologische praktijk. Neurologie, 104(6):339–<br />
346, 2003.<br />
[20] R. Ray, A.M. Wu, and S.S. Gambhir. Optical bioluminescence and positron emission tomography<br />
imaging of a novel fusion reporter gene in tumor xenografts of living mice. Cancer Research, 63:1160–<br />
1165, March 2003.<br />
[21] Vijay Sharma, Gary D Luker, and David Piwnica-Worms. Molecular imaging of gene expression and<br />
protein function in vivo with pet and spect. J Magn Reson Imaging, 16(4):336–351, 2002.<br />
[22] J.P. Hornak. The basics of mri. HTML, 1996-2007.<br />
[23] A Y Louie, M M Huber, E T Ahrens, U Rothbacher, R Moats, R E Jacobs, S E Fraser, and T J<br />
Meade. In vivo visualization of gene expression using magnetic resonance imaging. Nat Biotechnol,<br />
18(3):321–325, 2000.<br />
[24] V. Ntziachrisos, C.H. Tung, C. Bremer, and R. Weissleder. Fluorescence molecular tomography resolves<br />
protease activity in vivo. Nature Medicine, 8(7):757–760, July 2002.<br />
[25] Vasilis Ntziachristos, Jorge Ripoll, Lihong V Wang, and Ralph Weissleder. Looking and listening to<br />
light: the evolution of whole-body photonic imaging. Nat Biotechnol, 23(3):313–320, 2005.<br />
[26] Timothy C Doyle, Stacy M Burns, and Christopher H Contag. In vivo bioluminescence imaging for<br />
integrated studies of infection. Cell Microbiol, 6(4):303–317, 2004.<br />
[27] D. Germain-Desprez, M. Bazinet, M. Bouvier, and M. Aubry. Oligomerization of transcriptional intermdiary<br />
factor 1 regulators and interaction with znf74 nuclear matrix protein tevealed by bioluminescence<br />
resonance energy transfer in living cells. The Journal of Biological Chemistry, 278(25):22367–<br />
22373, June 2003.<br />
[28] K.A. Eidne, K.M. Kroeger, and A.C. Hanyaloglu. Applications of novel resonance energy transfer<br />
techniques to study dynamic hormone receptor interactions in living cells. TRENDSin Endocrinology<br />
& Metabolism, 13(10):415–421, December 2002.<br />
[29] P. van Roessel and A.H. Brand. Imaging into the future: visualizing gene expression and protein<br />
interactions with fluorescent proteins. Nature Cell Biology, 4:E15–E20, 2002.<br />
[30] R. Ray, H Pimenta, R. Paulmurugan, F. Berger, M.E. Phelps, and S.S. Gambhir. Noninvasive quantitative<br />
imaging of protein-protein interactions in living subjects. PNAS, 99(5):3105–3110, March 2002.<br />
[31] C. von Mering, R. Krause, B. Snel, M Cornell, S.G. Oliver, S. Field, and P Bork. Comparative assessment<br />
of large-scale data sets of protein-protein interactions. Nature, 417:399–403, May 2002.<br />
[32] H. D. Liang and M. J. K. Blomley. The role of ultrasound in molecular imaging, 2003. British Journal<br />
of Radiology.<br />
[33] M. Guven, B. Yazici, X. Intes, and B. Chance. Diffuse optical tomography with a priori anatomical<br />
information. Physics in Medicine and Biology, 50:2837–2858, June 2005.<br />
[34] Belma Dogdas, David Stout, Arion F Chatziioannou, and Richard M Leahy. Digimouse: a 3d whole<br />
body mouse atlas from ct and cryosection data. Phys Med Biol, 52(3):577–587, 2007.<br />
[35] W. Cong, G. Wang, D. Kuman, Y. Liu, M. Jiang, L.V. Wang, E.A. Hoffman, G McLennan, P.B. McCray,<br />
J. Zabner, and A. Cong. Practical reconstruction for bioluminescence tomography. Optical Express,<br />
13(18):6756–6771, September 2005.<br />
[36] G. Wang, Y. Li, and M. Jiang. Uniqueness theorems in bioluminescence tomography. Medical Physics,<br />
31(8):2289–2299, July 2004.<br />
[37] G. Wang, H. Shen, Cong W., S. Zhao, and G.W. Wei. Temperature-modulated bioluminescence tomography.<br />
Optics Express, 14(17), August 2006.<br />
[38] A.J. Chaudhari, F. Darvas, J.R. Bading, R.A. Moats, P.S. Conti, D.J. Smith, S.R. Cherry, and R.M.<br />
Leahy. Hyperspectral and multispectral bioluminescence optical tomography for small animal imaging.<br />
Physics in Medicine and Biology, 20:5421–5541, 2005.<br />
58 Martin Wildeman
Bibliography<br />
[39] P. Kok, J. Dijkstra, C.P. Botha, F.H. Post, E. Kaijzel, I. Que, C.W.G.M. Löwik, J.H.C. Reiber, and B.P.F.<br />
Lelieveldt. Integrated visualization of multi-angle bioluminescence imaging and micro ct. Proceedings<br />
of SPIE, 6509, 2007.<br />
[40] J. Reinitz and D.H. Sharp. Mechanism of eve stripe formation. Mechanisms of Development, 49:133–<br />
158, 1995.<br />
[41] M. Baiker, J. Milles, A.M. Vossepoel, I. Que, E.L. Kaijzel, C.W.G.M. Löwik, J.H.C. Reiber, J. Dijkstra,<br />
and B.P.F. Lelieveldt. Fully automated whole-body registration in mice, using an articulated skeleton<br />
atlas. ISBI, 2007.<br />
[42] Albert Burger, Richard A. Baldock, Yiya Yang, Andrew Waterhouse, Derek Houghton, Nick Burton,<br />
and Duncan Davidson. The edinburgh mouse atlas and gene-expression database: A spatio-temporal<br />
database for biological research. In SSDBM ’02: Proceedings of the 14th International Conference<br />
on Scientific and Statistical Database Management, page 239, Washington, DC, USA, 2002. IEEE<br />
Computer Society.<br />
[43] D. Davidson, J. Bard, R. Brune, A. Burger, C. Dubreuil, W. Hill, M. Kaufman, J. Quinn, M. Stark, and<br />
R. Baldock. The mouse atlas and graphical gene-expression database. Cell & Developmental Biology,<br />
8:509–517, 1997.<br />
[44] D.W. Townsend and T. Beyer. A combined petct scanner: the path to true image fusion. The British<br />
Journal of Radiology, 2002.<br />
[45] I.I. Moraru and L. M. Loew. Intracellular signaling: Spatial and temporal control. Physiology, 20:169–<br />
179, 2005.<br />
[46] L. Seroude, T. Brummel, P. Kapahi, and S. Benzer. Spatio-temporal analysis of gene expression during<br />
aging in Drosophila melanogaster. Aging Cell, 1:47–56, 2002.<br />
[47] Flytrap website. http://www.fly-trap.org/flytrap/html/docs/egal4.html, October 2007.<br />
[48] Axel Visel, James Carson, Judit Oldekamp, Marei Warnecke, Vladimira Jakubcakova, Xunlei Zhou,<br />
Chad A Shaw, Gonzalo Alvarez-Bolado, and Gregor Eichele. Regulatory pathway analysis by highthroughput<br />
in situ hybridization. PLoS Genet, 3(10):1867–1883, 2007.<br />
[49] D. Dupuy, N. Bertin, C.A. Hidalgo, K. Venkatesan, D. Tu, D. Lee, J. Rosenberg, N. Svrzikapa,<br />
A. Blanc, A. Carnac, A. Carvunis, R. Pulak, J. Shingles, J. Reece-Hoyes, R. Hunt-Newbury,<br />
R. Viveiros, W.A. Mohler, M. Tasa, F. P. Roth, C. Le Peuch, I.A. Hope, R. Johnsen, D.G. Merman,<br />
A. L. Barbasi, D. Baillie, and M. Vidal. Genome-scale analysis of in vivo spatiotemporal promoter<br />
activity in Caenorhabditis elegans. Nature Biotechnology, 25(6):663–668, June 2007.<br />
[50] T. Krul, J.A. Kaandorp, and J.G. Blom. Modelling developmental regulatory networks. In ICCS 2003,<br />
pages 688–697, 2003.<br />
[51] Kalyanmoy Deb. An introduction to genetic algorithms.<br />
[52] Z. Yang, W. Zhu, and L. Ji. Slit: Designing complexity penalty for classification and regression trees<br />
using the srm orinciple. ISNN, 2006.<br />
[53] C.P Fall, E.S. Marland, J.M. Wagner, and J.J. Tyson. Computational Cell Biology. Springer, 2002.<br />
[54] H. Janssens, J. Hou, S. amd Jaeger, A. Kim, E. Myasnikova, D. Sharp, and J. Reinitz. Quantitative and<br />
predictive model of transcriptional control of the Drosophila Melanogaster even skipped gene. Nature<br />
Genetics, 38(10):1159–1165, 2006.<br />
[55] Yves Fomekong-Nanfack, Jaap A Kaandorp, and Joke Blom. Efficient parameter estimation for<br />
spatio-temporal models of pattern formation: case study of drosophila melanogaster. Bioinformatics,<br />
23(24):3356–3363, 2007.<br />
[56] Hidde de Jong, Johannes Geiselmann, Celine Hernandez, and Michel Page. Genetic network analyzer:<br />
qualitative simulation of genetic regulatory networks. Bioinformatics, 19(3):336–344, 2003.<br />
[57] Hidde de Jong, Jean-Luc Gouze, Celine Hernandez, Michel Page, Tewfik Sari, and Johannes Geiselmann.<br />
Qualitative simulation of genetic regulatory networks using piecewise-linear models. Bull Math<br />
Biol, 66(2):301–340, 2004.<br />
[58] I.M. Ong, J.D. Glasner, and Page.D. Modelling regulatory pathways in E.coli from time series expression<br />
profiles. Bioinformatics, 18(S241-S248), 2002.<br />
[59] Kevin P. Murphy. Dynamic bayesian networks. To appear in Probabilistic Graphical Models, M.<br />
Jordan, November 2002.<br />
[60] Z. Bar-Joseph. Analyzing time series expression data. Bioinformatics, 20(16):2493–2503, 2004.<br />
[61] Affimetrix price sheet, September 2007.<br />
Martin Wildeman 59
Bibliography<br />
[62] R.L. Somorjai, B. Dolenko, and R. Baumgartner. Class prediction and discovery using gene microarray<br />
and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics, 19(12):1484–1491,<br />
2003.<br />
[63] C.H. Yeang, H.C. Mak, S. McCuine, C. Workman, T. Jaakkola, and T. Ideker. Validation and refinement<br />
of gene-regulatory pathways on a network of physical interactions. Genome Biology, 2005.<br />
[64] E.L. Kaijzel, G van der Pluijm, and C.W.G.M. Löwik. Whole-body optical imaging in animal models<br />
to assess cancer development and progression. Clinical Cancer Research, 13(12):3490–3497, June<br />
2007.<br />
[65] Gregory Batt, Delphine Ropers, Hidde de Jong, Johannes Geiselmann, Radu Mateescu, Michel Page,<br />
and Dominique Schneider. Validation of qualitative models of genetic regulatory networks by model<br />
checking: analysis of the nutritional stress response in escherichia coli. Bioinformatics, 21 Suppl<br />
1:i19–28, 2005.<br />
[66] Edinburgh mouse atlas project. http://genex.hgu.mrc.ac.uk/About/intro.html.<br />
[67] BD Biosciences Clontech. BD Living Colors TM Flourescent Proteins.<br />
[68] R.M. Mansfield, J.R. Levenson. Distinguished photons: The maestro TM in-vivo fluorescence imaging<br />
system. Technical report, CRi, 2006.<br />
[69] Haiyan Wan, Jiangyan He, Bensheng Ju, Tie Yan, Toong Jin Lam, and Zhiyuan Gong. Generation of<br />
two-color transgenic zebrafish using the green and red fluorescent protein reporter genes gfp and rfp.<br />
Mar Biotechnol (NY), 4(2):146–154, 2002.<br />
[70] Simon R Cherry. In vivo molecular and genomic imaging: new challenges for imaging physics. Phys<br />
Med Biol, 49(3):R13–48, 2004.<br />
[71] Guo-Jun Zhang, Michal Safran, Wenyi Wei, Erik Sorensen, Peter Lassota, Nikolai Zhelev, Donna S<br />
Neuberg, Geoffrey Shapiro, and William G Jr Kaelin. Bioluminescent imaging of cdk2 inhibition in<br />
vivo. Nat Med, 10(6):643–648, 2004.<br />
[72] Weisheng Zhang, Anthony F Purchio, Kevin Chen, Jianming Wu, Li Lu, Richard Coffee, Pamela R<br />
Contag, and David B West. A transgenic mouse model with a luciferase reporter for studying in vivo<br />
transcriptional regulation of the human cyp3a4 gene. Drug Metab Dispos, 31(8):1054–1064, 2003.<br />
[73] Paolo Ciana, Michele Raviscioni, Paola Mussi, Elisabetta Vegeto, Ivo Que, Malcolm G Parker,<br />
Clemens Lowik, and Adriana Maggi. In vivo imaging of transcriptionally active estrogen receptors.<br />
Nat Med, 9(1):82–86, 2003.<br />
[74] F.M. Dekking, C. Kraaikamp, P. Lopuhaä, and L.E. Meester. Kanstat: Probability and statistics for<br />
the 21st century. Delft University of Technology, 2002.<br />
[75] Antoinette Wetterwald, Gabri van der Pluijm, Ivo Que, Bianca Sijmons, Jeroen Buijs, Marcel Karperien,<br />
Clemens W G M Lowik, Elsbeth Gautschi, George N Thalmann, and Marco G Cecchini. Optical<br />
imaging of cancer metastasis to bone marrow: a mouse model of minimal residual disease. Am J<br />
Pathol, 160(3):1143–1153, 2002.<br />
[76] Darlene E Jenkins, Yoko Oei, Yvette S Hornig, Shang-Fan Yu, Joan Dusich, Tony Purchio, and<br />
Pamela R Contag. Bioluminescent imaging (bli) to improve and refine traditional murine models of<br />
tumor growth and metastasis. Clin Exp Metastasis, 20(8):733–744, 2003.<br />
[77] Andrew Webb. Statistical <strong>Pattern</strong> Regognition. Wiley, 2 edition, 2002.<br />
60 Martin Wildeman