hardware implementation of data compression ... - INFN Bologna
hardware implementation of data compression ... - INFN Bologna
hardware implementation of data compression ... - INFN Bologna
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
UNIVERSITÀ DEGLI STUDI DI BOLOGNA<br />
FACOLT À DI SCIENZE MATEMATICHE FISICHE E NATURALI<br />
DOTTORATO DI RICERCA IN FISICA XIV ciclo<br />
HARDWARE IMPLEMENTATION OF<br />
DATA COMPRESSION ALGORITHMS<br />
IN THE ALICE EXPERIMENT<br />
Tesi di Dottorato<br />
di:<br />
Dott. Davide Falchieri<br />
Anno Accademico 2000/2001<br />
Tutori:<br />
Pr<strong>of</strong>. Maurizio Basile<br />
Pr<strong>of</strong>. Enzo Gandolfi<br />
Coordinatore:<br />
Pr<strong>of</strong>. Giovanni Venturi
UNIVERSITÀ DEGLI STUDI DI BOLOGNA<br />
FACOLT À DI SCIENZE MATEMATICHE FISICHE E NATURALI<br />
DOTTORATO DI RICERCA IN FISICA XIV ciclo<br />
HARDWARE IMPLEMENTATION OF<br />
DATA COMPRESSION ALGORITHMS<br />
IN THE ALICE EXPERIMENT<br />
Tesi di Dottorato<br />
di:<br />
Dott. Davide Falchieri<br />
Tutori:<br />
Pr<strong>of</strong>. Maurizio Basile<br />
Pr<strong>of</strong>. Enzo Gandolfi<br />
Coordinatore:<br />
Pr<strong>of</strong>. Giovanni Venturi<br />
Parole chiave: ALICE, <strong>data</strong> <strong>compression</strong>, CARLOS, wavelets, VHDL<br />
Anno Accademico 2000/2001
Contents<br />
Introduction ix<br />
1 The ALICE experiment 1<br />
1.1 The Inner Tracking System . . . . . . . . . . . . . . . . . . . 2<br />
1.1.1 Tracking in ALICE . . . . . . . . . . . . . . . . . . . . 3<br />
1.1.2 Physics <strong>of</strong> the ITS . . . . . . . . . . . . . . . . . . . . 4<br />
1.1.3 Layout <strong>of</strong> the ITS . . . . . . . . . . . . . . . . . . . . . 6<br />
1.2 Design <strong>of</strong> the drift layers . . . . . . . . . . . . . . . . . . . . . 8<br />
1.3 The SDDs (Silicon Drift Detectors) . . . . . . . . . . . . . . . 10<br />
1.4 SDD readout system . . . . . . . . . . . . . . . . . . . . . . . 12<br />
1.4.1 Front-end module . . . . . . . . . . . . . . . . . . . . . 14<br />
1.4.2 Event-buffer strategy . . . . . . . . . . . . . . . . . . . 17<br />
1.4.3 End-ladder module . . . . . . . . . . . . . . . . . . . . 18<br />
1.4.4 Choice <strong>of</strong> the technology . . . . . . . . . . . . . . . . . 19<br />
2 Data <strong>compression</strong> techniques 21<br />
2.1 Applications <strong>of</strong> <strong>data</strong> <strong>compression</strong> . . . . . . . . . . . . . . . . 22<br />
2.2 Remarks on information theory . . . . . . . . . . . . . . . . . 23<br />
2.3 Compression techniques . . . . . . . . . . . . . . . . . . . . . 24<br />
2.3.1 Lossless <strong>compression</strong> . . . . . . . . . . . . . . . . . . . 25<br />
2.3.2 Lossy <strong>compression</strong> . . . . . . . . . . . . . . . . . . . . 25<br />
2.3.3 Measures <strong>of</strong> performance . . . . . . . . . . . . . . . . . 25<br />
2.3.4 Modelling and coding . . . . . . . . . . . . . . . . . . . 26<br />
2.4 Lossless <strong>compression</strong> techniques . . . . . . . . . . . . . . . . . 27<br />
2.4.1 Huffman coding . . . . . . . . . . . . . . . . . . . . . . 27<br />
v
vi<br />
CONTENTS<br />
2.4.2 Run Length encoding . . . . . . . . . . . . . . . . . . . 31<br />
2.4.3 Differential encoding . . . . . . . . . . . . . . . . . . . 32<br />
2.4.4 Dictionary techniques . . . . . . . . . . . . . . . . . . . 33<br />
2.4.5 Selective readout . . . . . . . . . . . . . . . . . . . . . 34<br />
2.5 Lossy <strong>compression</strong> techniques . . . . . . . . . . . . . . . . . . 35<br />
2.5.1 Zero supression . . . . . . . . . . . . . . . . . . . . . . 35<br />
2.5.2 Transform coding . . . . . . . . . . . . . . . . . . . . . 36<br />
2.5.3 Subband coding . . . . . . . . . . . . . . . . . . . . . . 41<br />
2.5.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />
2.6 Implementation <strong>of</strong> <strong>compression</strong> algorithms . . . . . . . . . . . 51<br />
3 1D <strong>compression</strong> algorithm and <strong>implementation</strong>s 55<br />
3.1 Compression algorithms for SDD . . . . . . . . . . . . . . . . 55<br />
3.2 1D <strong>compression</strong> algorithm . . . . . . . . . . . . . . . . . . . . 56<br />
3.3 1D algorithm performances . . . . . . . . . . . . . . . . . . . . 58<br />
3.3.1 Compression coefficient . . . . . . . . . . . . . . . . . . 59<br />
3.3.2 Reconstruction error . . . . . . . . . . . . . . . . . . . 60<br />
3.4 CARLOS v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />
3.4.1 Board description . . . . . . . . . . . . . . . . . . . . . 62<br />
3.4.2 CARLOS v1 design flow . . . . . . . . . . . . . . . . . 65<br />
3.4.3 Functions performed by CARLOS v1 . . . . . . . . . . 67<br />
3.4.4 Tests performed on CARLOS v1 . . . . . . . . . . . . 68<br />
3.5 CARLOS v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />
3.5.1 The firstcheck block . . . . . . . . . . . . . . . . . . . 71<br />
3.5.2 The barrel shifter block . . . . . . . . . . . . . . . . . . 72<br />
3.5.3 The fifo block . . . . . . . . . . . . . . . . . . . . . . . 73<br />
3.5.4 The event-counter block . . . . . . . . . . . . . . . . . 75<br />
3.5.5 The outmux block . . . . . . . . . . . . . . . . . . . . 76<br />
3.5.6 The feesiu (toplevel) block . . . . . . . . . . . . . . . . 81<br />
3.5.7 CARLOS-SIU interface . . . . . . . . . . . . . . . . . . 82<br />
3.6 CARLOS v2 design flow . . . . . . . . . . . . . . . . . . . . . 87<br />
3.7 Tests performed on CARLOS v2 . . . . . . . . . . . . . . . . . 89
CONTENTS<br />
4 2D <strong>compression</strong> algorithm and <strong>implementation</strong> 91<br />
4.1 2D <strong>compression</strong> algorithm . . . . . . . . . . . . . . . . . . . . 91<br />
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 91<br />
4.1.2 How the 2D algorithm works . . . . . . . . . . . . . . . 95<br />
4.1.3 Compression coefficient . . . . . . . . . . . . . . . . . . 96<br />
4.1.4 Reconstruction error . . . . . . . . . . . . . . . . . . . 97<br />
4.2 CARLOS v3 vs. the previous prototypes . . . . . . . . . . . . 98<br />
4.3 The final readout architecture . . . . . . . . . . . . . . . . . . 101<br />
4.4 CARLOS v3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />
4.5 CARLOS v3 building blocks . . . . . . . . . . . . . . . . . . . 103<br />
4.5.1 The channel block . . . . . . . . . . . . . . . . . . . . 105<br />
4.5.2 The encoder block . . . . . . . . . . . . . . . . . . . . 105<br />
4.5.3 The barrel15 block . . . . . . . . . . . . . . . . . . . . 107<br />
4.5.4 The fifonew32x15 block . . . . . . . . . . . . . . . . . 108<br />
4.5.5 The channel-trigger block . . . . . . . . . . . . . . . . 111<br />
4.5.6 The ttc-rx-interface block . . . . . . . . . . . . . . . . 112<br />
4.5.7 The fifo-trigger block . . . . . . . . . . . . . . . . . . . 112<br />
4.5.8 The event-counter block . . . . . . . . . . . . . . . . . 113<br />
4.5.9 The outmux block . . . . . . . . . . . . . . . . . . . . 113<br />
4.6<br />
4.5.10 The trigger-interface block . . . . . . . . . . . . . . . . 116<br />
4.5.11 The cmcu block . . . . . . . . . . . . . . . . . . . . . . 117<br />
4.5.12 The pattern-generator block . . . . . . . . . . . . . . . 119<br />
4.5.13 The signature-maker block . . . . . . . . . . . . . . . . 121<br />
Digital design flow for CARLOS v3 . . . . . . . . . . . . . . . 122<br />
4.7 CARLOS layout features . . . . . . . . . . . . . . . . . . . . . 123<br />
5 Wavelet based <strong>compression</strong> algorithm 125<br />
5.1 Wavelet based <strong>compression</strong> algorithm . . . . . . . . . . . . . . 126<br />
5.1.1 Configuration parameters <strong>of</strong> the multiresolution algorithm<br />
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br />
5.2 Multiresolution algorithm optimization . . . . . . . . . . . . . 129<br />
5.2.1 The Wavelet Toolbox from Matlab . . . . . . . . . . . 130<br />
5.2.2 Choice <strong>of</strong> the filters . . . . . . . . . . . . . . . . . . . . 131<br />
vii
viii<br />
CONTENTS<br />
5.2.3 Choice <strong>of</strong> the dimensionality, number <strong>of</strong> levels and threshold<br />
value . . . . . . . . . . . . . . . . . . . . . . . . . . 139<br />
5.3 Choice <strong>of</strong> the architecture . . . . . . . . . . . . . . . . . . . . 141<br />
5.3.1 Simulink and the Fixed-Point Blockset . . . . . . . . . 141<br />
5.3.2 Choice <strong>of</strong> the architecture . . . . . . . . . . . . . . . . 143<br />
5.4 Multiresolution algorithm performances . . . . . . . . . . . . . 149<br />
5.5 Hardware <strong>implementation</strong> . . . . . . . . . . . . . . . . . . . . 151<br />
Conclusions 159<br />
Bibliography 161
Introduction<br />
This thesis work has been aimed at the <strong>hardware</strong> <strong>implementation</strong> <strong>of</strong> <strong>data</strong><br />
<strong>compression</strong> algorithms to be applied to High Energy Physics Experiments.<br />
The amount <strong>of</strong> <strong>data</strong> that will be produced by LHC experiments at CERN<br />
is <strong>of</strong> the order <strong>of</strong> magnitude <strong>of</strong> 1 GByte/s. Cost constraints on magnetic<br />
tapes and <strong>data</strong> acquisition systems (optical fibres, readout boards) require<br />
to apply on-line <strong>data</strong> <strong>compression</strong> on the front-end electronics <strong>of</strong> the different<br />
detectors. This leads to the search <strong>of</strong> the <strong>compression</strong> algorithms allowing to<br />
achieve a high <strong>compression</strong> ratio, while keeping low the value <strong>of</strong> the reconstruction<br />
error. In fact a high <strong>compression</strong> coefficient can only be achieved<br />
at the expense <strong>of</strong> some loss on the physical <strong>data</strong>.<br />
The thesis contains the description <strong>of</strong> the <strong>hardware</strong> <strong>implementation</strong> <strong>of</strong> <strong>compression</strong><br />
algorithms applied to the ALICE experiment for what concerns the<br />
SDD (Silicon Drift Detector) readout chain. The total amount <strong>of</strong> <strong>data</strong> produced<br />
by SDDs is 32.5 MBytes per event, while the reserved space on magnetic<br />
tapes for permanent storage is 1.5 MBytes. This means that the <strong>compression</strong><br />
coefficient has to be at least 22. Beside that, since the p-p interaction<br />
rate is 1000 Hz, <strong>data</strong> <strong>compression</strong> <strong>hardware</strong> has to complete its job within 1<br />
ms. This leads to the search for high performances <strong>compression</strong> algorithms<br />
for what concerns both <strong>compression</strong> ratio and execution speed.<br />
The thesis contains a description <strong>of</strong> the design and <strong>implementation</strong> <strong>of</strong> 3<br />
prototypes <strong>of</strong> the ASIC CARLOS (Compression And Run Length encOding<br />
Subsystem) which deals with the on-line <strong>data</strong> <strong>compression</strong>, packing and<br />
transmission to the standard ALICE <strong>data</strong> acquisition system. CARLOS v1<br />
and v2 contain a uni-dimensional <strong>compression</strong> algorithm based on threshold,<br />
run length encoding, differential encoding and Huffman coding techniques.<br />
ix
x<br />
Introduction<br />
CARLOS v3 was meant to contain a bi-dimensional <strong>compression</strong> algorithm<br />
that obtains a better <strong>compression</strong> ratio than 1D with a lower physical <strong>data</strong><br />
loss. Nevertheless, for time reasons, the design <strong>of</strong> CARLOS v3 sent to the<br />
foundy contains a simple 1D look-up table based <strong>compression</strong> algorithm. The<br />
2D algorithm is about to be implemented in the next prototype, which should<br />
be the final version <strong>of</strong> CARLOS. The first two prototypes have been tested<br />
with good results; the third one is in realization phase up to now and its test<br />
will begin from February 2002.<br />
Beside that, the thesis contains a detailed study <strong>of</strong> a wavelet-based <strong>compression</strong><br />
algorithm, which obtains encouraging results for what concerns both<br />
<strong>compression</strong> ratio and reconstruction error. The algorithm may find a suitable<br />
application as a second level compressor on SDD <strong>data</strong> in the case that<br />
it might become necessary to switch <strong>of</strong>f the <strong>compression</strong> algorithm implemented<br />
on CARLOS.<br />
The thesis is structured in the following way:<br />
• Chapter 1 contains a description <strong>of</strong> the ALICE experiment, especially<br />
for what concerns the SDD readout architecture.<br />
• Chapter 2 contains an introduction to standard <strong>compression</strong> algorithms.<br />
• Chapter 3 contains a description <strong>of</strong> the 1D algorithm developed at the<br />
<strong>INFN</strong> Section <strong>of</strong> Torino and the two prototypes CARLOS v1 and v2.<br />
• Chapter 4 focuses on the 2D <strong>compression</strong> algorithm and on the design<br />
and <strong>implementation</strong> <strong>of</strong> the prototype CARLOS v3.<br />
• Chapter 5 contains a description <strong>of</strong> a wavelet-based <strong>compression</strong> algorithm<br />
especially tuned to reach high performances on SDD <strong>data</strong> and<br />
its possible application to a second level compressor in counting room.
Chapter 1<br />
The ALICE experiment<br />
ALICE (A Large Ion Collider Experiment) [1] is an experiment at the Large<br />
Hadron Collider (LHC) [2] optimized for the study <strong>of</strong> heavy-ion collisions,<br />
at a centre-<strong>of</strong>-mass energy <strong>of</strong> 5.5 TeV per nucleon. The main aim <strong>of</strong> the<br />
experiment is to study in details the behaviour <strong>of</strong> nuclear matter at high<br />
densities and temperatures, in view <strong>of</strong> probing deconfinment and chiral symmetry<br />
restoration.<br />
The detector [1, 3] consists essentially <strong>of</strong> two main components: the central<br />
part, composed <strong>of</strong> detectors mainly devoted to the study <strong>of</strong> hadronic signals<br />
and dielectrons, and the forward muon spectrometer, devoted to the study<br />
<strong>of</strong> quarkonia behaviour in dense matter. The layout <strong>of</strong> the ALICE set-up is<br />
shown in Fig. 1.1.<br />
A major technical challenge is imposed by the large number <strong>of</strong> particles created<br />
in the collisions <strong>of</strong> lead ions. There is a considerable spread in the<br />
currently available predictions for the multiplicity <strong>of</strong> charged particles produced<br />
in a central Pb-Pb collision. The design <strong>of</strong> the experiment has been<br />
based on the highest value, 8000 charged particles per unit <strong>of</strong> rapidity, at<br />
midrapidity. This multiplicity dictates the granularity <strong>of</strong> the detectors and<br />
their optimal distance from the colliding beams. The central part, which<br />
covers ±45◦ (η ≤ 0.9) over the full azimuth, is embedded in a large magnet<br />
with a weak solenoidal field. Outside <strong>of</strong> the Inner Tracking System (ITS),<br />
there are a cylindrical TPC (Time Projection Chamber) and a large area PID<br />
array <strong>of</strong> time-<strong>of</strong>-flight (TOF) counters. In addition, there are two small-area<br />
1
2<br />
The ALICE experiment<br />
Figure 1.1: Longitudinal section <strong>of</strong> the ALICE detector<br />
single-arm detectors: an electromagnetic calorimeter (Photon Spectrometer,<br />
PHOS) and an array <strong>of</strong> RICH counters optimized for high-momentum inclusive<br />
particle identification (HMPID).<br />
My thesis work has been focused on <strong>data</strong> coming from one <strong>of</strong> the three detectors<br />
forming the ITS, the Silicon Drift Detector (SDD).<br />
1.1 The Inner Tracking System<br />
The basic functions <strong>of</strong> the ITS [4] are:<br />
• determination <strong>of</strong> the primary vertex and <strong>of</strong> the secondary vertices necessary<br />
for the reconstruction <strong>of</strong> charm and hyperon decays;<br />
• particle identification and tracking <strong>of</strong> low-momentum particles;<br />
• improvement <strong>of</strong> the momentum and angle measurements <strong>of</strong> the TPC.
1.1 — The Inner Tracking System<br />
1.1.1 Tracking in ALICE<br />
Track finding in heavy-ion collisions at the LHC presents a big challenge,<br />
because <strong>of</strong> the extremely high track density. In order to achieve<br />
a high granularity and a good two-track separation, ALICE uses threedimensional<br />
hit information, wherever feasible, with many points on<br />
each track and a weak magnetic field. The ionization density <strong>of</strong> each<br />
track is measured for particle identification. The need for a large number<br />
<strong>of</strong> points on each track has led to the choice <strong>of</strong> a TPC as the main<br />
tracking system. In spite <strong>of</strong> its drawbacks, concerning speed and <strong>data</strong><br />
volume, only this device can provide reliable performance for a large<br />
volume at up to 8000 charged particles per unit <strong>of</strong> rapidity. The minimum<br />
possible inner radius <strong>of</strong> the TPC (rin =90cm)isgivenbythe<br />
maximum acceptable hit density. The outer radius (rout = 250 cm)<br />
is determined by the minimum length required for a dE/dx resolution<br />
better than 10 %. At smaller radii, and hence larger track densities,<br />
tracking is taken over by the ITS.<br />
The ITS consists <strong>of</strong> six cylindrical layers <strong>of</strong> silicon detectors. The number<br />
and position <strong>of</strong> the layers are optimized for efficient track finding<br />
and impact parameter resolution. In particular, the outer radius is<br />
determined by the track matching with the TPC, and the inner one<br />
is the minimum compatible with the radius <strong>of</strong> the beam pipe (3 cm).<br />
The silicon detectors feature the high granularity and excellent spatial<br />
precision required.<br />
Because <strong>of</strong> the high particle density, up to 90 cm−2 , the four innermost<br />
layers (r ≤ 24 cm) must be truly two-dimensional devices. For<br />
this task, silicon pixel and silicon drift detectors were chosen. The<br />
outer two layers at r = 45 cm, where the track densities are below<br />
1 cm−2 , are equipped with double-sided silicon micro-strip detectors.<br />
With the exception <strong>of</strong> the two innermost pixel planes, all layers have<br />
analog readout for particle identification via a dE/dx measurement<br />
in the non-relativistic region. This gives the inner tracking system a<br />
stand-alone capability as a low-pt particle spectrometer.<br />
3
4<br />
The ALICE experiment<br />
1.1.2 Physics <strong>of</strong> the ITS<br />
The ITS will contribute to the track reconstruction by improving the<br />
momentum resolution obtained by the TPC. This will be beneficial for<br />
practically all physics topics which will be addressed by the ALICE experiment.<br />
The global event features will be studied by measuring the<br />
multiplicity distributions and the inclusive particle spectra. For the<br />
study <strong>of</strong> resonance production (ρ, ω and φ), and, more important, the<br />
behaviour <strong>of</strong> the mass and width <strong>of</strong> these mesons in the dense medium,<br />
the momentum resolution is even more important. We have to achieve<br />
a mass precision comparable to, or better than, the natural width <strong>of</strong><br />
the resonances in order to observe changes <strong>of</strong> their parameters caused<br />
by chiral symmetry restoration. Also the mass resolution for heavy<br />
states, like D mesons, J/ψ and Υ, will be better, thus improving the<br />
signal-to-background ratio in the measurement <strong>of</strong> the open charm production,<br />
and in the study <strong>of</strong> heavy-quarkonia suppression. Improved<br />
momentum resolution will enhance the performances in the observation<br />
<strong>of</strong> another hard phenomenon, the jet production and predicted jet<br />
quenching, i.e. the energy loss <strong>of</strong> partons in strongly interacting dense<br />
matter.<br />
The low-momentum particles (below 100 MeV/c) will be detectable<br />
only by the ITS. This is <strong>of</strong> interest in itself, because it widens the momentum<br />
range for the measurement <strong>of</strong> particle spectra, which allows<br />
collective effects associated with the large length scales to be studied.<br />
In addition, a low-pt cut-<strong>of</strong>f is essential to suppress the s<strong>of</strong>t gamma<br />
conversions and the background in the electron-pair spectrum due to<br />
Dalitz pairs. Also the PID capabilities <strong>of</strong> the ITS in the non-relativistic<br />
(1/β2 ) region will therefore be <strong>of</strong> great help.<br />
In addition to the improved momentum resolution, which is necessary<br />
for the identical particle interferometry, especially at low momenta, the<br />
ITS will contribute to this study through an excellent double-hit resolution<br />
enabling the separation <strong>of</strong> tracks with close momenta. In order<br />
to be able to study particle correlations in the three components <strong>of</strong>
1.1 — The Inner Tracking System<br />
their relative momenta, and hence to get information about the space<br />
time evolution <strong>of</strong> the system produced in heavy-ion collisions at the<br />
LHC, we need sufficient angular resolution in the measurement <strong>of</strong> the<br />
particle’s direction. Two <strong>of</strong> the three components <strong>of</strong> the relative momentum<br />
(the side and longitudinal ones) are crucially dependent on<br />
the precision with which the particle direction is known. The angular<br />
resolution is determined by the precise ITS measurements <strong>of</strong> the primary<br />
vertex position and <strong>of</strong> the first points on the tracks. The particle<br />
identification at low momenta will enhance the physics capability by<br />
allowing the interferometry <strong>of</strong> individual particle species as well as the<br />
study <strong>of</strong> non-identical particle correlations, the latter giving access to<br />
the emission time <strong>of</strong> different particles.<br />
The study <strong>of</strong> strangeness production is an essential part <strong>of</strong> the ALICE<br />
physics program. It will allow the level <strong>of</strong> chemical equilibration and<br />
the density <strong>of</strong> strange quarks in the system to be established. The measurement<br />
will be performed by charge kaon identification and hyperon<br />
detection, based on the ITS capability to recognize secondary vertices.<br />
The observation <strong>of</strong> multi-strange hyperons (Ξ − and Ω − ) is <strong>of</strong> particular<br />
interest, because they are unlikely to be produced during the hadronic<br />
rescattering due to the high-energy threshold for their production. In<br />
this way we can obtain information about the strangeness density <strong>of</strong><br />
the earlier stage <strong>of</strong> the collision.<br />
Open charm production in heavy-ion collisions is <strong>of</strong> great physics interest.<br />
Charmed quarks can be produced in the initial hard parton<br />
scattering and then only at the very early stages <strong>of</strong> the collision, while<br />
the energy in parton rescattering is above the charm production threshold.<br />
The charm yield is not altered later. The excellent performance <strong>of</strong><br />
the ITS in finding the secondary vertices close to the interaction point<br />
gives us the possibility to detect D mesons, by reconstructing the full<br />
decay topology.<br />
5
6<br />
The ALICE experiment<br />
Figure 1.2: ITS layers<br />
1.1.3 Layout <strong>of</strong> the ITS<br />
A general view <strong>of</strong> the ITS is shown in Fig. 1.2. The system consists<br />
<strong>of</strong> six cylindrical layers <strong>of</strong> coordinate-sensitive detectors, covering the<br />
central rapidity region (η ≤ 0.9) for vertices located within the length<br />
<strong>of</strong> the interaction diamond (2σ), i.e. 10.6 cm along the beam direction<br />
(z). The detectors and front-end electronics are held by lightweight<br />
carbon-fibre structures. The geometrical dimensions and the main features<br />
<strong>of</strong> the various layers <strong>of</strong> the ITS are summarized in Table 1.1.<br />
The granularity required for the innermost planes is achieved with<br />
silicon micro-pattern detectors with true two-dimensional readout: Silicon<br />
Pixel Detectors (SPD) and Silicon Drift Detectors (SDD). At larger<br />
radii, the requirements in terms <strong>of</strong> granularity are less stringent, therefore<br />
double-sided Silicon Strip Detectors (SSD) with a small stereo<br />
angle are used. Double-sided microstrips have been selected rather<br />
than single-sided ones because they introduce less material in the active<br />
volume. In addition they <strong>of</strong>fer the possibility to correlate the pulse<br />
height read out from the two sides, thus helping to resolve ambiguities<br />
inherent in the use <strong>of</strong> detectors with projective readout. The main<br />
parameters for each <strong>of</strong> the three detector types are: spatial precision,<br />
two-track resolution, pixel size, number <strong>of</strong> channels <strong>of</strong> an individual<br />
detector, total number <strong>of</strong> electronic channels are shown in Table 1.1.
1.1 — The Inner Tracking System<br />
Parameter Pixel Drift Strip<br />
Spatial precision rφ µm 12 38 20<br />
Spatial precision z µm70 28 830<br />
Two-track resolution rφ µm 100 200 300<br />
Two-track resolution z µm600 600 2400<br />
Cell size µm2 50 x 300 150 x 300 95 x 40000<br />
Active area mm2 13.8 × 82 72.5 × 75.3 73× 40<br />
Readout channels per module 65536 2 x 256 2 x 768<br />
Total number <strong>of</strong> modules 240 260 1770<br />
Total number <strong>of</strong> readout channels k 15729 133 2719<br />
Total number <strong>of</strong> cells M 15.7 34 2.7<br />
Average occupancy (inner layer) 1.5 2.5 4<br />
Average occupancy (outer layer) 0.4 1.0 3.3<br />
Table 1.1: Main features <strong>of</strong> ITS detectors<br />
The large number <strong>of</strong> channels in the layers <strong>of</strong> the ITS requires a large<br />
number <strong>of</strong> connections from the front-end electronics to the detector<br />
and to the <strong>data</strong> acquisition system. The requirement for a minimum <strong>of</strong><br />
material within the acceptance does not allow the use <strong>of</strong> conventional<br />
copper cables near the active surfaces <strong>of</strong> the detection system. Therefore<br />
Tape Automatic Bonded (TAB) aluminium multilayer microcables<br />
are used.<br />
The detectors and their front-end electronics produce a large amount<br />
<strong>of</strong> heat which has to be removed while keeping a very high degree <strong>of</strong><br />
temperature stability. In particular, the SDDs are sensitive to temperature<br />
variations in the 0.1 ◦C range. For these reasons, particular care<br />
was taken in the design <strong>of</strong> the cooling system and <strong>of</strong> the temperature<br />
monitoring. A water cooling system at room temperature is the chosen<br />
solution for all ITS layers, but the use <strong>of</strong> other liquid coolants is still<br />
being considered. For the temperature monitoring dedicated integrated<br />
circuits are mounted on the readout boards and specific calibration devices<br />
are integrated in the SDDs.<br />
The outer four layers <strong>of</strong> the ITS detectors are assembled onto a me-<br />
7
8<br />
The ALICE experiment<br />
Figure 1.3: SDD prototype: 1) active area, 2) guard area.<br />
chanical structure made <strong>of</strong> two end-cap cones connected by a cylinder<br />
placed between the SSD and the SDD layers. Both the cones and the<br />
cylinder are made <strong>of</strong> lightweight sandwiches <strong>of</strong> carbon-fibre plies and<br />
Rohacell TM . The carbon-fibre structure includes also the appropriate<br />
mechanical links to the TPC and to the SPD layers. The latter<br />
are assembled in two half-cylinder structures, specifically designed for<br />
safe installation around the beam pipe. The end-cap cones provide the<br />
cabling and cooling connection <strong>of</strong> the six ITS layers with the outside<br />
services.<br />
1.2 Design <strong>of</strong> the drift layers<br />
SDDs (a picture is shown in Fig. 1.3) have been selected to equip the<br />
two intermediate layers <strong>of</strong> the ITS, since they couple a very good multitrack<br />
capability with dE/dx information. At least three measured<br />
samples per track, and therefore at least four layers carrying dE/dx<br />
information are needed. The SDDs, 7.25 × 7.53 cm2 active area each,
1.2 — Design <strong>of</strong> the drift layers<br />
Figure 1.4: Longitudinal section <strong>of</strong> ITS layer 3andlayer 4<br />
will be mounted on linear structures called ladders, each holding six<br />
detectors for layer 3 and eight detectors for layer 4 (see Fig. 1.4).<br />
The layers will sit at the average radius <strong>of</strong> 14.9 and 23.8 cm from<br />
the beam pipe and will be composed <strong>of</strong> 14 and 22 ladders respectively.<br />
The front-end electronics will be mounted on rigid heat-exchanging hybrids,<br />
which in turn will be connected onto cooling pipes running along<br />
the ladder structure. The connections between the detectors and the<br />
front-end electronics, and between both and the ends <strong>of</strong> the ladders will<br />
be assured with flexible Al microcables, TAB bonded, which will carry<br />
both <strong>data</strong> and power supply lines. Each detector will be first assembled<br />
together with its front-end electronics and high-voltage connections as<br />
9
10<br />
The ALICE experiment<br />
n<br />
+<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
+<br />
+<br />
− +<br />
−−<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
p<br />
+<br />
Figure 1.5: Working mode <strong>of</strong> a SDD detector<br />
a unit, hereafter called a module, which will be fully tested before it is<br />
mounted on the ladder.<br />
1.3 The SDDs (Silicon Drift Detectors)<br />
SDDs, like gaseous drift detectors, exploit the measurement <strong>of</strong> the<br />
transport time <strong>of</strong> the charge deposited by a transversing particle to<br />
localize the impact point in two dimensions, thus enhancing resolution<br />
and multi-track capability at the expense <strong>of</strong> speed. They are therefore<br />
well suited to this experiment in which very high particle multiplicities<br />
are coupled with relatively low event rates (up to some KHz). A linear<br />
SDD, shown schematically in Fig. 1.5, has a series <strong>of</strong> parallel implanted<br />
p + field strips, connected to a voltage divider on both surfaces <strong>of</strong> the<br />
high-resistivity n-type silicon wafer. The voltage divider is integrated<br />
on the detector substrate itself. The field strips provide the bias voltage<br />
to fully deplete the volume <strong>of</strong> the detector and they generate an electrostatic<br />
field parallel to the wafer surface, thus creating a drift region<br />
(see Fig. 1.6). Electron-hole pairs are created by the charged particles<br />
crossing the detector. The holes are collected by the nearest p +<br />
electrode, while the electrons are focused into the middle plane <strong>of</strong> the<br />
detector and driven by the drift field towards the edge <strong>of</strong> the detector<br />
x<br />
z<br />
y
1.3 — The SDDs (Silicon Drift Detectors)<br />
Figure 1.6: Potential energy <strong>of</strong> electrons (negative electric potential) on<br />
the y-z plane <strong>of</strong> the device<br />
where they are collected by an array <strong>of</strong> anodes composed <strong>of</strong> n + pads.<br />
So far an electronic charge cloud drifts from the impact point to the anode<br />
region: the cloud shows a bell-shaped Gaussian distribution that,<br />
owing to the diffusion and mutual repulsion, during the drift becomes<br />
smaller and larger [5] (see Fig. 1.7). In this way a charge cloud can<br />
be collected by one or more anodes depending on the charge released<br />
by the ionizing particle and on the impact position with respect to the<br />
anode region. The small size <strong>of</strong> the anodes, and hence their small capacitance<br />
(50 fF), imply low noise and good energy resolution.<br />
The coordinate perpendicular to the drift direction is given by the centroid<br />
<strong>of</strong> the collected charge. The coordinate along the drift direction is<br />
measured by the centroid <strong>of</strong> the signal in the time domain, taking into<br />
account the amplifier response. A space precision, averaged over the<br />
full detector surface, better than 40 µm in both coordinates has been<br />
obtained during beam tests <strong>of</strong> full-size prototype detectors. Each SDD<br />
module is divided in two half-detectors: each half-detector contains on<br />
the external side 256 anodes at a distance <strong>of</strong> 300 µmfromeachanother.<br />
So far each SDD detector contains 2 x 256 readout channels: taking<br />
into account that the layer 3 and 4 contain 260 SDD modules, the total<br />
number <strong>of</strong> SDD readout channels is around 133k.<br />
11
12<br />
Time axis<br />
The ALICE experiment<br />
Drift<br />
Anode axis<br />
Figure 1.7: Charge distribution evolution scheme<br />
1.4 SDD readout system<br />
The system requirements for the SDD readout system derive from both<br />
the features <strong>of</strong> the detector and the ALICE experiment in general. The<br />
following points are crucial in the definition <strong>of</strong> the final readout system:<br />
– The signal generated by the SDD is a Gaussian shaped current<br />
signal, with variable sigma and charge (5-30 ns and 4 to 32 fC)<br />
and can be collected by one or more anodes. Therefore the frontend<br />
electronics should be able to handle analog signals in a wide<br />
dynamic range. Then, the system noise should be very low while<br />
being able to handle large signals.<br />
– The amount <strong>of</strong> <strong>data</strong> generated by the SDD is very large: each half<br />
detector has 256 anodes and for each anode 256 time samples have<br />
to be taken in order to cover the full drift length.<br />
– The small space available on the ladder and the constraints on<br />
material impose an architecture which minimizes cabling.<br />
– The radiation environment in which the front-end electronics has<br />
to work imposes the choice <strong>of</strong> a radiation tolerant technological
PASCAL<br />
AMBRA<br />
1.4 — SDD readout system<br />
SIU<br />
.<br />
.<br />
.<br />
End ladder module<br />
Front−end module<br />
SDD detectors<br />
Test and slow control<br />
CARLOS<br />
Figure 1.8: SDD ladder electronics<br />
library for the <strong>implementation</strong> <strong>of</strong> the electronics.<br />
The chosen SDD readout electronics, shown in Fig. 1.8, consists <strong>of</strong><br />
front-end modules and end-ladder modules. The front-end module performs<br />
analog <strong>data</strong> acquisition, A/D conversion and buffering, while the<br />
end-ladder module contains high voltage and low voltage regulators and<br />
a chip for <strong>data</strong> <strong>compression</strong> and interfacing the ALICE DAQ system.<br />
13
14<br />
The ALICE experiment<br />
Figure 1.9: The front-end readout unit<br />
1.4.1 Front-end module<br />
The front-end modules, one per half-detector, are distributed along the<br />
ladders together with the SDD modules. Each front-end module contains<br />
4 PASCAL (Preamplifier, Analog Storage and Conversion from<br />
Analog to digitaL) - AMBRA (A Multievent Buffer Readout Architecture)<br />
chips pairs, as shown in Fig. 1.9. The PASCAL chips are<br />
TAB-bonded directly on the SDD output anodes, while the AMBRA<br />
chips are connected to CARLOS (Compression And Run Length encOding<br />
Subsystem) via an 8-bit bus.<br />
Each PASCAL chip contains three functional blocks (see Fig. 1.10):<br />
– low noise preamplifiers (they are 64, one for each anode);<br />
– an analog memory working at a 40 MHz clock frequency (64×256<br />
cells);<br />
– 10-bit analog to digital converters ADC, (they are 64, one for each<br />
channel).<br />
During the write phase, i.e. when no trigger signal has been received,<br />
the preamplifiers continuosly write the samples into the analog memory
<strong>data</strong>_in[0]<br />
<strong>data</strong>_in[1]<br />
<strong>data</strong>_in[2]<br />
<strong>data</strong>_in[62]<br />
<strong>data</strong>_in[63]<br />
Preamplifiers<br />
pa_cal<br />
1.4 — SDD readout system<br />
Analog memory<br />
...<br />
...<br />
...<br />
...<br />
...<br />
Analog memory<br />
control unit<br />
Figure 1.10: PASCAL chip architecture<br />
A/D conversion, buffering and multiplexing<br />
ADC<br />
ADC<br />
ADC<br />
ADC<br />
ADC<br />
Interface control unit<br />
reset<br />
clock<br />
<strong>data</strong>_out<br />
start_op<br />
end_op<br />
write_req<br />
write_ack<br />
jtag_bus<br />
cells at 40 MHz, while the ADCs are in stand-by mode. When PAS-<br />
CAL receives a trigger signal from CARLOS (that receives it from the<br />
Central Trigger Processor, CTP) , a control logic module on the PAS-<br />
CAL chip stops the analog memory write phase, freezes its contents<br />
and starts the read phase, performed in two steps: in the first step the<br />
ADCs are set to sample mode and the analog memory reads out the<br />
first sample for each anode row; after the memory settling time, the<br />
ADCs switch to the conversion mode and analog <strong>data</strong> are converted<br />
to digital through a successive approximation technique. When the<br />
conversion is finished, the control logic module on PASCAL starts the<br />
15
16<br />
The ALICE experiment<br />
Input range Output codes Code mapping Bits lost<br />
0-127 from 128 to 128 0xxxxxxx 0<br />
128-255 from 128 to 32 100xxxxx 2<br />
256-511 from 256 to 32 101xxxxx 3<br />
512-1023 from 512 to 64 11xxxxxx 3<br />
Table 1.2: Digital <strong>compression</strong> from 10 to 8 bits<br />
readout <strong>of</strong> the next sample from the analog memory and, at the same<br />
time, sends the 64 digital words to the AMBRA chip using a 40-bit<br />
wide bus. The read phase goes on until all the analog memory content<br />
has been converted to digital values or an abort signal comes from<br />
CARLOS (again receiving it from the CTP), meaning that the event<br />
has to be discarded.<br />
The AMBRA chip has mainly two functions: first, AMBRA has to<br />
compress <strong>data</strong> from 10 to 8 bits per sample, then it has to store the<br />
input <strong>data</strong> stream into a digital buffer. The principle used for <strong>compression</strong><br />
is to decrease the resolution for larger signals with a logarithmic<br />
or square-root law using the mapping shown in Table 1.2. Since the<br />
larger signals have better signal to noise ratio than the smaller ones,<br />
the accuracy <strong>of</strong> the measurement is not affected.<br />
The 4 AMBRA chips are static RAM able to contain 256 KBytes,<br />
thus being able to temporarily store 4 half-SDD complete events (one<br />
event corresponds to 256 × 256 Bytes = 64 KBytes). Data read/write<br />
stages are allowed at the same time: so far while the PASCAL chips<br />
are transferring <strong>data</strong> to the AMBRA ones, the AMBRA chips can send<br />
<strong>data</strong> belonging to an other event to the CARLOS chip. Actually, since<br />
four AMBRA chips have to transmit <strong>data</strong> over a single 8-bit bus, an<br />
arbitration mechanism has been implemented.
1.4 — SDD readout system<br />
1.4.2 Event-buffer strategy<br />
The dead time due to the SDD readout system is around 358.4 µs: this<br />
is, in fact, the time needed for reading a cell <strong>of</strong> the analog memory and<br />
for converting it into a digital word, 1.4 µs, multiplied by the number<br />
<strong>of</strong> cells, 256. This means that a new trigger signal will not be accepted<br />
before 358.4 µs have passed after the previous event. Every 1.4 µs each<br />
detector produces 512 bytes <strong>of</strong> <strong>data</strong>, then at least 10 8-bit buses per<br />
detector working at 40 MHz are required for <strong>data</strong> transfer. Unfortunately<br />
the space on the ladder is very limited and managing 80 <strong>data</strong><br />
lines for each detector (for a total <strong>of</strong> 320 for the half-ladder) is a very<br />
serious problem, especially for the input connections to the end-ladder<br />
readout units.<br />
The adopted solution to insert a digital multi-event buffer on the frontend<br />
readout unit between PASCAL and CARLOS allows to send <strong>data</strong><br />
towards the end-ladder unit at a lower speed, in fact if an other event<br />
arrives while transmitting <strong>data</strong> from AMBRA to CARLOS, an other<br />
digital buffer on AMBRA is ready to accept <strong>data</strong> coming from PAS-<br />
CAL. Data is transferred from AMBRA to CARLOS using an 8-bit<br />
bus in 1.65 ms (25 ns x 64 Kwords) while other events are processed<br />
by PASCAL and sent to AMBRA. For an average Pb-Pb event rate <strong>of</strong><br />
40 Hz and using a double-event digital buffer, our simulations indicate<br />
that the dead time due to buffer overrun is only 0.1 % <strong>of</strong> the total time.<br />
This is the amount <strong>of</strong> time during which AMBRA is transferring <strong>data</strong><br />
to CARLOS and the other buffer in AMBRA is full: in this situation<br />
a BUSY signal is asserted towards the CTP, meaning that no further<br />
trigger can be accepted. In order to reach a much smaller amount <strong>of</strong><br />
dead time even with higher event rates, a decision was taken to have a<br />
4-buffer-deep AMBRA device.<br />
In order to allow the full testability <strong>of</strong> the readout electronics at the<br />
board and system levels, the ASICs embody a JTAG standard interface.<br />
In this way it is possible to test each chip after the various assembly<br />
stages and during the run phase in order to check correct functionality.<br />
17
18<br />
The ALICE experiment<br />
Layer Ladders Detectors/ladder Data/ladder Total <strong>data</strong><br />
3 14 6 768 KBytes 10.5 MBytes<br />
4 22 8 1 MByte 22 MBytes<br />
Both 32.5 MBytes<br />
Table 1.3: Total amount <strong>of</strong> <strong>data</strong> produced by SDDs<br />
The same interface is used to download control information into the<br />
chips.<br />
Radiation tolerant deep-submicron processes (0.25 µm) has been used<br />
for the final versions <strong>of</strong> the ASICs. These technologies are now available<br />
and allow us to reduce size and power consumption with no degradation<br />
<strong>of</strong> the signal processing speed. Moreover, it has been shown that they<br />
have a better resistance to radiation when specific layout techniques<br />
are used, if compared to commercially available technologies.<br />
1.4.3 End-ladder module<br />
The end-ladder modules are located at both ends <strong>of</strong> each ladder (2<br />
per ladder); they receive <strong>data</strong> from the front-end modules, perform<br />
<strong>data</strong> <strong>compression</strong> with the CARLOS chip and send <strong>data</strong> to the DAQ<br />
through an optical fibre link.<br />
Beside that, the end-ladder board will host the TTCrx device, a<br />
chip receiving the global clock and trigger signals from the CTP and<br />
distributing it to PASCAL, AMBRA and CARLOS, and the power regulators<br />
for the complete ladder system.<br />
CARLOS receives 8 <strong>data</strong> streams coming from 8 half-detectors, i.e.<br />
from one half-ladder, for a total volume <strong>of</strong> <strong>data</strong> <strong>of</strong> 64 KBytes × 8=<br />
512 KBytes, at a rate <strong>of</strong> 320 MByte/s in input. Taking into account the<br />
number <strong>of</strong> ladders and detectors per ladder (see Table 1.3), the total<br />
volume <strong>of</strong> <strong>data</strong> produced by all the SDD modules amounts to around<br />
22 MBytes per event, while the space reserved on disk for permanent<br />
storage is 1.5 MBytes. This implies to use a <strong>compression</strong> algorithm
1.4 — SDD readout system<br />
with a <strong>compression</strong> coefficient <strong>of</strong> at least 22 and a reconstruction error<br />
as low as possible, in order to minimize physical information loss.<br />
Moreover since the trigger rate in proton-proton interactions amounts<br />
to 1 KHz, each event should be compressed and sent to the DAQ system<br />
within 1 ms. Actually, thanks to the buffering provided by the<br />
AMBRA chips, this processing time doubles to 2 ms, thus relaxing the<br />
timing constraint on the CARLOS chip.<br />
These constraints led us to the design and <strong>implementation</strong> <strong>of</strong> a first<br />
prototype <strong>of</strong> CARLOS. Then the desire to have better <strong>compression</strong><br />
performances and changes in the readout architecture due to the presence<br />
<strong>of</strong> radiations led us to the design and <strong>implementation</strong> <strong>of</strong> other two<br />
CARLOS prototypes. We are now going to design CARLOS v4 that<br />
is intended to be the final version <strong>of</strong> the <strong>compression</strong> ASIC. The first<br />
3 prototypes <strong>of</strong> the device CARLOS are explained in details in chapters<br />
3 and 4, while chapter 2 contains a review <strong>of</strong> existent <strong>compression</strong><br />
techniques.<br />
1.4.4 Choice <strong>of</strong> the technology<br />
The effects <strong>of</strong> radiations on electronics circuits can be divided in total<br />
dose effects and single event effects (SEU) [6]. Total dose modifies the<br />
thresholds <strong>of</strong> MOS transistors and increases leakage currents. This is <strong>of</strong><br />
particular concern in leakage sensitive analog circuits, like analog memories.<br />
For instance, assuming for the storage capacitors in the memory<br />
a value <strong>of</strong> 1 pF, a leakage current as small as 1 nA would change the<br />
value <strong>of</strong> the stored information by 0.2 V in 200 µs. This is <strong>of</strong> course<br />
unacceptable.<br />
Radiation tolerant layout practices prevent this risk and their use in<br />
analog circuits is therefore recommended. These designs techniques become<br />
extremely effective in deep-submicron CMOS technologies. Single<br />
event effects can trigger latch-up phenomena or can change the value<br />
<strong>of</strong> digital bits (Single Event Upset). Latch-up can be prevented with<br />
the systematic use <strong>of</strong> guard rings in the layout. Single event upset can<br />
19
20<br />
The ALICE experiment<br />
be a problem especially when occurring in the digital control logic and<br />
can be prevented by layout techniques or by redundancy in the system.<br />
Radiation tolerant layouts have <strong>of</strong> course area penalties. It can<br />
be estimated that in a given technology a minimum size inverter with<br />
radiation tolerant layout is 70% bigger than the corresponding inverter<br />
with standard layout. Nevertheless, a radiation tolerant inverter in a<br />
quarter micron technology is about eight times smaller than a standard<br />
inverter in a 0.8 µm technology. The radiation dose which will be received<br />
by the readout electronics will be quite low, below 100 Krad in<br />
10 years. This value is probably below the limit <strong>of</strong> what a standard<br />
technology can afford; however conservative considerations suggested<br />
the use <strong>of</strong> radiation tolerant techniques for critical parts <strong>of</strong> the circuit.<br />
These techniques have been proven to work up to 30 MRad and allow<br />
a lower area penalty and lower cost compared with the radiation hard<br />
processes. So far the library chosen for the <strong>implementation</strong> <strong>of</strong> PAS-<br />
CAL, AMBRA and CARLOS chips is the 0.25 µm IBM technology<br />
with standard cells designed at CERN to be radiation tolerant.
Chapter 2<br />
Data <strong>compression</strong> techniques<br />
Data <strong>compression</strong> [7] is the art <strong>of</strong> science <strong>of</strong> representing information in<br />
a compact form. These compact representations are created by identifying<br />
and using structures that exist in the <strong>data</strong>. Data can be characters<br />
in a text file, numbers that are samples <strong>of</strong> speech or image waveforms<br />
or sequences <strong>of</strong> numbers that are generated by physical processes.<br />
Data <strong>compression</strong> plays an important role in many fields, for example<br />
in digital television signals transmission. If we wanted to transmit an<br />
HDTV (High Definition TeleVision) signal without any <strong>compression</strong>, we<br />
would need to transmit about 884 Mbits/s. Using <strong>data</strong> <strong>compression</strong>,<br />
we need to transmit less than 20 Mbits/s along with audio information.<br />
Compression is now very much a part <strong>of</strong> everyday life. If you use computers<br />
you are probably using a variety <strong>of</strong> products that make use <strong>of</strong><br />
<strong>compression</strong>. Most modems now have <strong>compression</strong> capabilities that allow<br />
to transmit <strong>data</strong> many times faster than otherwise possible. File<br />
<strong>compression</strong> utilities, that permit us to store more on our disks, are<br />
now commonplace.<br />
This chapter contains an introduction to <strong>data</strong> <strong>compression</strong> with a description<br />
<strong>of</strong> the most commonly used <strong>compression</strong> algorithms, with the<br />
aim <strong>of</strong> finding out the most suitable <strong>compression</strong> technique for physical<br />
<strong>data</strong> coming out from the SDD.<br />
21
22<br />
Data <strong>compression</strong> techniques<br />
2.1 Applications <strong>of</strong> <strong>data</strong> <strong>compression</strong><br />
An early example <strong>of</strong> <strong>data</strong> <strong>compression</strong> is the Morse code, developed<br />
by Samuel Morse in the mid-19th century. Letters sent by telegraph<br />
are encoded with dots and dashes. Morse noticed that certain letters<br />
occurred more <strong>of</strong>ten than others. In order to reduce the average time<br />
required to send a message, he assigned shorter sequences to letters that<br />
occur more frequently such as a (· −)ande (·) and longer sequences to<br />
letters that occur less frequently such as q (− −·−)orj (· −−−).<br />
What is being used to provide <strong>compression</strong> in the Morse code is the<br />
statistical structure <strong>of</strong> the message to compress, i.e. the message contains<br />
letters with a probability to occurr higher than others. So far<br />
most <strong>compression</strong> techniques exploit the input statistical structure to<br />
provide <strong>compression</strong>, but this is not the only kind <strong>of</strong> structure that<br />
exists in the <strong>data</strong>.<br />
There are many other kinds <strong>of</strong> structures in <strong>data</strong> <strong>of</strong> differents types that<br />
can be exploited for <strong>compression</strong>. Let us take speech as an example.<br />
When we speak, the physical construction <strong>of</strong> our voice box dictates the<br />
kinds <strong>of</strong> sounds that we can produce, that is the mechanics <strong>of</strong> speech<br />
production impose a structure on speech. Therefore, instead <strong>of</strong> transmitting<br />
the sampled speech itself we could send information about the<br />
conformation <strong>of</strong> the voice box, which could be used by the receiver to<br />
synthesize the speech. An adequate amount <strong>of</strong> information about the<br />
conformation <strong>of</strong> the voice box can be represented much more compactly<br />
than the sampled values <strong>of</strong> the speech. This <strong>compression</strong> approach is<br />
being used currently in a number <strong>of</strong> applications, including transmission<br />
<strong>of</strong> speech over mobile radios and the synthetic voice in toys that<br />
speak.<br />
Data <strong>compression</strong> can also take advantage <strong>of</strong> some redundant structure<br />
<strong>of</strong> the input signal, that is a structure containing more information than<br />
needed. For example if a sound has to be transmitted for being heard<br />
by a human being, all frequencies below 20 Hz and above 20 KHz<br />
can be eliminated (thus providing <strong>compression</strong>) since these frequencies
2.2 — Remarks on information theory<br />
cannnot be perceived by humans.<br />
2.2 Remarks on information theory<br />
Without going into details we just want to recall Shannon’s theorem [8].<br />
He defines the information contents <strong>of</strong> a message in the following way:<br />
given a message which is made up <strong>of</strong> N characters in total containing<br />
n different symbols, the information contents measured in bits <strong>of</strong> the<br />
message is the following:<br />
n<br />
I = N (−pilog(pi)) (2.1)<br />
i=1<br />
where pi is the occurrence probability <strong>of</strong> symbol i.<br />
What is regarded as a symbol depends on the application: it might be<br />
an ASCII code, 16 or 32 bit words, words in a text and so on.<br />
A practical illustration <strong>of</strong> the Shannon theorem is the following: let<br />
us assume to measure a charge or any other physical quantity using<br />
an 8-bit digitizer. Very <strong>of</strong>ten measured quantities will be distributed<br />
approximately exponentially. Let us assume that the mean value <strong>of</strong><br />
the statistical distribution is one tenth <strong>of</strong> the dynamic range, i.e. 25.6.<br />
Each value between 0 and 255 is regarded as a symbol. Applying the<br />
−(i+0.5)<br />
e 25.6<br />
Shannon’s formula with n = 256 and pi = we obtain a mean<br />
25.6<br />
information content I/N <strong>of</strong> 6.11 bits per measured value which is almost<br />
25% less than the 8 bits we need saving the <strong>data</strong> as a sequence<br />
<strong>of</strong> bytes. Even if we had increased the dynamic range by a factor <strong>of</strong> 4<br />
using a 10-bit ADC, it turns out that the mean information contents<br />
expressed as the number <strong>of</strong> bits per measurement would have been virtually<br />
the same and hence the possible <strong>compression</strong> gain even higher<br />
(39%). This might be surprising but considering that an exponential<br />
distribution delivers a value beyond ten times the mean only every e10 = 22026 samples, it is clear that even using a quite long code for such<br />
measurements cannot have an appreciable influence on the <strong>compression</strong><br />
23
24<br />
Data <strong>compression</strong> techniques<br />
rates. Considering that with all likelihood in a realistic architecture we<br />
would have had to expand the 10 bits to 16, the gain is impressive 62%<br />
in the latter case.<br />
The exponential distribution is a good approximation <strong>of</strong> the raw <strong>data</strong> in<br />
many cases and in particular for <strong>data</strong> coming out from the SDD. Comparing<br />
various probability distributions with the same RMS it seems<br />
that the exponential distribution is particularly hard to compress. For<br />
instance a discrete spectrum being distributed according to a Gaussian<br />
with the same RMS as the above exponential only has an information<br />
contents <strong>of</strong> 4.75 bits.<br />
2.3 Compression techniques<br />
When we speak <strong>of</strong> a <strong>compression</strong> technique or a <strong>compression</strong> algorithm<br />
we actually refer to two algorithms: the first one takes an input X<br />
and generates a representation XC that requires fewer bits; the second<br />
one is a reconstruction algorithm that operates on the compressed<br />
representation XC to generate the reconstruction Y . Based upon the<br />
requirements <strong>of</strong> reconstruction, <strong>data</strong> <strong>compression</strong> schemes can be divided<br />
into two broad classes:<br />
– lossless <strong>compression</strong> schemes, in which Y is identical to X;<br />
– lossy <strong>compression</strong> schemes, which generally provide much higher<br />
<strong>compression</strong> than lossless ones, but force Y to be different from<br />
X.<br />
In fact Shannon showed that the best performance achievable by a<br />
lossless <strong>compression</strong> algorithm is to encode a stream with an average<br />
number <strong>of</strong> bits equal to the I/N value. On the contrary lossy algorithms<br />
do not have upper bounds to the <strong>compression</strong> ratio.
2.3 — Compression techniques<br />
2.3.1 Lossless <strong>compression</strong><br />
Lossless <strong>compression</strong> techniques involve no loss <strong>of</strong> information. If <strong>data</strong><br />
have been losslessly compressed, the original <strong>data</strong> can be recovered<br />
exactly from the compressed <strong>data</strong>. Lossless <strong>compression</strong> is generally<br />
used for discrete <strong>data</strong>, such as text, computer-generated <strong>data</strong> and some<br />
kind <strong>of</strong> image and video information. There are many situations that<br />
require <strong>compression</strong> where we want the reconstruction to be identical<br />
to the original. There are also a number <strong>of</strong> situations in which it is<br />
possible to relax this requirement in order to get more <strong>compression</strong>: in<br />
these cases lossy <strong>compression</strong> techniques have to be used.<br />
2.3.2 Lossy <strong>compression</strong><br />
Lossy <strong>compression</strong> techniques involve some loss <strong>of</strong> information and <strong>data</strong><br />
that have been compressed using lossy techniques generally cannot be<br />
recovered or reconstructed exactly. In return for accepting distortion in<br />
the reconstruction, we can generally obtain much higher <strong>compression</strong><br />
ratios than it is possible with lossless <strong>compression</strong>. Whether the distortion<br />
introduced is acceptable or not depends on the specific application:<br />
for instance if the input source X contains a physical information plus<br />
noise, while the output Y contains only the physical signal, the distortion<br />
introduced is completely acceptable.<br />
2.3.3 Measures <strong>of</strong> performance<br />
A <strong>compression</strong> algorithm can be evaluated in a number <strong>of</strong> different<br />
ways. We could measure the relative complexity <strong>of</strong> the algorithm, the<br />
memory required to implement the algorithm, how fast the algorithm<br />
performs on a given machine or on dedicated <strong>hardware</strong>, the amount <strong>of</strong><br />
<strong>compression</strong> and how closely the reconstruction resembles the original.<br />
The last two features are the most important ones for our application<br />
to SDD <strong>data</strong>.<br />
25
26<br />
Data <strong>compression</strong> techniques<br />
A very logical way <strong>of</strong> measuring how well a <strong>compression</strong> algorithm compresses<br />
a given set <strong>of</strong> <strong>data</strong> is to look at the ratio <strong>of</strong> the number <strong>of</strong> bits<br />
required to represent the <strong>data</strong> before <strong>compression</strong> to the number <strong>of</strong> bits<br />
required to represent the <strong>data</strong> after <strong>compression</strong>. This ratio is called<br />
<strong>compression</strong> ratio. Suppose <strong>of</strong> storing an image made up <strong>of</strong> a square<br />
array <strong>of</strong> 256x256 8-bit pixels (exactly as a half SDD): it requires 64<br />
KBytes. If the compressed image requires only 16 KBytes we would<br />
then say that the <strong>compression</strong> ratio is 4.<br />
Another way <strong>of</strong> reporting <strong>compression</strong> performance is to provide the<br />
average number <strong>of</strong> bits required to represent a single sample. This is<br />
generally referred to as the rate. For instance, for the same image described<br />
above, the average number <strong>of</strong> bits per pixel in the compressed<br />
representation is 2: thus the rate is 2 bits/pixel.<br />
In lossy <strong>compression</strong> the reconstruction differs from the original <strong>data</strong>.<br />
Therefore, in order to determine the efficiency <strong>of</strong> a <strong>compression</strong> algorithm,<br />
we have to find some way to quantify the difference. The difference<br />
between the original <strong>data</strong> and the reconstructed ones is <strong>of</strong>ten<br />
called distortion. This value is usually calculated as a mathematical or<br />
percentual difference among <strong>data</strong> before and after <strong>compression</strong>.<br />
2.3.4 Modelling and coding<br />
The development <strong>of</strong> <strong>data</strong> <strong>compression</strong> algorithms for a variety <strong>of</strong> <strong>data</strong><br />
can be divided in two steps. The first phase is usually referred to<br />
as modelling. In this phase we try to extract information about any<br />
redundancy that exists in the <strong>data</strong> and describe the redundancy in the<br />
form <strong>of</strong> a model. The second phase is called coding. The description <strong>of</strong><br />
the model and a description <strong>of</strong> how the <strong>data</strong> differ from the model are<br />
encoded, generally using a binary alphabet.
2.4 — Lossless <strong>compression</strong> techniques<br />
2.4 Lossless <strong>compression</strong> techniques<br />
This section contains an explanation <strong>of</strong> the most widely used lossless<br />
<strong>compression</strong> techniques. In particular the following items are covered:<br />
– Huffman coding;<br />
– runlengthencoding;<br />
– differential encoding;<br />
– dictionary techniques;<br />
– selective readout.<br />
Some <strong>of</strong> these algorithms have been chosen for direct application in the<br />
1D <strong>compression</strong> algorithm implemented in the prototypes CARLOS v1<br />
and v2.<br />
2.4.1 Huffman coding<br />
Huffman based <strong>compression</strong> algorithm [7] encodes <strong>data</strong> samples in this<br />
way: symbols that occur more frequently (i.e. symbols having a higher<br />
probability <strong>of</strong> occurrence) will have shorter codewords than symbols<br />
that occurr less frequently. This leads to a variable-length coding<br />
scheme, in which each symbol can be encoded with a different number<br />
<strong>of</strong> bits. The choice <strong>of</strong> the code to assign to each symbol or, in other<br />
words, the design <strong>of</strong> the Huffman look-up table is carried out with standard<br />
criteria.<br />
An example can better explain this sentence. Suppose to have 5 <strong>data</strong>,<br />
a1, a2, a3, a4 and a5, each one with a probability <strong>of</strong> occurrence, P (a1) =<br />
0.2, P (a2) =0.4, P (a3) =0.2, P (a4) =0.1, P (a5) =0.1; at first, in<br />
order to write down the encoding c(ai) <strong>of</strong> each <strong>data</strong> ai, it is necessary<br />
to order <strong>data</strong> from the higher probable to the lower probable one, as<br />
shown in Tab. 2.1.<br />
27
28<br />
Data <strong>compression</strong> techniques<br />
Data Probability Code<br />
a2 0.4 c(a2)<br />
a1 0.2 c(a1)<br />
a3 0.2 c(a3)<br />
a4 0.1 c(a4)<br />
a5 0.1 c(a5)<br />
Table 2.1: Sample <strong>data</strong> and probability <strong>of</strong> occurrence<br />
The least probable <strong>data</strong> are a4 and a5; they are assigned the following<br />
codes:<br />
c(a4) = α1 ∗ 0 (2.2)<br />
c(a5) = α1 ∗ 1 (2.3)<br />
where α1 is a generic binary string and ∗ represents the concatenation<br />
between two strings.<br />
If a ′ 4 is a <strong>data</strong> for which the following relationship holds true P (a′ 4 )=<br />
P (a4)+P (a5) =0.2, then <strong>data</strong> in Tab. 2.1 can be reordered from the<br />
higher to the lower probable, as shown in Tab. 2.2.<br />
Data Probability Code<br />
a2 0.4 c(a2)<br />
a1 0.2 c(a1)<br />
a3 0.2 c(a3)<br />
a ′ 4 0.2 α1<br />
Table 2.2: Introduction <strong>of</strong> <strong>data</strong> a ′ 4<br />
In this table lower probability <strong>data</strong> are a3 and a ′ 4 : so far they can be<br />
encoded in the following way:<br />
c(a3) = α2 ∗ 0 (2.4)<br />
c(a ′ 4 ) = α2 ∗ 1 (2.5)<br />
Nevertheless, being c(a ′ 4 )=α1, from Tab. 2.2, then from (2.5) follows
2.4 — Lossless <strong>compression</strong> techniques<br />
that α1 = α2 ∗ 1, e then, (2.2) and (2.3) become:<br />
c(a4) = α2 ∗ 10 (2.6)<br />
c(a5) = α2 ∗ 11 (2.7)<br />
Defining a ′ 3 as the <strong>data</strong> for which P (a′ 3 )=P (a3)+P (a ′ 4 )=0.4, <strong>data</strong><br />
from Tab. 2.2 can be reordered from the higher probable to the lower<br />
probable as shown in Tab. 2.3.<br />
Data Probability Code<br />
a2 0.4 c(a2)<br />
a ′ 3 0.4 α2<br />
a1 0.2 c(a1)<br />
Table 2.3: Introduction <strong>of</strong> <strong>data</strong> a ′ 3<br />
In Tab. 2.3 lower probability <strong>data</strong> are a ′ 3 and a1; so far they can be<br />
encoded in the following way:<br />
c(a ′ 3 ) = α3 ∗ 0 (2.8)<br />
c(a1) = α3 ∗ 1 (2.9)<br />
Being c(a ′ 3 )=α2, from Tab. 2.3, then from (2.8) follows α2 = α3 ∗ 0,<br />
so far (2.4), (2.6) and (2.7), become:<br />
Finally, by defining a ′′<br />
3<br />
c(a3) = α3 ∗ 00 (2.10)<br />
c(a4) = α3 ∗ 010 (2.11)<br />
c(a5) = α3 ∗ 011 (2.12)<br />
as the <strong>data</strong> for which the following relationship<br />
holds true P (a ′′<br />
3 )=P (a′ 3 )+P (a1) =0.6, <strong>data</strong> from Tab. 2.3 can be<br />
reordered from the higher probable to the lower probable as shown in<br />
Tab. 2.4.<br />
29
30<br />
Data <strong>compression</strong> techniques<br />
Data Probability Code<br />
a ′′<br />
3 0.6 α3<br />
a2 0.4 c(a2)<br />
Table 2.4: Introduction <strong>of</strong> <strong>data</strong> a ′′<br />
3<br />
Only two <strong>data</strong> being left, the encoding is immediate:<br />
c(a ′′<br />
3) = 0 (2.13)<br />
c(a2) = 1 (2.14)<br />
Beside that, being c(a ′′<br />
3 )=α3, as shown in Tab. 2.4, then from (2.13)<br />
the following relationship becomes α3 = 0, i.e., (2.9), (2.10), (2.11) and<br />
(2.12), can be written as:<br />
c(a1) = 01 (2.15)<br />
c(a3) = 000 (2.16)<br />
c(a4) = 0010 (2.17)<br />
c(a5) = 0011 (2.18)<br />
Tab. 2.5 contains a complete view <strong>of</strong> the Huffman table so far generated.<br />
The method used for building the Huffman table in this example can<br />
be applied as it is to every <strong>data</strong> stream having whichever statistical<br />
structure. Huffman codes c(ai), so far generated, can be univoquely<br />
decoded: this means that from a sequence <strong>of</strong> variable length codes<br />
c(ai) created using the Huffman coding, only one <strong>data</strong> sequence ai can<br />
be reconstructed.<br />
Beside that, as shown in the example in Tab. 2.5, none <strong>of</strong> the codes<br />
c(ai) is contained as a prefix in the remaining codes; codes following<br />
this property are named prefix codes. In particular prefix codes also<br />
follow the property <strong>of</strong> being univoquely decodable, while the contrary<br />
does not always hold true.<br />
Finally an Huffman code is defined an optimum code since, among all<br />
the prefix codes, it is the one that minimizes the average code length.
2.4 — Lossless <strong>compression</strong> techniques<br />
Data Probability Code<br />
a2 0.4 1<br />
a1 0.2 01<br />
a3 0.2 000<br />
a4 0.1 0010<br />
a5 0.1 0011<br />
Table 2.5: Huffman table<br />
2.4.2 Run Length encoding<br />
Very <strong>of</strong>ten a <strong>data</strong> stream happens to contain long sequences <strong>of</strong> the<br />
same value: this may happen when a physical quantity holds the same<br />
value for several sampling periods, it can happen in text files where a<br />
character can be repeated several times, it can happen in digital images<br />
where spaces with the same color are encoded with pixels with the same<br />
value, and so on. The <strong>compression</strong> algorithm based on the Run Length<br />
[9] encoding is well suited for such repetitive <strong>data</strong>.<br />
As shown in the example in Fig. 2.1, where the zero symbol has been<br />
chosen as the repetitive <strong>data</strong> in the sequence, each zero sequence in the<br />
original sequence is encoded as a couple <strong>of</strong> words: the first contains<br />
the code for the zero symbol, the second contains the number <strong>of</strong> zero<br />
symbols consecutively occurred in the original sequence.<br />
The performances <strong>of</strong> the algorithm get better, in terms <strong>of</strong> <strong>compression</strong><br />
ratio, when the input <strong>data</strong> stream contains long sub-sequences <strong>of</strong> the<br />
same symbol and when it contains few single subsequences, such as the<br />
second code, 0→00, in Fig. 2.1. Finally this <strong>compression</strong> algorithm can<br />
be implemented in different ways: it can be applied only on one value<br />
<strong>of</strong> the original <strong>data</strong> sequence or on different elements <strong>of</strong> the sequence.<br />
One <strong>of</strong> the most important applications <strong>of</strong> the Run Length encoding<br />
system is the <strong>compression</strong> <strong>of</strong> facsimile or fax. In facsimile transmission a<br />
page is scanned and converted into a sequence <strong>of</strong> white and black pixels:<br />
since it is highly probable to have very long sequences <strong>of</strong> white or black<br />
pixels, coding the lengths <strong>of</strong> runs instead <strong>of</strong> coding individual pixels<br />
31
32<br />
Original sequence<br />
Run Length<br />
encoded sequence<br />
Data <strong>compression</strong> techniques<br />
17 8 54 0 0 0 97 5 16 0 45 23 0 0 0 0 43<br />
17 8 54 0 2 97 5 16 0 0 45 23 0 3 43<br />
Figure 2.1: Run length encoding<br />
leads to high <strong>compression</strong> ratios. Beside that Run Length encoding is<br />
<strong>of</strong>ten used in conjunction with other <strong>compression</strong> algorithms, after the<br />
input <strong>data</strong> stream has been transformed in a more compressible form.<br />
2.4.3 Differential encoding<br />
Differential encoding [7] is obtained performing the difference between<br />
one sample and the previous one, except for the first one, whose value<br />
is left unchanged, as shown in Fig. 2.2.<br />
It is to be noticed that each <strong>data</strong> <strong>of</strong> the original sequence can be reconstructed<br />
by summing to the corresponding <strong>data</strong> in the coded sequence<br />
all the previous <strong>data</strong>: for instance, 89 = 79+17+2+5+0+0+(−3)+<br />
(−6) + (−5). So far it is very important to leave the first value in the<br />
coded sequence unchanged, otherwise the reconstruction process cannot<br />
be carried out correctly. The differential algorithm is well suited<br />
for all <strong>data</strong> sequences with very small changes, in value, between consecutive<br />
samples: in fact for this kind <strong>of</strong> <strong>data</strong> streams the differential<br />
encoding produces an encoded stream with a smaller dynamics, i.e. the<br />
difference between the maximum and minimum values in the encoded<br />
stream is smaller than the same value calculated in the original sequence.<br />
So far the encoded sequence can be represented with a smaller<br />
number <strong>of</strong> bits than the original one.
Original sequence<br />
Sequence after<br />
differential encoding<br />
2.4 — Lossless <strong>compression</strong> techniques<br />
17 19 24 24 24 21 15 10 89 95 96 96 96 95 94 94 95<br />
...<br />
17 2 5 0 0 −3 −6 −5 79 6 1 0 0 −1 −1 0 1<br />
Figure 2.2: Differential encoding<br />
Beside that the differential encoding can be used in conjunction with<br />
the Run Length encoding system: in fact, if a sequence contains long<br />
sequences <strong>of</strong> equal values, it is converted into a sequence <strong>of</strong> zeros by the<br />
differential encoder and then further compressed using the Run Length<br />
encoder.<br />
2.4.4 Dictionary techniques<br />
In many applications, the output <strong>of</strong> a source consists <strong>of</strong> recurring patterns.<br />
A classical example is a text source in which certain patterns<br />
or words recur frequently. Also, there are certain patterns that simply<br />
do not occur or, if they do, occurr with great rarity. A very reasonable<br />
approach to encoding such sources is to keep a list or dictionary<br />
<strong>of</strong> frequently occurring patterns. When these patterns appear in the<br />
source, they are encoded with the reference to the dictionary containing<br />
the address to the right table location. If the pattern does not<br />
appear in the dictionary, then it can be encoded using some other,<br />
less efficient, method. In effect we are splitting the input domain in<br />
two classes: frequently occurring patterns and infrequently occurring<br />
patterns. For this technique to be effective, the class <strong>of</strong> frequently occurring<br />
patterns, and hence the size <strong>of</strong> the dictionary, must be much<br />
smaller than the number <strong>of</strong> all possible patterns. Depending upon how<br />
much information is available to build a dictionary, it can be used a<br />
static or a dynamic approach to the creation <strong>of</strong> the dictionary. Choos-<br />
33
34<br />
Data <strong>compression</strong> techniques<br />
ing a static dictionary technique is most appropriate when considerable<br />
prior knowledge about the source is available.<br />
When no a priori information is available on the structure <strong>of</strong> the input<br />
source an adaptive technique is adopted: for example the UNIX compress<br />
command makes use <strong>of</strong> this technique. It starts with a dictionary<br />
<strong>of</strong> size 512, thus transmitting codewords 9-bit long. Once the dictionary<br />
has filled up, the size <strong>of</strong> the dictionary is doubled to 1024 entries,<br />
so far transmitting codewords 10-bit long. The size <strong>of</strong> the dictionary is<br />
progressively filled up until it contains 216 entries, then compress becomes<br />
a static coding technique. At this point the algorithm monitors<br />
the <strong>compression</strong> ratio: if it falls below a threshold, the dictionary is<br />
flushed and the dictionary building process is restarted.<br />
The dictionary techniques are also used in the image <strong>compression</strong> field<br />
in the GIF (Graphics Interchange Format) standard, working in a very<br />
similar way to the compress command.<br />
2.4.5 Selective readout<br />
The selective readout technique [10] is a lossless <strong>data</strong> <strong>compression</strong> technique<br />
usually applied in High Energy Physics Experiments. Since really<br />
interesting <strong>data</strong> are a small fraction <strong>of</strong> the total amount <strong>of</strong> <strong>data</strong> actually<br />
produced, it proves useful to transmit and store only those <strong>data</strong>.<br />
The selective readout may reduce the <strong>data</strong> size by identifying regions<br />
in space containing a significant amount <strong>of</strong> energy. For example in<br />
the SDD case, the Central Trigger Processor (CTP) unit defines a Region<br />
Of Interest (ROI) that, event by event, contains the information<br />
<strong>of</strong> which ladders are to be read out and which ones can be discarded.<br />
Using the ROI feature a very high <strong>compression</strong> ratio can be achieved.
2.5 — Lossy <strong>compression</strong> techniques<br />
2.5 Lossy <strong>compression</strong> techniques<br />
This section contains an explanation <strong>of</strong> the most widely used lossy <strong>compression</strong><br />
techniques. In particular the following items will be covered:<br />
– zero suppression;<br />
– transform coding;<br />
– sub-band coding with some remarks on wavelets.<br />
The first <strong>of</strong> these algorithms has been chosen for direct application in<br />
the 1D <strong>compression</strong> algorithm implemented in the prototypes CARLOS<br />
v1 and v2.<br />
2.5.1 Zero supression<br />
Zero suppression is the very simple technique <strong>of</strong> eliminating <strong>data</strong> samples<br />
below a certain threshold, by putting them to 0. Zero suppression<br />
proves to be very useful in <strong>data</strong> containing large quantities <strong>of</strong> zeros and<br />
interesting <strong>data</strong> concentrated in small clusters: for instance, being the<br />
mean occupancy <strong>of</strong> a SDD in the inner layer <strong>of</strong> 2.5 %, a <strong>compression</strong><br />
ratio <strong>of</strong> 40 can be obtained by using the zero suppression technique<br />
only.<br />
A problem arises since the SDD <strong>data</strong> and, in general, <strong>data</strong> collections<br />
contain the sum <strong>of</strong> two different distributions: the real signal corresponding<br />
to the interesting physical event and a white noise with a<br />
Gaussian distribution around a mean value. So far if a lossy <strong>compression</strong><br />
algorithm obtains a good <strong>compression</strong> ratio just eliminating the<br />
noise, the distortion introduced is absolutely acceptable. The key task<br />
for a fair <strong>implementation</strong> <strong>of</strong> the zero suppression technique is the choice<br />
<strong>of</strong> the right value <strong>of</strong> the threshold parameter, in order to eliminate noise<br />
while preserving the physical signal.<br />
In the case <strong>of</strong> <strong>data</strong> coming out from the SDD detector and related<br />
front-end electronics, <strong>data</strong> values are shifted from the 0 level to a baseline<br />
level greater than 0. This baseline level corresponds to the mean<br />
35
36<br />
Data <strong>compression</strong> techniques<br />
value <strong>of</strong> the noise introduced by the preamplification electronics; then<br />
there is a spread among this value due to the RMS <strong>of</strong> the Gaussian<br />
distribution <strong>of</strong> the noise.<br />
The noise level introduced by the electronics may vary with time and<br />
with the amount <strong>of</strong> radiation absorbed: so far a <strong>compression</strong> algorithm<br />
making use <strong>of</strong> the zero suppression technique has to allow a tunable<br />
value <strong>of</strong> the threshold level, in order to accomodate fluctuations or<br />
drifts in the baseline values. Following this indication, the threshold<br />
level used in CARLOS v1 and v2 is completely presettable via s<strong>of</strong>tware<br />
using the JTAG port.<br />
2.5.2 Transform coding<br />
Transform coding [7] takes as input a <strong>data</strong> sequence and transforms it<br />
into a sequence in which most part <strong>of</strong> the information is contained into<br />
a few samples: so far the new sequence can be further compressed using<br />
the other <strong>compression</strong> algorithms described up to now. The key point<br />
<strong>of</strong> transform coding is the choice <strong>of</strong> the transform: this depends on the<br />
features and redundancies <strong>of</strong> the input <strong>data</strong> stream to compress. The<br />
algorithm, working on N elements at a time, consists <strong>of</strong> three steps:<br />
– transform: the input sequence {sn} is split in N-long sequences;<br />
then each block is mapped, using a reversible transformation, into<br />
the sequence {cn}.<br />
– quantization: the transformed sequence {cn} is quantized, i.e. a<br />
number <strong>of</strong> bits is assigned to each sample depending on the dynamics<br />
<strong>of</strong> the sequence, <strong>compression</strong> ratio desired and acceptable<br />
distortion.<br />
– coding: the quantized sequence {cn} is encoded using a binary<br />
encoding technique such as Run Length encoding or the Huffman<br />
coding.<br />
These concepts can be expressed in a mathematical way: given a sequence<br />
in input {sn}, it is divided in N-long blocks and it is mapped
2.5 — Lossy <strong>compression</strong> techniques<br />
using the reversible transform A into the sequence {cn}:<br />
or, in other terms:<br />
cn =<br />
N−1 <br />
i=0<br />
c = As (2.19)<br />
sian,i con [A]i,j = ai,j (2.20)<br />
Quantization and encoding steps are performed on the sequence {cn},<br />
so to optimize <strong>compression</strong>.<br />
The de<strong>compression</strong> algorithm, by means <strong>of</strong> the inverse transform B =<br />
A −1 , reconstructs the original sequence {sn} from the encoded sequence<br />
{cn}, in the following way:<br />
or:<br />
sn =<br />
N−1 <br />
i=0<br />
s = Bc (2.21)<br />
sibn,i con [B]i,j = bi,j (2.22)<br />
These concepts can be easily extended to bi-dimensional <strong>data</strong>, such as<br />
images or 2-D charge distributions, as in the case <strong>of</strong> the SDD.<br />
Let us take a portion N × N <strong>of</strong> a digital image S, containing Si,j as<br />
its (i, j)-th pixel; by performing a reversible bi-dimensional transform<br />
A working on N × N pixels at a time, with ai,j (i, j)-th element <strong>of</strong> the<br />
transform matrix A and Ci,j (i, j)-th pixel <strong>of</strong> the block N × N <strong>of</strong> the<br />
compressed image C, the following holds true:<br />
Ck,l =<br />
N−1 <br />
i=0<br />
N−1 <br />
j=0<br />
Si,jai,jak,l<br />
(2.23)<br />
A transform is defined separable if it is possible to apply the 2D transform<br />
<strong>of</strong> a N ×N block by applying, first, a 1D transform on the N rows<br />
<strong>of</strong> the block and, then, a transform on the N columns <strong>of</strong> the block, just<br />
transformed; by choosing a separable transform the (2.23) becomes:<br />
Ck,l =<br />
N−1 <br />
i=0<br />
N−1 <br />
j=0<br />
Si,jak,ial,j<br />
(2.24)<br />
37
38<br />
or, expressed as a matrix:<br />
Data <strong>compression</strong> techniques<br />
C = ASA T<br />
The inverse transform is the following one:<br />
S = BCB T<br />
(2.25)<br />
(2.26)<br />
Frequently orthonormal transforms are used, so that B = A −1 = A T ,<br />
in a way that calculating the inverse trasform reduces to:<br />
S = A T CA (2.27)<br />
Even in the bi-dimensional case, in order to reach a high <strong>compression</strong><br />
ratio, a good transform has to be chosen. For instance the JPEG<br />
standard has adopted, until the year 2000, the use <strong>of</strong> the Discrete<br />
Cosine Transform, known as DCT.<br />
If A is the matrix representing the DCT, the following relationship<br />
follows:<br />
<br />
(2j +1)iπ<br />
[A]i,j = w(i)cos<br />
j =0, 1,... ,N − 1 (2.28)<br />
2N<br />
where:<br />
⎧<br />
⎨<br />
w(i) =<br />
⎩<br />
<br />
1<br />
N <br />
2<br />
N<br />
i =0<br />
i =1,... ,N − 1<br />
Fig. 2.3 gives a graphical interpretation <strong>of</strong> (2.28).<br />
After choosing the transform, the next step consists in the quantization<br />
<strong>of</strong> the transformed image.<br />
Several approaches are possible: for example the zonal mapping foresees<br />
a preliminary analysis <strong>of</strong> the transformed coefficients statistics and<br />
alaterassignment<strong>of</strong>afixednumber<strong>of</strong>bits.<br />
The name zonal mapping comes from the assignment <strong>of</strong> a fixed number<br />
<strong>of</strong> bits depending on the zone in which each coefficient is placed in the<br />
square N × N block under study; Tab. 2.6 reports an allocation bit
2.5 — Lossy <strong>compression</strong> techniques<br />
Figure 2.3: Base coefficients for the bi-dimensional DCT in the case N =8<br />
8 7 5 3 1 1 0 0<br />
7 5 3 2 1 0 0 0<br />
4 3 2 1 1 0 0 0<br />
3 3 2 1 1 0 0 0<br />
2 1 1 1 0 0 0 0<br />
1 0 0 0 0 0 0 0<br />
0 0 0 0 0 0 0 0<br />
Table 2.6: Allocation bit table for a 8 × 8block<br />
table for a 8 × 8block.<br />
It is interesting to note that quantization in Tab. 2.6 assigns zero bits<br />
to the coefficients in the lower-right side <strong>of</strong> the table: actually this is<br />
equivalent to ignore these coefficients. This kind <strong>of</strong> quantization makes<br />
sense since lower-right side coefficients come from a transformation <strong>of</strong><br />
the original image using high frequency cosines, i.e. these coefficients<br />
contain an information corresponding to the high frequencies in the<br />
original signal, see Fig. 2.3.<br />
Since human eye response strongly depends on frequency and, in particular,<br />
it is sensible to variations at low frequencies and far less sensible<br />
at higher frequencies, quantization in Tab. 2.6 tends to ignore informations<br />
that the human eye would not appreciate at all.<br />
39
40<br />
Data <strong>compression</strong> techniques<br />
After quantization, only non-null coefficients are transmitted. In particular<br />
for every non-null coefficient, two words have to be transmitted:<br />
the first with the quantized value <strong>of</strong> the coefficient itself; the second<br />
containing the number <strong>of</strong> null samples occurred after the last non null<br />
coefficient. This allows the de<strong>compression</strong> algorithm to exactly reconstruct<br />
the sequence as it was quantized and, from that, the original<br />
image.<br />
As an example, let us suppose to have the 8 × 8 8-bit pixels image<br />
reported in Tab. 2.7.<br />
124 125 122 120 122 119 117 118<br />
121 121 120 119 119 120 120 118<br />
126 124 123 122 121 121 120 120<br />
124 124 125 125 126 125 124 124<br />
127 127 128 129 130 128 127 125<br />
143 142 143 142 140 139 139 139<br />
150 148 152 152 152 152 150 151<br />
156 159 158 155 158 158 157 156<br />
Table 2.7: 8 × 8 block <strong>of</strong> a digital image<br />
Each value <strong>of</strong> the block is translated <strong>of</strong> a factor 2p−1 ,wherepis the<br />
number <strong>of</strong> bits per pixel (in this case p = 8); then the DCT is applied<br />
to the block obtaining the coefficients ci,j reported in Tab. 2.8.<br />
39.88 6.56 -2.24 1.22 -0.37 -1.08 0.79 1.13<br />
-102.43 4.56 2.26 1.12 0.35 -0.63 -1.05 -0.48<br />
37.77 1.31 1.77 0.25 -1.50 -2.21 -0.10 0.23<br />
-5.67 2.24 -1.32 -0.81 1.41 0.22 -0.13 0.17<br />
-3.37 -0.74 -1.75 0.77 -0.62 -2.65 -1.30 0.76<br />
5.98 -0.13 -0.45 -0.77 1.99 -0.26 1.46 0.00<br />
3.97 5.52 2.39 -0.55 -0.051 -0.84 -0.52 -0.13<br />
-3.43 0.51 -1.07 0.87 0.96 0.09 0.33 0.01<br />
Table 2.8: DCT coefficients related to the block in Tab. 2.7.
2.5 — Lossy <strong>compression</strong> techniques<br />
As already stated high-frequency related coefficients in the lower-right<br />
corner tend to be quite close to 0, while most <strong>of</strong> the information is<br />
concentrated in the upper-left corner.<br />
The quantization <strong>of</strong> the coefficients is obtained using the reference table<br />
as in Tab. 2.9; in particular quantized lij values are obtained with<br />
the following formula:<br />
<br />
cij<br />
lij = +0.5<br />
(2.29)<br />
Q t ij<br />
where Q t ij is the (i,j)-th element <strong>of</strong> the quantization table and ⌊⌋ is a<br />
function for which ⌊x⌋ is the greatest integer less than x.<br />
16 11 10 16 24 40 51 61<br />
12 12 14 19 26 58 60 55<br />
14 13 16 24 40 57 69 56<br />
14 17 22 29 51 87 80 62<br />
18 22 37 56 68 109 103 77<br />
24 35 55 64 81 104 113 92<br />
49 64 78 87 103 121 120 101<br />
72 92 95 98 112 100 103 99<br />
Table 2.9: Quantization table<br />
Tab. 2.10 contains the resulting bit allocation table obtained using the<br />
values contained in the quantization table Tab. 2.9:<br />
After studying the structure <strong>of</strong> matrices like Tab. 2.10, the order chosen<br />
for sending coefficients is the one shown in Fig. 2.4.<br />
This choice allows to have a high probability that the final sequence<br />
contains a lot <strong>of</strong> zero coefficients; so far this part <strong>of</strong> the sequence can<br />
be encoded using the Run-Length technique.<br />
2.5.3 Subband coding<br />
A signal can be decomposed in different frequency components (see<br />
Fig. 2.5) using analog or digital filters, then each resulting signal can<br />
41
42<br />
Data <strong>compression</strong> techniques<br />
2 1 0 0 0 0 0 0<br />
-9 0 0 0 0 0 0 0<br />
3 0 0 0 0 0 0 0<br />
0 0 0 0 0 0 0 0<br />
0 0 0 0 0 0 0 0<br />
0 0 0 0 0 0 0 0<br />
0 0 0 0 0 0 0 0<br />
0 0 0 0 0 0 0 0<br />
Table 2.10: Resulting bit allocation table<br />
Figure 2.4: Zig-zag scanning pattern for an 8x8 transform<br />
be encoded and compressed using a specific algorithm. Digital filtering<br />
[9] involves taking a weighted sum <strong>of</strong> current and past inputs to the<br />
filter and, in some cases, the past outputs to the filter. The general<br />
form <strong>of</strong> the input-output relationship <strong>of</strong> the filter is given by:<br />
N<br />
M<br />
yn = aixn−i +<br />
(2.30)<br />
biyn−i<br />
i=0<br />
i=1<br />
where the sequence xn is the input to the filter, the sequence yn is<br />
the output from the filter and the values ai and bi are called the filter<br />
coefficients. If the input sequence is a single 1 followed by all 0s, the<br />
output sequence is called the impulse response <strong>of</strong> the filter. The im-
2.5 — Lossy <strong>compression</strong> techniques<br />
input signal<br />
Figure 2.5: Decomposition <strong>of</strong> a signal in frequency components<br />
pulse response completely specifies the filter: once we know the impulse<br />
response <strong>of</strong> the filter, we know the relationship between the input and<br />
the output <strong>of</strong> the filter. Notice that if the bi are all zero, there the<br />
impulse response will die out after N samples. These filters are called<br />
finite impulse response or FIR filters. In FIR filters Eq. 2.30 reduces<br />
to a convolution operation between the input signal and the filter coefficients.<br />
Filters with the nonzero values for some <strong>of</strong> the bi are called<br />
infinite response filters or IIR filters.<br />
The basic subband coding works as follows: the source is passed<br />
through a bank <strong>of</strong> filters (a 3-level filter bank is shown in Fig. 2.6),<br />
called the analysis filter bank which covers the range <strong>of</strong> frequencies<br />
that make up the source; the outputs <strong>of</strong> the filters are then subsampled<br />
as in Fig. 2.7. The justification <strong>of</strong> subsampling is the Nyquist rule and<br />
its generalization, which tells that for perfect reconstruction we only<br />
need twice as many samples per second as the range <strong>of</strong> frequencies.<br />
This means that it is possible to reduce the number <strong>of</strong> samples at the<br />
output <strong>of</strong> the filter as the range <strong>of</strong> frequencies is less than the range <strong>of</strong><br />
frequencies at the input <strong>of</strong> the filter. The process <strong>of</strong> reducing the number<br />
<strong>of</strong> samples is called decimation or downsampling. The amount <strong>of</strong><br />
decimation depends on the ratio <strong>of</strong> the bandwidth <strong>of</strong> the filter output<br />
43
44<br />
Data <strong>compression</strong> techniques<br />
High pass filter<br />
Low pass filter<br />
High pass filter<br />
Low pass filter<br />
High pass filter<br />
Low pass filter<br />
High pass filter<br />
Low pass filter<br />
High pass filter<br />
Low pass filter<br />
High pass filter<br />
Low pass filter<br />
High pass filter<br />
Low pass filter<br />
Figure 2.6: An 8-band 3-level filter bank<br />
to the filter input. If the bandwidth at the output <strong>of</strong> the filter is 1/M<br />
<strong>of</strong> the bandwidth at the input <strong>of</strong> the filter, the output is decimated by<br />
a factor <strong>of</strong> M by keeping every Mth sample. Once the output <strong>of</strong> the<br />
filters has been decimated, the output is encoded using one <strong>of</strong> several<br />
encoding schemes explained so far.<br />
Along with the selection <strong>of</strong> the <strong>compression</strong> scheme, the allocation <strong>of</strong><br />
bits between the subbands is an important design parameter, since<br />
different subbands contain differing amounts <strong>of</strong> information. The bit<br />
allocation procedure can have a significant impact on the quality <strong>of</strong><br />
the final reconstruction, especially when the information component <strong>of</strong><br />
different bands is very different.<br />
The de<strong>compression</strong> phase, in subband coding also named synthesis,<br />
works as follows: first the encoded samples for each subband are decoded<br />
at the receiver, then the decoded values are upsampled by inserting<br />
an appropriate number <strong>of</strong> zeros between the samples, then the<br />
upsampled signals are passed through a bank <strong>of</strong> reconstruction filters<br />
and added together.
2.5 — Lossy <strong>compression</strong> techniques<br />
input<br />
signal<br />
H ~<br />
~<br />
G<br />
ν<br />
Downsampling<br />
Analysis filter 1<br />
ν<br />
Analysis filter 2<br />
2<br />
2<br />
Downsampling<br />
Encoder 1<br />
Encoder 2<br />
Figure 2.7: Subband coding technique: analysis filter bank, downsampling<br />
and encoding<br />
Subband coding has applications in speech coding and audio coding<br />
with the MPEG audio, but can be applied also to image <strong>compression</strong>.<br />
2.5.4 Wavelets<br />
Another method <strong>of</strong> decomposing signals that has gained a great deal<br />
<strong>of</strong> popularity in recent years is the use <strong>of</strong> wavelets [11, 12, 13, 14].<br />
Decomposing a signal in terms <strong>of</strong> its frequency content using sinusoids<br />
results in a very fine resolution in the frequency domain. However<br />
siinusoids are defined on the time domain from −∞ to ∞, therefore<br />
individual frequency components give no temporal resolution [15].<br />
In a wavelet representation, a signal is represented in terms <strong>of</strong> functions<br />
that are localized both in time and in frequency. For instance, the<br />
following is known as the Haar wavelet:<br />
ψ0,0(x) =<br />
<br />
1 0 ≤ x< 1<br />
2<br />
−1 1 ≤ x
46<br />
ψ<br />
0,0<br />
Data <strong>compression</strong> techniques<br />
ψ<br />
2,0<br />
ψ<br />
ψ<br />
1,0 1,1<br />
ψ<br />
2,1<br />
Figure 2.8: The Haar wavelet<br />
ψ<br />
2,2<br />
From this “mother” function the following set <strong>of</strong> functions can be obtained:<br />
<br />
ψj,k(x) =ψ0,0(2 j x − k) =<br />
1 k2 −j ≤ x
2.5 — Lossy <strong>compression</strong> techniques<br />
(a) (b)<br />
(c) (d)<br />
Figure 2.9: Example <strong>of</strong> multiresolution analysis<br />
In 1989, Stephane Mallat ([16]) developed the multiresolution approach,<br />
which moved the representation using wavelets into the domain <strong>of</strong> subband<br />
coding. These concepts can be better understood with the help <strong>of</strong><br />
an example. Let us suppose we have to approximate the function f(t)<br />
drawn in Fig. 2.9a using the translated versions <strong>of</strong> some time-limited<br />
function φ(t). The indicator function is a simple approximating function:<br />
<br />
1 0 ≤ t
48<br />
Data <strong>compression</strong> techniques<br />
and c0,k are the average values <strong>of</strong> the function in the interval [k − 1,k).<br />
In other words:<br />
c0,k =<br />
k+1<br />
It is possible to scale φ(t) to obtain:<br />
<br />
φ1,0(t) =φ0,0(2t) =<br />
Its translates would be given by:<br />
k<br />
f(t)φ0,k(t)dt (2.37)<br />
1 0 ≤ t< 1<br />
2<br />
0 otherwise<br />
(2.38)<br />
φ1,k(t) =φ1,0(t − k) (2.39)<br />
<br />
= φ0,0(2t − k) = (2.40)<br />
1<br />
0<br />
<br />
0 ≤ 2t − k
2.5 — Lossy <strong>compression</strong> techniques<br />
is accurately represented by φ1 f (t). φ1f (t) can be decomposed into a<br />
lower resolution version <strong>of</strong> itself, namely φ0 f (t) and the difference φ1f (t)<br />
- φ0 f (t). Let us examine this function over an arbitrary interval [k,k+1):<br />
φ 1 f (t) − φ0f (t) =<br />
<br />
c0,k − c1,2k k ≤ t
50<br />
Data <strong>compression</strong> techniques<br />
2. If a function can be expressed exactly by a linear combination <strong>of</strong><br />
the set {φj,k(t)}, then it can also be expressed exactly as a function<br />
<strong>of</strong> the set {φl,k(t)} for all l ≥ j.<br />
3. The complete set {φj,k(t)} ∞ j,k=−∞<br />
tions with the property that:<br />
∞<br />
−∞<br />
can exactly represent all func-<br />
|f(t)| 2 < ∞ (2.52)<br />
4. If a function f(t) can be exactly represented by the set {φ0,k(t)},<br />
then any integer translate <strong>of</strong> the function f(t − k) can also be<br />
represented exactly by the same set.<br />
5.<br />
<br />
φ0,l(t)φ0,k(t)dt =<br />
<br />
0 l = k<br />
1 l = k<br />
(2.53)<br />
The set forms a multiresolution analysis [16]. So far at any resolution<br />
2−j every function f(t) can be decomposed in two components: one<br />
that can be expressed as a function <strong>of</strong> the set {φj,k(t)} and one that<br />
can be expressed as a linear combination <strong>of</strong> the wavelets {ψj,k(t)}.<br />
The mother wavelet ψ0,0(t) and the scaling function φ0,0(t) are related<br />
in the following manner: from Property 2, φ0,0 can be written in terms<br />
<strong>of</strong> φ1,k. If the relationship is given by:<br />
Then the wavelet ψ0,0(t) isgivenby:<br />
φ0,0(t) = hnφ1,n(t) (2.54)<br />
ψ0,0(t) = (−1) n hnφ1,n(t) (2.55)<br />
From this relationship we can assume that the wavelet decomposition<br />
can be implemented in terms <strong>of</strong> filters with impulse responses given<br />
by (2.54) and (2.55) and that the filters are quadrature mirror filters.<br />
Most <strong>of</strong> the orthonormal wavelets are nonzero over an infinite interval.<br />
Therefore the corresponding filters are IIR filters. Well known
2.6 — Implementation <strong>of</strong> <strong>compression</strong> algorithms<br />
exceptions are the Daubechies wavelets that correspond to FIR filters.<br />
Once obtained the coefficients <strong>of</strong> the FIR filters, the procedure for <strong>compression</strong><br />
using wavelets is identical to the one described for subband<br />
coding. From now on the terms multiresolution analysis and waveletbased<br />
analysis will be regarded as synonymous. Some <strong>of</strong> the most used<br />
wavelets families are shown in Fig. 2.10, Fig. 2.11 and Fig. 2.12.<br />
2.6 Implementation <strong>of</strong> <strong>compression</strong> algorithms<br />
Compression algorithms can be implemented in <strong>hardware</strong> or in s<strong>of</strong>tware,<br />
depending on the required speed. When speed is the most important<br />
constraint on the choice <strong>of</strong> the <strong>implementation</strong> <strong>of</strong> the <strong>compression</strong><br />
algorithm, <strong>hardware</strong> <strong>implementation</strong> becomes necessary.<br />
Commercial devices exist implementing <strong>data</strong> <strong>compression</strong> in <strong>hardware</strong>:<br />
for example the ALDC1-40S-M from IBM featuring an adaptive lossless<br />
<strong>data</strong> <strong>compression</strong> works at a rate <strong>of</strong> 40 MBytes/s, while the AHA32321<br />
chip from Aha can compress and decompress <strong>data</strong> at 10 MBytes/s with<br />
a clock frequency <strong>of</strong> 40 MHz. These rates are far too small than the<br />
one required for what concerns the SDD readout: in fact the <strong>compression</strong><br />
chip we need has to face an input <strong>data</strong> rate <strong>of</strong> 320 MByte/s.<br />
No commercial chip exists with such features, so we had to design an<br />
Application Specific Integrated Circuit (ASIC) targeted to our requirements.<br />
51
52<br />
Haar<br />
haar<br />
Wavelet function psi<br />
1<br />
0.5<br />
0<br />
−0.5<br />
Data <strong>compression</strong> techniques<br />
0 0.2 0.4 0.6 0.8 1<br />
−1<br />
Scaling function phi<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
0 0.2 0.4 0.6 0.8 1<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
Daubachies<br />
db1 db2 db3 db10<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
0 1 2 3 4 5<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1.5<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
0 0.5 1 1.5 2 2.5 3<br />
1<br />
0.5<br />
0 5 10 15<br />
−0.4<br />
−0.2<br />
0<br />
0.2<br />
0.4<br />
0.6<br />
0.8<br />
1<br />
1<br />
0.5<br />
0<br />
0.5<br />
0.5<br />
0<br />
−0.5<br />
0<br />
0<br />
−0.5<br />
0 5 10 15<br />
−1<br />
0 1 2 3 4 5<br />
0 0.5 1 1.5 2 2.5 3<br />
0 0.2 0.4 0.6 0.8 1<br />
−1<br />
Scaling function phi<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
0 0.2 0.4 0.6 0.8 1<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Decomposition low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Decomposition high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Decomposition low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Decomposition high−pass filter<br />
0.5<br />
0.5<br />
0<br />
0<br />
0 2 4 6 8 10 12 14 16 18<br />
−0.5<br />
0 2 4 6 8 10 12 14 16 18<br />
−0.5<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
Reconstruction high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Reconstruction low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Reconstruction high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Reconstruction low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
0.5<br />
0.5<br />
0<br />
0<br />
0 2 4 6 8 10 12 14 16 18<br />
−0.5<br />
0 2 4 6 8 10 12 14 16 18<br />
−0.5<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
0 1<br />
−0.5<br />
0<br />
0.5<br />
Figure 2.10: Some functions belonging to different wavelet families: note<br />
that db1 is equivalent to the Haar
Symlets<br />
sym2 sym3 sym4 sym8<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
Scaling function phi<br />
2.6 — Implementation <strong>of</strong> <strong>compression</strong> algorithms<br />
1<br />
0 1 2 3 4 5 6 7<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1.5<br />
Wavelet function psi<br />
0 1 2 3 4 5 6 7<br />
−0.2<br />
0<br />
0.2<br />
0.4<br />
0.6<br />
0.8<br />
1<br />
1.2<br />
Scaling function phi<br />
0 1 2 3 4 5<br />
−1.5<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1<br />
0 0.5 1 1.5 2 2.5 3<br />
−1.5<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1<br />
0.5<br />
0.5<br />
0.5<br />
0<br />
0<br />
−0.5<br />
0<br />
0 5 10 15<br />
0 5 10 15<br />
−0.2<br />
0<br />
0.2<br />
0.4<br />
0.6<br />
0.8<br />
1<br />
0 1 2 3 4 5<br />
0 0.5 1 1.5 2 2.5 3<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
0 2 4 6 8 10 12 14<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10 12 14<br />
−0.5<br />
0<br />
0.5<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7<br />
Decomposition high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Decomposition low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Decomposition high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Decomposition low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
0 2 4 6 8 10 12 14<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10 12 14<br />
−0.5<br />
0<br />
0.5<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7<br />
Reconstruction high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Reconstruction low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3 4 5<br />
Reconstruction high−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Reconstruction low−pass filter<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
−0.2<br />
−0.4<br />
0 1 2 3<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
Scaling function phi<br />
Wavelet function psi<br />
Wavelet function psi<br />
Scaling function phi<br />
1.5<br />
0 1 2 3 4 5<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1.5<br />
2<br />
1.5<br />
1<br />
1<br />
1<br />
0.5<br />
1<br />
0.5<br />
0.5<br />
0<br />
0<br />
0.5<br />
0<br />
−0.5<br />
−0.5<br />
−0.5<br />
0 2 4 6 8 10<br />
−0.2<br />
0<br />
0.2<br />
0.4<br />
0.6<br />
0.8<br />
1<br />
1.2<br />
Scaling function phi<br />
0<br />
0 5 10 15 20 25<br />
0 5 10 15 20 25<br />
−0.2<br />
0<br />
0.2<br />
0.4<br />
0.6<br />
0.8<br />
1<br />
0 5 10 15<br />
0 5 10 15<br />
−0.2<br />
0<br />
0.2<br />
0.4<br />
0.6<br />
0.8<br />
1<br />
0 2 4 6 8 10<br />
0 1 2 3 4 5<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
0 4 8 12 16 20 24 28<br />
−0.5<br />
0<br />
0.5<br />
0 4 8 12 16 20 24 28<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10 12 14 16<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10 12 14 16<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10<br />
−0.5<br />
0<br />
0.5<br />
0.5<br />
0.5<br />
0<br />
−0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
0 1 2 3 4 5<br />
Reconstruction low−pass filter<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
0 4 8 12 16 20 24 28<br />
−0.5<br />
0<br />
0.5<br />
0 4 8 12 16 20 24 28<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10 12 14 16<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10 12 14 16<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10<br />
−0.5<br />
0<br />
0.5<br />
0 2 4 6 8 10<br />
−0.5<br />
0<br />
0.5<br />
Figure 2.11: Some functions belonging to different wavelet families<br />
Coiflets<br />
coif1 coif2 coif3 coif5<br />
Reconstruction high−pass filter<br />
0.5<br />
0.5<br />
0<br />
−0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
0 1 2 3 4 5<br />
53
54<br />
Biorthogonal Wavelets<br />
bior1.1 bior1.3 bior1.5 bior6.8<br />
Decomposition wavelet function psi<br />
1.5<br />
1<br />
0.5<br />
Decomposition scaling function phi<br />
Decomposition wavelet function psi<br />
Decomposition scaling function phi<br />
Decomposition wavelet function psi<br />
Decomposition scaling function phi<br />
0 0.2 0.4 0.6 0.8 1<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
Decomposition wavelet function psi<br />
Decomposition scaling function phi<br />
0 2 4 6 8<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
0 1 2 3 4<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1<br />
0.5<br />
0<br />
0 1 2 3 4<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
1<br />
0<br />
−0.5<br />
1<br />
0.5<br />
0<br />
0 2 4 6 8<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
0.5<br />
0 5 10 15<br />
1<br />
0.5<br />
0<br />
0 5 10 15<br />
Decomposition low−pass filter<br />
0 0.2 0.4 0.6 0.8 1<br />
0<br />
Decomposition high−pass filter<br />
Decomposition high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
Decomposition high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
Decomposition high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Data <strong>compression</strong> techniques<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
Reconstruction wavelet function psi<br />
Reconstruction scaling function phi<br />
0 2 4 6 8<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
Reconstruction wavelet function psi<br />
Reconstruction scaling function phi<br />
Reconstruction wavelet function psi<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
0 1 2 3 4<br />
Reconstruction scaling function phi<br />
Reconstruction wavelet function psi<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
0 0.2 0.4 0.6 0.8 1<br />
Reconstruction scaling function phi<br />
0 5 10 15<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1<br />
1<br />
1<br />
1<br />
0.5<br />
0.5<br />
0.5<br />
0.5<br />
0 5 10 15<br />
0<br />
0 2 4 6 8<br />
0<br />
0 1 2 3 4<br />
0<br />
0 0.2 0.4 0.6 0.8 1<br />
0<br />
Reconstruction high−pass filter<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Reverse Biorthogonal Wavelets<br />
rbio1.1 rbio1.3 rbio1.5 rbio6.8<br />
Decomposition wavelet function psi<br />
Decomposition scaling function phi<br />
Decomposition wavelet function psi<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
0 1 2 3 4<br />
Decomposition scaling function phi<br />
0 0.2 0.4 0.6 0.8 1<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
Decomposition wavelet function psi<br />
Decomposition scaling function phi<br />
0 5 10 15<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1<br />
1<br />
1<br />
0.5<br />
0.5<br />
0.5<br />
0 5 10 15<br />
0<br />
0 2 4 6 8<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
Decomposition wavelet function psi<br />
Decomposition scaling function phi<br />
1<br />
0.5<br />
0 2 4 6 8<br />
0<br />
0 1 2 3 4<br />
0<br />
0 0.2 0.4 0.6 0.8 1<br />
0<br />
Decomposition high−pass filter<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
Decomposition high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
Decomposition high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
Decomposition high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Decomposition low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Reconstruction wavelet function psi<br />
1.5<br />
1<br />
0.5<br />
Reconstruction scaling function phi<br />
Reconstruction wavelet function psi<br />
Reconstruction scaling function phi<br />
Reconstruction wavelet function psi<br />
Reconstruction scaling function phi<br />
0 0.2 0.4 0.6 0.8 1<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
Reconstruction wavelet function psi<br />
Reconstruction scaling function phi<br />
0 1 2 3 4<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
1<br />
0.5<br />
0<br />
0 1 2 3 4<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
1<br />
0<br />
−0.5<br />
0.5<br />
0 5 10 15<br />
1<br />
0.5<br />
0<br />
0 5 10 15<br />
Reconstruction low−pass filter<br />
0 2 4 6 8<br />
−1<br />
−0.5<br />
0<br />
0.5<br />
1<br />
0 0.2 0.4 0.6 0.8 1<br />
0<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
0.5<br />
0<br />
−0.5<br />
0 2 4 6 8 10 12 14 16<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
1<br />
0.5<br />
0<br />
0 2 4 6 8<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5 6 7 8 9<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1 2 3 4 5<br />
Reconstruction high−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Reconstruction low−pass filter<br />
0.5<br />
0<br />
−0.5<br />
0 1<br />
Figure 2.12: Some functions belonging to different wavelet families: note<br />
that bior1.1 and rbior1.1 are equivalent to the haar
Chapter 3<br />
1D <strong>compression</strong> algorithm<br />
and <strong>implementation</strong>s<br />
3.1 Compression algorithms for SDD<br />
The choice <strong>of</strong> the algorithm for SDD <strong>data</strong> <strong>compression</strong> is strictly related<br />
to the input <strong>data</strong> stream features:<br />
– low detector occupancy (max 3 %)<br />
– small samples are much more probable than high samples<br />
The first feature suggests the use <strong>of</strong> a zero suppression algorithm: all<br />
samples below a certain value (depending on the noise distribution)<br />
are discarded. The second feature suggests to adopt an entropy coder,<br />
such as the Huffman one. Beside that it is important for the algorithm<br />
to contain s<strong>of</strong>tware tunable parameters in order to re-optimize the algorithm<br />
performance in case <strong>of</strong> changes on the statistics <strong>of</strong> the input<br />
distribution. For instance the threshold level has to be changeable via<br />
s<strong>of</strong>tware in order to take into account <strong>of</strong> changes on the signal to noise<br />
ratio over the years, so the Huffman tables have to be reconfigurable<br />
too. The other important features for the <strong>compression</strong> algorithms are:<br />
– they have to be fast<br />
55
56<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
– they have to be simple to implement in <strong>hardware</strong><br />
– they have to allow lossless <strong>data</strong> transmission<br />
For the development <strong>of</strong> the <strong>compression</strong> algorithms, studies have been<br />
performed on the statistical distribution <strong>of</strong> the sample <strong>data</strong> coming<br />
from the single-particle events <strong>of</strong> three beam tests, so that noise could<br />
be properly taken into account. The <strong>compression</strong> results have been<br />
evaluated in order to verify the algorithm efficiency and the best parameter<br />
values.<br />
3.2 1D <strong>compression</strong> algorithm<br />
Following these requirements the <strong>INFN</strong> Section <strong>of</strong> Torino has chosen<br />
a sequential <strong>compression</strong> algorithm [17] which scans <strong>data</strong> coming from<br />
each anode row as uni-dimensional <strong>data</strong> streams. As shown in Fig. 3.1<br />
as an example, <strong>data</strong> samples coming from anode 76 are processed, then<br />
from anode 77 and so on. The ultimate goal <strong>of</strong> the algorithm is to<br />
save <strong>data</strong> belonging to a cluster, while rejecting all the other samples<br />
regarded as noise. To have a <strong>data</strong> reduction system that is applicable<br />
to all the situations, the algorithm is provided with different tuning<br />
parameters (Fig. 3.2 provides a graphical explanation <strong>of</strong> them):<br />
– threshold: the threshold parameter is applied to the incoming<br />
samples, forcing the differences to zero if they are smaller than<br />
this value. This parameter has the goal <strong>of</strong> eliminating noise and<br />
pedestals affecting <strong>data</strong>.<br />
– tolerance: the tolerance parameter is applied to differences calculated<br />
between consecutive samples, forcing them to zero if they<br />
are less than this value (using this mechanism samples not very<br />
different are considered equal). So far non significant fluctuations<br />
<strong>of</strong> the input values are eliminated using the tolerance mechanism.<br />
– disable: the disable parameter is applied to the input <strong>data</strong>, removing<br />
all previous mechanisms for samples greater than disable
3.2 — 1D <strong>compression</strong> algorithm<br />
Figure 3.1: Cluster in two dimensions and its slices along the anode direction<br />
in order to have full information on the clusters and to maintain<br />
good double peak resolution. This means that the important information<br />
is not affected by the lossy <strong>compression</strong> algorithm.<br />
The 1D algorithm actually consists <strong>of</strong> 5 processing steps sequentially<br />
applied (see Fig. 3.3):<br />
– first the input <strong>data</strong> values below the threshold parameter value<br />
are put to 0;<br />
– then, the difference between a sample and the previous one (along<br />
the time direction) is calculated;<br />
– if the difference value is smaller than the tolerance parameter and<br />
if the input sample is smaller the the disable parameter, then the<br />
difference value is put to 0, otherwise its value is left unchanged;<br />
– these values are then encoded using the Huffman table;<br />
– the obtained values are then encoded using the Run Length encoding<br />
method.<br />
57
58<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
disable<br />
anodic signal<br />
+tolerance<br />
−tolerance<br />
threshold<br />
Figure 3.2: Threshold, tolerance and disable parameters<br />
The high probability <strong>of</strong> finding long zero sequences in the SDD charge<br />
distribution makes the Run Length encoding use very effective, especially<br />
when combined with threshold, tolerance and disable mechanisms.<br />
3.3 1D algorithm performances<br />
As explained in Chapter 1 in order to comply with the target figures <strong>of</strong><br />
DAQ speed and magnetic tape usage, the size <strong>of</strong> the SDD event has to<br />
be reduced from 32.5 MBytes to about 1.5 MBytes, which corresponds<br />
to a target <strong>compression</strong> coefficient <strong>of</strong> 22. Several standard <strong>compression</strong><br />
algorithms have been evaluated on SDD test beam events <strong>data</strong> in order<br />
to have an estimation <strong>of</strong> the <strong>compression</strong> performances achievable: the<br />
best <strong>compression</strong> coefficient has been obtained with the gzip utility<br />
implemented in the Unix operating system, so far it was chosen for<br />
comparison with our 1D algorithm. The <strong>data</strong> was submitted to the<br />
gzip program into a binary format for a fair comparison.<br />
time
3.3 — 1D algorithm performances<br />
s<strong>of</strong>tware tunable parameters<br />
threshold<br />
tolerance<br />
Huffman tables<br />
input stream<br />
simple threshold zero suppression<br />
differential encoding<br />
tolerance<br />
Huffman encoding<br />
run length encoding<br />
compressed <strong>data</strong><br />
Figure 3.3: 1D <strong>compression</strong> algorithms<br />
3.3.1 Compression coefficient<br />
For the comparison task <strong>data</strong> coming from the August 1998 test beam<br />
was chosen. The gzip <strong>compression</strong> algorithm achieves a <strong>compression</strong><br />
ratio around 2: this value is too far from our target value <strong>of</strong> 22.<br />
The 1D <strong>compression</strong> algorithm has been applied using a threshold value<br />
<strong>of</strong> 20 = 1 ∗ noise mean +1.35 ∗ noise RMS and tolerance =0: the<br />
<strong>compression</strong> value obtained is around 12.5. This is still an unacceptable<br />
value for our purposes. The goal <strong>compression</strong> value <strong>of</strong> 22 can<br />
only be reached by increasing the threshold parameter, which implies a<br />
larger information loss. For instance by applying the algorithm on the<br />
same test beam <strong>data</strong> it is possible to obtain a <strong>compression</strong> coefficient<br />
<strong>of</strong> about 33, with threshold =40=1∗noise mean+2.68∗noise RMS<br />
and tolerance = 0. Fig. 3.4 shows the variation <strong>of</strong> the <strong>compression</strong><br />
coefficient using the 1D algorithm as a function <strong>of</strong> the threshold level<br />
between 20 and 40 and for two values <strong>of</strong> tolerance.<br />
59
60<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Figure 3.4: 1D <strong>compression</strong> ratio as a function <strong>of</strong> threshold and tolerance<br />
An important feature <strong>of</strong> this <strong>compression</strong> algorithm is that it can be<br />
reversed to a lossless algorithm simply by putting the values <strong>of</strong> threshold<br />
and tolerance to 0. Sending <strong>data</strong> without losing any information<br />
will be very useful for the first event acquisitions since raw <strong>data</strong> will be<br />
analyzed for determing statistics, noise and so on. These raw <strong>data</strong> will<br />
also be used for determining the best Huffman tables, the ones allowing<br />
to obtain the best <strong>compression</strong> coefficient. When used in lossless mode,<br />
meaning that only differential encoding, Huffman and run length encoding<br />
are applied, the <strong>compression</strong> coefficient obtained is 2.3, that is<br />
even better than what we obtain with the gzip algorithm.<br />
3.3.2 Reconstruction error<br />
So far it was to be checked if the information loss introduced with a<br />
threshold level <strong>of</strong> 40 is acceptable or not. In particular it was decided to<br />
study how much <strong>data</strong> <strong>compression</strong> and de<strong>compression</strong> affected clusters<br />
geometry for what concerns centroid position and charge.<br />
A cluster finding routine was developed with the following two step<br />
procedure:
3.3 — 1D algorithm performances<br />
Figure 3.5: Spreads introduced by <strong>data</strong> <strong>compression</strong> on measurement <strong>of</strong><br />
coordinates <strong>of</strong> the SDD clusters and <strong>of</strong> the cluster charge (bottom right)<br />
– <strong>data</strong> streams are analyzed one anode row after the other: when<br />
a sample value is higher than a certain threshold level for two<br />
consecutive time bins, it is considered to be a hit until it goes<br />
below the same threshold for two consecutive time bins;<br />
– then if any two 1-D hits from adjacent anodes overlap in time they<br />
are considered as a part <strong>of</strong> a two-dimensional cluster.<br />
After finding samples belonging to clusters they are fitted with a twodimensional<br />
Gaussian function, with the following features:<br />
– the mean value corresponds to the cluster centroid;<br />
– the sigma value corresponds to the centroid resolution;<br />
– the volume under the Gaussian function corresponds to the charge<br />
released on the detector by the ionizing particle.<br />
61
62<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
1D <strong>compression</strong> and de<strong>compression</strong> algorithms were then applied on<br />
test beam <strong>data</strong>, performed cluster finding and analysis on both <strong>data</strong>:<br />
the results are shown in Fig. 3.5. The picture on the upper left shows<br />
the distribution <strong>of</strong> the differences in the centroid coordinates before<br />
and after <strong>compression</strong> along the anode and drift time direction. The<br />
picture on the upper right shows the same distribution on the drift time<br />
direction, while the picture on the bottom left shows the distribution<br />
along the anode direction. These plots show that the <strong>compression</strong> algorithm<br />
with a threshold <strong>of</strong> 40 does not introduce biases on the centroid<br />
coordinate measurements, but that worsen their accuracy by about 9<br />
µm (+4%) along the anode direction and by about 16 µm (+8%) along<br />
the drift time axis. The bottom right picture shows the percentual difference<br />
<strong>of</strong> charge before and after <strong>compression</strong>: so far the 1D algorithm<br />
also introduces an underestimation <strong>of</strong> the cluster charge <strong>of</strong> about 4 %.<br />
3.4 CARLOS v1<br />
During 1999 I have collaborated with <strong>INFN</strong> group in Torino for the<br />
design and test <strong>of</strong> a first <strong>hardware</strong> <strong>implementation</strong> <strong>of</strong> the 1D algorithm:<br />
CARLOS v1. This device is physically implemented as a PCB (Printed<br />
Circuit Board) containing 2 FPGAs (Field Programmable Gate Array)<br />
circuits and some connectors for use in a test beam <strong>data</strong> acquisition<br />
system, as shown in Fig. 3.6. The device processes <strong>data</strong> coming from<br />
one macrochannel only, that is <strong>data</strong> coming from one half-detector, and<br />
directly interfaces the SIU board, the first stage <strong>of</strong> the DAQ system.<br />
3.4.1 Board description<br />
The main two processing blocks mounted on the board are the two<br />
Xilinx FPGA devices. An FPGA is a completely programmable device<br />
widely used for fast prototyping before the final <strong>implementation</strong> <strong>of</strong><br />
the design on an ASIC circuit which requires more resources as far as
3.4 — CARLOS v1<br />
Figure 3.6: CARLOS prototype v1 picture<br />
time, money and design efforts. An FPGA contains a matrix <strong>of</strong> CLBs<br />
(Configurable Logic Blocks) that can be individually programmed and<br />
connected together in order to implement the desired input/output<br />
logic function. Each CLB contains a SRAM (Static RAM) that is used<br />
to implement a logic function by putting the input values on the address<br />
bus: they are used as look-up tables.<br />
An other piece <strong>of</strong> silicon area on the FPGA die contains the configuration<br />
RAM : depending on the contents <strong>of</strong> this block the device<br />
will accomplish different logic functions. The configuration RAM is<br />
written on power-on from an external EPROM: CARLOS v1 hosts two<br />
EPROM devices for the configuration <strong>of</strong> the two FPGAs. The process<br />
<strong>of</strong> configuration takes around 20 ms, after which the devices are<br />
completely operational. A 10 MHz clock generator is hosted between<br />
the EPROM chips: we could not achieve a higher working frequency<br />
with our choice <strong>of</strong> FPGA device. In fact the final operating frequency<br />
63
64<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Features Values<br />
Logic cells 2432<br />
Max logic gates (no RAM) 25k<br />
Max RAM bits (no logic) 32768<br />
Typical gate range (logic and RAM) 15k - 45k<br />
CLB matrix 32x32<br />
Total CLBs 1024<br />
Number <strong>of</strong> flip-flops 2560<br />
Number <strong>of</strong> user I/O 256<br />
Table 3.1: XC4025 Xilinx FPGA main features<br />
is a function <strong>of</strong> how many internal resources are being used: the more<br />
resources are used, the slower becomes the final working frequency.<br />
With the final 10 MHz frequency we reached a good trade-<strong>of</strong>f between<br />
logic complexity and speed; furthermore this frequency was sufficient<br />
for application in a test-beam environment. Tab. 3.1 reports the main<br />
features <strong>of</strong> the chosen FPGA devices XC4025E-4 HQ240C.<br />
The board also contains 3 connectors from left to right:<br />
– the first is used for <strong>data</strong> injection into the first FPGA device using<br />
a Hewlett Packard (HP) pattern generator;<br />
– the second one is used for analyzing <strong>data</strong> coming out from the<br />
first device by making use <strong>of</strong> a logic analyzer probe;<br />
– the third connector is used for the communication between CAR-<br />
LOS v1 and the SIU board. Fig. 3.7 shows a picture <strong>of</strong> the final<br />
SIU board. We used a SIU simplified version called SIMU (SIU<br />
simulator), distributed at CERN for helping front-end designers to<br />
realize DAQ-compatible devices. The SIMU board can be directly<br />
plugged onto this connector.
3.4 — CARLOS v1<br />
Figure 3.7: Picture <strong>of</strong> the SIU board<br />
3.4.2 CARLOS v1 design flow<br />
I have carried out the design <strong>of</strong> the second FPGA device following the<br />
digital design flow shown in Fig. 3.8. In particular the design flow is<br />
composed by the following steps:<br />
– block specifications have been coded with the VHDL language<br />
using a hierarchical structure starting from the bottom layer up<br />
to the top-level;<br />
– each VHDL model has been simulated in order to debug the code<br />
using the Synopsys simulator s<strong>of</strong>tware;<br />
– each VHDL model has been synthesized, that means translated to<br />
a netlist, using the Synopsys synthesis tool; the netlist contains<br />
usual standard cells such as AND, OR or flip-flops, but the FPGA<br />
device does not contain these elements, it contains only RAM<br />
blocks. The netlist is only a logic representation <strong>of</strong> the circuit<br />
itself, it has no physical meaning.<br />
– the netlist is simulated using the Synopsys simulator, taking into<br />
account cell timing delays and constraints.<br />
– the netlist is automatically converted into a physical layout using<br />
65
66<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Figure 3.8: Digital design flow for CARLOS v1<br />
the place and route s<strong>of</strong>tware Alliance from Xilinx.<br />
– the layout information is put in a binary file ready to be downloaded<br />
on the EPROM chip using the Alliance s<strong>of</strong>tware, together<br />
with an EPROM programmer.<br />
This is a very straight-forward and automated process; besides the<br />
time needed between a slight modification in the VHDL code and its<br />
actual <strong>implementation</strong> in the FPGA device is very short. This is the<br />
main reason why FPGAs are so widely used for prototyping. An other<br />
very important reason is the following one: running millions <strong>of</strong> test<br />
vectors as a s<strong>of</strong>tware simulation <strong>of</strong> a VHDL model is a very long process<br />
even for fast machines; the same set <strong>of</strong> test vectors can be run in a<br />
few seconds on the <strong>hardware</strong> prototype. FPGA <strong>implementation</strong> easily<br />
allows algorithms verification on a huge amount <strong>of</strong> <strong>data</strong>.
3.4 — CARLOS v1<br />
3.4.3 Functions performed by CARLOS v1<br />
The FPGA on the left in Fig. 3.6 contains the 1D <strong>compression</strong> algorithm,<br />
as explained in the previous sections, composed <strong>of</strong> 5 processing<br />
blocks sequentially applied to the input <strong>data</strong>. The blocks form a 5-level<br />
pipeline chain, each one requiring one clock cycle. The variable-length<br />
<strong>compression</strong> coefficients are produced as 32-bit long words.<br />
The FPGA on the right contains the following blocks:<br />
– firstcheck: this block processes 32-bit input words coming from<br />
the compressor FPGA: if the MSB is high the incoming <strong>data</strong> is<br />
rejected, otherwise it is accepted and splitted in two different <strong>data</strong><br />
words, one 26-bit wide containing the variable length code and one<br />
5-bit one containing the information <strong>of</strong> how many bits have to be<br />
stored.<br />
– barrel: this block packs 2 to 26 bits variable length codes in fixedsize<br />
32 bits words. The information <strong>of</strong> how many bits from 2 to 26<br />
have to be stored is contained in the 5-bit length bus coming from<br />
the firstcheck block. Variable length Huffman codes packed in 32bit<br />
words can be uniquely unpacked by using the Huffman table<br />
and starting from the MSB to LSB. When a word is complete an<br />
output-push signal is asserted.<br />
– fifo: it contains a 64x32 RAM memory wide for storing <strong>data</strong> coming<br />
out <strong>of</strong> the barrel shifter. When the FIFO contains at least<br />
16 <strong>data</strong> words it asserts a query signal in order to ask the feesiu<br />
block to begin <strong>data</strong> popping.<br />
– feesiu: this is the most complex block <strong>of</strong> the prototype containing<br />
the interface between CARLOS and the SIU board. The main behavior<br />
is quite simple: CARLOS waits for a “Ready to Receive”<br />
(RDYRX) command from the SIU on a bidirectional <strong>data</strong> bus;<br />
after receiving it CARLOS takes possession <strong>of</strong> the bidirectional<br />
bus and begins sending <strong>data</strong> towards the SIU as 17 32-bit words<br />
packets. Each packet is built as a header word containing exter-<br />
67
68<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
nally hardwired informations and 16 <strong>data</strong> words coming out <strong>of</strong> the<br />
FIFO. When the FIFO is empty or it does not contain 16 <strong>data</strong><br />
words, no valid <strong>data</strong> is sent to the SIU. Otherwise if a FIFO begins<br />
to acquire large quantities <strong>of</strong> <strong>data</strong> and the connection to the SIU<br />
is not still open (a RDYRX command has not been received yet)<br />
a <strong>data</strong>-stop signal is asserted for stopping the <strong>data</strong> stream coming<br />
into CARLOS from AMBRA.<br />
3.4.4 Tests performed on CARLOS v1<br />
The test <strong>of</strong> the CARLOS prototype has been carried on using the pattern<br />
generator and logic analyzer HP16700A at the <strong>INFN</strong> Section in<br />
Torino. Data were injected on the first connector, analyzed on the<br />
second connector, while the third one has been connected to a SIU<br />
extender board, which directly connects to the SIMU board. The SIU<br />
extender is very useful for debugging purposes since it provides 5 logic<br />
analyzer compatible connectors for analyzing signals being exchanged<br />
in the interface CARLOS-SIU. Here follows a list <strong>of</strong> the test performed<br />
on CARLOS:<br />
1. functional test and <strong>compression</strong> algorithm verification;<br />
2. opening <strong>of</strong> a transaction by manually pushing buttons on the<br />
SIMU board;<br />
3. event <strong>data</strong> transmission from CARLOS to the SIMU. The SIMU<br />
does not store <strong>data</strong>, so the only way to check if <strong>data</strong> are correct<br />
on not is by using the logic analyzer.<br />
Prototype test was especially useful in order to design a perfectly compatible<br />
interface towards the SIU. The main difficulty in testing the<br />
interface towards the SIU without a SIU board is due to the presence<br />
<strong>of</strong> bidirectional pads: it is quite a difficult job to work with such pads<br />
using a pattern generator.<br />
Many corrections had to be applied to the original version in order to
3.5 — CARLOS v2<br />
have a 100% compatible interface. The final VHDL version was then<br />
frozen and then used for the ASIC <strong>implementation</strong> <strong>of</strong> CARLOS v2.<br />
The VHDL model, in fact, does not depend on the technology chosen<br />
for the <strong>implementation</strong> and is completely re-usable.<br />
3.5 CARLOS v2<br />
The first CARLOS prototype has been very useful for testing the <strong>compression</strong><br />
algorithm on a huge amount <strong>of</strong> <strong>data</strong> and for correctly designing<br />
complex blocks as the interface towards the SIU, but it has many limitations<br />
if compared to the final version we need to design. So far we<br />
decided to pass to a second prototype <strong>of</strong> CARLOS with the following<br />
features:<br />
– 40 MHz clock frequency;<br />
– 8 macro-channels parallel processing;<br />
– small size for an easier use in test-beam environment;<br />
– a JTAG port for downloading the Huffman look-up tables, the<br />
threshold and tolerance values .<br />
The CARLOS chip design has been logically divided into two main<br />
parts, the first one designed in Torino and the second one in <strong>Bologna</strong>:<br />
– a <strong>data</strong> compressor on 8 incoming streams, using the 1D <strong>compression</strong><br />
algorithm. The compressor accepts 8-bit input <strong>data</strong> and gives<br />
as output 32-bit words containing the variable length codes.<br />
– a <strong>data</strong> packing and formatting block, a multiplexer selecting which<br />
one <strong>of</strong> the 8 incoming streams has to be sent in output and an<br />
interface block towards the SIU.<br />
As you can see in Fig. 3.9 the main sub-blocks are 6: firstcheck, barrel,<br />
fifo, event-counter, outmux, feesiu.<br />
69
70<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Figure 3.9: CARLOS v2 schematic blocks
3.5 — CARLOS v2<br />
3.5.1 The firstcheck block<br />
The I/O signals are:<br />
– input<strong>data</strong>: input 32-bit bus;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– load: output signal;<br />
– addressvalid: output 5-bit bus;<br />
– <strong>data</strong>valid: output 26-bit bus.<br />
The firstcheck block takes as input the compressed codes coming from<br />
the <strong>compression</strong> block and selects the useful bits while rejecting the<br />
dummy ones. In fact the 32-bit input word has the following structure:<br />
– bit 31: under-run bit: when set to 1 it means that incoming <strong>data</strong><br />
are dummy and have to be discarded; this may happen, for example,<br />
when the run length encoder is packing long zeros sequences,<br />
thus temporarily interrupting the <strong>data</strong> flow towards the SIU.<br />
– bit 30 to 26: this 5-bit word contains the actual number <strong>of</strong> bits<br />
that have to be selected by the following logic block, the barrel<br />
shifter.<br />
– bit 25 to 0: this 26-bit word contains the compressed code.<br />
The real interesting bits are usually much less than 26, thus obtaining<br />
a reduction in the <strong>data</strong> stream volume.<br />
The firstcheck behavior is quite simple: when the reset signal is active<br />
(active high) all outputs are set to 0; when reset is inactive the<br />
firstcheck block samples the under-run bit value: when 1 all outputs are<br />
set to 0, when 0 load is set to 1, addressvalid is assigned inpu<strong>data</strong>(30<br />
downto 26) and <strong>data</strong>valid is assigned input<strong>data</strong>(25 downto 0).<br />
71
72<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
3.5.2 The barrel shifter block<br />
The I/O signals are:<br />
– input: input 26-bit bus;<br />
– sel: input 5-bit bus;<br />
– load: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– end-trace: input signal;<br />
– output-push: output signal;<br />
– output: output 32-bit bus.<br />
The barrel shifter has to pack all the valid <strong>data</strong> coming out from the<br />
firstcheck block into a fixed-length 32-bit register word to be put in output:<br />
in this way all dummy <strong>data</strong> are rejected and we have no more any<br />
distinction between <strong>data</strong>-length and <strong>data</strong> itself. All <strong>data</strong> are packed in<br />
the same word and can be easily reconstructed by using the Huffman<br />
tree decoding scheme. If an input <strong>data</strong> cannot be completely stored<br />
into a 32-bit word, it is broken into 2 pieces: the first as the MSBs <strong>of</strong><br />
the current output so to completely fill it, the second as the LSBs <strong>of</strong><br />
the following valid output word.<br />
When the reset is active all internal registers and outputs are set to<br />
0, when the reset is inactive the barrel shifter begins to wait for valid<br />
<strong>data</strong> coming from the firstcheck block, that is <strong>data</strong> with the load signal<br />
set to 1. When it happens the barrel shifter selects the valid bits from<br />
input and packs them together in a 64-bit circular register word. When<br />
32 bits are written on the register, the block asserts a signal outputpush<br />
high to communicate to the following block (the FIFO) that the<br />
output is valid and has to be stored.<br />
Two situations are very important for the barrel shifter working properly:<br />
when the load signal changes from 1 to 0 the barrel stops packing
3.5 — CARLOS v2<br />
<strong>data</strong> and when load turns to 1 again the barrel begins packing <strong>data</strong> as<br />
if no pause had happened.<br />
The end-trace signal is asserted for one clock period in coincidence with<br />
the last valid <strong>data</strong>: this <strong>data</strong> has to be packed together with the others,<br />
then the 32-bit word has to be pushed in output (by putting outputpush<br />
to 1) even if it is not complete. After the end-trace and after<br />
the last valid word has been sent to output the barrel shifter puts n<br />
zero words as valid outputs: that number depends on how many words<br />
have been sent to output from the beginning <strong>of</strong> the current event. In<br />
fact the total number <strong>of</strong> valid words per event has to be an integer<br />
multiple <strong>of</strong> 16. So far if (16k + 7) words have been sent in output after<br />
the end-trace gets active n=9 zero words follows with output-push set<br />
to 1. This condition is strictly related to the <strong>data</strong> transmission policy<br />
and multiplexing <strong>of</strong> the 8 incoming <strong>data</strong> streams onto a single 32-bit<br />
output, as will be explained in the next paragraph.<br />
3.5.3 The fifo block<br />
The I/O signals are:<br />
– <strong>data</strong>in: input 32-bit bus;<br />
– ck: input signal;<br />
– push: input signal;<br />
– pop: input signal;<br />
– reset: input signal;<br />
– empty: output signal;<br />
– full: output signal;<br />
– query: output signal;<br />
– <strong>data</strong>out: output 32-bit bus.<br />
The fifo block contains a double-port RAM block with 64 32-bits words<br />
plus some control logic. Its purpose is to buffer the input <strong>data</strong> stream<br />
73
74<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
and derandomize the queues that are waiting to be served by the outmux<br />
block. The buffer memory has to be large enough so to allow <strong>data</strong><br />
storing when the other queues are being served, since we have to avoid<br />
block conditions. On the other side it cannot be too large since CAR-<br />
LOS hosts 8 fifo blocks and the chip area is a strong design constraint.<br />
The fifo allows 3 main storage operations:<br />
– write only;<br />
– read only;<br />
– read/write at the same time but at different cell locations.<br />
The FIFO allows to write <strong>data</strong> coming from the barrel shifter and to<br />
read them when the queue has to be served by the outmux block. The<br />
most important feature is that read and write operations can be executed<br />
in parallel. In order to accomplish this feature the control logic<br />
provides two pointers named address-write and address-read. They run<br />
from 0 to 63 and then back to 0 in a circular way: obviously address-read<br />
has always to follow address-write, otherwise we would be extracting<br />
invalid <strong>data</strong> from the memory. Data is written in the fifo and the<br />
address-write pointer is incremented by one when the input push is set<br />
to 1: the input push <strong>of</strong> the fifo isthesamesignalastheoutput-push<br />
one from the barrel. In this way when the barrel shifter has an output<br />
valid, it is written in a free location <strong>of</strong> the fifo at the next clock cycle.<br />
The RAM read phase is activated by the pop input signal: for every<br />
clock cycle in which pop is 1, the <strong>data</strong> value corresponding to addressread<br />
is taken in output <strong>data</strong>out and then the pointer address-read is<br />
incremented by 1. When both push and pop are set to 1 the fifo is<br />
read and written at the same time and the distance between the two<br />
pointers remains constant. Three important signals are:<br />
– query signal: the query signal is set to 1 when the memory contains<br />
at least 16 valid <strong>data</strong>, that is when the distance among the two<br />
pointers is greater or equal to 16. The query signal is used at<br />
the outmux block where a priority encoding based arbiter decides
3.5 — CARLOS v2<br />
which <strong>of</strong> the 8 queues has to be served in output. When a fifo<br />
blockisservedbytheoutmux, the number <strong>of</strong> total valid words<br />
decreases and the signal query comesbackto0. Itcanhappen<br />
that the signal query remains to 1 if more than 32 valid words were<br />
stored in the fifo. In this case it is possible that the fifo might be<br />
read again. All depends on how many queues are sending queries<br />
for being emptied to the scheduler.<br />
– empty signal: the empty signal is set to 1 when the fifo does not<br />
contain any valid <strong>data</strong>, that is when address-write and addressread<br />
have the same value and are pointing to the same memory<br />
location. This signal will be used by the feesiu block in order to<br />
decide when all the 8 queues have been completely emptied and a<br />
new <strong>data</strong> set can enter CARLOS.<br />
– full signal: the full signal is very important since it is backpropagated<br />
to the compressor block in order to assert the fact that<br />
the FIFO is getting full and the input stream has to be stopped.<br />
The compressor block will back-propagate this full signal to the<br />
AMBRA chip which will stop sending <strong>data</strong> to CARLOS. Obviously<br />
the full signal has to be asserted before the FIFO is really<br />
full, otherwise some input <strong>data</strong> would be lost. For this reason the<br />
fifo full signal works between 2 thresholds: 32 and 48: the full<br />
signal goes high when the fifo contains more than 48 valid words,<br />
then it comes back to 0 only when the fifo has been served by the<br />
outmux block, that is when the fifo contains less than 32 valid<br />
words. With this trick the risk for the fifo to get completely full<br />
is reduced, at least if the queues arbiter is fair enough with every<br />
input stream.<br />
3.5.4 The event-counter block<br />
The I/O signals are:<br />
– end-trace: input signal;<br />
75
76<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
– ck: input signal;<br />
– reset: input signal;<br />
– event-id: output 3-bit bus.<br />
The event-counter block is a very simple 3-bit binary counter used<br />
to assign a number to every physical event, at least for being able to<br />
easily discriminate consecutive events. When the reset is active internal<br />
registers and outputs are put to 0, then, when the reset is inactive,<br />
the event-counter block increments by one its output signal event-id<br />
every time it samples the end-trace signal at logic level 1. The endtrace<br />
feeding the event-counter block is a signal coming from the feesiu<br />
block called all-fifos-empty. This signal is asserted for two clock periods<br />
when all the 8 end-trace signals have been set to 1 and when all the<br />
8 queues have been completely emptied. For this purpose CARLOS<br />
contains a global end-trace signal which is activated when all the 8<br />
local end-traces have been high for at least one clock period; it is not<br />
strictly necessary that a temporal overlap exists between the 8 signals.<br />
Nevertheless, this means that the global end-trace will never be put to<br />
1 if some <strong>of</strong> the local end-traces are not used and remain stuck at 0.<br />
After an end-trace global is activated, the feesiu block begins waiting<br />
for the 8 FIFOs being emptied: as soon as this happens the all-fifosempty<br />
signal is activated and the event-id signal is incremented by one.<br />
The signal all-fifos-empty stays at logical level 1 for two consecutive<br />
clock periods: nevertheless the event-id counter is incremented only by<br />
1. The value <strong>of</strong> event-id is used in the outmux block and it is sent to<br />
the SIU as a part <strong>of</strong> the header word. We thought that 3 bits could be<br />
sufficient to discriminate the events and for putting them in the right<br />
order during <strong>data</strong> de<strong>compression</strong> and reconstruction stages.<br />
3.5.5 The outmux block<br />
The I/O signals are:<br />
– indat7 : input 32-bit bus;
– indat6 : input 32-bit bus;<br />
– indat5 : input 32-bit bus;<br />
– indat4 : input 32-bit bus;<br />
– indat3 : input 32-bit bus;<br />
– indat2 : input 32-bit bus;<br />
– indat1 : input 32-bit bus;<br />
– indat0 : input 32-bit bus;<br />
– reset: input signal;<br />
– ck: input signal;<br />
– query: input 8-bit bus;<br />
– event-id: input 3-bit bus;<br />
– enable-read: input signal;<br />
3.5 — CARLOS v2<br />
– half-ladder-id: input 7-bit bus;<br />
– good-<strong>data</strong>: output signal;<br />
– read : output 8-bit bus;<br />
– output: out 32-bit bus.<br />
The outmux block has two distinct functions in the overall logic:<br />
– multiplexing the 8 compressed and packed streams onto a single<br />
32-bit output (femux sub-block);<br />
– deciding which queue has to be served using a priority encoding<br />
based arbiter (ppe sub-block).<br />
The femux and ppe blocks implement the following 17-word <strong>data</strong> packet<br />
transmission protocol (see Fig. 3.10):<br />
– a 32-bit header;<br />
– 16 32-bit <strong>data</strong> words, all coming from one macrochannel and from<br />
one event.<br />
77
78<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Figure 3.10: 17-bit word <strong>data</strong> transmission protocol<br />
The header contains the following information from MSB to LSB:<br />
– half ladder id (7 bits): this number is hardwired externally to each<br />
CARLOS chip, depending on the ladder it will be connected to;<br />
– packet sequence number (10 bits): this is a 10-bit wide counter<br />
incremented once a packet is transmitted, i.e. every 17 <strong>data</strong> words;<br />
– cyclic event number (3 bits): this is the event number coming from<br />
the event-counter block;<br />
– available bits (9 bits): these will be used in a future expansion <strong>of</strong><br />
CARLOS;<br />
– half detector id (3 bits): every half ladder contains 8 half detectors.<br />
They are numbered from 0 to 7 and this number is provided by<br />
the macro-channel being served.<br />
Let’s take a look at the 2 sub-blocks <strong>of</strong> the outmux :
3.5 — CARLOS v2<br />
– femux is a multiplexer with nine 32-bit inputs and a 9-bit selection<br />
bus. The 9 <strong>data</strong> inputs are the header and the 8 input channels<br />
coming from the FIFOs. The selection bus value is given by the<br />
queues scheduler: this bus contains all zeros but one.<br />
– ppe stands for programmable priority encoder. It is a completely<br />
combinatorial block with two inputs and one output: request (8<br />
bits) contains the query signals coming from the 8 macro-channels;<br />
priority (8 bits) is a bus containing only one 1 and all the other<br />
bits at 0; served (8 bits), like priority, contains only one bit at<br />
logic level 1 and this bit indicates which <strong>of</strong> the 8 macro-channels<br />
has to be served from the femux.<br />
The programmable priority encoder works in a very simple way:<br />
it scans the request bus starting from the bit stuck at 1 in the<br />
priority bus until it finds a 1. Its bit position from 0 to 7 corresponds<br />
to the channel chosen by the arbiter. At the next choice<br />
that the arbiter has to take, the priority bus value is updated in<br />
the following way: the served bus value is shifted on the right as if<br />
it were a circular register and its value is assigned to the priority<br />
bus. In this way we avoid the risk <strong>of</strong> a queue being served many<br />
times consecutively in spite <strong>of</strong> other queues making requests. An<br />
example will easily clarify this situation: request = 10100010, priority<br />
= 00010000, served = 00000010. At the next clock cycle,<br />
the value ”00000001” will be assigned to the priority bus. There<br />
are several possible <strong>implementation</strong>s for a scheduling algorithm<br />
based on a programmable priority encoder: they differ in area<br />
and timing requirements. We chose the <strong>implementation</strong> used in<br />
the Stanford University’s Tiny Tera prototype as described in [18].<br />
I’ll try now to explain how the outmux block works: the outmux block<br />
is stopped and it is initialized when the reset signal is active. When<br />
the reset is inactive, the outmux block begins waiting for the enableread<br />
signal to get active. This is a signal coming from the feesiu block:<br />
when low it states that the link between the SIU and CARLOS has<br />
79
80<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
not been initialized yet or it means that temporarily the SIU cannot<br />
accept <strong>data</strong>. When the enable-read is high, the SIU is able to receive<br />
<strong>data</strong> from CARLOS, so the outmux block begins evaluating the value<br />
<strong>of</strong> the query bus. When its value is low it means that no macro-channel<br />
has still required to be served, otherwise the ppe block decides which<br />
queue to send in output. The first word served as output is the header<br />
word containing the information on the macro-channel being served<br />
and other information as stated above in the paragraph. In order to<br />
get the 16 <strong>data</strong> words to send as output, the outmux block has to<br />
provide the right pop signal to send to one <strong>of</strong> the 8 FIFOs. The 8<br />
pop signals to the FIFOs are grouped in the 8-bit read bus; <strong>of</strong> course<br />
only one bit at a time will be asserted. Signal read(7) will be sent to<br />
fifonew7, read(6) to fifonew6 and so on, as to extract 16 valid <strong>data</strong><br />
from the FIFO. Since we want to send <strong>data</strong> to the SIU at a 20 MHz<br />
clock (half the system clock frequency) the pop signal cannot be stuck<br />
at 1 for 16 clock periods but it is alternatively 0 and 1 in order to get<br />
a <strong>data</strong> word out from the FIFO one clock period every two. When<br />
the outmux block is putting in output the 17 words <strong>of</strong> the packet, the<br />
output signal good-<strong>data</strong> is set to 1 in order to grant the feesiu block<br />
that it is receiving significant <strong>data</strong>. While sending the last <strong>data</strong> word<br />
<strong>of</strong> a packet, the outmux block updates the priority bus value as stated<br />
above and examines the query bus value, then it computes the right<br />
served value. If served is not 0, that is if any request has occurred, the<br />
outmux block begins sending in output an other packet, without any<br />
interruptions (there are not wasted clock periods), otherwise the block<br />
stops waiting for a new request to be asserted. If the enable-read turns<br />
from 1 to 0 when transmitting <strong>data</strong>, the outmux block sends only an<br />
other valid word in output, then stops and waits for the enable-read<br />
signal to be restored to its active value: then it continues sending <strong>data</strong><br />
to the feesiu block as if no pause had really occurred. The outmux<br />
block itself provides to increment the 10-bit packet sequence number<br />
after every packet has been completely transmitted.<br />
The reason why a 20 MHz clock has been chosen is related to the
3.5 — CARLOS v2<br />
total optical fibre bandwidth to be used by CARLOS: 800 Mbits/s. If<br />
CARLOS puts in output 32-bit <strong>data</strong> at 40 MHz the total bandwidth<br />
required is 1.280 Gbits/s, while at 20 MHz only 640 Mbits/s. For this<br />
reason a half-frequency <strong>data</strong> rate has been chosen as the final one.<br />
3.5.6 The feesiu (toplevel) block<br />
The I/O signals are:<br />
– huffman7 : input 32-bit bus;<br />
– huffman6 : input 32-bit bus;<br />
– huffman5 : input 32-bit bus;<br />
– huffman4 : input 32-bit bus;<br />
– huffman3 : input 32-bit bus;<br />
– huffman2 : input 32-bit bus;<br />
– huffman1 : input 32-bit bus;<br />
– huffman0 : input 32-bit bus;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– end-trace7 : input signal;<br />
– end-trace6 : input signal;<br />
– end-trace5 : input signal;<br />
– end-trace4 : input signal;<br />
– end-trace3 : input signal;<br />
– end-trace2 : input signal;<br />
– end-trace1 : input signal;<br />
– end-trace0 : input signal;<br />
– fidir: input signal;<br />
81
82<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
– fiben-n: input signal;<br />
– filf-n: input signal;<br />
– half-ladder-id: input 7-bit bus;<br />
– wait-request7 : output signal;<br />
– wait-request6 : output signal;<br />
– wait-request5 : output signal;<br />
– wait-request4 : output signal;<br />
– wait-request3 : output signal;<br />
– wait-request2 : output signal;<br />
– wait-request1 : output signal;<br />
– wait-request0 : output signal;<br />
– foclk: output signal;<br />
– fbten-n: bidirectional signal;<br />
– fbctrl-n: bidirectional signal;<br />
– fobsy-n: output signal;<br />
– fbd: bidirectional 32-bit bus.<br />
The VHDL feesiu block contains all the other block instances (see<br />
Fig. 3.11) and the logic working as interface with the SIU board. So<br />
far the feesiu block contains 8 instances <strong>of</strong> firstcheck, 8 instances <strong>of</strong><br />
barrel, 8 instances <strong>of</strong> fifo, 1 instance <strong>of</strong> event-counter and 1 instance <strong>of</strong><br />
outmux. However we can imagine the feesiu block as the block taking<br />
<strong>data</strong> from the outmux block and directly interfacing the SIU board, as<br />
if it were at the same hierarchical level as the other blocks. In Fig. 3.9<br />
the feesiu block is represented exactly in this fashion.<br />
3.5.7 CARLOS-SIU interface<br />
Let’s now take a look the interface signals between CARLOS and the<br />
SIU and how the communication protocol has been implemented:
3.5 — CARLOS v2<br />
Figure 3.11: Design hierarchy <strong>of</strong> CARLOS v1<br />
– fidir: it’s an input to CARLOS. It asserts the direction <strong>of</strong> the<br />
<strong>data</strong> flow between CARLOS and the SIU: when low, <strong>data</strong> flow is<br />
directed from the SIU to CARLOS, otherwise <strong>data</strong> flow is directed<br />
from CARLOS to the SIU.<br />
– fiben-n: it’s an input to CARLOS, active low. It enables the communication<br />
on the bidirectional buses between CARLOS and the<br />
SIU. When low, communication is enabled, otherwise communication<br />
is disabled.<br />
– filf-n: it’s an input to CARLOS, active low, ”lf” stands for link<br />
full. When the SIU is no longer able to accept <strong>data</strong> coming from<br />
CARLOS, it puts this signal active. When this happens CARLOS<br />
sends an other valid <strong>data</strong> word, then stops transmitting waiting<br />
for the filf-n signal to be asserted again. This is the signal used by<br />
the SIU to implement the back-pressure on the <strong>data</strong> flow running<br />
from the front-end to the <strong>data</strong> acquisition system.<br />
– foclk: it is a free running clock generated on CARLOS and driving<br />
83
84<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
the CARLOS-SIU interface. It is a 20 MHz clock generated by<br />
dividing the system clock frequency by 2. Interface signals coming<br />
from the SIU are triggered on the falling edge <strong>of</strong> foclk.<br />
– fbten-n: it is a bidirectional signal, active low, it can be driven by<br />
CARLOS or by the SIU, ”ten” stands for transfer enable. When<br />
CARLOS is assigned to drive the bidirectional buses (when fidir<br />
is high and fiben-n is 0) fbten-n value is asserted from CARLOS: it<br />
turns to its active state when CARLOS is transmitting valid <strong>data</strong><br />
to the SIU, otherwise it is inactive. When the SIU is assigned<br />
to drive the bidirectional buses (when fidir is 0 and fiben-n is<br />
0) fbten-n value is asserted from the SIU: it turns to its active<br />
state when the SIU is transmitting valid commands to CARLOS,<br />
otherwise it is inactive.<br />
– fbctrl-n: it is a bidirectional signal, active low, it can be driven by<br />
CARLOS or by the SIU, ”ctrl” stands for control. When CARLOS<br />
is assigned to drive the bidirectional buses (when fidir is 1 and<br />
fiben-n is 0) fbctrl-n value is asserted from CARLOS: it turns<br />
to its active state when CARLOS is transmitting a Front End<br />
Status Word to the SIU, otherwise, when in the inactive state,<br />
CARLOS is sending normal <strong>data</strong> to the SIU. When the SIU is<br />
assigned to drive bidirectional buses (when fidir is 0 and fiben-n<br />
is 0) fbctrl-n value is asserted from the SIU: it turns to its active<br />
state when sending command words to CARLOS, to its inactive<br />
state when sending <strong>data</strong> words. The second option has not been<br />
implemented on CARLOS since we decided that CARLOS needs<br />
only commands and not <strong>data</strong> from the SIU. Other detectors use<br />
this option in order to download <strong>data</strong> to the detector itself: this<br />
is the case, for example, <strong>of</strong> the Silicon Pixel Detector.<br />
– fobsy-n: it is an input signal to the SIU, active low, ”bsy” stands<br />
for busy. CARLOS should put this signal active when not able<br />
to accept <strong>data</strong> coming from the SIU. Since CARLOS has not to<br />
receive <strong>data</strong> from the SIU, this signal has been stuck at 1, meaning
3.5 — CARLOS v2<br />
that CARLOS will never be in a busy state. In fact it always has<br />
to accept command words coming from the SIU.<br />
– fbd: it is a bidirectional 32-bit bus on which <strong>data</strong> or command<br />
words are exchanged between CARLOS and the SIU.<br />
This is the way the communication protocol works: the SIU acts as the<br />
master and CARLOS acts as the slave, i.e. the SIU sends commands to<br />
CARLOS and CARLOS sends <strong>data</strong> and front end status words to the<br />
SIU. At first the link CARLOS - SIU has to be initialized and the SIU<br />
acts as the master <strong>of</strong> the bidirectional buses. So CARLOS waits for the<br />
bidirectional buses to be driven from the SIU (fidir is 0 and fiben-n is<br />
0) and waits for a valid (fbten-n = 0) command (fbctrl-n =0)named:<br />
Ready to Receive (RDYRX). This command is always used in order<br />
for a new event transaction to begin. The RDYRX command contains<br />
a transaction identifier (bits 11 to 8) and the string ”00010100” as the<br />
less significant bits.<br />
As the command is accepted and recognized, CARLOS waits for the<br />
fidir signal to change value in order to take possession <strong>of</strong> the bidirectional<br />
buses, then, if the filf-n is not active, it is able to send valid<br />
<strong>data</strong> on the fbd bus if the good-<strong>data</strong> signal is active. In this state,<br />
CARLOS sends valid <strong>data</strong> <strong>of</strong> an event to the SIU only when some<br />
queues are making requests <strong>of</strong> being served in output, otherwise the<br />
feesiu stops sending <strong>data</strong> by putting the fbten-n signal to 1. When<br />
an end-trace signal has arrived on each macrochannel and every queue<br />
has been completely emptied (no more <strong>data</strong> <strong>of</strong> a particular event are<br />
stored in CARLOS yet), CARLOS puts in output the Front End Status<br />
Word (FESTW), a word that confirms that no errors occurred and<br />
that the whole event has been successfully transferred to the SIU. The<br />
FESTW contains the Transaction Id code received upon the opening <strong>of</strong><br />
the transaction (bits 11 to 8) and the 8-bit FESTW code ”01100100”.<br />
After this happens CARLOS begins to wait for some action <strong>of</strong> the SIU<br />
to be taken: it means that the SIU can decide to take back its control<br />
on the bidirectional buses and close the <strong>data</strong> link towards the <strong>data</strong> ac-<br />
85
86<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
quisition system, or the SIU can leave the bidirectional buses control to<br />
CARLOS for an other <strong>data</strong> event to be sent. So far, CARLOS begins<br />
waiting 16 foclk periods: if nothing happens, CARLOS is able to begin<br />
sending <strong>data</strong> again without the need to receive some other commands<br />
from the SIU; if the SIU takes back the possession <strong>of</strong> the bidirectional<br />
buses, CARLOS closes the link towards the SIU and keeps waiting for<br />
an other RDYRX command raised from the SIU itself.<br />
The feesiu block implements this communication protocol with the SIU<br />
using a simple state-machine: for example state 0 is the state in which<br />
CARLOS is waiting for a command <strong>of</strong> initialization from the SIU, state<br />
1 is the state in which CARLOS sends <strong>data</strong> from the SIU, state 2 in<br />
which CARLOS sends the front end status word to the SIU, state 3<br />
in which CARLOS waits 16 foclk periods waiting for some action from<br />
the SIU to happen.<br />
An important feature <strong>of</strong> CARLOS realized in the feesiu blockisthe<br />
following one: CARLOS cannot accept a new event before the previous<br />
one has been completely sent in output, otherwise we run into the<br />
risk <strong>of</strong> mixing <strong>data</strong> belonging to different events. The only way CAR-<br />
LOS has to implement back-pressure on the AMBRA chips is using the<br />
wait-request signals. So far the wait-request signal has to avoid that<br />
CARLOS fetches new input <strong>data</strong> values while emptying the FIFOs.<br />
For this reason a new signal, dont-send-<strong>data</strong>, has been introduced for<br />
every macro-channel which turns to 1 when the end-trace is activated<br />
and turns back to 0 when all the FIFOs are completely empty. So<br />
the wait-request <strong>of</strong> every macro-channel is obtained by putting in OR<br />
the full and dont-send-<strong>data</strong> signals. The feesiu acknowledges that all<br />
the FIFOs have been emptied using the empty signal <strong>of</strong> every FIFO<br />
block. When all the 8 signals turn to 1 the feesiu block raises the allfifos-empty<br />
signal which stands at logical level 1 for at least two clock<br />
periods in order to be sensed by the foclk clock. The all-fifos-empty signal<br />
is also used to trigger the event-counter block: in fact the number<br />
<strong>of</strong> total events is exactly the same as the total number <strong>of</strong> occurrences<br />
<strong>of</strong> the all-fifos-empty signal. An other signal, end-trace-global is set to
3.6 — CARLOS v2 design flow<br />
Figure 3.12: Digital design flow for CARLOS v2<br />
1 only if all the local end-trace signals have been put to 1 for at least<br />
one clock period in the current event. From the moment in which the<br />
end-trace-global is asserted and when the all-fifos-empty is activated<br />
no new input <strong>data</strong> set can enter CARLOS.<br />
3.6 CARLOS v2 design flow<br />
Fig. 3.12 illustrates the digital design flow for CARLOS v2. The front<br />
end steps are exactly the same as the ones followed in the design <strong>of</strong><br />
CARLOS v1. The only difference is the library used, being, in this<br />
case, the Alcatel Mietec 0.35 µm digital library provided via Europractice.<br />
This is a very rich library since it contains more than 200<br />
differents standard cells and RAM blocks with several dimensions. A<br />
87
88<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Figure 3.13: Layout <strong>of</strong> the ASIC CARLOS v2<br />
RAM generator s<strong>of</strong>tware allows the designer to get a macrocell with<br />
the exact number <strong>of</strong> words and bits per word as requested: in our case<br />
a 64 32-bit macrocell instantiated 8 times, one for macrochannel.<br />
The back end steps were carried out at IMEC using the Avant! s<strong>of</strong>tware<br />
Acquarius. We could not succeed to get a license <strong>of</strong> this s<strong>of</strong>tware<br />
due to the high cost (more then 100k$ for a license), while no other<br />
available s<strong>of</strong>tware, such as Cadence, was able to work with the design<br />
kit provided. The final physical layout is depicted in Fig. 3.13. The<br />
chip has a total area <strong>of</strong> 30 mm2 containing 300 k standard cells, 180
3.7 — Tests performed on CARLOS v2<br />
I/O pads and 24 RAM blocks.<br />
After the design <strong>of</strong> the layout, IMEC sent us the post-layout netlist<br />
and a SDF file (Standard Delay Format) containing the information<br />
on each net and cell delay for post-layout simulation with the same<br />
test-benches already used for pre-layout simulation. This is usually an<br />
iterative process since, if some simulation problems arise, the layout<br />
has to be re-designed. Luckily due to the relatively small working frequency<br />
(40 MHz) (the technology adopted can easily work up to 200<br />
MHz) the post-layout simulation gave no problems and the design was<br />
then sent to the foundry.<br />
3.7 Tests performed on CARLOS v2<br />
After receiving from the Alcatel Mietec foundry 20 samples <strong>of</strong> naked<br />
chips (without any package), they have been directly bonded on the<br />
test PCB at the <strong>INFN</strong> <strong>of</strong> Torino, one sample per PCB. The test PCB<br />
shown in Fig 3.14, especially designed for testing CARLOS v2 and for<br />
its use in test beam <strong>data</strong> taking, contains the following:<br />
– 5 2x10 pins DIL connectors pin compatible with the pattern generator<br />
and logic analyzer HP16600/16700A pods;<br />
– 2 Mictor 38 connectors;<br />
– a DIP switch providing a facility to setup the hardwired parameters,<br />
such as the half ladder ID;<br />
– filter capacitors for a total capacity greater than 100 nF;<br />
– buffers for preserving CARLOS input pads integrity.<br />
After testing the JTAG control unit on CARLOS, the connection towards<br />
the SIMU was successfully tested: after the SIMU opens a transaction,<br />
CARLOS takes possession <strong>of</strong> the bidirectional buses and starts<br />
sending <strong>data</strong>. After these tests, the SIMU has been replaced by the<br />
SIU board and all the <strong>data</strong> acquisition system, i.e. DIU (Destination<br />
89
90<br />
1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />
Figure 3.14: CARLOS v2 test board<br />
Interface Unit) and PCI RORC (Read Out Receiver Card) directly connected<br />
to a PC. So far testing CARLOS behavior with huge amounts<br />
<strong>of</strong> <strong>data</strong> becomes easier to simply use the Logic State Analyzer and the<br />
complete <strong>data</strong> acquisition system can be used to acquire <strong>data</strong> in test<br />
beams.
Chapter 4<br />
2D <strong>compression</strong> algorithm<br />
and <strong>implementation</strong><br />
This chapter contains a brief description <strong>of</strong> the 2D algorithm [19] conceived<br />
at the <strong>INFN</strong> Section <strong>of</strong> Torino and a first <strong>implementation</strong> attempt<br />
in ASIC with the third prototype <strong>of</strong> CARLOS.<br />
4.1 2D <strong>compression</strong> algorithm<br />
The 2D algorithm operates a <strong>data</strong> reduction based on a two-threshold<br />
discrimination and a two-dimensional analysis along both the drift time<br />
axis and the SDD anode axis. The proposed scheme allows for a better<br />
understanding <strong>of</strong> the neighbourhoods <strong>of</strong> the SDD signal clusters,<br />
thus improving their reconstructability and also provides a statistical<br />
monitoring <strong>of</strong> the background features for each SDD anode.<br />
4.1.1 Introduction<br />
As shown in Chapter 3, due to the presence <strong>of</strong> noise a simple singlethreshold<br />
one-dimensional zero suppression does not allow a good clus-<br />
91
92<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
ter reconstruction in all circumstances. Indeed in order to obtain a<br />
good <strong>compression</strong> factor using the 1D algorithm a threshold <strong>of</strong> about<br />
three times the RMS <strong>of</strong> the noise has to be used. Such threshold <strong>of</strong>ten<br />
determines a rather sharp cut <strong>of</strong> the tails <strong>of</strong> the anode signals containing<br />
high samples and, more important, it can completely suppress the<br />
anodic signals with small values which are on the sides <strong>of</strong> the cluster.<br />
Both these sharp cuts, particularly the latter, can significantly affect<br />
the spatial resolution. Though samples below a 3 RMS threshold have<br />
small information contents, it is conceivable that, in the more accurate<br />
<strong>of</strong>f-line analysis, they can help to improve the pattern recognition and<br />
the fitting <strong>of</strong> the cluster features. In order to read out small-amplitude<br />
samples without increasing too much the collection <strong>of</strong> the noise, a twothreshold<br />
algorithm can be used, so that small samples that satisfy a<br />
low threshold are collected only when, along the drift direction, they<br />
are near to samples satisfying a high threshold. Since the charge cloud<br />
diffuses in two orthogonal directions for symmetry reasons and due the<br />
previous considerations, the two-threshold method should be applied<br />
along the anode axis too. We want that such a two-threshold twodimensional<br />
<strong>data</strong> <strong>compression</strong> and zero suppression algorithm satisfy<br />
the following criteria:<br />
– the values <strong>of</strong> the samples, in the neighbourhood <strong>of</strong> a cluster, be<br />
available both for an accurate measurement <strong>of</strong> the characteristics<br />
<strong>of</strong> the clusters and for a good monitoring and understanding <strong>of</strong><br />
the characteristics <strong>of</strong> the background;<br />
– the statistical nature <strong>of</strong> the suppressed samples be available to<br />
monitor the noise level <strong>of</strong> the anodes and to obtain their baseline<br />
values, which have to be subtracted from the cluster samples in<br />
order to obtain a correct measurement <strong>of</strong> the related charge.<br />
Here follows a description <strong>of</strong> the studied algorithm: the <strong>data</strong> reduction<br />
algorithm is applied to the resulting matrix <strong>of</strong> 256 rows by 256<br />
columns like the one shown in the upper part <strong>of</strong> Fig. 4.1. Each matrix<br />
element expresses an 8-bit quantized amplitude. A row represents a
4.1 — 2D <strong>compression</strong> algorithm<br />
Figure 4.1: Example <strong>of</strong> the digitized <strong>data</strong> produced by a half SDD<br />
time sequence <strong>of</strong> the samples from a single SDD anode and a column<br />
represents a spatial snapshot <strong>of</strong> the simultaneous anode outputs for an<br />
instant <strong>of</strong> time. For each charge cloud we expect several high values in<br />
one or more columns and rows. This extension in both time and space<br />
thus requires that correlations in both dimensions be preserved for future<br />
analysis. We refer to correlations within a column as space-like<br />
and correlations within a row as time-like. Therefore, in the proposed<br />
two-threshold two-dimensional algorithm, the high threshold TH must<br />
be satisfied by a pixel value in order that it be part <strong>of</strong> a cluster, and the<br />
93
94<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
W<br />
N<br />
C<br />
S<br />
Figure 4.2: Neighbourhood <strong>of</strong> the pixel C<br />
low threshold TL leads to the registering <strong>of</strong> a pixel whose value satisfies<br />
it, if adjacent to an other pixel satisfying TH. In this way the lower<br />
value pixels on the border <strong>of</strong> a cluster are encoded thus ensuring that<br />
the tails <strong>of</strong> the charge distribution are retrieved.<br />
Within this framework, a cluster is redefined operationally as a set <strong>of</strong><br />
adjacent pixels whose values tend to stand out above the background.<br />
In the described algorithm there is a trade-<strong>of</strong>f in the definition <strong>of</strong> such<br />
a cluster, which lies in the definition <strong>of</strong> adjacency. We have considered<br />
as adjacent (or neighbour) to the (i, j) element, the pixels for which<br />
only one <strong>of</strong> the two indexes change by 1: so far the neighbour pixels are<br />
(i − 1,j), (i +1,j), (i, j − 1) and (i, j + 1). Thus a correlation involves<br />
a quintuple composed <strong>of</strong> a central (C) pixel and its north (N), south<br />
(S), east (E) and west (W) neighbours only (see Fig. 4.2. In order to<br />
monitor the statistical nature <strong>of</strong> the suppressed samples, the number <strong>of</strong><br />
zero quantized values (due either to negative analog values <strong>of</strong> the noise<br />
or to baseline equalization), and the numbers <strong>of</strong> samples satisfying TH<br />
and TL are recorded. The background average and standard devia-<br />
tion are obtained by applying a minimization procedure to the three<br />
counted <strong>data</strong>. An aspect <strong>of</strong> this reduction algorithm allows the conservation<br />
<strong>of</strong> information about the background both near and far from the<br />
clusters. When the thresholds are properly chosen, statistically, pairs<br />
and a few triplets <strong>of</strong> background pixels not associated with a particleproduced<br />
cluster will satisfy the described discrimination criteria and<br />
E
4.1 — 2D <strong>compression</strong> algorithm<br />
Figure 4.3: Cluster in two dimensions and its slices along the anode direction<br />
provide consistency information on the background statistics, assumed<br />
to be Gaussian white noise. At the same time single high background<br />
peaks are suppressed as zeros (if they do not have at least one neighbour<br />
that satisfies at least the low threshold) so as not to overload the<br />
<strong>data</strong> acquisition and to allow an efficient zero suppression. The only<br />
parameters needed as input to the 2D <strong>compression</strong> algorithm are the<br />
two thresholds, TH, TL and the baseline equalization values.<br />
4.1.2 How the 2D algorithm works<br />
The 2D algorithm makes use <strong>of</strong> two threshold values:<br />
– a high threshold TH for cluster selection;<br />
– a low threshold TL so to collect information around the selected<br />
cluster.<br />
The algorithm retains <strong>data</strong> belonging to a cluster and around a cluster<br />
in the following way (as graphically shown as an example in Fig. 4.3):<br />
– the pixel matrix is scanned searching for values higher than the<br />
TH value (70 in Fig. 4.3);<br />
– the pixels positioned around the previously selected ones are accepted<br />
if higher than the low threshold value TL (40 in Fig. 4.3),<br />
otherwise they are rejected;<br />
95
96<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
– thus a cluster is defined and cluster values are saved exactly as<br />
they are: other pixels, not belonging to clusters, are discarded;<br />
– if a pixel value higher than the TH value is found but it has not<br />
pixel values higher than TL around its value is rejected. This is<br />
the case <strong>of</strong> the 78 value on the bottom-left corner in Fig. 4.3 which<br />
is discarded, even it its value is greater than the high threshold<br />
value.<br />
– pixel values belonging to a cluster are encoded using a simple lookup<br />
table method, assigning long codes to non-frequent values and<br />
short codes to frequent symbols.<br />
So far in Fig. 4.3, after applying the 2D <strong>compression</strong> algorithm, only<br />
the shadowed values are stored, while the other value ares erased. The<br />
2D algorithm is conceptually very simple to understand, but it is quite<br />
more complex than the 1D for what concerns <strong>hardware</strong> <strong>implementation</strong>.<br />
In fact having to perform a bi-dimensional analysis <strong>of</strong> the pixel array<br />
implies the need <strong>of</strong> storing all the information on a digital buffer on<br />
CARLOS, thus requiring a larger silicon surface and a higher cost.<br />
4.1.3 Compression coefficient<br />
Fig. 4.4 shows the 2D <strong>compression</strong> coefficient as a function <strong>of</strong> the high<br />
threshold value, calculated using <strong>data</strong> coming from the test beam <strong>of</strong><br />
September 1998. The 2D <strong>compression</strong> algorithm reaches a <strong>compression</strong><br />
ratio<strong>of</strong>22choosingTHvalue <strong>of</strong> 1.5 noise RMS and TL <strong>of</strong> 1.2 noise<br />
RMS. It is to be remembered that the 1D <strong>compression</strong> algorithm had<br />
to use a threshold level <strong>of</strong> 3 noise RMS in order to reach the target<br />
<strong>compression</strong> ratio. So far the 2D algorithm shows higher performances<br />
than the 1D since it reaches the target <strong>compression</strong> ratio, while losing<br />
a lower amount <strong>of</strong> physical information. This is the main reason why<br />
the 2D algorithm has been chosen as the one that will be implemented<br />
on the final version <strong>of</strong> CARLOS.
4.1 — 2D <strong>compression</strong> algorithm<br />
Figure 4.4: 2D <strong>compression</strong> coefficient ratio as a function <strong>of</strong> the high<br />
threshold<br />
4.1.4 Reconstruction error<br />
Even for what concerns the reconstruction error, the 2D algorithm<br />
proves to have better performances than 1D. In fact the difference values<br />
between cluster centroid position before and after <strong>compression</strong> are<br />
fitted by a Gaussian distribution centered around the 0 value with a<br />
σ value <strong>of</strong> 10 µm along the drift time direction and 10 µm alongthe<br />
anode direction, choosing 1.5 noise RMS for TH and 1.2 noise RMS for<br />
TL. So far the 2D algorithm manages to achieve a better cluster center<br />
resolution than 1D by keeping track <strong>of</strong> more pixel values around the<br />
cluster center. Moreover the 2D algorithm introduces a smaller bias on<br />
the reconstructed charge than 1D with a value <strong>of</strong> around 3 %, meaning<br />
that the reconstructed cluster charge is 3 % lower than before <strong>compression</strong><br />
- de<strong>compression</strong> steps.<br />
Beside that the 2D algorithm is very useful for what concerns the study<br />
<strong>of</strong> the noise distribution: in fact monitoring the couples <strong>of</strong> noise samples<br />
passing the double threshold filter allows to recover information<br />
on the average and on the standard deviation <strong>of</strong> the Gaussian noise<br />
distribution. This is quite important for checking how the signal to<br />
background ratio changes in time.<br />
97
98<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
If used in lossless mode, the 2D <strong>compression</strong> ratio is 1.3 versus the<br />
2.3 value obtained using the lossless version <strong>of</strong> the 1D algorithm: this<br />
requires a more complex second level compressor in counting room, in<br />
order to reach the target <strong>compression</strong> ratio <strong>of</strong> 22, in the case the 2D<br />
<strong>compression</strong> algorithm cannot be applied to <strong>data</strong>. In fact there are<br />
some cases in which it might prove no longer desirable the use <strong>of</strong> the<br />
2D <strong>compression</strong> algorithm: for example when the baseline value is not<br />
constant through the 256 samples <strong>of</strong> an anode row. This is the case <strong>of</strong><br />
the present version <strong>of</strong> the PASCAL chip, which introduces a slope in<br />
each anode row baseline and, what is worst, the slope value varies from<br />
different rows. It is obvious that a fixed double-threshold compressor,<br />
as the one explained in this Chapter, cannot deal with this problem. So<br />
far the foreseen solution is to eliminate the baseline slope in the final<br />
version <strong>of</strong> PASCAL. If this proves to be not possible or if a baseline<br />
with slope behavior emerges after some working time, the use <strong>of</strong> the<br />
2D algorithm can no longer be accepted. In this case <strong>data</strong> <strong>compression</strong><br />
on CARLOS has to be switched <strong>of</strong>f and a second level compressor algorithm<br />
implemented directly in counting room will do the job.<br />
4.2 CARLOS v3 vs. the previous prototypes<br />
There are several differences between CARLOS v3 and the previous<br />
versions. This is a brief list containing the most important ones:<br />
1. CARLOS v1 and v2 were meant to work in a radiation free environment,<br />
since, when they were designed, the problem <strong>of</strong> radiation<br />
had not been faced yet. So far commercial technologies such as<br />
Xilinx FPGAs or Alcatel Mietec design kit have been chosen for<br />
prototype <strong>implementation</strong>. The necessity for CARLOS to work in<br />
a radiation environment emerged some times after sending CAR-
4.2 — CARLOS v3 vs. the previous prototypes<br />
LOS v2 to the foundry. The radiation level CARLOS has to withstand<br />
is in the range from 5 to 15 krads. This led us to the search<br />
<strong>of</strong> a radiation-safe technology.<br />
One <strong>of</strong> the possible solutions is given by SOI (Silicon On Insulator)<br />
technology which provide a complete radiation resistance. This is<br />
the case for instance <strong>of</strong> the 0.8 µm DMILL technology that is being<br />
widely used even in satellite applications at ESA (European<br />
Space Agency). The problem related to this technology in mainly<br />
one: the cost is too high for our budget. So far we decided to<br />
choose a commercial technology, IBM 0.25 µm, with a library <strong>of</strong><br />
standard cells designed to be radiation tolerant up to some Mrads.<br />
The library has been designed by the EP-MIC group at CERN.<br />
2. Mechanical constraints emerged not allowing the use <strong>of</strong> the SIU in<br />
the end-ladder zone, since it is far too big for the space available.<br />
Another problem concerning the SIU is that this device cannot<br />
safely work in a radiation environment since it contains commercial<br />
devices, such as ALTERA PLDs. Finally the laser driver<br />
hosted on the SIU board has a mean life <strong>of</strong> a few years, while we<br />
are looking for something lasting until the end <strong>of</strong> the experiment<br />
<strong>data</strong> taking.<br />
These considerations led us to change all the readout architecture<br />
from CARLOS to the DAQ. Instead <strong>of</strong> directly interfacing<br />
the SIU, CARLOS v3 interfaces the radiation-tolerant serializer<br />
GOL chip (Gigabit Optical Link) [20]. Serial <strong>data</strong> is then sent<br />
to the counting room using a 200 m long optic fibre, deserialized<br />
using a commercial deserializer device and then sent to the SIU<br />
board using a FPGA device named CARLOS-rx that is still to<br />
be designed. This final readout architecture is shown in details in<br />
Fig. 4.5.<br />
3. CARLOS v3 contains only 2 <strong>data</strong> processing channels, versus the<br />
8 hosted in the two previous prototypes. This choice was due to<br />
99
100<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
the need <strong>of</strong> reducing the ASIC complexity and to greatly reduce<br />
the possibility <strong>of</strong> losing <strong>data</strong> in case <strong>of</strong> chip failure. In fact if<br />
a CARLOS v2 chip breaks down for some reasons, <strong>data</strong> coming<br />
from a half-ladder, i.e. from 4 detectors, is completely lost until<br />
the chip is substituted with a working one. On the other side, if<br />
a CARLOS v3 chip breaks down, only <strong>data</strong> coming from an SDD<br />
detector are lost. So far a 2-channel version <strong>of</strong> CARLOS provides<br />
a greater failure resistance and is far less complex.<br />
4. CARLOS v3 contains a preliminary interface with the TTCrx chip<br />
that distributes trigger signals and the clock to the end-ladder<br />
board.<br />
5. CARLOS v3 also contains a BIST structure (Built In Self Test)<br />
for a quick test <strong>of</strong> the chip itself issued via the JTAG port.<br />
Figure 4.5: The final readout chain
4.3 — The final readout architecture<br />
4.3 The final readout architecture<br />
The chosen architecture for the final readout system introduces new<br />
items to carry on and new problems to solve.<br />
For instance splitting CARLOS in 4 chips makes every chip much simpler<br />
to design, test and control (CARLOS v2 is a very complex and<br />
difficult to debug chip), but moving the SIU board in counting room<br />
implies the design <strong>of</strong> the CARLOS-rx device taking <strong>data</strong> from 4 deserializer<br />
chips and feeding <strong>data</strong> to the SIU.<br />
Beside that, putting a 200 m distance between CARLOS and the SIU<br />
implies that no back-pressure can be used: in fact if the SIU asserts<br />
the filf − n signal, meaning that it cannot accept further <strong>data</strong> starting<br />
from the following foclk signal, CARLOS receives this information<br />
after 2 µs, i.e. after 40 foclk cycles. So far the CARLOS-rx chip has<br />
to contain a well-sized FIFO buffer chip to store <strong>data</strong> when the SIU is<br />
not able to accept them.<br />
The role <strong>of</strong> the JTAG link is shown in Fig. 4.6. In the new architecture<br />
a transaction can be opened and closed via the JTAG link, instead <strong>of</strong><br />
using the 32-bit bus fbd. The JTAG link is obtained serializing the<br />
5-bit JTAG port coming from the SIU for transmission to the frontend<br />
zone through an optic fibre, then the HAL (Hardware Abstraction<br />
Layer) chip performs the serial to parallel conversion for distributing<br />
the JTAG signals to the PASCAL, AMBRA and CARLOS chips. A<br />
rad-hard version <strong>of</strong> the HAL chip has to be implemented yet.<br />
Currently we plan to use a commercial pair <strong>of</strong> chips for serializingdeserializing<br />
<strong>data</strong> from Agilent Technologies: in the final architecture<br />
the serializer chip will be substituted with the rad-hard Gigabit Optical<br />
Link (GOL) chip designed by the Marchioro group at CERN. This chip<br />
is a multi-protocol high-speed transmitter ASIC, wich is able to withstand<br />
high doses <strong>of</strong> radiation. The IC supports two standard protocols,<br />
the G-Link and GBit-Ethernet and sustains transmission <strong>data</strong> at both<br />
800 Mbits/s and 1.6 Gbits/s. The ASIC was implemented using CERN<br />
library 0.25 µm CMOS technology employing radiation tolerant layout<br />
101
102<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
Figure 4.6: Final readout chain zoom<br />
techniques.<br />
A problem concerning the use <strong>of</strong> the GOL chip is to be solved yet: the<br />
TTCrx chip distributes to all front-end chips a clock with a maximum<br />
jitter <strong>of</strong> around 300 ps. This is not a problem for AMBRA and CAR-<br />
LOS ICs working at 40 MHz but it proves to be a big problem for the<br />
GOL chip, since it contains an internal PLL to multiply the incoming<br />
40 MHz clock by 20 or 40, so to get an internal 800 MHz or 1.6 GHz<br />
frequency. The PLL shows some synchronization problems with the<br />
incoming clock if the input jitter is greater than 100 ps. This problem<br />
has still to be faced and solved.<br />
4.4 CARLOS v3<br />
CARLOS v3 is our first prototype tailored to fit in the new readout<br />
architecture. The main new features <strong>of</strong> this chip are:
4.5 — CARLOS v3 building blocks<br />
– two processing channels;<br />
– the radiation tolerant technology chosen.<br />
Nevertheless CARLOS v3 does not contain the complete 2D <strong>compression</strong><br />
algorithm as would be expected. We made this choice in order to<br />
acquire experience with a small chip with the new technology and with<br />
the new layout techniques since we had to carry out the layout design<br />
task. Taking into account that the CERN 0.25 µm library contains a<br />
small number <strong>of</strong> standard cells and they are not so well characterized as<br />
commercial ones, we decided to try the new design flow and new technology<br />
with a simple chip: the result is CARLOS v3, that has been<br />
sent to the foundry in November 2001 and will be tested starting from<br />
February 2002.<br />
As a <strong>compression</strong> block, CARLOS v3 only hosts the simple encoding<br />
scheme conceived as the final part <strong>of</strong> the 2D algorithm. Nevertheless if<br />
CARLOS v3 proves to be perfectly working, it will be used to acquire<br />
<strong>data</strong> in the test beams and will allow us to build and test the foreseen<br />
readout architecture.<br />
4.5 CARLOS v3 building blocks<br />
Fig. 4.7 shows the main building blocks <strong>of</strong> CARLOS v3. The complete<br />
design <strong>of</strong> CARLOS v3 has been carried out in <strong>Bologna</strong>: I have worked<br />
on the VHDL models, while other people worked on the C++ models<br />
<strong>of</strong> the same blocks. Each block has been designed both in VHDL and<br />
C++, so to allow an easy verification and debugging process.<br />
The main two processing channels are the ones with encoderbo, barrel15,<br />
fifonew32x15 and the outmux blocks: theseblockstake<strong>data</strong><br />
coming from the AMBRA chips, encode them using a lossless <strong>compression</strong><br />
algorithm, pack them into 15-bit words and store them in a FIFO<br />
memory before sending them in output to the GOL chip one channel<br />
after the other.<br />
103
104<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
Figure 4.7: CARLOS v3 building blocks
4.5 — CARLOS v3 building blocks<br />
The channel containing the ttc-rx-interface and fifo-trigger15x12 receives<br />
trigger numbers (bunch counter and event counter) from the<br />
TTCrx chip and sends them in output at the beginning <strong>of</strong> each <strong>data</strong><br />
packet. The event-counter block is a local event number generator providing<br />
a further information to be added to the event number coming<br />
from the TTCrx chip: this gives us a greater confidence <strong>of</strong> being able to<br />
reconstruct <strong>data</strong> and to find errors if present. Then a trigger-interface<br />
block handles the trigger signals L0, L1 and L2 coming from the Central<br />
Trigger Processor (CTP) through the TTCrx chip. A Command<br />
Mode Control Unit (CMCU ) receives commands issued through the<br />
JTAG port and puts CARLOS in one <strong>of</strong> some logic states: running,<br />
idle, bist and so on. Finally the BIST blocks on chip are based on a<br />
pseudo-random pattern generator and a signature maker circuit. Next<br />
paragraph contain a detailed description <strong>of</strong> these blocks.<br />
4.5.1 The channel block<br />
The channel block is the main processing unit contained in CARLOS<br />
for <strong>data</strong> encoding, packing and storing. It is composed by three blocks:<br />
encoderbo, barrel15 and fifonew32x15. Two identical channel blocks<br />
are hosted on CARLOS v3.<br />
4.5.2 The encoder block<br />
The I/O signals are:<br />
– value: input 8-bit bus;<br />
– value-strobe: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– <strong>data</strong>: output 10-bit bus;<br />
– field: output 4-bit bus;<br />
105
106<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
Input range Output code Total<br />
0-1 1 bit + 000 4bits<br />
2-3 1 LSB bit + 001 4bits<br />
4-7 2 LSB bits + 010 5bits<br />
8-15 3 LSB bits + 011 6bits<br />
16-31 4 LSB bits + 100 7bits<br />
32-63 5 LSB bits + 101 8bits<br />
64-127 6 LSB bits + 110 9bits<br />
128-255 7 LSB bits + 111 10 bits<br />
Table 4.1: Lossless <strong>compression</strong> algorithm encoding scheme<br />
– valid: output signal.<br />
The encoderbo block encodes 8-bit input <strong>data</strong> in variable length codes<br />
in the range from 4 to 10 bits long in a completely lossless way. Table<br />
4.1 contains a detailed description <strong>of</strong> the encoding mechanism. This<br />
encoding scheme provides a <strong>compression</strong> on input <strong>data</strong> based on the<br />
knowledge <strong>of</strong> the statistics <strong>of</strong> the stream: in fact small-value <strong>data</strong> are<br />
much more probable than high-value ones. So far most input <strong>data</strong> will<br />
be reduced from 8 to 4 or 5 bits, providing some degree <strong>of</strong> <strong>compression</strong>.<br />
Indeed it is possible that locally, in time, this compressor may provide<br />
an expansion <strong>of</strong> <strong>data</strong>: in fact if a long sequence <strong>of</strong> values greater than<br />
127 occur, the encoderbo block provides as output a stream <strong>of</strong> 10-bit<br />
<strong>data</strong>, that have to be temporarily stored in a FIFO buffer. Here is<br />
a description <strong>of</strong> how the block actually works: when the input signal<br />
value-strobe is high, the 8-bit input value is encoded in the 10-bit output<br />
<strong>data</strong> and the valid output signal is asserted. The field output signal<br />
is assigned the number <strong>of</strong> bits actually containing information in the<br />
10-bit <strong>data</strong> register. The block is synchronous with the rising edge <strong>of</strong><br />
the clock, while the reset signal is active high and asynchronous.
4.5 — CARLOS v3 building blocks<br />
Figure 4.8: Graphical description <strong>of</strong> how the barrel shifter works<br />
4.5.3 The barrel15 block<br />
The I/O signals are:<br />
– input: input 8-bit bus;<br />
– sel: input 4-bit bus;<br />
– load: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– end-trace: input signal;<br />
– output-push: output signal;<br />
– output: output 15-bit bus.<br />
The barrel15 is the block packing the 4 to 10 bits variable length codes<br />
coming from the encoderbo block to a fixed length 15-bit word. Data<br />
are packed as shown in Figure 4.8. The barrel block makes use <strong>of</strong> two<br />
internal 15-bit registers, so to be able to break an input <strong>data</strong> in two<br />
pieces without losing any information: when the first word is put in<br />
output by putting the output signal output-push low, the second word<br />
is used to store the input <strong>data</strong>. The latency <strong>of</strong> the barrel block is <strong>of</strong><br />
107
108<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
2 clock periods: it means that it takes 2 clock periods before a word<br />
is packed by the barrel15 block. When the input signal end-trace is<br />
asserted, meaning that this is the last <strong>data</strong> belonging to the current<br />
event, the current value in the internal register is put in output even if<br />
it is not completely full: not defined bits are put to 0.<br />
Data coming from the barrel can be easily reconstructed by starting<br />
from the 3 LSBs <strong>of</strong> the first barrel word containing the information <strong>of</strong><br />
how many bits have to be selected on the left side <strong>of</strong> the code. By<br />
going on in this way from the LSB to the MSB <strong>of</strong> every valid word, it<br />
is possible to retrieve all the encoded information.<br />
4.5.4 The fifonew32x15 block<br />
The I/O signals are:<br />
– push-req-n: input signal;<br />
– pop-req-n: input signal;<br />
– diag-n: input signal;<br />
– <strong>data</strong>-in: input 15-bit bus;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– empty: output signal;<br />
– almost-empty: output signal;<br />
– half-full: output signal;<br />
– almost-full: output signal;<br />
– full: output signal;<br />
– error: output signal;<br />
– <strong>data</strong>out: output 15-bit bus.<br />
The fifonew32x15 block has the purpose <strong>of</strong> storing information coming<br />
out from the barrel shifter. The multiplexing scheme that has been
4.5 — CARLOS v3 building blocks<br />
chosen cannot avoid the use <strong>of</strong> buffers before the multiplexer: in fact<br />
since the output <strong>data</strong> is fairly allocated 50 % <strong>of</strong> the time to both channels<br />
(one clock period for channel 0, the next clock period for channel<br />
1 and so on) and since the encoding algorithm can locally, in time, behave<br />
as an expansor, <strong>data</strong> has to be locally stored before multiplexing.<br />
The only decision that has to be taken is about FIFO dimensions: we<br />
have chosen a FIFO containing 32 words coming from the barrel shifter<br />
(32x15 bits) in order to take into account the worst possible input <strong>data</strong><br />
stream. The problem we have faced designing the FIFO block is the<br />
following one: a FIFO is usually composed <strong>of</strong> a dual port RAM block<br />
plus some logic for <strong>implementation</strong> <strong>of</strong> the First In First Out phylosophy.<br />
This is for example what has been done in CARLOS v2. Nevertheless<br />
the CERN library 0.25 µm only provides one size <strong>of</strong> RAM memories,<br />
that is 64x32 bits size. This block is at least 4 times bigger than the<br />
block dimensions we need (2048 bits versus 480). Beside that it is quite<br />
difficult, if not impossible, to share the same RAM block between two<br />
different FIFO designs: the idea to share the FIFOs <strong>of</strong> the two channels<br />
is quite difficult to implement since the number <strong>of</strong> read/write ports has<br />
to be doubled. So far we decided to design a flip-flop based RAM for<br />
the FIFO taken from the “Designer Foundation” library provided together<br />
with our design s<strong>of</strong>tware Synopsys. This is a library containing<br />
IP (Intellectual Property) blocks ready to be inserted into a design such<br />
as logic and arithmetic blocks, RAMs and application-specific blocks,<br />
for instance for error checking and correction or for a JTAG controller.<br />
The idea is: it is completely useless that every ASIC designer loses<br />
time while designing a block that is necessary to hundreds <strong>of</strong> other designers<br />
in all over the world. With this idea in mind, many IP libraries<br />
have been collected such as the one provided by Synopsys we have been<br />
making use <strong>of</strong>.<br />
This is the behavior <strong>of</strong> the fifonew32x15 block: a push is executed when<br />
the push-req-n input is asserted (low) and either the full flag is inactive<br />
(low)orthefull flag is active and the pop-req-n input is asserted (low).<br />
So far a push can occur even if the FIFO is full, as long as a pop is<br />
109
110<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
executed in the same cycle period. Asserting push-req-n in either <strong>of</strong><br />
the above cases causes the <strong>data</strong> at the <strong>data</strong>-in port to be written to<br />
the next available location in the FIFO. A pop operation occurs when<br />
pop-req-n is asserted (LOW), as long as the FIFO is not empty. Asserting<br />
pop-req-n causes the internal read pointer to be incremented on<br />
the next rising edge <strong>of</strong> ck. Thus the RAM read <strong>data</strong> must be captured<br />
on the ck following the assertion <strong>of</strong> pop-req-n. Push and pop can occur<br />
at the same time if there is <strong>data</strong> in the FIFO, even when the FIFO is<br />
full. In this case first the pop <strong>data</strong> is captured by the next stage <strong>of</strong><br />
logic after the FIFO and then the new <strong>data</strong> is pushed into the same<br />
location from which the <strong>data</strong> was popped. So far there is no conflict in<br />
a simultaneous push and pop when the FIFO is full. A simultaneous<br />
push and pop cannot occur when the FIFO is empty since there is no<br />
pop<strong>data</strong>toprefetch.<br />
The FIFO block contains some important flags such as empty, almostfull,<br />
full. Theempty flag indicates that there are no words in the FIFO<br />
availabletobepopped. Thealmost-full flag is asserted when there<br />
are no more than 8 empty locations left in the FIFO. This number is<br />
used as a threshold and is very useful for preventing the FIFO from<br />
overflowing. When this flag is asserted the <strong>data</strong>-stop signal, output<br />
from CARLOS, is sent to the AMBRA chip asking to stop the <strong>data</strong><br />
stream transmission. AMBRA requires 3 clock cycles before it actually<br />
stops sending <strong>data</strong> to CARLOS. So far the threshold level 8 chosen<br />
for the FIFO design has to take into account for these 3 clock periods<br />
delay due to AMBRA and for the latency due to the encoder and barrel<br />
blocks. So far this flag is very useful for managing <strong>data</strong> transmission<br />
between AMBRA and CARLOS without losing any <strong>data</strong>. The last flag<br />
full indicates that the FIFO is full and there is no space available for<br />
pushing <strong>data</strong>. If AMBRA - CARLOS communication works well this<br />
flag should never be asserted. Fig. 4.9 shows the FIFO timing waveforms<br />
during the push phase, while Fig. 4.10 shows the FIFO timing<br />
waveforms during the pop phase.
4.5 — CARLOS v3 building blocks<br />
Figure 4.9: FIFO timing waveforms during the push phase<br />
Figure 4.10: FIFO timing waveforms during the pop phase<br />
4.5.5 The channel-trigger block<br />
The channel-trigger block has the purpose <strong>of</strong> getting trigger numbers<br />
from the TTCrx chip and store them before they are multiplexed and<br />
sent to the GOL chip. It is composed by two different blocks: the<br />
111
112<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
ttc-rx-interface and the fifo-trigger block.<br />
4.5.6 The ttc-rx-interface block<br />
The I/O signals are:<br />
– TTCready: input signal;<br />
– BCnt: 12-bit input bus;<br />
– BCntLStr: input signal;<br />
– EvCntLStr: input signal;<br />
– EvCntHStr: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– BCnt-reg: output 12-bit bus;<br />
– EvCntL-reg: output 12-bit bus;<br />
– EvCntH-reg: output 12-bit bus.<br />
The ttc-rx-interface block receives trigger information from the TTCrx<br />
chip when the input signal TTCready coming from the TTCrx chip<br />
is high, meaning that the TTCrx is ready. When BCntStr is high,<br />
the 12-bit input word is fetched in the register BCnt-reg, the same for<br />
EvCntLStr and EvCntHStr for the MSB and LSB <strong>of</strong> the 24-bit word<br />
event counter. Following a L2accept signal active the values <strong>of</strong> these<br />
three registers are written into 3 memory locations <strong>of</strong> the fifo-trigger<br />
block. Since the event can be discarded until the final confirmation<br />
arrives through signal L2accept it is necessary to wait for such a signal<br />
before storing them in the FIFO.<br />
4.5.7 The fifo-trigger block<br />
This block is logically equivalent to the FIFO block except for what<br />
concerns dimensions: its size is 15x12 words. During the transmission
4.5 — CARLOS v3 building blocks<br />
<strong>of</strong> a complete event from AMBRA to CARLOS lasting for 1.6 ms, up<br />
to four events can be stored in the AMBRA chip, so far CARLOS has<br />
to process 4 triplets <strong>of</strong> incoming signals L0, L1accept and L2accept.<br />
Thus a 15 words deep FIFO is necessary for storing bunch counter and<br />
event counter information concerning 5 consecutive accepted events.<br />
When CARLOS is ready to send a <strong>data</strong> packet in output, the first 3<br />
trigger words are read and taken to the outmux block. So far a correct<br />
synchronization between <strong>data</strong> being sent and trigger information is preserved.<br />
Output flags from the fifo-trigger block empty, almost-full and<br />
full are not used by other blocks as a control since we do not expect to<br />
have a buffer overflow due to the structure <strong>of</strong> the AMBRA chip.<br />
4.5.8 The event-counter block<br />
The I/O signals are:<br />
– end-trace: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– event-id: output 3-bit bus.<br />
A local event counting is performed on CARLOS thanks to the eventcounter<br />
block. It is a very simple 3-bit counter triggered by the eventident<br />
signal coming from the outmux block: this signals asserts that an<br />
event has been completely transmitted and a new one can be accepted.<br />
This number is used both in the header and in the footer words for a<br />
safer transmission protocol.<br />
4.5.9 The outmux block<br />
The I/O signals are:<br />
– indat1 : input 15-bit bus;<br />
113
114<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
– indat0 : input 15-bit bus;<br />
– trigger-<strong>data</strong>: input 12-bit bus;<br />
– reset: input signal;<br />
– ck: input signal;<br />
– gol-ready: input signal;<br />
– fifo-empty: input 2-bit bus;<br />
– half-ladder-id: input 7-bit bus;<br />
– all-fifos-empty: input signal;<br />
– event-id: input 3-bit bus;<br />
– no-input-<strong>data</strong>: input signal;<br />
– event-identifier: output signal;<br />
– read-<strong>data</strong>: output 2-bit bus;<br />
– read-trigger: output signal;<br />
– output-strobe: output signal;<br />
– output: output 16-bit bus.<br />
The outmux block is a multiplexing unit for sending in output <strong>data</strong><br />
coming from the two main processing channels in an interlaced way,<br />
meaning that during the even clock periods <strong>data</strong> coming from channel<br />
1 are put in output, while during the odd clock periods <strong>data</strong> coming<br />
from channel 0 are served.<br />
This is the way the outmux block behaves: as soon as <strong>data</strong> begin to fill<br />
the two FIFO blocks the outmux block begins to put in output a packet<br />
like the one shown in Fig. 4.11. The first 3 16-bit words contain trigger<br />
informations coming from the trigger channel, the first word contains<br />
the bunch counter, while second and third word contain event counter<br />
MSBs and LSBs respectively. Since trigger informations are 12-bit long<br />
they are added the bits 1011 as MSBs in order to be able to recognize<br />
them easily in a later phase <strong>of</strong> <strong>data</strong> reconstruction.<br />
Follow two header words containing the local event-id number and the
4.5 — CARLOS v3 building blocks<br />
Figure 4.11: CARLOS v3 <strong>data</strong> transmission protocol<br />
externally hardwired information half-ladder-id. The MSBs from the<br />
header word are 110.<br />
Headers are followed by an even number <strong>of</strong> <strong>data</strong> words containing <strong>data</strong><br />
from the two main channels: if a channel has not valid <strong>data</strong> to send,<br />
the MSB is put to 1 and all the other bits are set to 0, meaning that a<br />
dummy <strong>data</strong> is sent in output, otherwise the MSB is set to 0 meaning<br />
that the <strong>data</strong> word is valid.<br />
The <strong>data</strong> packet is then concluded with the transmission <strong>of</strong> two footer<br />
words containing the same information <strong>of</strong> the header regarding the<br />
event-id number and the number <strong>of</strong> words being sent in output. The<br />
MSBs are set to 1, so to uniquely identify the footer word type.<br />
The outmux block puts in output the 16-bit <strong>data</strong> words and the signal<br />
output-strobe. When this signal is high, CARLOS is transmitting <strong>data</strong><br />
belonging to a packet, while when low CARLOS is not sending useful<br />
115
116<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
information to the GOL chip. When the gol-ready signal coming from<br />
the GOL chip goes low, meaning that it has lost synchronization with<br />
the input clock, CARLOS stops sending <strong>data</strong> and begins transmission<br />
again only when gol-ready goes high. The outmux block also puts in<br />
output the 2-bit signal read-<strong>data</strong> that is sent in input to the 2 main<br />
FIFOsasapop signal and the signal read-trigger sent to the FIFOtrigger<br />
block. The block outmux also asserts the signal event-ident,that<br />
is used as a trigger for the event-counter block. The input signal allfifos-empty<br />
is a signal that puts an end to the <strong>data</strong> packet transmission<br />
since the end <strong>of</strong> an event has been reached: in fact after the occurrence<br />
<strong>of</strong> the input signals <strong>data</strong>-end1 and <strong>data</strong>-end0 high values, CARLOS<br />
waits until both FIFOs get empty in order to assert the all-fifo-empty<br />
signal. This triggers the end <strong>of</strong> an event transmission.<br />
4.5.10 The trigger-interface block<br />
The I/O signals are:<br />
– reference-count-trigger: input 8-bit bus;<br />
– L0 : input signal;<br />
– L1accept: input signal;<br />
– L2accept: input signal;<br />
– L2reject: input signal;<br />
– dis-trigger: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– busy: output signal;<br />
– trigger: output signal;<br />
– abort: output signal.<br />
This block accepts as inputs the trigger signals L0, L1accept, L2accept<br />
and L2reject. Follows a brief description <strong>of</strong> how these signals can be
4.5 — CARLOS v3 building blocks<br />
used for accepting or rejecting an event for storage: the L0 signal is<br />
asserted 1.2 µs after the interaction; L1accept signal is asserted 5.5 µs<br />
after the interaction, if it is not asserted in time the event is rejected;<br />
L2accept is asserted after 100 µs from the interaction if the event is<br />
accepted, otherwise a L2reject signal is asserted before 100 µs. It means<br />
that either a L2accept signal or a L2reject signal is asserted.<br />
The trigger-interface block receives these inputs, processes them to<br />
build 3 other signals: trigger, busy and abort. The trigger signal is<br />
L0 delayed <strong>of</strong> a quantity <strong>of</strong> clock cycles programmable via JTAG and<br />
is distributed to the PASCAL and AMBRA chips. This is the signal<br />
triggering an event <strong>data</strong> acquisition on the PASCAL chip.<br />
The busy signal is asserted just after L0, then waits in the active state<br />
until 5.5 µs after the interaction. If the signal L1accept is not asserted,<br />
then busy goes low again, otherwise it stays active until the signal<br />
dis-trigger coming from AMBRA is activated. The meaning is the<br />
following: until PASCAL is transferring <strong>data</strong> to AMBRA the readout<br />
system is not ready to accept any other trigger signals, that is to acquire<br />
any other <strong>data</strong>. The time necessary for the transmission <strong>of</strong> an event<br />
from PASCAL to AMBRA is about 360 µs. Finally the abort signal<br />
that CARLOS sends to AMBRA is asserted when the L1accept signal is<br />
not asserted at the prefixed time or when the L2reject signal is asserted.<br />
The abort signal causes <strong>data</strong> transmission from PASCAL to AMBRA<br />
to end and <strong>data</strong> already stored are discarded.<br />
4.5.11 The cmcu block<br />
The I/O signals are:<br />
– tdi: input signal;<br />
– tms: input signal;<br />
– trst: input signal;<br />
– tck: input signal;<br />
117
118<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
Figure 4.12: CMCU logic state diagram<br />
– bist-ok-tcked: input signal;<br />
– bist-failure-tcked: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– reference-count-trigger: output 8-bit bus;<br />
– tdo: output signal;<br />
– state-tcked: output signal;<br />
– reset-pipe: output signal.<br />
The Command Mode Control Unit (cmcu) is CARLOS internal control<br />
unit remotely controlled via the JTAG port. Serial <strong>data</strong> coming from<br />
the JTAG pin tdi are packed into 8-bit words and interpreted as a very<br />
simple program containing commands and operands. Fig. 4.12 shows<br />
CARLOS working states reachable using the JTAG port.<br />
At power-on CARLOS is put in an IDLE state in which no calculation<br />
is performed. Then it can be put is a RESET-PIPELINE state in which
4.5 — CARLOS v3 building blocks<br />
an internal reset signal is asserted and all registers are initialized. The<br />
following state is the BIST (Built In Self Test) state in which CARLOS<br />
runs an internal test at working speed to check if everything is working<br />
fine or not, then depending on the test results CARLOS enters the<br />
BIST-FAILURE state or BIST-SUCCESS state. In case <strong>of</strong> success the<br />
8-bit word sent serially as output on tdo is A0, otherwise the word is<br />
55. In the state WRITE-REG CARLOS prepares to write an internal<br />
register with the value read via JTAG in the next state WRITE-REG-<br />
FETCH: this register contains the number <strong>of</strong> clock cycles <strong>of</strong> delay to be<br />
applied to the incoming L0 signal before passing it to the AMBRA chip.<br />
If needed, during the READ-REG stage the CARLOS user can read<br />
this value to check that no errors occurred during the writing phase<br />
by means <strong>of</strong> the tdo output JTAG pin. Then CARLOS can finally<br />
enter the RUNNING stage in which it is able to accept and process<br />
input <strong>data</strong> streams and to manage the interfaces towards the GOL and<br />
TTCrx chips. When CARLOS is not in RUNNING mode the busy<br />
signal is set high, meaning that no L0 trigger signal is accepted from<br />
the CTP and no <strong>data</strong> is transmitted to the GOL chip.<br />
4.5.12 The pattern-generator block<br />
The I/O signals are:<br />
– bist-start: input signal;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– <strong>data</strong>: output 8-bit bus;<br />
– <strong>data</strong>-valid: output signal;<br />
– <strong>data</strong>-end: output signal.<br />
The pattern generator block is part <strong>of</strong> the BIST utility implemented<br />
on CARLOS v3. The BIST [21, 22] is an in-circuit testing scheme for<br />
digital circuits in which both test generation and test verification are<br />
119
120<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
done by circuitry built into the chip itself. BIST schemes <strong>of</strong>fer three<br />
attractive advantages:<br />
1. they <strong>of</strong>fer a solution to the problem <strong>of</strong> testing large integrated<br />
circuits with limited number <strong>of</strong> I/O pins;<br />
2. they are useful for high speed testing since they can run at design<br />
speed;<br />
3. they do not require expensive external automatic test equipment<br />
(ATE).<br />
BIST schemes, in the most general sense, can have any <strong>of</strong> the following<br />
characteristics:<br />
– concurrent or non-concurrent operation: concurrent testing is designed<br />
to detect faults during normal circuit operation, while nonconcurrent<br />
testing requires that normal operation be suspended<br />
during testing. In CARLOS v3 non-concurrent operation has been<br />
chosen since we decided to use BIST only to check the correct behavior<br />
<strong>of</strong> the chip when <strong>of</strong>f-line.<br />
– exhaustive or non-exhaustive test design: an exhaustive test <strong>of</strong> a<br />
circuit requires that every intended state <strong>of</strong> circuit be shown to<br />
exist and that all transitions be demonstrated. For large sequential<br />
circuits as CARLOS this is not practical, so we decided to<br />
implement a non-exhaustive testing design.<br />
– deterministic or pseudo-random generation <strong>of</strong> test vectors: deterministic<br />
testing occurs when specific produced vectors have to be<br />
applied, while pseudorandom testing occurs when random-like test<br />
vectors are produced. We chose the pseudo-random generation<br />
since its <strong>implementation</strong> requires much less area than the deterministic<br />
generation. Pseudo-random generation on CARLOS v3<br />
is performed by the pattern generator block.<br />
The pattern generator block provides a set <strong>of</strong> 200 pseudo-random test<br />
vectors for BIST. These vectors are provided at the same time to both
4.5 — CARLOS v3 building blocks<br />
processing channels. The pseudo-random sequence is obtained using<br />
a linear feed-back shift register, that is a very simple structure and it<br />
requires a very small on-chip area.<br />
4.5.13 The signature-maker block<br />
The I/O signals are:<br />
– bist-vector: input 16-bit bus;<br />
– ck: input signal;<br />
– reset: input signal;<br />
– bist-strobe: output signal;<br />
– signature: output 16-bit bus.<br />
The signature maker block performs the signature analysis. In signature<br />
analysis, the test responses <strong>of</strong> a system are compacted into a<br />
signature using a linear feedback shift register (LFSR). Then the signature<br />
<strong>of</strong> the device under test is compared with the expected (reference)<br />
signature. If they both match, the device is declared fault free, otherwise<br />
it is declared faulty. Since several thousands <strong>of</strong> test responses are<br />
compacted into a few bits <strong>of</strong> signature by a LFSR, there is an information<br />
loss. As a result some faulty devices may have the same correct<br />
signature. The probability <strong>of</strong> a faulty device having the same signature<br />
<strong>of</strong> a working device is called the probability <strong>of</strong> aliasing. The probability<br />
<strong>of</strong> aliasing is shown to be approximately 2−m ,wheremdenotes the<br />
number <strong>of</strong> bits in the signature.<br />
The signature register implemented on CARLOS is 16 bits wide, so the<br />
probability <strong>of</strong> aliasing is 2−16 . The signature maker block takes the<br />
16-bit bist-vector word coming from the outmux block, performs the<br />
signature analysis, then, when the FIFO have been completely emptied,<br />
asserts the bist-strobe signal when the signature value is ready.<br />
121
122<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
Figure 4.13: Digital design flow for CARLOS v3<br />
4.6 Digital design flow for CARLOS v3<br />
Fig. 4.13 shows in some details the digital design flow we have used for<br />
the design <strong>of</strong> CARLOS v3 with the CERN library 0.25 µm. Since it is<br />
quite a recent library, we had to face some problems: for instance the<br />
small number <strong>of</strong> standard cells, the lack <strong>of</strong> 3-state buffers, the lack <strong>of</strong><br />
worst-case cell models, the fact that only Verilog models for cells and<br />
not VHDL models were provided and so on.<br />
The reason for these lacks has to be searched in the fact that up to now<br />
very few chips have been realized and tested using this library, so not<br />
so much characterization work could be done.<br />
So far we had to learn how to use the s<strong>of</strong>tware Cadence Verilog XL for
4.7 — CARLOS layout features<br />
post-synthesis simulations, since Synopsys allows to simulate VHDL<br />
models only. Our main difficulty was due to the necessity <strong>of</strong> using<br />
VHDL-written testbenches for logic simulation and Verilog-written ones<br />
for netlist simulation: this can be very error-prone since it is quite difficult<br />
to exactly match the two models together.<br />
Beside that we had to learn how to use Cadence Silicon Ensemble for<br />
the place and route job. This is really a very difficult job when the<br />
standard cells are not completely characterized. We received a great<br />
help from Marchioro group especially for what concerns the back-end<br />
design flow. They suggested us to follow a completely flat approach to<br />
the problem since the chip is very small: the hierarchical approach, i.e.<br />
design the layout <strong>of</strong> each block and then route them together is only<br />
worthy when dealing with chip complexities one order <strong>of</strong> magnitude<br />
greater then ours.<br />
4.7 CARLOS layout features<br />
Fig. 4.14 shows a picture <strong>of</strong> the final layout <strong>of</strong> CARLOS v3, as it has<br />
been sent to the foundry. As one can easily observe it is pad-limited,<br />
i.e. the total silicon surface is due to the number <strong>of</strong> I/O pads (100)<br />
and not to the number <strong>of</strong> standard cells it contains. Adding some extra<br />
logic would not imply any additional cost if contained in the area that<br />
is now empty. So far we hope that adding the 2D <strong>compression</strong> logic will<br />
not substantially increase the chip area and, consequently, production<br />
cost. The total area is 16 mm2 corresponding to the minimal size the<br />
silicon wafer was divided into.<br />
CARLOS v3 is fairly a very simple chip if compared to CARLOS v2<br />
with its 300 kgates <strong>of</strong> logical complexity: in fact it contains only 10<br />
Kgates. Nevertheless it has been designed in order to test our approach<br />
to the new library and to verify that we were able to run through all<br />
the design flow steps. Our final check will be the test <strong>of</strong> the chip itself<br />
in order to verify that everything was correctly designed, so to have<br />
123
124<br />
2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />
Figure 4.14: CARLOS v3 layout picture<br />
very clear ideas for the design <strong>of</strong> the final version <strong>of</strong> CARLOS.<br />
A specific PCB is in the design phase right now: it will contain only<br />
the connectors for probing with the Tektronics pattern generator and<br />
logic analyzer pods and the chip itself. Differently from CARLOS v2,<br />
the chip will be bonded into a PGA package and inserted on the PCB<br />
using a ZIF socket. This will allow us to test the 100 samples <strong>of</strong> the<br />
chip by using only a few PCB samples.
Chapter 5<br />
Wavelet based <strong>compression</strong><br />
algorithm<br />
As an alternative to the 1D and 2D <strong>compression</strong> algorithms conceived<br />
at the <strong>INFN</strong> Section <strong>of</strong> Torino, our group in <strong>Bologna</strong> decided to study<br />
other <strong>compression</strong> algorithms that may be used as a second level compressor<br />
on SDD <strong>data</strong>. After studying the main standard <strong>compression</strong><br />
algorithms, we decided to focuse on a wavelet-based <strong>compression</strong> algorithm<br />
and its performances when used to compress SDd <strong>data</strong>.<br />
The wavelet based <strong>compression</strong> algorithm design can be divided in 4<br />
steps, requiring the use <strong>of</strong> different s<strong>of</strong>tware tools:<br />
1. choice <strong>of</strong> the algorithm main features;<br />
2. optimization <strong>of</strong> the algorithm with respect to SDD <strong>data</strong> using the<br />
Matlab Wavelet Toolbox [23];<br />
3. choice <strong>of</strong> the architecture for the <strong>implementation</strong> <strong>of</strong> the algorithm<br />
using Simulink [24];<br />
4. comparison between the wavelet algorithm performances and the<br />
ones implemented on CARLOS prototypes, in terms <strong>of</strong> <strong>compression</strong><br />
ratio and reconstruction error.<br />
125
126<br />
Wavelet based <strong>compression</strong> algorithm<br />
5.1 Wavelet based <strong>compression</strong> algorithm<br />
The idea <strong>of</strong> compressing SDD <strong>data</strong> using a multiresolution based <strong>compression</strong><br />
algorithm comes from the growing success <strong>of</strong> this technique,<br />
both for uni-dimensional and bi-dimensional signal <strong>compression</strong>.<br />
Multiresolution analysis gives an equivalent representation <strong>of</strong> an input<br />
signal in terms <strong>of</strong> approximation and detail coefficients; these coefficients<br />
can then be encoded using standard techniques, such as run<br />
length encoding.<br />
An SDD event, i.e. <strong>data</strong> coming from a half-SDD, can be analyzed as<br />
a unidimensional <strong>data</strong> stream <strong>of</strong> 64k samples or as a bi-dimensional<br />
structure <strong>of</strong> 256 by 256 elements. So far the first choice we have to<br />
take is whether implementing a 1D or 2D multiresolution analysis.<br />
In 1D analysis the signal can be written as:<br />
S =<br />
⎛<br />
⎝s1,s2,... ,s256<br />
<br />
1o ,s257,s258,... ,s512<br />
<br />
anode<br />
2o ,... ,s65281,s65282,... ,s65536<br />
<br />
anode<br />
256o anode<br />
In 2D analysis the signal can be written as:<br />
⎛<br />
⎜<br />
S = ⎜<br />
⎝<br />
s1,1 s1,2 ... s1,256<br />
s2,1 s2,2 ... s2,256<br />
.<br />
.<br />
. ..<br />
s256,1 s256,2 ... s256,256<br />
.<br />
⎞<br />
⎟<br />
⎠<br />
1 o anode<br />
2 o anode<br />
.<br />
256 o anode<br />
⎞<br />
⎠<br />
(5.1)<br />
(5.2)<br />
In the case <strong>of</strong> 1D analysis, once chosen the two decomposition filters<br />
H and G, the multiresolution analysis can be applied with a number<br />
<strong>of</strong> levels, that is the number <strong>of</strong> cascadable filters, between 1 and 16.<br />
So far an orthogonal wavelet decomposition C with 64k coefficients is<br />
produced: the ratio <strong>of</strong> the approximation coefficients ai number to the<br />
detail coefficients di number depends on the number <strong>of</strong> decomposition
levels used:<br />
<br />
5.1 — Wavelet based <strong>compression</strong> algorithm<br />
S = s1,.... ............................ ,s65536<br />
⎛<br />
⎞<br />
<br />
0 decomposition levels<br />
⎝a1,......... ,a32768,d32769,.........<br />
,d65536⎠<br />
1 decomposition level<br />
C =<br />
<br />
⎛ coeffs. app.<br />
coeffs. dett. ⎞<br />
⎝a1,...... ,a16384,d16385,............<br />
,d65536⎠<br />
2 decomposition levels<br />
C =<br />
<br />
⎛ coeffs. app.<br />
coeffs. dett. ⎞<br />
C =<br />
⎝a1,... ,a8192,d8193,................<br />
,d65536⎠<br />
3 decomposition levels<br />
⎛<br />
<br />
coeffs. app.<br />
<br />
coeffs. dett.<br />
.<br />
,d5,................... ,d65536⎠<br />
14 decomposition levels<br />
C = ⎝a1,a2,a3,a4<br />
<br />
⎛ coeffs. app.<br />
coeffs. dett. ⎞<br />
C = ⎝ a1,a2<br />
<br />
⎛<br />
C =<br />
coeffs. app.<br />
⎝ a1<br />
<br />
coeff. app.<br />
,d3,...................... ,d65536⎠<br />
15 decomposition levels<br />
⎞<br />
<br />
coeffs. dett.<br />
⎞<br />
,d2,...................... ,d65536⎠<br />
16 decomposition levels<br />
<br />
coeffs. dett.<br />
In the case <strong>of</strong> 2D analysis, once chosen the two decomposition filters<br />
H and G, the bi-dimensional decomposition scheme is applied with a<br />
number <strong>of</strong> levels to be chosen between 1 and 8. First, multiresolution<br />
analysis is applied to each row <strong>of</strong> the 2D signal, then each column resulting<br />
from the previous analysis is decomposed using the same number<br />
<strong>of</strong> levels.<br />
So far the 2D signal (5.2) is transformed into the 2D orthogonal wavelet<br />
decomposition, containing 64k coefficients; even in this case the ratio<br />
<strong>of</strong> the approximation coefficients number to detail coefficients number<br />
.<br />
127
128<br />
Wavelet based <strong>compression</strong> algorithm<br />
depends on the decomposition levels applied:<br />
⎛<br />
S =<br />
⎜<br />
⎝<br />
s1,1 ......................... s1,256<br />
.<br />
s256,1 ......................... s256,256<br />
.<br />
⎞<br />
⎟<br />
⎠<br />
⎛<br />
⎞<br />
a1,1<br />
⎜ .<br />
⎜ a128,1<br />
C = ⎜ d129,1<br />
⎜<br />
⎝ .<br />
...<br />
...<br />
...<br />
a1,128<br />
.<br />
a128,128<br />
d129,128<br />
.<br />
d1,129<br />
.<br />
d128,129<br />
d129,129<br />
.<br />
...<br />
...<br />
...<br />
d1,256<br />
⎟<br />
.<br />
⎟<br />
d128,256 ⎟<br />
d129,256 ⎟<br />
. ⎟<br />
⎠<br />
d256,1 ... d256,128 d256,129 ... d256,256<br />
⎛<br />
⎜<br />
C = ⎜<br />
⎝<br />
⎛<br />
⎜<br />
C = ⎜<br />
⎝<br />
.<br />
a1,1 a1,2 d1,3 ....... .... d1,256<br />
a2,1 a2,2 d2,3 ....... .... d2,256<br />
d3,1 d3,2 d3,3 ....... .... d3,256<br />
.<br />
.<br />
.<br />
d256,1 d256,2 d256,3 ....... .... d256,256<br />
a1,1 d1,2 .................. d1,256<br />
d2,1 d2,2 .................. d2,256<br />
.<br />
.<br />
d256,1 d256,2 .................. d256,256<br />
.<br />
.<br />
⎞<br />
⎟<br />
⎠<br />
⎞<br />
⎟<br />
⎠<br />
0 decomposition levels<br />
1 decomposition levels<br />
.<br />
7 decomposition levels<br />
8 decomposition levels<br />
Applying multiresolution analysis to SDD <strong>data</strong> proves to be useful since<br />
approximation coefficients feature high values, since they represent the<br />
signal approximation, while detail coefficients feature values near to 0.<br />
So far, in order to get <strong>compression</strong>, detail coefficients can be eliminated<br />
without losing significant information on the input signal.<br />
An easy and effective technique for compressing <strong>data</strong> after multiresolution<br />
analysis is to put a threshold level over every coefficient ai and
5.2 — Multiresolution algorithm optimization<br />
di. What we expect is that approximation coefficients ai remain unchanged,<br />
while detail coefficients di are all put to 0. This is useful since<br />
the long zero sequences coming from the detail coefficients can be further<br />
compressed using the run length encoding technique.<br />
The multiresolution based <strong>compression</strong> algorithm described so far is a<br />
lossy technique but it can be used in a lossless way without putting the<br />
threshold on wavelet coefficients.<br />
5.1.1 Configuration parameters <strong>of</strong> the multiresolution<br />
algorithm<br />
Some algorithm parameters can be tuned in order to get the best performances<br />
in terms <strong>of</strong> <strong>compression</strong> ratio and reconstruction error. These<br />
parameters are:<br />
– the pair <strong>of</strong> decomposition filters H and G, used to implement the<br />
multiresolution analysis;<br />
– the number <strong>of</strong> dimensions used for the analysis: 1D or 2D;<br />
– the number <strong>of</strong> decomposition levels;<br />
– the threshold value applied to ai and di coefficients.<br />
5.2 Multiresolution algorithm optimization<br />
The multiresolution algorithm optimization has been carried out using<br />
the Wavelet Toolbox from Matlab.<br />
First, the pair <strong>of</strong> decomposition filters that, with a fixed value <strong>of</strong> the<br />
threshold, gives the higher number <strong>of</strong> null coefficients ai and di and the<br />
lower reconstruction error has been chosen; then the other 3 parameters<br />
have been evaluated one after the other for optimization.<br />
129
130<br />
Wavelet based <strong>compression</strong> algorithm<br />
5.2.1 The Wavelet Toolbox from Matlab<br />
The Wavelet Toolbox is a collection <strong>of</strong> functions from Matlab that,<br />
using Matlab line commands and a user-friendly graphical interface,<br />
allows to develop wavelet techniques to be applied to real problems.<br />
In particular the Wavelet Toolbox allowed us to:<br />
– perform the multiresolution analysis <strong>of</strong> a signal and the corresponding<br />
synthesis, using a wide variety <strong>of</strong> decomposition and<br />
reconstruction filters;<br />
– treat signals as uni-dimensional or bi-dimensional;<br />
– analyze signals on a variable number <strong>of</strong> levels;<br />
– apply different threshold levels to the coefficients obtained ai and<br />
di.<br />
The wide choice <strong>of</strong> filters corresponds to the wide number <strong>of</strong> wavelet<br />
families implemented by the Wavelet Toolbox, shown in Tab. 5.1 and<br />
in Fig. 2.10, Fig. 2.11 and Fig. 2.12.<br />
In particular the Haar family is composed by the wavelet function ψ(x)<br />
Family Name identifier<br />
Haar wavelet ’haar’<br />
Daubechies wavelets ’db’<br />
Symlets ’sym’<br />
Coiflets ’coif’<br />
Biorthogonal wavelets ’bior’<br />
Reverse Biorthogonal wavelets ’rbio’<br />
Table 5.1: Wavelet families used for multiresolution analysis<br />
and its corresponding scale function φ(x), already discussed in Chapter<br />
2. On the other side each Daubechies, Symlets e Coiflets family is<br />
composed by more than a pair <strong>of</strong> functions ψ(x) andφ(x): Daubechies<br />
family pairs are named db1, ... , db10, Symlets family pairs are named<br />
sym2, ... , sym8, while Coiflets family pairs are named coif1, ... , coif5.
5.2 — Multiresolution algorithm optimization<br />
Biorthogonal (bior1.1, ... , bior6.8) and Reverse Biorthogonal (rbio1.1,<br />
... , rbio6.8) are composed by quartets <strong>of</strong> functions ψ1(x), φ1(x), ψ2(x)<br />
and φ2(x), where, the first pair is used for decomposition and the second<br />
for reconstruction. Using a particular function <strong>of</strong> the Wavelet Toolbox<br />
which requires the name <strong>of</strong> the pair <strong>of</strong> functions ψ(x) andφ(x) chosen<br />
or the name <strong>of</strong> the quartet ψ1(x), φ1(x), ψ2(x) andφ2(x) when using<br />
Biorthogonal and Reverse Biorthogonal, it is possible to determine the<br />
impulse response representing, respectively, the low pass filter H and<br />
the high pass filter G used for decomposition and the low pass filter H<br />
and high pass filter G, used in the reconstruction stage.<br />
Multiresolution analysis and synthesis are computed as described in<br />
Chapter 3: in particular the analysis step is performed with a convolution<br />
operation between the input signal and the filters H and G,<br />
followed by decimation, while synthesis is performed with up-sampling,<br />
followed by a convolution operation between the signal and the filters<br />
H and G.<br />
5.2.2 Choice <strong>of</strong> the filters<br />
In order to choose the best filters H, G, H and G for SDD <strong>data</strong> <strong>compression</strong>,<br />
10 64-kbytes SDD events have ben analyzed using the Wavelet<br />
Toolbox using the wavelet families shown in Tab.5.1.<br />
Each signal S, interpreted both as unidimensional as in in Fig. 5.1 and<br />
bi-dimensional as in Fig. 5.2, has been processed in the following way:<br />
– after choosing a pair <strong>of</strong> functions ψ(x) andφ(x) or the quartet<br />
ψ1(x), φ1(x), ψ2(x), φ2(x), the corresponding filter coefficients H,<br />
G, H and G have been determined;<br />
– the signal S has been analyzed using the filters H and G obtaining<br />
the decomposition coefficients C;<br />
– a threshold th has been applied to the coefficients C, obtaining<br />
the modified coefficients Cth;<br />
131
132<br />
Wavelet based <strong>compression</strong> algorithm<br />
s<br />
a 5<br />
d 5<br />
d 4<br />
d 3<br />
−50<br />
40<br />
20<br />
d 0<br />
2 −20<br />
−40<br />
d 1<br />
150<br />
100<br />
50<br />
0<br />
30<br />
20<br />
10<br />
10<br />
0<br />
−10<br />
20<br />
0<br />
−20<br />
50<br />
0<br />
20<br />
0<br />
−20<br />
Decomposition at level 5 : s = a5 + d5 + d4 + d3 + d2 + d1 .<br />
1 2 3 4 5 6<br />
Figure 5.1: Uni-dimensional analysis on 5 levels <strong>of</strong> the signal S<br />
– the coefficients Cth have been synthesized into the signal R, using<br />
the filters H and G.<br />
Both in the uni-dimensional and in the bi-dimensional case, the performances<br />
related to <strong>compression</strong> have been quantified using the percentage<br />
P <strong>of</strong> the number <strong>of</strong> null coefficients in Cth, while the performances<br />
related to the reconstruction error have been quantified using the root<br />
mean square error E between the original signal S and the signal R,<br />
obtained after the analysis and synthesis <strong>of</strong> Cth.<br />
In particular, since the total number <strong>of</strong> elements in Cth is 65536, in<br />
the uni-dimensional case, the parameter P can be expressed in the<br />
following way:<br />
P =<br />
100 · (number <strong>of</strong> null coefficients in Cth)<br />
65536<br />
x 10 4<br />
(5.3)<br />
Even the total number <strong>of</strong> elements in S and in R is 65536, so, if si<br />
is the i-th element <strong>of</strong> the uni-dimensional signal S and ri is the i-th
5.2 — Multiresolution algorithm optimization<br />
50<br />
100<br />
150<br />
200<br />
250<br />
Original Image<br />
50 100 150 200 250<br />
Synthesized Image<br />
dwt<br />
idwt<br />
Approximation coef. at level 5<br />
Image Selection<br />
Decomposition at level 5<br />
Figure 5.2: Bi-dimensional analysis on 5 levels <strong>of</strong> the signal S<br />
element <strong>of</strong> R, the parameter E can be expressed in the following way:<br />
<br />
<br />
<br />
E = 1<br />
65536 <br />
(si − ri)<br />
65536<br />
2 (5.4)<br />
In the bi-dimensional case P is calculated in the same way while, naming<br />
si,j as the (i, j)-th element <strong>of</strong> S and ri,j as the (i, j)-th element <strong>of</strong><br />
R, the parameter E can be expressed in the following way:<br />
<br />
<br />
<br />
E = 1 256<br />
256<br />
(si,j − ri,j)<br />
65536<br />
2 (5.5)<br />
i=1<br />
Even if the parameters P and E cannot be directly comparable to<br />
the results obtained in the <strong>compression</strong> algorithms implemented on the<br />
CARLOS prototypes, they give an important indication about the performance<br />
<strong>of</strong> each filter set used during the analysis.<br />
In particular, P gives a rough estimation <strong>of</strong> how much the coefficients<br />
Cth can be compressed using the run length encoding, while E can<br />
i=1<br />
j=1<br />
133
134<br />
Wavelet based <strong>compression</strong> algorithm<br />
be interpreted as the error introduced in the value associated to each<br />
sample coming the SDD. The analysis results related to 10 SDD events<br />
are shown from Tab. 5.2 to Tab. 5.7. In particular, Tab. 5.2 shows<br />
the parameter P and E values related to a 5-level analysis using the<br />
Haar filter, both in 1D and 2D, with a threshold value th variable in<br />
the range 0-25. The other tables show the P and E values obtained<br />
with a 5-level analysis with a threshold th <strong>of</strong> 25 using filters belonging<br />
to Daubechies (Tab. 5.3), Symlets (Tab. 5.4), Coiflets (Tab. 5.5),<br />
Biorthogonal (Tab. 5.6) and Reverse Biorthogonal (Tab. 5.7) families,<br />
in the 1D and 2D cases. The uncertainties ∆P and ∆E have been<br />
reported in terms <strong>of</strong> the respective orders <strong>of</strong> magnitude only, since we<br />
are only looking for an estimation <strong>of</strong> these values.<br />
An intersting feature emerging from Tab. 5.2 is the progressive increase<br />
<strong>of</strong> the values P and E with the increase <strong>of</strong> the threshold values th applied<br />
to the coefficients C.<br />
The trend <strong>of</strong> P is easy to understand considering that, applying the<br />
threshold th to decomposition coefficients C means putting to 0 all coefficients<br />
less than th in absolute value: so far the greater the th value,<br />
the greater the parameter P value.<br />
For what concerns E, the greater the th value, the greater the differences<br />
between Cth and the original C and the distortion introduced.<br />
It is to be noticed that for a value <strong>of</strong> th equal to 0, the parameter<br />
P is 9.12, while the parameter E is 1.26 e-14, that is the percentage<br />
<strong>of</strong> null coefficients in Cth and the reconstruction error are very small.<br />
This is quite easy to understand for what concerns P since, without<br />
a threshold, the only null coefficients are a very small fraction <strong>of</strong> the<br />
total number. For what concerns E, avoiding to modify the coefficients<br />
C with the threshold assures a nearly perfect reconstruction <strong>of</strong> the signal.<br />
The value 1.26 e-14 comes from the finite precision <strong>of</strong> the machine<br />
performing the analysis and synthesis processes.
5.2 — Multiresolution algorithm optimization<br />
Haar<br />
1D 2D<br />
Threshold value th P E P E<br />
0 9.12 1.26 e-14 3.68 2.50 e-14<br />
1 24.68 0.27 22.21 0.28<br />
2 40.01 0.63 42.63 0.75<br />
3 58.60 1.64 56.34 1.19<br />
4 67.08 1.71 67.76 1.67<br />
5 75.56 2.09 75.50 2.09<br />
6 79.87 2.38 80.77 2.44<br />
7 83.56 2.68 84.96 2.77<br />
8 86.71 2.99 88.21 3.08<br />
9 88.82 3.23 90.75 3.36<br />
10 90.70 3.48 92.88 3.63<br />
11 92.21 3.72 94.49 3.87<br />
12 93.20 3.89 95.80 4.08<br />
13 94.16 4.07 96.78 4.26<br />
14 94.81 4.21 97.56 4.42<br />
15 95.33 4.34 98.20 4.57<br />
16 95.72 4.44 98.73 4.71<br />
17 96.03 4.54 99.05 4.80<br />
18 96.20 4.60 99.25 4.86<br />
19 96.41 4.67 99.44 4.93<br />
20 96.54 4.72 99.55 4.97<br />
21 96.62 4.76 99.64 5.01<br />
22 96.69 4.79 99.69 5.03<br />
23 96.73 4.81 99.74 5.05<br />
24 96.76 4.83 99.77 5.07<br />
25 96.79 4.85 99.80 5.09<br />
Table 5.2: Mean values <strong>of</strong> P and E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />
the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />
Haar derived from the Haar wavelet.<br />
135
136<br />
Wavelet based <strong>compression</strong> algorithm<br />
Daubechies<br />
1D 2D<br />
Filters P E P E<br />
db1 96.79 4.85 99.80 5.09<br />
db2 96.75 4.82 99.63 5.08<br />
db3 96.73 4.81 99.54 5.07<br />
db4 96.73 4.81 99.48 5.07<br />
db5 96.72 4.81 99.33 5.07<br />
db6 96.71 4.81 99.27 5.07<br />
db7 96.72 4.82 99.20 5.07<br />
db8 96.70 4.81 99.08 5.08<br />
db9 96.69 4.81 98.98 5.09<br />
db10 96.68 4.80 98.98 5.09<br />
Table 5.3: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />
the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />
Daubechies and a threshold level th equal to 25; the values obtained with<br />
db1 are equivalent to the ones obtained with Haar, since the corresponding<br />
filters are equivalent.<br />
Symlets<br />
1D 2D<br />
Filters P E P E<br />
sym2 96.75 4.82 99.63 5.08<br />
sym3 96.73 4.81 99.54 5.07<br />
sym4 96.74 4.82 99.43 5.07<br />
sym5 96.72 4.81 99.38 5.06<br />
sym6 96.73 4.81 99.33 5.07<br />
sym7 96.70 4.80 99.17 5.06<br />
sym8 96.71 4.80 99.11 5.08<br />
Table 5.4: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />
the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />
Symlets and a threshold value th equal to 25.
5.2 — Multiresolution algorithm optimization<br />
Coiflets<br />
1D 2D<br />
Filters P E P E<br />
coif1 96.74 4.82 99.51 5.07<br />
coif2 96.72 4.80 98.32 4.75<br />
coif3 96.72 4.81 99.60 5.06<br />
coif4 96.69 4.80 98.62 5.06<br />
coif5 96.68 4.80 98.29 5.05<br />
Table 5.5: Mean values <strong>of</strong> P and E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />
the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />
Coiflets and a threshold value th equal to 25.<br />
Biorthogonal<br />
1D 2D<br />
Filters P E P E<br />
bior1.1 96.79 4.85 99.80 5.09<br />
bior1.3 96.68 4.81 99.48 5.07<br />
bior1.5 96.64 4.82 99.25 5.05<br />
bior2.2 96.28 4.71 98.70 4.94<br />
bior2.4 96.28 4.65 98.56 4.92<br />
bior2.6 96.23 4.62 98.27 4.91<br />
bior2.8 96.21 4.63 97.81 4.91<br />
bior3.1 93.41 5.68 94.15 5.58<br />
bior3.3 94.37 4.84 95.43 5.01<br />
bior3.5 94.70 4.65 96.60 5.10<br />
bior3.7 94.81 4.59 95.13 4.85<br />
bior3.9 94.88 4.56 94.13 4.85<br />
bior4.4 96.75 4.82 99.39 5.07<br />
bior5.5 96.78 4.88 99.46 5.10<br />
bior6.8 96.68 4.79 98.95 5.04<br />
Table 5.6: Mean values <strong>of</strong> P and E using the Biorthogonal filters<br />
137
138<br />
Wavelet based <strong>compression</strong> algorithm<br />
Reverse Biorthogonal<br />
1D 2D<br />
Filters P E P E<br />
rbio1.1 96.79 4.85 99.80 5.09<br />
rbio1.3 96.77 4.85 99.57 5.08<br />
rbio1.5 96.75 4.86 99.39 5.06<br />
rbio2.2 96.78 4.92 96.89 4.58<br />
rbio2.4 96.79 4.88 99.47 5.12<br />
rbio2.6 96.77 4.87 99.32 5.11<br />
rbio2.8 96.78 4.88 99.18 5.12<br />
rbio3.1 96.38 8.67 98.76 11.29<br />
rbio3.3 96.72 5.14 99.29 5.39<br />
rbio3.5 96.76 4.95 99.28 5.18<br />
rbio3.7 96.76 4.92 99.09 5.18<br />
rbio3.9 96.74 4.91 98.97 5.20<br />
rbio4.4 96.68 4.80 99.29 5.06<br />
rbio5.5 93.32 4.63 98.56 4.92<br />
rbio6.8 96.71 4.81 99.10 5.08<br />
Table 5.7: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈<br />
0.01): the analysis has been performed on a 5-level base, using a set <strong>of</strong><br />
filters Rev. Biorthogonal and a threshold value th equal to 25; the values<br />
obtained with bior1.1 are equivalent to the ones obtained with Haar, since<br />
the corresponding filters are equivalent.<br />
The common feature from Tab. 5.3, Tab. 5.4, Tab. 5.5, Tab. 5.6 and<br />
Tab. 5.7 is the increasing value <strong>of</strong> P and E with the increase <strong>of</strong> the th<br />
value.<br />
Nevertheless some wavelet families are better suited than others to the<br />
<strong>compression</strong> task; by comparing the values obtained for th = 25, it is<br />
evident that the Haar set <strong>of</strong> filters shows the best performances. In<br />
particular with P =96.79 and E =4.85 in the uni-dimensional case<br />
and P =99.80 and E =5.09 in the bi-dimensional case, the Haar set<br />
<strong>of</strong> filters gets the higher percentage <strong>of</strong> null coefficients with an accept-
5.2 — Multiresolution algorithm optimization<br />
Family Set <strong>of</strong> filters name Filter length<br />
Haar haar 2<br />
Daubechies dbN 2N<br />
Symlets symN 2N<br />
Coiflets coifN 6N<br />
Biorthogonal bior1.1 2<br />
biorN1.N2, N1=1,N2=1 max(2N1,2N2)+2<br />
Reverse Biorthogonal rbio1.1 2<br />
rbioN1.N2, N1=1,N2=1 max(2N1,2N2)+2<br />
Table 5.8: Length <strong>of</strong> filters belonging to different families<br />
able error. The choice <strong>of</strong> the Haar filters can be supported with other<br />
argomentations too, concerning Haar filter’s length H, G, H and G,<br />
i.e. the number <strong>of</strong> coefficients which characterize the impulse response.<br />
As shown in Tab. 5.8 filters belonging to the Haar family have the<br />
smallest number <strong>of</strong> coefficients among filters, obviously together with<br />
the set <strong>of</strong> filters db1, bior1.1 and rbio1.1. Since the analysis and synthesis<br />
processes consist <strong>of</strong> successive convolutions between the signal<br />
to analyze or synthesize and the respective filters, this small number<br />
<strong>of</strong> coefficients allows for a higher execution speed <strong>of</strong> the analysis and<br />
synthesis processes.<br />
5.2.3 Choice <strong>of</strong> the dimensionality, number <strong>of</strong> levels<br />
and threshold value<br />
Once chosen the Haar set <strong>of</strong> filters, we studied the effect on the P and E<br />
parameters <strong>of</strong> dimensionality (1D or 2D), the number <strong>of</strong> levels used for<br />
decomposition (1,2, ... ,16 in 1D and 1,2, ... ,8 in 2D) and the value<br />
<strong>of</strong> the threshold th.<br />
Tab. 5.9 and Tab. 5.10 show the analysis <strong>of</strong> the usual 10 SDD events in<br />
139
140<br />
Wavelet based <strong>compression</strong> algorithm<br />
1D and 2D; each table also contains the value <strong>of</strong> P and E for 1, 3 and<br />
5 levels <strong>of</strong> decomposition and for each level a threshold value between<br />
0 and 25 has been adopted.<br />
The first result is that bi-dimensional analysis produces a higher percentage<br />
P <strong>of</strong> null coefficients than the uni-dimensional case; nevertheless<br />
its E values are also higher.<br />
For instance comparing the P and E values for a threshold value th<br />
<strong>of</strong> 35 the 1D analysis on 1 level determines P =50.01 and E =1.85,<br />
while 2D analysis determines P =74.96 and E =3.96; the same 1D<br />
analysis on 3 levels determines P =87.45 and E =4.18, versus the<br />
values P =99.80 and E =5.09 in the 2D case.<br />
An other result we obtained from the tables is that, once decided<br />
whether to use 1D or 2D analysis, an increase in the number <strong>of</strong> decomposition<br />
levels determines an increase in the values <strong>of</strong> the parameters<br />
P and E.<br />
For instance, by comparing values in Tab. 5.9 obtained with th equal to<br />
25, it can be noticed that 1D analysis on 1 level determines P =50.01<br />
and E =1.85, on 2 levels P =87.45 and E =4.18, while on 3 levels<br />
P =96.79 and E =4.85. The same concept holds true for 2D analysis<br />
and synthesis. So far we found out that the optimized version <strong>of</strong> a<br />
multiresolution analysis based algorithm for SDD <strong>data</strong> is a 2D analysis<br />
on the maximum number <strong>of</strong> decomposition levels using the Haar set <strong>of</strong><br />
filters.<br />
For what concerns the threshold th, the parameters P and E increase<br />
when th is increased. In order to decide the th valuewehavetoable<br />
to quantify the reconstruction error introduced after wavelet analysis<br />
and to compare it with the <strong>compression</strong> algorithms implemented on<br />
CARLOS.
5.3 — Choice <strong>of</strong> the architecture<br />
5.3 Choice <strong>of</strong> the architecture<br />
The precision related to the architecture chosen for the <strong>implementation</strong><br />
<strong>of</strong> the multiresolution analysis can strongly affect the percentage P <strong>of</strong><br />
null coefficients and the reconstruction error E. As an example it is<br />
sufficient to apply both the analysis and synthesis processes to an input<br />
signal without any threshold : the reconstruction error E, though very<br />
little, is different from 0, due to the finite precision that our Pentium<br />
II processor used to perform the calculations.<br />
In order to quantify the influence <strong>of</strong> the architecture on the algorithm<br />
performance we used Simulink, a s<strong>of</strong>tware tool from Matlab for the<br />
design and simulation <strong>of</strong> complex systems, and Fixed-Point Blockset<br />
[25] that allows to simulate the performances <strong>of</strong> a given algorithm when<br />
implemented on different architectures, both in fixed and floating point.<br />
5.3.1 Simulink and the Fixed-Point Blockset<br />
The Fixed-Point Blockset tool [25] is one <strong>of</strong> the Simulink libraries which<br />
contains blocks performing operations between signals such as sum,<br />
multiplication, convolution and so on, simulating various types <strong>of</strong> architectures,<br />
both fixed and floating point. This tool is very useful since<br />
it allows the designer to study the performance <strong>of</strong> a given algorithm on<br />
different architectures before the actual <strong>implementation</strong> takes place.<br />
For instance, this tool can be successfully used in order to decide if<br />
a Fourier transform can be implemented with acceptable performance<br />
in a fixed-point DSP (Digital Signal Processor) or it has to be implemented<br />
in a floating-point DSP. The difference is relevant especially for<br />
cost reasons, since a floating-point DSP has a much higher cost than<br />
a fixed-point one. We used the Fixed-Point Blockset with the same<br />
purpose <strong>of</strong> finding the more suitable architecture before actual <strong>implementation</strong>.<br />
Among the various floating and fixed-point architectures handled by<br />
141
142<br />
Wavelet based <strong>compression</strong> algorithm<br />
the Fixed-Point Blockset, we studied the following ones:<br />
– double precision floating point IEEE 754 standard architecture;<br />
– single precision floating point IEEE 754 standard architecture;<br />
– fractional fixed point.<br />
IEEE 754 standard architecture is one <strong>of</strong> the most widespread architectures<br />
and it is used in most floating-point processors.<br />
When the double precision is used, the standard architecture requires<br />
a 64-bit word in which 1 bit stands for the sign s, 11 bits for the exponent<br />
e and the remaining 52 bits for the mantissa m. The relationship<br />
b b b b<br />
63 62 51 0<br />
s e m<br />
between binary and decimal representation is the following one:<br />
valore decimale = (−1) s · 2 e−1023 (1.m) , 0
5.3 — Choice <strong>of</strong> the architecture<br />
on the right (b0 − bs−1) contain the fractionary part <strong>of</strong> the number, one<br />
bit on the left (bs) contains the sign <strong>of</strong> the number and the other guard<br />
bits (bs+1 − b31) on the left <strong>of</strong> the radix point contain the integer part<br />
<strong>of</strong> the number.<br />
It is to be noticed that double precision floating point IEEE 754 stan-<br />
b b b<br />
s+1<br />
31 30 s<br />
guard bits<br />
b b s−1 1<br />
radix point<br />
b 0 b<br />
dard architecture features a precision <strong>of</strong> 2−52 ≈ 10−16 , single precision<br />
IEEE 754 has a precision <strong>of</strong> 2−23 ≈ 10−7 , while fractional fixed point<br />
architecture has a precision <strong>of</strong> 2−s , i.e. the precision depends on the<br />
number <strong>of</strong> bits being used for the fractional part <strong>of</strong> the number. So<br />
far the study <strong>of</strong> the influence <strong>of</strong> the fixed fractional architecture on the<br />
multiresolution analysis has been carried on by varying the position <strong>of</strong><br />
the radix point among the 32 bit word.<br />
5.3.2 Choice <strong>of</strong> the architecture<br />
Implementing bi-dimensional multiresolution analysis and synthesis using<br />
Simulink is quite a long job, both in terms <strong>of</strong> design and simulation<br />
time. So far we decided to implement a uni-dimensional algorithm on<br />
16 decomposition levels, since it is a much quicker and simpler job.<br />
Beside that it gives a rather good estimation on the performances <strong>of</strong><br />
the 3 architectures on an algorithm very similar to the one we have<br />
chosen.<br />
The <strong>implementation</strong> with Simulink <strong>of</strong> the multiresolution analysis<br />
and synthesis processes is shown in the external blocks in Fig.5.3: the<br />
block on the left performs the 1D analysis <strong>of</strong> the signal S using the<br />
Haar set <strong>of</strong> filters, while the block on the right applies a threshold on<br />
the decomposition coefficients and performs the synthesis <strong>of</strong> the signal<br />
143
144<br />
Wavelet based <strong>compression</strong> algorithm<br />
D1<br />
D2<br />
D1 del<br />
D2 del<br />
D1<br />
D2<br />
D3<br />
D4<br />
D3 del<br />
D4 del<br />
D3<br />
D4<br />
D5<br />
D6<br />
D5 del<br />
D6 del<br />
D5<br />
D6<br />
D7<br />
D8<br />
D7 del<br />
D8 del<br />
D7<br />
D8<br />
R<br />
Segnale Ricostruito<br />
D9<br />
D10<br />
D9 del<br />
D10 del<br />
D9<br />
D10<br />
Segnale<br />
S<br />
Segnale<br />
Ricostruito<br />
D11 del<br />
D12 del<br />
D11<br />
D12<br />
Segnale<br />
D13 del<br />
D14 del<br />
D13<br />
D14<br />
D15 del<br />
D16 del<br />
D15<br />
D16<br />
D1<br />
D2<br />
D3<br />
D4<br />
D5<br />
D6<br />
D7<br />
D8<br />
D9<br />
D10<br />
D11<br />
D12<br />
D13<br />
D14<br />
D15<br />
D16<br />
A16<br />
Figure 5.3: Developed Simulink blocks: from left to right the analysis<br />
block, the delay block and the threshold and synthesis block<br />
D11<br />
D12<br />
D13<br />
D14<br />
D15<br />
D16<br />
A16<br />
A16 del<br />
A16<br />
Applicazione soglia e 16 livelli di sintesi<br />
Delay<br />
16 livelli di analisi
Dettaglio 1<br />
1<br />
D1<br />
2<br />
Downsample<br />
Hi_Dec Filter<br />
1<br />
Segnale<br />
2<br />
Downsample1<br />
Low_Dec Filter<br />
Dettaglio 2<br />
2<br />
D2<br />
2<br />
Downsample2<br />
Hi_Dec Filter1<br />
2<br />
Downsample3<br />
Low_Dec Filter1<br />
Dettaglio 3<br />
3<br />
D3<br />
2<br />
Downsample4<br />
Hi_Dec Filter2<br />
Dettaglio 6<br />
6<br />
D6<br />
2<br />
2<br />
Downsample5<br />
Low_Dec Filter2<br />
Downsample10<br />
Hi_Dec Filter5<br />
Dettaglio 4<br />
4<br />
D4<br />
2<br />
Downsample6<br />
Hi_Dec Filter3<br />
5.3 — Choice <strong>of</strong> the architecture<br />
2<br />
Downsample7<br />
Low_Dec Filter3<br />
2<br />
Dettaglio 5<br />
5<br />
D5<br />
2<br />
Downsample8<br />
Hi_Dec Filter4<br />
Downsample11<br />
Low_Dec Filter5<br />
2<br />
Downsample9<br />
Low_Dec Filter4<br />
Dettaglio 6<br />
6<br />
D6<br />
2<br />
Downsample10<br />
Hi_Dec Filter5<br />
2<br />
Downsample11<br />
Low_Dec Filter5<br />
Dettaglio 7<br />
7<br />
D7<br />
2<br />
Downsample12<br />
Hi_Dec Filter6<br />
2<br />
Downsample13<br />
Low_Dec Filter6<br />
Dettaglio 8<br />
8<br />
D8<br />
2<br />
Downsample14<br />
Hi_Dec Filter7<br />
Figure 5.4: Zoom on the developed analysis block<br />
2<br />
Downsample15<br />
Low_Dec Filter7<br />
Dettaglio 9<br />
9<br />
D9<br />
2<br />
Downsample16<br />
Hi_Dec Filter8<br />
2<br />
Downsample17<br />
Low_Dec Filter8<br />
Dettaglio 10<br />
10<br />
D10<br />
2<br />
Downsample18<br />
Hi_Dec Filter9<br />
2<br />
Downsample19<br />
Low_Dec Filter9<br />
Dettaglio 11<br />
11<br />
D11<br />
2<br />
Downsample20<br />
Hi_Dec Filter10<br />
2<br />
Downsample21<br />
Low_Dec Filter10<br />
Dettaglio 12<br />
12<br />
D12<br />
2<br />
Downsample22<br />
Hi_Dec Filter11<br />
2<br />
Downsample23<br />
Low_Dec Filter11<br />
Dettaglio 13<br />
13<br />
D13<br />
2<br />
Downsample24<br />
Hi_Dec Filter12<br />
2<br />
Downsample25<br />
Low_Dec Filter12<br />
Dettaglio 14<br />
14<br />
D14<br />
2<br />
Downsample26<br />
Hi_Dec Filter13<br />
2<br />
Downsample27<br />
Low_Dec Filter13<br />
Dettaglio 15<br />
15<br />
D15<br />
2<br />
Downsample28<br />
Hi_Dec Filter14<br />
2<br />
Downsample29<br />
Low_Dec Filter14<br />
Dettaglio 16<br />
16<br />
D16<br />
2<br />
Downsample30<br />
Hi_Dec Filter15<br />
Approssimazione 16<br />
17<br />
A16<br />
2<br />
Downsample31<br />
Low_Dec Filter15<br />
145
146<br />
Wavelet based <strong>compression</strong> algorithm<br />
1<br />
Segnale Ricostruito<br />
D1<br />
D2<br />
D3<br />
D4<br />
D5<br />
D6<br />
D7<br />
D8<br />
D9<br />
D10<br />
D11<br />
D12<br />
D13<br />
D14<br />
D15<br />
D16<br />
A16<br />
In1<br />
Out1<br />
In2<br />
Out2<br />
In3<br />
Out3<br />
In4<br />
Out4<br />
In5<br />
Out5<br />
In6<br />
Out6<br />
In7<br />
Out7<br />
In8<br />
Out8<br />
In9<br />
Out9<br />
In10<br />
Out10<br />
In11<br />
Out11<br />
In12<br />
Out12<br />
In13<br />
Out13<br />
In14<br />
Out14<br />
In15<br />
Out15<br />
In16<br />
Out16<br />
In17<br />
Out17<br />
To Workspace<br />
D1 th<br />
D2 th<br />
D3 th<br />
D4 th<br />
D5 th<br />
D6 th<br />
D7 th<br />
D8 th<br />
D9 th<br />
D10 th<br />
D11 th<br />
D12 th<br />
D13 th<br />
D14 th<br />
D15 th<br />
D16 th<br />
A16 th<br />
D1<br />
D2<br />
D3<br />
D4<br />
D5<br />
D6<br />
D7<br />
D8<br />
D9<br />
D10<br />
D11<br />
D12<br />
D13<br />
D14<br />
D15<br />
D16<br />
A16<br />
1<br />
D1<br />
2<br />
D2<br />
3<br />
D3<br />
4<br />
D4<br />
5<br />
D5<br />
6<br />
D6<br />
7<br />
D7<br />
8<br />
D8<br />
Figure 5.5: Zoom on the developed threshold and synthesis block<br />
Segnale Ricostruito<br />
9<br />
D9<br />
10<br />
D10<br />
11<br />
D11<br />
Dettagli<br />
12<br />
D12<br />
13<br />
D13<br />
14<br />
D14<br />
15<br />
D15<br />
16<br />
D16<br />
16 livelli di ricostruzione<br />
Applicazione soglia<br />
17<br />
A16<br />
Approssimazione
2<br />
Upsample<br />
Dettaglio 1 1<br />
D1<br />
Hi_Rec Filter10<br />
1<br />
Segnale Ricostruito<br />
FixPt<br />
Sum13<br />
2<br />
Upsample1<br />
2<br />
D2<br />
Dettaglio 2<br />
5.3 — Choice <strong>of</strong> the architecture<br />
Hi_Rec Filter1<br />
2<br />
Upsample3<br />
FixPt<br />
Sum12<br />
Low_Rec Filter10<br />
2<br />
Upsample4<br />
3<br />
D3<br />
Dettaglio 3<br />
Hi_Rec Filter2<br />
2<br />
Upsample10<br />
6<br />
D6<br />
2<br />
Upsample2<br />
FixPt<br />
Sum11<br />
2<br />
Upsample6<br />
Dettaglio 4 4<br />
D4<br />
Hi_Rec Filter5<br />
Low_Rec Filter1<br />
Hi_Rec Filter3<br />
2<br />
Upsample5<br />
FixPt<br />
Sum10<br />
Low_Rec Filter2<br />
2<br />
Upsample9<br />
2<br />
Upsample8<br />
5<br />
D5<br />
Dettaglio 5<br />
FixPt<br />
Sum8<br />
Hi_Rec Filter4<br />
2<br />
FixPt<br />
Upsample7<br />
Sum9<br />
Low_Rec Filter3<br />
2<br />
Upsample10<br />
Dettaglio 6 6<br />
D6<br />
Hi_Rec Filter5<br />
2<br />
Upsample9<br />
FixPt<br />
Sum8<br />
Low_Rec Filter4<br />
Dettaglio 7 7<br />
2<br />
D7<br />
Upsample12<br />
2<br />
Upsample11<br />
Hi_Rec Filter6<br />
Low_Rec Filter5<br />
FixPt<br />
Sum7<br />
2<br />
Upsample11<br />
FixPt<br />
Sum7<br />
Low_Rec Filter5<br />
Dettaglio 8 8<br />
2<br />
D8<br />
Upsample14<br />
Hi_Rec Filter7<br />
2<br />
Upsample13<br />
FixPt<br />
Sum6<br />
Low_Rec Filter6<br />
Dettaglio 9 9<br />
2<br />
D9<br />
Upsample16<br />
Hi_Rec Filter8<br />
2<br />
Upsample15<br />
FixPt<br />
Sum5<br />
Low_Rec Filter7<br />
2<br />
Upsample18<br />
Dettaglio 10 10<br />
D10<br />
Figure 5.6: Zoom on the developed synthesis block<br />
Hi_Rec Filter9<br />
2<br />
FixPt<br />
Upsample17<br />
Sum4<br />
Low_Rec Filter8<br />
2<br />
Upsample20<br />
Dettaglio 11 11<br />
D11<br />
Hi_Rec Filter11<br />
2<br />
Upsample19<br />
FixPt<br />
Sum1<br />
Low_Rec Filter9<br />
2<br />
Upsample22<br />
Dettaglio 12 12<br />
D12<br />
Hi_Rec Filter12<br />
2<br />
Upsample21<br />
FixPt<br />
Sum2<br />
Low_Rec Filter11<br />
2<br />
Upsample24<br />
Dettaglio 13 13<br />
D13<br />
Hi_Rec Filter13<br />
2<br />
Upsample23<br />
FixPt<br />
Sum3<br />
Low_Rec Filter12<br />
2<br />
Upsample26<br />
Dettaglio 14 14<br />
D14<br />
Hi_Rec Filter14<br />
2<br />
Upsample25<br />
FixPt<br />
Sum14<br />
Low_Rec Filter13<br />
2<br />
Upsample28<br />
Dettaglio 15 15<br />
D15<br />
Hi_Rec Filter15<br />
2<br />
Upsample27<br />
FixPt<br />
Sum15<br />
Low_Rec Filter14<br />
2<br />
Upsample30<br />
Dettaglio 16 16<br />
D16<br />
Hi_Rec Filter16<br />
2<br />
Upsample29<br />
FixPt<br />
Sum16<br />
Low_Rec Filter15<br />
2<br />
Upsample31<br />
17<br />
A16<br />
Approssimazione 16<br />
Low_Rec Filter16<br />
147
148<br />
Wavelet based <strong>compression</strong> algorithm<br />
R.<br />
The analysis block has been implemented as a 16-level cascade, see<br />
Fig. 5.4, containing high-pass filter operators (Hi Dec Filter), low pass<br />
filter operators (Low Dec Filter) and Downsample operators. Hi Dec<br />
Filter operators perform convolution between the incoming signal and<br />
the Haar high pass decomposition filter, Low Dec Filter operators perform<br />
convolution between the incoming signal and the Haar low pass<br />
decomposition filter, while the Downsample operators perform the decimation<br />
<strong>of</strong> the incoming signal.<br />
Fig. 5.5 shows the threshold and synthesis block which is subdivided<br />
into 3 major sub-blocks: the sub-block on the left applies a threshold<br />
on the input stream, the sub-block on the right performs the synthesis<br />
<strong>of</strong> the signal, while the central block, called To Workspace, stores the<br />
decomposition coefficients after the application <strong>of</strong> the threshold, so that<br />
this value is used for calculating the percentage P <strong>of</strong> null coefficients.<br />
The synthesis block has been implemented, in analogy to the analysis<br />
block, as a 16-level cascade, see Fig. 5.6, containing Hi Rec Filter operators<br />
performing the convolution between the incoming signal and the<br />
Haar high-pass reconstruction filter, Low Rec Filter operators performing<br />
the convolution between the incoming signal and the Haar low-pass<br />
reconstruction filter, FixPt Sum operators performing the sum between<br />
filtered signals and Upsample operators performing the upsampling on<br />
the incoming signals.<br />
Finally the Delay block shown in Fig. 5.3 is the block with the task <strong>of</strong><br />
starting the synthesis process only when the analysis job has already<br />
been completed. It is to be noticed that the analysis, delay and synthesis<br />
blocks have been developed starting from simple blocks belonging<br />
to the Fixed Point Blockset, such as filtering, downsampling and upsampling<br />
blocks, and so on.<br />
After performing the analysis and synthesis <strong>of</strong> the 10 SDD events with<br />
a value <strong>of</strong> the threshold equal to 25 for the 3 architectures described<br />
above, we have obtained the values shown in Tab. 5.11; as a notation<br />
the floating point double precision standard architecture IEEE 754 is
5.4 — Multiresolution algorithm performances<br />
indicated as ieee754doub, the single precision floating point standard<br />
architecture IEEE 754 as ieee754sing and the fractional fixed point architecture<br />
as fixed(s), wheresis the number <strong>of</strong> bits representing the<br />
fractional part <strong>of</strong> the number.<br />
Simulink simulations show how the values P and E depend on the<br />
precision <strong>of</strong> the selected architecture: in particular taking as a reference<br />
the values P and E less influenced from the finite precision <strong>of</strong> the<br />
calculations, i.e. the values related to the architecture ieee754doub, it<br />
can be noticed in the cases ieee754sing, fixed(18), fixed(15), fixed(12)<br />
and fixed(9), a slight increase in the error E while P remains constant,<br />
while in cases fixed(7), fixed(5) and fixed(3) the discrepancy with the<br />
values obtained in the case ieee754doub increases strongly.<br />
So far the results we have obtained pointed us towards the choice <strong>of</strong><br />
one <strong>of</strong> the following architectures: ieee754doub, ieee754sing, fixed(18),<br />
fixed(15), fixed(12) and fixed(9). Our choice fell on the ieee754sing as<br />
explained in Par. 5.5.<br />
5.4 Multiresolution algorithm performances<br />
For a direct comparison <strong>of</strong> the performances obtained by the <strong>compression</strong><br />
algorithms implemented on the CARLOS prototypes and by the<br />
multiresolution based algorithm, we developed a FORTRAN subroutine<br />
running analysis and synthesis on a floating-point single precision<br />
SPARC5 processor. The FORTRAN subroutine can be logically divided<br />
in two parts: the first with the aim <strong>of</strong> giving an estimation <strong>of</strong> the<br />
algorithm in terms <strong>of</strong> <strong>compression</strong>, the second with the aim <strong>of</strong> giving<br />
an estimation <strong>of</strong> the reconstruction error on the cluster charge.<br />
The first part <strong>of</strong> the subroutine performs analysis, threshold th application<br />
and synthesis on SDD events containing several charge clusters.<br />
After applying analysis and threshold, for each SDD event the reciprocal<br />
<strong>of</strong> the <strong>compression</strong> ratio is calculated c−1 = no output bits<br />
no input bits ,with<br />
the assumption that each non-null decomposition coefficient is encoded<br />
149
150<br />
Wavelet based <strong>compression</strong> algorithm<br />
using two 32-bit words, one representing the value <strong>of</strong> the coefficient<br />
itself, the other representing the number <strong>of</strong> null coefficients between<br />
the current and the previous non-null coefficient. So far the number <strong>of</strong><br />
bits entering the algorithm is the number <strong>of</strong> samples multiplied by 8<br />
bits (64k × 8 = 512k), while the number <strong>of</strong> bits exiting the algorithm<br />
is the number <strong>of</strong> non-null decomposition coefficients multiplied by the<br />
32 + 32 = 64 bits used to encode each coefficient.<br />
The second part <strong>of</strong> the FORTRAN subroutine performs analysis, threshold<br />
application and synthesis to single-cluster SDD events.<br />
After analysis, threshold th application and synthesis, the difference<br />
between the coordinates <strong>of</strong> the cluster charge before <strong>compression</strong> and<br />
after synthesis is computed for each SDD event, as long as the percentage<br />
difference between the charge <strong>of</strong> the cluster before <strong>compression</strong> and<br />
after reconstruction.<br />
Fig. 5.7, Fig. 5.8, Fig. 5.9, Fig. 5.10, Fig. 5.11 and Fig. 5.12 show the<br />
value <strong>of</strong> the <strong>compression</strong> parameter c−1 for different threshold th values;<br />
in each figure the upper histogram represent the c values belonging<br />
to 500 SDD events analyzed, while the lower hystogram represents the<br />
c values related to SDD events whose c−1 value is less than 46 × 10−3 (c = 22).<br />
As shown in hystograms, the mean c values are lower than our target<br />
value c−1 =46× 10−3 for each threshold value selected. So far the<br />
multiresolution algorithms can reach an acceptable <strong>compression</strong> ratio<br />
by putting a threshold <strong>of</strong> 20 on analyzed coefficients.<br />
For what concerns the reconstruction error calculation up to now we<br />
could use only 20 single-cluster events. So far the hystograms reporting<br />
coordinate and charge difference before and after <strong>compression</strong> show a<br />
very poor statistics.<br />
For this reason the results we obtained on reconstruction error are<br />
pretty qualitative up to now: in particular performing the analysis on<br />
20 SDD events and using a threshold th level equal to 21 the differences<br />
on the centroid coordinates before and after <strong>compression</strong> are <strong>of</strong><br />
the order <strong>of</strong> magnitude <strong>of</strong> the µm, whereas the difference between clus-
5.5 — Hardware <strong>implementation</strong><br />
ter charge show a cluster underestimation <strong>of</strong> some percentual point.<br />
These qualitative results belong to the same order <strong>of</strong> magnitude <strong>of</strong> the<br />
<strong>compression</strong> algorithms implemented in CARLOS prototypes.<br />
Figure 5.7: c −1 values for th=20<br />
5.5 Hardware <strong>implementation</strong><br />
The <strong>hardware</strong> we have chosen for the <strong>implementation</strong> <strong>of</strong> the wavelet<br />
based <strong>compression</strong> algorithm is a DSP chip from Analog Devices (AD):<br />
the ADSP-21160. The DSP belongs to the Single Instruction Multiple<br />
Data SHARC family produced by AD. It performs calculations both<br />
in fixed-point and in single precision floating point at the same speed.<br />
Our choice fell on this DSP also for this interesting feature, since it<br />
allows us to try two different architectures with a single chip. The chip<br />
has the following features:<br />
– 600 MFLOPS (32-bit floating point) peak operation;<br />
151
152<br />
Wavelet based <strong>compression</strong> algorithm<br />
Figure 5.8: c −1 values for th=21<br />
Figure 5.9: c −1 values for th=22
5.5 — Hardware <strong>implementation</strong><br />
Figure 5.10: c −1 values for th=23<br />
Figure 5.11: c −1 values for th=24<br />
153
154<br />
Wavelet based <strong>compression</strong> algorithm<br />
Figure 5.12: c −1 values for th=25<br />
– 600 MOPS (32-bit fixed point) peak operation;<br />
– 100 MHz core operation;<br />
– 4 Mbits on-chip dual-ported SRAM;<br />
– division <strong>of</strong> SRAM between program and <strong>data</strong> memory is user selectable;<br />
– 14 channels <strong>of</strong> zero overhead DMA;<br />
– JTAG standard test access port.<br />
Particularly interesting in this chip is the amount <strong>of</strong> memory hosted onchip:<br />
4 Mbits are sufficient to store the algorithm program and at least<br />
2 SDD events (each one requires 512 Kbits). So far while processing<br />
one SDD event, an other one can be fetched into the internal SRAM<br />
using the DMA channels, so increasing the total throughput.<br />
The DSP has been bought together with an evaluation board and an<br />
integrated development environment s<strong>of</strong>tware VisualDSP, that allows<br />
to write C code and download it to the DSP chip. The wavelet based
5.5 — Hardware <strong>implementation</strong><br />
<strong>compression</strong> algorithm <strong>implementation</strong> on DSP is still in the design<br />
phase, so far no <strong>data</strong> concerning algorithm speed are available up to<br />
now for a quantitative comparison with the CARLOS chip prototypes.<br />
155
156<br />
Wavelet based <strong>compression</strong> algorithm<br />
Haar<br />
1D<br />
1 level 3 levels 5 levels<br />
Threshold value th P E P E P E<br />
0 7.78 3.02 e-15 9.05 7.11 e-15 9.12 1.26 e-14<br />
1 17.51 0.22 23.67 0.26 24.68 0.27<br />
2 31.23 0.65 38.11 0.62 40.01 0.63<br />
3 40.09 1.01 55.81 1.21 58.60 1.64<br />
4 44.28 1.25 63.48 1.56 67.08 1.71<br />
5 47.84 1.52 71.20 2.00 75.56 2.09<br />
6 48.78 1.61 74.80 2.26 79.87 2.38<br />
7 49.31 1.68 77.81 2.52 83.56 2.68<br />
8 49.71 1.74 80.38 2.79 86.71 2.99<br />
9 49.78 1.76 82.02 2.99 88.82 3.23<br />
10 49.87 1.78 83.41 3.19 90.70 3.48<br />
11 49.91 1.79 84.50 3.38 92.21 3.72<br />
12 49.94 1.80 85.17 3.50 93.20 3.89<br />
13 49.97 1.81 85.81 3.64 94.16 4.07<br />
14 49.98 1.82 86.25 3.75 94.81 4.21<br />
15 49.98 1.83 86.60 3.84 95.33 4.34<br />
16 49.99 1.83 86.85 3.92 95.72 4.44<br />
17 50.00 1.84 87.02 3.98 96.03 4.54<br />
18 50.00 1.84 87.12 4.02 96.20 4.60<br />
19 50.00 1.84 87.24 4.07 96.41 4.67<br />
20 50.00 1.84 87.32 4.10 96.54 4.72<br />
21 50.01 1.84 87.36 4.12 96.62 4.76<br />
22 50.01 1.84 87.40 4.14 96.69 4.79<br />
23 50.01 1.84 87.42 4.16 96.73 4.81<br />
24 50.01 1.85 87.43 4.17 96.76 4.83<br />
25 50.01 1.85 87.45 4.18 96.79 4.85<br />
Table 5.9: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />
the analysis has been performed on a number <strong>of</strong> levels equal to 1, 3, 5,<br />
using the Haar set <strong>of</strong> filters.
5.5 — Hardware <strong>implementation</strong><br />
Haar<br />
2D<br />
1 level 3 levels 5 levels<br />
Threshold value th P E P E P E<br />
0 3.54 5.32 e-15 3.67 1.5 e-14 3.68 2.50 e-14<br />
1 18.90 0.26 22.06 0.28 22.21 0.28<br />
2 36.05 0.69 42.33 0.74 42.63 0.75<br />
3 46.42 1.07 55.90 1.19 56.34 1.19<br />
4 55.25 1.47 67.15 1.66 67.76 1.67<br />
5 60.69 1.80 74.78 2.07 75.50 2.09<br />
6 64.01 2.06 79.95 2.42 80.77 2.44<br />
7 66.46 2.30 84.03 2.75 84.96 2.77<br />
8 68.30 2.51 87.18 3.05 88.21 3.08<br />
9 69.73 2.70 89.64 3.33 90.75 3.36<br />
10 70.95 2.90 91.72 3.59 92.88 3.63<br />
11 71.87 3.06 93.25 3.82 94.49 3.87<br />
12 72.63 3.22 94.51 4.03 95.80 4.08<br />
13 73.20 3.35 95.46 4.21 96.78 4.26<br />
14 73.65 3.47 96.21 4.36 97.56 4.42<br />
15 74.06 3.59 96.84 4.51 98.20 4.57<br />
16 74.38 3.69 97.34 4.64 98.73 4.71<br />
17 74.53 3.75 97.63 4.72 99.05 4.80<br />
18 74.65 3.80 97.82 4.79 99.25 4.86<br />
19 74.76 3.85 98.01 4.85 99.44 4.93<br />
20 74.82 3.87 98.11 4.89 99.55 4.97<br />
21 74.87 3.90 98.20 4.93 99.64 5.01<br />
22 74.91 3.92 98.25 4.95 99.69 5.03<br />
23 74.93 3.94 98.29 4.97 99.74 5.05<br />
24 74.94 3.95 98.32 4.99 99.77 5.07<br />
25 74.96 3.96 98.35 5.00 99.80 5.09<br />
Table 5.10: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />
the analysis has been performed on a number <strong>of</strong> levels equal to 1, 3, 5, using<br />
the Haar set <strong>of</strong> filters.<br />
157
158<br />
Wavelet based <strong>compression</strong> algorithm<br />
Architecture Precision P E<br />
ieee754doub 2 −52 99.88 5.07<br />
ieee754sing 2 −23 99.88 5.11<br />
fixed(18) 2 −18 99.88 5.11<br />
fixed(15) 2 −15 99.88 5.11<br />
fixed(12) 2 −12 99.88 5.11<br />
fixed(9) 2 −9 99.88 5.11<br />
fixed(7) 2 −7 99.87 6.04<br />
fixed(5) 2 −5 99.81 12.75<br />
fixed(3) 2 −3 99.52 89.09<br />
Table 5.11: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01),<br />
obtained with Simulink simulations
Conclusions<br />
The main goal <strong>of</strong> this thesis work was the search for <strong>compression</strong> algorithms<br />
and its <strong>hardware</strong> <strong>implementation</strong> to be applied to <strong>data</strong> coming<br />
out from the Silicon Drift Detectors in the ALICE experiment.<br />
ALICE and, in general, LHC experiments put very stringent constraints<br />
on the <strong>compression</strong> algorithms for what concerns <strong>compression</strong> ratio,<br />
reconstruction error, speed, flexibility and so on. For example <strong>data</strong><br />
produced by the SDD have to be reduced <strong>of</strong> a factor <strong>of</strong> 22 in order<br />
to satisfy the constraints on disk space for permanent storage. So far<br />
many standard <strong>compression</strong> algorithms have been studied in order to<br />
find which one could obtain the best trade-<strong>of</strong>f between <strong>compression</strong><br />
ratio and reconstruction error, i.e. distortion introduced. It is rather<br />
obvious, in fact, that a high <strong>compression</strong> ratio such as 22 can only be<br />
achieved at the expense <strong>of</strong> some loss <strong>of</strong> information on the physical<br />
charge distribution over the SDD surface.<br />
Three <strong>hardware</strong> prototypes implementing <strong>data</strong> <strong>compression</strong> are presented<br />
in the thesis: the front-end chip CARLOS v1, v2 and v3. Their<br />
evolution from version 1 to version 3 reflects the architectural changes<br />
in the readout chain occurred during the 3 years <strong>of</strong> the work. Three<br />
major reasons can be used to justify these changes:<br />
– the necessity to work in a radiation environment, forcing us to<br />
choose a radiation-tolerant technology;<br />
– the lack <strong>of</strong> space for the SIU board, forcing us to change the<br />
readout architecture;<br />
159
160<br />
CONCLUSIONS<br />
– the change from a uni-dimensional (1D) <strong>compression</strong> algorithm to<br />
a bi-dimensional one (2D), in order to have the same <strong>compression</strong><br />
ratio as in 1D, while using lower thresholds, thus losing a smaller<br />
amount <strong>of</strong> physical <strong>data</strong>.<br />
We plan that CARLOS v4 will be the final version <strong>of</strong> the chip: it will<br />
contain the 2D algorithm and will be designed to be compliant with<br />
the new readout architecture. It should be sent to the foundry before<br />
the end <strong>of</strong> 2002.<br />
One <strong>of</strong> the main features <strong>of</strong> these chips is that lossy <strong>compression</strong> can be<br />
switched <strong>of</strong>f when needed and turned to lossless <strong>compression</strong>. Lossless<br />
<strong>data</strong> <strong>compression</strong> becomes necessary if <strong>compression</strong> algorithms implemented<br />
on the CARLOS chips are no longer applicable. For example<br />
the 2D <strong>compression</strong> algorithm does not work fine in presence <strong>of</strong> a<br />
slope on the anodic signal baseline. In this case on-line <strong>compression</strong> on<br />
the front-end has to be switched <strong>of</strong>f and a second level compressor in<br />
counting room has to do the job. For this kind <strong>of</strong> application different<br />
<strong>compression</strong> algorithms have to be studied.<br />
In alternative to the 1D and 2D algorithms, our group in <strong>Bologna</strong><br />
decided to study a wavelet based <strong>compression</strong> algorithm, in order to<br />
decide if it could be useful for a possible second level <strong>data</strong> <strong>compression</strong>.<br />
Our simulations proved that the algorithm show good performances<br />
for what concerns both the <strong>compression</strong> ratio and the reconstruction<br />
error. We are still working in order to obtain some more quantitative<br />
results and, at the same time, an <strong>implementation</strong> on DSP is planned for<br />
the near future in order to evaluate <strong>compression</strong> speed and how many<br />
DSPs would be necessary for the task. The use <strong>of</strong> DSP in counting<br />
room may be very convenient since, differently from ASICs, they are<br />
completely reprogrammable via s<strong>of</strong>tware if needed. So far as many as<br />
different <strong>compression</strong> algorithms as wanted can be tried on the input<br />
<strong>data</strong> in order to find the best one.
Bibliography<br />
[1] ALICE Collaboration, “Technical Proposal for A Large Ion<br />
Collider Experiment at the CERN LHC”, December 1995,<br />
CERN/LHCC/95-71.<br />
[2] The LHC study group, “The Large Hadron Collider Conceptual<br />
Design”, October 1995, CERN/AC/95-05(LHC).<br />
[3] P. Giubellino, E. Crescio, “The ALICE experiment at LHC:<br />
physics prospects and detector design”, January 2001, ALICE-<br />
PUB-2000-35.<br />
[4] CERN/LHCC 99-12 ALICE TDR 4, 18 June 1999.<br />
[5] E. Crescio, D. Nouais, P. Cerello, “A detailed study <strong>of</strong> charge diffusion<br />
and its effect on spatial resolution in Silicon Drift Detectors”,<br />
September 2001, ALICE-INT-2001-09.<br />
[6] F. Faccio, K. Kloukinas, G. Magazzu, A. Marchioro, “SEU<br />
effects in registers and in a Dual-Ported Static RAM designed in<br />
a0.25µmCMOStechnology for applications in the LHC”, Fifth<br />
Workshop on Electronics for LHC Experiments, September 20-24,<br />
1999, pages 571-575.<br />
[7] K. Sayood, “Introduction to Data Compression”, Morgan Kaufmann,<br />
S. Francisco, 1996.<br />
[8] E.S. Ventsel “Teoria delle probabilità”, Mir edition.<br />
[9] S. W. Smith, “The Scientist and Engineer’s Guide to Digital Signal<br />
Processing”, California Technical Publishing, S. Diego, 1999.<br />
161
162<br />
BIBLIOGRAPHY<br />
[10] J. Badier, Ph. Busson, A. Karar, D.W. Kim, G.B. Kim,, S.C.<br />
Lee, “Reduction <strong>of</strong> ECAL <strong>data</strong> volume using lossless <strong>data</strong> <strong>compression</strong><br />
techniques”, Nuclear Instruments and Methods in Physics<br />
Research A 463 (2001), pages 361-374.<br />
[11] R. Polikar, “The Engineer’s ultimate guide to wavelet analysis”,<br />
http://engineering.rowan.edu/˜polikar/WAVELETS/WTtutorial.html,<br />
2001.<br />
[12] P. G. Lemarié, Y.Meyer, “Ondelettes et bases hilbertiennes”, Rivista<br />
Matematica Iberoamericana, Vol. 2, pages 1-18, 1986.<br />
[13] E. J. Stollnitz, T. D. DeRose e D. H. Salesin, “Wavelets for<br />
computer graphics: a primer”, IEEE Computer Graphics and Applications,<br />
Vol. 3, NO. 15, pages 76-84, May 1995 (part 1) and<br />
Vol. 4, NO. 15, pages 75-85, July 1995 (part 2). Vol. 3, NO. 15,<br />
pages 76-84, May 1995.<br />
[14] P. Morton, “Image Compression Using<br />
the Haar Wavelet Transform”,<br />
http://online.redwoods.cc.ca.us/instruct/darnold/maw/haar.htm,<br />
1998.<br />
[15] B. Burke Hubbard, “The World According to Wavelets: the story<br />
<strong>of</strong> a mathematical technique in the making”, A K Peters, Ltd.,<br />
Wellesley, 1998.<br />
[16] S. G. Mallat, “A Theory for Multiresolution Signal Decomposition:<br />
The Wavelet Representation”, IEEE Transactions on pattern analysis<br />
and machine intelligence, Vol. II, NO. 7, pages 674-693, July<br />
1989.<br />
[17] D. Cavagnino, P. De Remigis, P. Giubellino, G. Mazza, e<br />
A. E. Werbrouck, “Data Compression for the ALICE Silicon Drift<br />
Detector”, 1998, ALICE-INT-1998-41.<br />
[18] Pankaj Gupta and Nick McKeown, “Designing and Implementing<br />
a Fast Crossbar Scheduler“, Jan/Feb 1999, IEEE Micro.
BIBLIOGRAPHY<br />
[19] D. Cavagnino, P. Giubellino, P. De Remigis, A. Werbrouck, G.<br />
Alberici, G. Mazza, A. Rivetti, F. Tosello, “Zero suppression and<br />
Data Compression for SDD Output in the ALICE Experiment”,<br />
Internal note/SDD, ALICE-INT-1999-28 V 1.0.<br />
[20] P. Moreira, J. Christiansen, A. Marchioro, E. van der Bij, K.<br />
Kloukinas, M. Campbell, G. Cervelli, “A 1.25 Gbit/s Serializer<br />
for LHC Data and Trigger Optical Links”, Fifth Workshop on<br />
Electronics for LHC Experiments, September 20-24, 1999, pages<br />
194-198.<br />
[21] F. Wang, “BIST using pseudorandom test vectors and signature<br />
analysis”, IEEE 1988 Custom Integrated Circuits Conference,<br />
CH2584-1/88/0000-0095.<br />
[22] T.W. Williams, W. Daehn, “Aliasing errors in multiple input signature<br />
analysis registers”, 1989 IEEE, CH2696-3/89/0000/0338.<br />
[23] M. Misiti, Y. Misiti, G. Oppenheim and J. M. Poggi, “Wavelet<br />
Toolbox User’s Guide”, The MathWorks, Inc., Natick, 2000.<br />
[24] “Simulink User’s Guide: Dynamic System Simulation for Matlab”,<br />
The MathWorks, Inc., Natick, 2000.<br />
[25] “Fixed-Point Blockset User’s Guide: for Use with Simulink”, The<br />
MathWorks, Inc., Natick, 2000.<br />
163