14.07.2013 Views

hardware implementation of data compression ... - INFN Bologna

hardware implementation of data compression ... - INFN Bologna

hardware implementation of data compression ... - INFN Bologna

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

UNIVERSITÀ DEGLI STUDI DI BOLOGNA<br />

FACOLT À DI SCIENZE MATEMATICHE FISICHE E NATURALI<br />

DOTTORATO DI RICERCA IN FISICA XIV ciclo<br />

HARDWARE IMPLEMENTATION OF<br />

DATA COMPRESSION ALGORITHMS<br />

IN THE ALICE EXPERIMENT<br />

Tesi di Dottorato<br />

di:<br />

Dott. Davide Falchieri<br />

Anno Accademico 2000/2001<br />

Tutori:<br />

Pr<strong>of</strong>. Maurizio Basile<br />

Pr<strong>of</strong>. Enzo Gandolfi<br />

Coordinatore:<br />

Pr<strong>of</strong>. Giovanni Venturi


UNIVERSITÀ DEGLI STUDI DI BOLOGNA<br />

FACOLT À DI SCIENZE MATEMATICHE FISICHE E NATURALI<br />

DOTTORATO DI RICERCA IN FISICA XIV ciclo<br />

HARDWARE IMPLEMENTATION OF<br />

DATA COMPRESSION ALGORITHMS<br />

IN THE ALICE EXPERIMENT<br />

Tesi di Dottorato<br />

di:<br />

Dott. Davide Falchieri<br />

Tutori:<br />

Pr<strong>of</strong>. Maurizio Basile<br />

Pr<strong>of</strong>. Enzo Gandolfi<br />

Coordinatore:<br />

Pr<strong>of</strong>. Giovanni Venturi<br />

Parole chiave: ALICE, <strong>data</strong> <strong>compression</strong>, CARLOS, wavelets, VHDL<br />

Anno Accademico 2000/2001


Contents<br />

Introduction ix<br />

1 The ALICE experiment 1<br />

1.1 The Inner Tracking System . . . . . . . . . . . . . . . . . . . 2<br />

1.1.1 Tracking in ALICE . . . . . . . . . . . . . . . . . . . . 3<br />

1.1.2 Physics <strong>of</strong> the ITS . . . . . . . . . . . . . . . . . . . . 4<br />

1.1.3 Layout <strong>of</strong> the ITS . . . . . . . . . . . . . . . . . . . . . 6<br />

1.2 Design <strong>of</strong> the drift layers . . . . . . . . . . . . . . . . . . . . . 8<br />

1.3 The SDDs (Silicon Drift Detectors) . . . . . . . . . . . . . . . 10<br />

1.4 SDD readout system . . . . . . . . . . . . . . . . . . . . . . . 12<br />

1.4.1 Front-end module . . . . . . . . . . . . . . . . . . . . . 14<br />

1.4.2 Event-buffer strategy . . . . . . . . . . . . . . . . . . . 17<br />

1.4.3 End-ladder module . . . . . . . . . . . . . . . . . . . . 18<br />

1.4.4 Choice <strong>of</strong> the technology . . . . . . . . . . . . . . . . . 19<br />

2 Data <strong>compression</strong> techniques 21<br />

2.1 Applications <strong>of</strong> <strong>data</strong> <strong>compression</strong> . . . . . . . . . . . . . . . . 22<br />

2.2 Remarks on information theory . . . . . . . . . . . . . . . . . 23<br />

2.3 Compression techniques . . . . . . . . . . . . . . . . . . . . . 24<br />

2.3.1 Lossless <strong>compression</strong> . . . . . . . . . . . . . . . . . . . 25<br />

2.3.2 Lossy <strong>compression</strong> . . . . . . . . . . . . . . . . . . . . 25<br />

2.3.3 Measures <strong>of</strong> performance . . . . . . . . . . . . . . . . . 25<br />

2.3.4 Modelling and coding . . . . . . . . . . . . . . . . . . . 26<br />

2.4 Lossless <strong>compression</strong> techniques . . . . . . . . . . . . . . . . . 27<br />

2.4.1 Huffman coding . . . . . . . . . . . . . . . . . . . . . . 27<br />

v


vi<br />

CONTENTS<br />

2.4.2 Run Length encoding . . . . . . . . . . . . . . . . . . . 31<br />

2.4.3 Differential encoding . . . . . . . . . . . . . . . . . . . 32<br />

2.4.4 Dictionary techniques . . . . . . . . . . . . . . . . . . . 33<br />

2.4.5 Selective readout . . . . . . . . . . . . . . . . . . . . . 34<br />

2.5 Lossy <strong>compression</strong> techniques . . . . . . . . . . . . . . . . . . 35<br />

2.5.1 Zero supression . . . . . . . . . . . . . . . . . . . . . . 35<br />

2.5.2 Transform coding . . . . . . . . . . . . . . . . . . . . . 36<br />

2.5.3 Subband coding . . . . . . . . . . . . . . . . . . . . . . 41<br />

2.5.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

2.6 Implementation <strong>of</strong> <strong>compression</strong> algorithms . . . . . . . . . . . 51<br />

3 1D <strong>compression</strong> algorithm and <strong>implementation</strong>s 55<br />

3.1 Compression algorithms for SDD . . . . . . . . . . . . . . . . 55<br />

3.2 1D <strong>compression</strong> algorithm . . . . . . . . . . . . . . . . . . . . 56<br />

3.3 1D algorithm performances . . . . . . . . . . . . . . . . . . . . 58<br />

3.3.1 Compression coefficient . . . . . . . . . . . . . . . . . . 59<br />

3.3.2 Reconstruction error . . . . . . . . . . . . . . . . . . . 60<br />

3.4 CARLOS v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.4.1 Board description . . . . . . . . . . . . . . . . . . . . . 62<br />

3.4.2 CARLOS v1 design flow . . . . . . . . . . . . . . . . . 65<br />

3.4.3 Functions performed by CARLOS v1 . . . . . . . . . . 67<br />

3.4.4 Tests performed on CARLOS v1 . . . . . . . . . . . . 68<br />

3.5 CARLOS v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

3.5.1 The firstcheck block . . . . . . . . . . . . . . . . . . . 71<br />

3.5.2 The barrel shifter block . . . . . . . . . . . . . . . . . . 72<br />

3.5.3 The fifo block . . . . . . . . . . . . . . . . . . . . . . . 73<br />

3.5.4 The event-counter block . . . . . . . . . . . . . . . . . 75<br />

3.5.5 The outmux block . . . . . . . . . . . . . . . . . . . . 76<br />

3.5.6 The feesiu (toplevel) block . . . . . . . . . . . . . . . . 81<br />

3.5.7 CARLOS-SIU interface . . . . . . . . . . . . . . . . . . 82<br />

3.6 CARLOS v2 design flow . . . . . . . . . . . . . . . . . . . . . 87<br />

3.7 Tests performed on CARLOS v2 . . . . . . . . . . . . . . . . . 89


CONTENTS<br />

4 2D <strong>compression</strong> algorithm and <strong>implementation</strong> 91<br />

4.1 2D <strong>compression</strong> algorithm . . . . . . . . . . . . . . . . . . . . 91<br />

4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

4.1.2 How the 2D algorithm works . . . . . . . . . . . . . . . 95<br />

4.1.3 Compression coefficient . . . . . . . . . . . . . . . . . . 96<br />

4.1.4 Reconstruction error . . . . . . . . . . . . . . . . . . . 97<br />

4.2 CARLOS v3 vs. the previous prototypes . . . . . . . . . . . . 98<br />

4.3 The final readout architecture . . . . . . . . . . . . . . . . . . 101<br />

4.4 CARLOS v3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

4.5 CARLOS v3 building blocks . . . . . . . . . . . . . . . . . . . 103<br />

4.5.1 The channel block . . . . . . . . . . . . . . . . . . . . 105<br />

4.5.2 The encoder block . . . . . . . . . . . . . . . . . . . . 105<br />

4.5.3 The barrel15 block . . . . . . . . . . . . . . . . . . . . 107<br />

4.5.4 The fifonew32x15 block . . . . . . . . . . . . . . . . . 108<br />

4.5.5 The channel-trigger block . . . . . . . . . . . . . . . . 111<br />

4.5.6 The ttc-rx-interface block . . . . . . . . . . . . . . . . 112<br />

4.5.7 The fifo-trigger block . . . . . . . . . . . . . . . . . . . 112<br />

4.5.8 The event-counter block . . . . . . . . . . . . . . . . . 113<br />

4.5.9 The outmux block . . . . . . . . . . . . . . . . . . . . 113<br />

4.6<br />

4.5.10 The trigger-interface block . . . . . . . . . . . . . . . . 116<br />

4.5.11 The cmcu block . . . . . . . . . . . . . . . . . . . . . . 117<br />

4.5.12 The pattern-generator block . . . . . . . . . . . . . . . 119<br />

4.5.13 The signature-maker block . . . . . . . . . . . . . . . . 121<br />

Digital design flow for CARLOS v3 . . . . . . . . . . . . . . . 122<br />

4.7 CARLOS layout features . . . . . . . . . . . . . . . . . . . . . 123<br />

5 Wavelet based <strong>compression</strong> algorithm 125<br />

5.1 Wavelet based <strong>compression</strong> algorithm . . . . . . . . . . . . . . 126<br />

5.1.1 Configuration parameters <strong>of</strong> the multiresolution algorithm<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 129<br />

5.2 Multiresolution algorithm optimization . . . . . . . . . . . . . 129<br />

5.2.1 The Wavelet Toolbox from Matlab . . . . . . . . . . . 130<br />

5.2.2 Choice <strong>of</strong> the filters . . . . . . . . . . . . . . . . . . . . 131<br />

vii


viii<br />

CONTENTS<br />

5.2.3 Choice <strong>of</strong> the dimensionality, number <strong>of</strong> levels and threshold<br />

value . . . . . . . . . . . . . . . . . . . . . . . . . . 139<br />

5.3 Choice <strong>of</strong> the architecture . . . . . . . . . . . . . . . . . . . . 141<br />

5.3.1 Simulink and the Fixed-Point Blockset . . . . . . . . . 141<br />

5.3.2 Choice <strong>of</strong> the architecture . . . . . . . . . . . . . . . . 143<br />

5.4 Multiresolution algorithm performances . . . . . . . . . . . . . 149<br />

5.5 Hardware <strong>implementation</strong> . . . . . . . . . . . . . . . . . . . . 151<br />

Conclusions 159<br />

Bibliography 161


Introduction<br />

This thesis work has been aimed at the <strong>hardware</strong> <strong>implementation</strong> <strong>of</strong> <strong>data</strong><br />

<strong>compression</strong> algorithms to be applied to High Energy Physics Experiments.<br />

The amount <strong>of</strong> <strong>data</strong> that will be produced by LHC experiments at CERN<br />

is <strong>of</strong> the order <strong>of</strong> magnitude <strong>of</strong> 1 GByte/s. Cost constraints on magnetic<br />

tapes and <strong>data</strong> acquisition systems (optical fibres, readout boards) require<br />

to apply on-line <strong>data</strong> <strong>compression</strong> on the front-end electronics <strong>of</strong> the different<br />

detectors. This leads to the search <strong>of</strong> the <strong>compression</strong> algorithms allowing to<br />

achieve a high <strong>compression</strong> ratio, while keeping low the value <strong>of</strong> the reconstruction<br />

error. In fact a high <strong>compression</strong> coefficient can only be achieved<br />

at the expense <strong>of</strong> some loss on the physical <strong>data</strong>.<br />

The thesis contains the description <strong>of</strong> the <strong>hardware</strong> <strong>implementation</strong> <strong>of</strong> <strong>compression</strong><br />

algorithms applied to the ALICE experiment for what concerns the<br />

SDD (Silicon Drift Detector) readout chain. The total amount <strong>of</strong> <strong>data</strong> produced<br />

by SDDs is 32.5 MBytes per event, while the reserved space on magnetic<br />

tapes for permanent storage is 1.5 MBytes. This means that the <strong>compression</strong><br />

coefficient has to be at least 22. Beside that, since the p-p interaction<br />

rate is 1000 Hz, <strong>data</strong> <strong>compression</strong> <strong>hardware</strong> has to complete its job within 1<br />

ms. This leads to the search for high performances <strong>compression</strong> algorithms<br />

for what concerns both <strong>compression</strong> ratio and execution speed.<br />

The thesis contains a description <strong>of</strong> the design and <strong>implementation</strong> <strong>of</strong> 3<br />

prototypes <strong>of</strong> the ASIC CARLOS (Compression And Run Length encOding<br />

Subsystem) which deals with the on-line <strong>data</strong> <strong>compression</strong>, packing and<br />

transmission to the standard ALICE <strong>data</strong> acquisition system. CARLOS v1<br />

and v2 contain a uni-dimensional <strong>compression</strong> algorithm based on threshold,<br />

run length encoding, differential encoding and Huffman coding techniques.<br />

ix


x<br />

Introduction<br />

CARLOS v3 was meant to contain a bi-dimensional <strong>compression</strong> algorithm<br />

that obtains a better <strong>compression</strong> ratio than 1D with a lower physical <strong>data</strong><br />

loss. Nevertheless, for time reasons, the design <strong>of</strong> CARLOS v3 sent to the<br />

foundy contains a simple 1D look-up table based <strong>compression</strong> algorithm. The<br />

2D algorithm is about to be implemented in the next prototype, which should<br />

be the final version <strong>of</strong> CARLOS. The first two prototypes have been tested<br />

with good results; the third one is in realization phase up to now and its test<br />

will begin from February 2002.<br />

Beside that, the thesis contains a detailed study <strong>of</strong> a wavelet-based <strong>compression</strong><br />

algorithm, which obtains encouraging results for what concerns both<br />

<strong>compression</strong> ratio and reconstruction error. The algorithm may find a suitable<br />

application as a second level compressor on SDD <strong>data</strong> in the case that<br />

it might become necessary to switch <strong>of</strong>f the <strong>compression</strong> algorithm implemented<br />

on CARLOS.<br />

The thesis is structured in the following way:<br />

• Chapter 1 contains a description <strong>of</strong> the ALICE experiment, especially<br />

for what concerns the SDD readout architecture.<br />

• Chapter 2 contains an introduction to standard <strong>compression</strong> algorithms.<br />

• Chapter 3 contains a description <strong>of</strong> the 1D algorithm developed at the<br />

<strong>INFN</strong> Section <strong>of</strong> Torino and the two prototypes CARLOS v1 and v2.<br />

• Chapter 4 focuses on the 2D <strong>compression</strong> algorithm and on the design<br />

and <strong>implementation</strong> <strong>of</strong> the prototype CARLOS v3.<br />

• Chapter 5 contains a description <strong>of</strong> a wavelet-based <strong>compression</strong> algorithm<br />

especially tuned to reach high performances on SDD <strong>data</strong> and<br />

its possible application to a second level compressor in counting room.


Chapter 1<br />

The ALICE experiment<br />

ALICE (A Large Ion Collider Experiment) [1] is an experiment at the Large<br />

Hadron Collider (LHC) [2] optimized for the study <strong>of</strong> heavy-ion collisions,<br />

at a centre-<strong>of</strong>-mass energy <strong>of</strong> 5.5 TeV per nucleon. The main aim <strong>of</strong> the<br />

experiment is to study in details the behaviour <strong>of</strong> nuclear matter at high<br />

densities and temperatures, in view <strong>of</strong> probing deconfinment and chiral symmetry<br />

restoration.<br />

The detector [1, 3] consists essentially <strong>of</strong> two main components: the central<br />

part, composed <strong>of</strong> detectors mainly devoted to the study <strong>of</strong> hadronic signals<br />

and dielectrons, and the forward muon spectrometer, devoted to the study<br />

<strong>of</strong> quarkonia behaviour in dense matter. The layout <strong>of</strong> the ALICE set-up is<br />

shown in Fig. 1.1.<br />

A major technical challenge is imposed by the large number <strong>of</strong> particles created<br />

in the collisions <strong>of</strong> lead ions. There is a considerable spread in the<br />

currently available predictions for the multiplicity <strong>of</strong> charged particles produced<br />

in a central Pb-Pb collision. The design <strong>of</strong> the experiment has been<br />

based on the highest value, 8000 charged particles per unit <strong>of</strong> rapidity, at<br />

midrapidity. This multiplicity dictates the granularity <strong>of</strong> the detectors and<br />

their optimal distance from the colliding beams. The central part, which<br />

covers ±45◦ (η ≤ 0.9) over the full azimuth, is embedded in a large magnet<br />

with a weak solenoidal field. Outside <strong>of</strong> the Inner Tracking System (ITS),<br />

there are a cylindrical TPC (Time Projection Chamber) and a large area PID<br />

array <strong>of</strong> time-<strong>of</strong>-flight (TOF) counters. In addition, there are two small-area<br />

1


2<br />

The ALICE experiment<br />

Figure 1.1: Longitudinal section <strong>of</strong> the ALICE detector<br />

single-arm detectors: an electromagnetic calorimeter (Photon Spectrometer,<br />

PHOS) and an array <strong>of</strong> RICH counters optimized for high-momentum inclusive<br />

particle identification (HMPID).<br />

My thesis work has been focused on <strong>data</strong> coming from one <strong>of</strong> the three detectors<br />

forming the ITS, the Silicon Drift Detector (SDD).<br />

1.1 The Inner Tracking System<br />

The basic functions <strong>of</strong> the ITS [4] are:<br />

• determination <strong>of</strong> the primary vertex and <strong>of</strong> the secondary vertices necessary<br />

for the reconstruction <strong>of</strong> charm and hyperon decays;<br />

• particle identification and tracking <strong>of</strong> low-momentum particles;<br />

• improvement <strong>of</strong> the momentum and angle measurements <strong>of</strong> the TPC.


1.1 — The Inner Tracking System<br />

1.1.1 Tracking in ALICE<br />

Track finding in heavy-ion collisions at the LHC presents a big challenge,<br />

because <strong>of</strong> the extremely high track density. In order to achieve<br />

a high granularity and a good two-track separation, ALICE uses threedimensional<br />

hit information, wherever feasible, with many points on<br />

each track and a weak magnetic field. The ionization density <strong>of</strong> each<br />

track is measured for particle identification. The need for a large number<br />

<strong>of</strong> points on each track has led to the choice <strong>of</strong> a TPC as the main<br />

tracking system. In spite <strong>of</strong> its drawbacks, concerning speed and <strong>data</strong><br />

volume, only this device can provide reliable performance for a large<br />

volume at up to 8000 charged particles per unit <strong>of</strong> rapidity. The minimum<br />

possible inner radius <strong>of</strong> the TPC (rin =90cm)isgivenbythe<br />

maximum acceptable hit density. The outer radius (rout = 250 cm)<br />

is determined by the minimum length required for a dE/dx resolution<br />

better than 10 %. At smaller radii, and hence larger track densities,<br />

tracking is taken over by the ITS.<br />

The ITS consists <strong>of</strong> six cylindrical layers <strong>of</strong> silicon detectors. The number<br />

and position <strong>of</strong> the layers are optimized for efficient track finding<br />

and impact parameter resolution. In particular, the outer radius is<br />

determined by the track matching with the TPC, and the inner one<br />

is the minimum compatible with the radius <strong>of</strong> the beam pipe (3 cm).<br />

The silicon detectors feature the high granularity and excellent spatial<br />

precision required.<br />

Because <strong>of</strong> the high particle density, up to 90 cm−2 , the four innermost<br />

layers (r ≤ 24 cm) must be truly two-dimensional devices. For<br />

this task, silicon pixel and silicon drift detectors were chosen. The<br />

outer two layers at r = 45 cm, where the track densities are below<br />

1 cm−2 , are equipped with double-sided silicon micro-strip detectors.<br />

With the exception <strong>of</strong> the two innermost pixel planes, all layers have<br />

analog readout for particle identification via a dE/dx measurement<br />

in the non-relativistic region. This gives the inner tracking system a<br />

stand-alone capability as a low-pt particle spectrometer.<br />

3


4<br />

The ALICE experiment<br />

1.1.2 Physics <strong>of</strong> the ITS<br />

The ITS will contribute to the track reconstruction by improving the<br />

momentum resolution obtained by the TPC. This will be beneficial for<br />

practically all physics topics which will be addressed by the ALICE experiment.<br />

The global event features will be studied by measuring the<br />

multiplicity distributions and the inclusive particle spectra. For the<br />

study <strong>of</strong> resonance production (ρ, ω and φ), and, more important, the<br />

behaviour <strong>of</strong> the mass and width <strong>of</strong> these mesons in the dense medium,<br />

the momentum resolution is even more important. We have to achieve<br />

a mass precision comparable to, or better than, the natural width <strong>of</strong><br />

the resonances in order to observe changes <strong>of</strong> their parameters caused<br />

by chiral symmetry restoration. Also the mass resolution for heavy<br />

states, like D mesons, J/ψ and Υ, will be better, thus improving the<br />

signal-to-background ratio in the measurement <strong>of</strong> the open charm production,<br />

and in the study <strong>of</strong> heavy-quarkonia suppression. Improved<br />

momentum resolution will enhance the performances in the observation<br />

<strong>of</strong> another hard phenomenon, the jet production and predicted jet<br />

quenching, i.e. the energy loss <strong>of</strong> partons in strongly interacting dense<br />

matter.<br />

The low-momentum particles (below 100 MeV/c) will be detectable<br />

only by the ITS. This is <strong>of</strong> interest in itself, because it widens the momentum<br />

range for the measurement <strong>of</strong> particle spectra, which allows<br />

collective effects associated with the large length scales to be studied.<br />

In addition, a low-pt cut-<strong>of</strong>f is essential to suppress the s<strong>of</strong>t gamma<br />

conversions and the background in the electron-pair spectrum due to<br />

Dalitz pairs. Also the PID capabilities <strong>of</strong> the ITS in the non-relativistic<br />

(1/β2 ) region will therefore be <strong>of</strong> great help.<br />

In addition to the improved momentum resolution, which is necessary<br />

for the identical particle interferometry, especially at low momenta, the<br />

ITS will contribute to this study through an excellent double-hit resolution<br />

enabling the separation <strong>of</strong> tracks with close momenta. In order<br />

to be able to study particle correlations in the three components <strong>of</strong>


1.1 — The Inner Tracking System<br />

their relative momenta, and hence to get information about the space<br />

time evolution <strong>of</strong> the system produced in heavy-ion collisions at the<br />

LHC, we need sufficient angular resolution in the measurement <strong>of</strong> the<br />

particle’s direction. Two <strong>of</strong> the three components <strong>of</strong> the relative momentum<br />

(the side and longitudinal ones) are crucially dependent on<br />

the precision with which the particle direction is known. The angular<br />

resolution is determined by the precise ITS measurements <strong>of</strong> the primary<br />

vertex position and <strong>of</strong> the first points on the tracks. The particle<br />

identification at low momenta will enhance the physics capability by<br />

allowing the interferometry <strong>of</strong> individual particle species as well as the<br />

study <strong>of</strong> non-identical particle correlations, the latter giving access to<br />

the emission time <strong>of</strong> different particles.<br />

The study <strong>of</strong> strangeness production is an essential part <strong>of</strong> the ALICE<br />

physics program. It will allow the level <strong>of</strong> chemical equilibration and<br />

the density <strong>of</strong> strange quarks in the system to be established. The measurement<br />

will be performed by charge kaon identification and hyperon<br />

detection, based on the ITS capability to recognize secondary vertices.<br />

The observation <strong>of</strong> multi-strange hyperons (Ξ − and Ω − ) is <strong>of</strong> particular<br />

interest, because they are unlikely to be produced during the hadronic<br />

rescattering due to the high-energy threshold for their production. In<br />

this way we can obtain information about the strangeness density <strong>of</strong><br />

the earlier stage <strong>of</strong> the collision.<br />

Open charm production in heavy-ion collisions is <strong>of</strong> great physics interest.<br />

Charmed quarks can be produced in the initial hard parton<br />

scattering and then only at the very early stages <strong>of</strong> the collision, while<br />

the energy in parton rescattering is above the charm production threshold.<br />

The charm yield is not altered later. The excellent performance <strong>of</strong><br />

the ITS in finding the secondary vertices close to the interaction point<br />

gives us the possibility to detect D mesons, by reconstructing the full<br />

decay topology.<br />

5


6<br />

The ALICE experiment<br />

Figure 1.2: ITS layers<br />

1.1.3 Layout <strong>of</strong> the ITS<br />

A general view <strong>of</strong> the ITS is shown in Fig. 1.2. The system consists<br />

<strong>of</strong> six cylindrical layers <strong>of</strong> coordinate-sensitive detectors, covering the<br />

central rapidity region (η ≤ 0.9) for vertices located within the length<br />

<strong>of</strong> the interaction diamond (2σ), i.e. 10.6 cm along the beam direction<br />

(z). The detectors and front-end electronics are held by lightweight<br />

carbon-fibre structures. The geometrical dimensions and the main features<br />

<strong>of</strong> the various layers <strong>of</strong> the ITS are summarized in Table 1.1.<br />

The granularity required for the innermost planes is achieved with<br />

silicon micro-pattern detectors with true two-dimensional readout: Silicon<br />

Pixel Detectors (SPD) and Silicon Drift Detectors (SDD). At larger<br />

radii, the requirements in terms <strong>of</strong> granularity are less stringent, therefore<br />

double-sided Silicon Strip Detectors (SSD) with a small stereo<br />

angle are used. Double-sided microstrips have been selected rather<br />

than single-sided ones because they introduce less material in the active<br />

volume. In addition they <strong>of</strong>fer the possibility to correlate the pulse<br />

height read out from the two sides, thus helping to resolve ambiguities<br />

inherent in the use <strong>of</strong> detectors with projective readout. The main<br />

parameters for each <strong>of</strong> the three detector types are: spatial precision,<br />

two-track resolution, pixel size, number <strong>of</strong> channels <strong>of</strong> an individual<br />

detector, total number <strong>of</strong> electronic channels are shown in Table 1.1.


1.1 — The Inner Tracking System<br />

Parameter Pixel Drift Strip<br />

Spatial precision rφ µm 12 38 20<br />

Spatial precision z µm70 28 830<br />

Two-track resolution rφ µm 100 200 300<br />

Two-track resolution z µm600 600 2400<br />

Cell size µm2 50 x 300 150 x 300 95 x 40000<br />

Active area mm2 13.8 × 82 72.5 × 75.3 73× 40<br />

Readout channels per module 65536 2 x 256 2 x 768<br />

Total number <strong>of</strong> modules 240 260 1770<br />

Total number <strong>of</strong> readout channels k 15729 133 2719<br />

Total number <strong>of</strong> cells M 15.7 34 2.7<br />

Average occupancy (inner layer) 1.5 2.5 4<br />

Average occupancy (outer layer) 0.4 1.0 3.3<br />

Table 1.1: Main features <strong>of</strong> ITS detectors<br />

The large number <strong>of</strong> channels in the layers <strong>of</strong> the ITS requires a large<br />

number <strong>of</strong> connections from the front-end electronics to the detector<br />

and to the <strong>data</strong> acquisition system. The requirement for a minimum <strong>of</strong><br />

material within the acceptance does not allow the use <strong>of</strong> conventional<br />

copper cables near the active surfaces <strong>of</strong> the detection system. Therefore<br />

Tape Automatic Bonded (TAB) aluminium multilayer microcables<br />

are used.<br />

The detectors and their front-end electronics produce a large amount<br />

<strong>of</strong> heat which has to be removed while keeping a very high degree <strong>of</strong><br />

temperature stability. In particular, the SDDs are sensitive to temperature<br />

variations in the 0.1 ◦C range. For these reasons, particular care<br />

was taken in the design <strong>of</strong> the cooling system and <strong>of</strong> the temperature<br />

monitoring. A water cooling system at room temperature is the chosen<br />

solution for all ITS layers, but the use <strong>of</strong> other liquid coolants is still<br />

being considered. For the temperature monitoring dedicated integrated<br />

circuits are mounted on the readout boards and specific calibration devices<br />

are integrated in the SDDs.<br />

The outer four layers <strong>of</strong> the ITS detectors are assembled onto a me-<br />

7


8<br />

The ALICE experiment<br />

Figure 1.3: SDD prototype: 1) active area, 2) guard area.<br />

chanical structure made <strong>of</strong> two end-cap cones connected by a cylinder<br />

placed between the SSD and the SDD layers. Both the cones and the<br />

cylinder are made <strong>of</strong> lightweight sandwiches <strong>of</strong> carbon-fibre plies and<br />

Rohacell TM . The carbon-fibre structure includes also the appropriate<br />

mechanical links to the TPC and to the SPD layers. The latter<br />

are assembled in two half-cylinder structures, specifically designed for<br />

safe installation around the beam pipe. The end-cap cones provide the<br />

cabling and cooling connection <strong>of</strong> the six ITS layers with the outside<br />

services.<br />

1.2 Design <strong>of</strong> the drift layers<br />

SDDs (a picture is shown in Fig. 1.3) have been selected to equip the<br />

two intermediate layers <strong>of</strong> the ITS, since they couple a very good multitrack<br />

capability with dE/dx information. At least three measured<br />

samples per track, and therefore at least four layers carrying dE/dx<br />

information are needed. The SDDs, 7.25 × 7.53 cm2 active area each,


1.2 — Design <strong>of</strong> the drift layers<br />

Figure 1.4: Longitudinal section <strong>of</strong> ITS layer 3andlayer 4<br />

will be mounted on linear structures called ladders, each holding six<br />

detectors for layer 3 and eight detectors for layer 4 (see Fig. 1.4).<br />

The layers will sit at the average radius <strong>of</strong> 14.9 and 23.8 cm from<br />

the beam pipe and will be composed <strong>of</strong> 14 and 22 ladders respectively.<br />

The front-end electronics will be mounted on rigid heat-exchanging hybrids,<br />

which in turn will be connected onto cooling pipes running along<br />

the ladder structure. The connections between the detectors and the<br />

front-end electronics, and between both and the ends <strong>of</strong> the ladders will<br />

be assured with flexible Al microcables, TAB bonded, which will carry<br />

both <strong>data</strong> and power supply lines. Each detector will be first assembled<br />

together with its front-end electronics and high-voltage connections as<br />

9


10<br />

The ALICE experiment<br />

n<br />

+<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

+<br />

+<br />

− +<br />

−−<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

p<br />

+<br />

Figure 1.5: Working mode <strong>of</strong> a SDD detector<br />

a unit, hereafter called a module, which will be fully tested before it is<br />

mounted on the ladder.<br />

1.3 The SDDs (Silicon Drift Detectors)<br />

SDDs, like gaseous drift detectors, exploit the measurement <strong>of</strong> the<br />

transport time <strong>of</strong> the charge deposited by a transversing particle to<br />

localize the impact point in two dimensions, thus enhancing resolution<br />

and multi-track capability at the expense <strong>of</strong> speed. They are therefore<br />

well suited to this experiment in which very high particle multiplicities<br />

are coupled with relatively low event rates (up to some KHz). A linear<br />

SDD, shown schematically in Fig. 1.5, has a series <strong>of</strong> parallel implanted<br />

p + field strips, connected to a voltage divider on both surfaces <strong>of</strong> the<br />

high-resistivity n-type silicon wafer. The voltage divider is integrated<br />

on the detector substrate itself. The field strips provide the bias voltage<br />

to fully deplete the volume <strong>of</strong> the detector and they generate an electrostatic<br />

field parallel to the wafer surface, thus creating a drift region<br />

(see Fig. 1.6). Electron-hole pairs are created by the charged particles<br />

crossing the detector. The holes are collected by the nearest p +<br />

electrode, while the electrons are focused into the middle plane <strong>of</strong> the<br />

detector and driven by the drift field towards the edge <strong>of</strong> the detector<br />

x<br />

z<br />

y


1.3 — The SDDs (Silicon Drift Detectors)<br />

Figure 1.6: Potential energy <strong>of</strong> electrons (negative electric potential) on<br />

the y-z plane <strong>of</strong> the device<br />

where they are collected by an array <strong>of</strong> anodes composed <strong>of</strong> n + pads.<br />

So far an electronic charge cloud drifts from the impact point to the anode<br />

region: the cloud shows a bell-shaped Gaussian distribution that,<br />

owing to the diffusion and mutual repulsion, during the drift becomes<br />

smaller and larger [5] (see Fig. 1.7). In this way a charge cloud can<br />

be collected by one or more anodes depending on the charge released<br />

by the ionizing particle and on the impact position with respect to the<br />

anode region. The small size <strong>of</strong> the anodes, and hence their small capacitance<br />

(50 fF), imply low noise and good energy resolution.<br />

The coordinate perpendicular to the drift direction is given by the centroid<br />

<strong>of</strong> the collected charge. The coordinate along the drift direction is<br />

measured by the centroid <strong>of</strong> the signal in the time domain, taking into<br />

account the amplifier response. A space precision, averaged over the<br />

full detector surface, better than 40 µm in both coordinates has been<br />

obtained during beam tests <strong>of</strong> full-size prototype detectors. Each SDD<br />

module is divided in two half-detectors: each half-detector contains on<br />

the external side 256 anodes at a distance <strong>of</strong> 300 µmfromeachanother.<br />

So far each SDD detector contains 2 x 256 readout channels: taking<br />

into account that the layer 3 and 4 contain 260 SDD modules, the total<br />

number <strong>of</strong> SDD readout channels is around 133k.<br />

11


12<br />

Time axis<br />

The ALICE experiment<br />

Drift<br />

Anode axis<br />

Figure 1.7: Charge distribution evolution scheme<br />

1.4 SDD readout system<br />

The system requirements for the SDD readout system derive from both<br />

the features <strong>of</strong> the detector and the ALICE experiment in general. The<br />

following points are crucial in the definition <strong>of</strong> the final readout system:<br />

– The signal generated by the SDD is a Gaussian shaped current<br />

signal, with variable sigma and charge (5-30 ns and 4 to 32 fC)<br />

and can be collected by one or more anodes. Therefore the frontend<br />

electronics should be able to handle analog signals in a wide<br />

dynamic range. Then, the system noise should be very low while<br />

being able to handle large signals.<br />

– The amount <strong>of</strong> <strong>data</strong> generated by the SDD is very large: each half<br />

detector has 256 anodes and for each anode 256 time samples have<br />

to be taken in order to cover the full drift length.<br />

– The small space available on the ladder and the constraints on<br />

material impose an architecture which minimizes cabling.<br />

– The radiation environment in which the front-end electronics has<br />

to work imposes the choice <strong>of</strong> a radiation tolerant technological


PASCAL<br />

AMBRA<br />

1.4 — SDD readout system<br />

SIU<br />

.<br />

.<br />

.<br />

End ladder module<br />

Front−end module<br />

SDD detectors<br />

Test and slow control<br />

CARLOS<br />

Figure 1.8: SDD ladder electronics<br />

library for the <strong>implementation</strong> <strong>of</strong> the electronics.<br />

The chosen SDD readout electronics, shown in Fig. 1.8, consists <strong>of</strong><br />

front-end modules and end-ladder modules. The front-end module performs<br />

analog <strong>data</strong> acquisition, A/D conversion and buffering, while the<br />

end-ladder module contains high voltage and low voltage regulators and<br />

a chip for <strong>data</strong> <strong>compression</strong> and interfacing the ALICE DAQ system.<br />

13


14<br />

The ALICE experiment<br />

Figure 1.9: The front-end readout unit<br />

1.4.1 Front-end module<br />

The front-end modules, one per half-detector, are distributed along the<br />

ladders together with the SDD modules. Each front-end module contains<br />

4 PASCAL (Preamplifier, Analog Storage and Conversion from<br />

Analog to digitaL) - AMBRA (A Multievent Buffer Readout Architecture)<br />

chips pairs, as shown in Fig. 1.9. The PASCAL chips are<br />

TAB-bonded directly on the SDD output anodes, while the AMBRA<br />

chips are connected to CARLOS (Compression And Run Length encOding<br />

Subsystem) via an 8-bit bus.<br />

Each PASCAL chip contains three functional blocks (see Fig. 1.10):<br />

– low noise preamplifiers (they are 64, one for each anode);<br />

– an analog memory working at a 40 MHz clock frequency (64×256<br />

cells);<br />

– 10-bit analog to digital converters ADC, (they are 64, one for each<br />

channel).<br />

During the write phase, i.e. when no trigger signal has been received,<br />

the preamplifiers continuosly write the samples into the analog memory


<strong>data</strong>_in[0]<br />

<strong>data</strong>_in[1]<br />

<strong>data</strong>_in[2]<br />

<strong>data</strong>_in[62]<br />

<strong>data</strong>_in[63]<br />

Preamplifiers<br />

pa_cal<br />

1.4 — SDD readout system<br />

Analog memory<br />

...<br />

...<br />

...<br />

...<br />

...<br />

Analog memory<br />

control unit<br />

Figure 1.10: PASCAL chip architecture<br />

A/D conversion, buffering and multiplexing<br />

ADC<br />

ADC<br />

ADC<br />

ADC<br />

ADC<br />

Interface control unit<br />

reset<br />

clock<br />

<strong>data</strong>_out<br />

start_op<br />

end_op<br />

write_req<br />

write_ack<br />

jtag_bus<br />

cells at 40 MHz, while the ADCs are in stand-by mode. When PAS-<br />

CAL receives a trigger signal from CARLOS (that receives it from the<br />

Central Trigger Processor, CTP) , a control logic module on the PAS-<br />

CAL chip stops the analog memory write phase, freezes its contents<br />

and starts the read phase, performed in two steps: in the first step the<br />

ADCs are set to sample mode and the analog memory reads out the<br />

first sample for each anode row; after the memory settling time, the<br />

ADCs switch to the conversion mode and analog <strong>data</strong> are converted<br />

to digital through a successive approximation technique. When the<br />

conversion is finished, the control logic module on PASCAL starts the<br />

15


16<br />

The ALICE experiment<br />

Input range Output codes Code mapping Bits lost<br />

0-127 from 128 to 128 0xxxxxxx 0<br />

128-255 from 128 to 32 100xxxxx 2<br />

256-511 from 256 to 32 101xxxxx 3<br />

512-1023 from 512 to 64 11xxxxxx 3<br />

Table 1.2: Digital <strong>compression</strong> from 10 to 8 bits<br />

readout <strong>of</strong> the next sample from the analog memory and, at the same<br />

time, sends the 64 digital words to the AMBRA chip using a 40-bit<br />

wide bus. The read phase goes on until all the analog memory content<br />

has been converted to digital values or an abort signal comes from<br />

CARLOS (again receiving it from the CTP), meaning that the event<br />

has to be discarded.<br />

The AMBRA chip has mainly two functions: first, AMBRA has to<br />

compress <strong>data</strong> from 10 to 8 bits per sample, then it has to store the<br />

input <strong>data</strong> stream into a digital buffer. The principle used for <strong>compression</strong><br />

is to decrease the resolution for larger signals with a logarithmic<br />

or square-root law using the mapping shown in Table 1.2. Since the<br />

larger signals have better signal to noise ratio than the smaller ones,<br />

the accuracy <strong>of</strong> the measurement is not affected.<br />

The 4 AMBRA chips are static RAM able to contain 256 KBytes,<br />

thus being able to temporarily store 4 half-SDD complete events (one<br />

event corresponds to 256 × 256 Bytes = 64 KBytes). Data read/write<br />

stages are allowed at the same time: so far while the PASCAL chips<br />

are transferring <strong>data</strong> to the AMBRA ones, the AMBRA chips can send<br />

<strong>data</strong> belonging to an other event to the CARLOS chip. Actually, since<br />

four AMBRA chips have to transmit <strong>data</strong> over a single 8-bit bus, an<br />

arbitration mechanism has been implemented.


1.4 — SDD readout system<br />

1.4.2 Event-buffer strategy<br />

The dead time due to the SDD readout system is around 358.4 µs: this<br />

is, in fact, the time needed for reading a cell <strong>of</strong> the analog memory and<br />

for converting it into a digital word, 1.4 µs, multiplied by the number<br />

<strong>of</strong> cells, 256. This means that a new trigger signal will not be accepted<br />

before 358.4 µs have passed after the previous event. Every 1.4 µs each<br />

detector produces 512 bytes <strong>of</strong> <strong>data</strong>, then at least 10 8-bit buses per<br />

detector working at 40 MHz are required for <strong>data</strong> transfer. Unfortunately<br />

the space on the ladder is very limited and managing 80 <strong>data</strong><br />

lines for each detector (for a total <strong>of</strong> 320 for the half-ladder) is a very<br />

serious problem, especially for the input connections to the end-ladder<br />

readout units.<br />

The adopted solution to insert a digital multi-event buffer on the frontend<br />

readout unit between PASCAL and CARLOS allows to send <strong>data</strong><br />

towards the end-ladder unit at a lower speed, in fact if an other event<br />

arrives while transmitting <strong>data</strong> from AMBRA to CARLOS, an other<br />

digital buffer on AMBRA is ready to accept <strong>data</strong> coming from PAS-<br />

CAL. Data is transferred from AMBRA to CARLOS using an 8-bit<br />

bus in 1.65 ms (25 ns x 64 Kwords) while other events are processed<br />

by PASCAL and sent to AMBRA. For an average Pb-Pb event rate <strong>of</strong><br />

40 Hz and using a double-event digital buffer, our simulations indicate<br />

that the dead time due to buffer overrun is only 0.1 % <strong>of</strong> the total time.<br />

This is the amount <strong>of</strong> time during which AMBRA is transferring <strong>data</strong><br />

to CARLOS and the other buffer in AMBRA is full: in this situation<br />

a BUSY signal is asserted towards the CTP, meaning that no further<br />

trigger can be accepted. In order to reach a much smaller amount <strong>of</strong><br />

dead time even with higher event rates, a decision was taken to have a<br />

4-buffer-deep AMBRA device.<br />

In order to allow the full testability <strong>of</strong> the readout electronics at the<br />

board and system levels, the ASICs embody a JTAG standard interface.<br />

In this way it is possible to test each chip after the various assembly<br />

stages and during the run phase in order to check correct functionality.<br />

17


18<br />

The ALICE experiment<br />

Layer Ladders Detectors/ladder Data/ladder Total <strong>data</strong><br />

3 14 6 768 KBytes 10.5 MBytes<br />

4 22 8 1 MByte 22 MBytes<br />

Both 32.5 MBytes<br />

Table 1.3: Total amount <strong>of</strong> <strong>data</strong> produced by SDDs<br />

The same interface is used to download control information into the<br />

chips.<br />

Radiation tolerant deep-submicron processes (0.25 µm) has been used<br />

for the final versions <strong>of</strong> the ASICs. These technologies are now available<br />

and allow us to reduce size and power consumption with no degradation<br />

<strong>of</strong> the signal processing speed. Moreover, it has been shown that they<br />

have a better resistance to radiation when specific layout techniques<br />

are used, if compared to commercially available technologies.<br />

1.4.3 End-ladder module<br />

The end-ladder modules are located at both ends <strong>of</strong> each ladder (2<br />

per ladder); they receive <strong>data</strong> from the front-end modules, perform<br />

<strong>data</strong> <strong>compression</strong> with the CARLOS chip and send <strong>data</strong> to the DAQ<br />

through an optical fibre link.<br />

Beside that, the end-ladder board will host the TTCrx device, a<br />

chip receiving the global clock and trigger signals from the CTP and<br />

distributing it to PASCAL, AMBRA and CARLOS, and the power regulators<br />

for the complete ladder system.<br />

CARLOS receives 8 <strong>data</strong> streams coming from 8 half-detectors, i.e.<br />

from one half-ladder, for a total volume <strong>of</strong> <strong>data</strong> <strong>of</strong> 64 KBytes × 8=<br />

512 KBytes, at a rate <strong>of</strong> 320 MByte/s in input. Taking into account the<br />

number <strong>of</strong> ladders and detectors per ladder (see Table 1.3), the total<br />

volume <strong>of</strong> <strong>data</strong> produced by all the SDD modules amounts to around<br />

22 MBytes per event, while the space reserved on disk for permanent<br />

storage is 1.5 MBytes. This implies to use a <strong>compression</strong> algorithm


1.4 — SDD readout system<br />

with a <strong>compression</strong> coefficient <strong>of</strong> at least 22 and a reconstruction error<br />

as low as possible, in order to minimize physical information loss.<br />

Moreover since the trigger rate in proton-proton interactions amounts<br />

to 1 KHz, each event should be compressed and sent to the DAQ system<br />

within 1 ms. Actually, thanks to the buffering provided by the<br />

AMBRA chips, this processing time doubles to 2 ms, thus relaxing the<br />

timing constraint on the CARLOS chip.<br />

These constraints led us to the design and <strong>implementation</strong> <strong>of</strong> a first<br />

prototype <strong>of</strong> CARLOS. Then the desire to have better <strong>compression</strong><br />

performances and changes in the readout architecture due to the presence<br />

<strong>of</strong> radiations led us to the design and <strong>implementation</strong> <strong>of</strong> other two<br />

CARLOS prototypes. We are now going to design CARLOS v4 that<br />

is intended to be the final version <strong>of</strong> the <strong>compression</strong> ASIC. The first<br />

3 prototypes <strong>of</strong> the device CARLOS are explained in details in chapters<br />

3 and 4, while chapter 2 contains a review <strong>of</strong> existent <strong>compression</strong><br />

techniques.<br />

1.4.4 Choice <strong>of</strong> the technology<br />

The effects <strong>of</strong> radiations on electronics circuits can be divided in total<br />

dose effects and single event effects (SEU) [6]. Total dose modifies the<br />

thresholds <strong>of</strong> MOS transistors and increases leakage currents. This is <strong>of</strong><br />

particular concern in leakage sensitive analog circuits, like analog memories.<br />

For instance, assuming for the storage capacitors in the memory<br />

a value <strong>of</strong> 1 pF, a leakage current as small as 1 nA would change the<br />

value <strong>of</strong> the stored information by 0.2 V in 200 µs. This is <strong>of</strong> course<br />

unacceptable.<br />

Radiation tolerant layout practices prevent this risk and their use in<br />

analog circuits is therefore recommended. These designs techniques become<br />

extremely effective in deep-submicron CMOS technologies. Single<br />

event effects can trigger latch-up phenomena or can change the value<br />

<strong>of</strong> digital bits (Single Event Upset). Latch-up can be prevented with<br />

the systematic use <strong>of</strong> guard rings in the layout. Single event upset can<br />

19


20<br />

The ALICE experiment<br />

be a problem especially when occurring in the digital control logic and<br />

can be prevented by layout techniques or by redundancy in the system.<br />

Radiation tolerant layouts have <strong>of</strong> course area penalties. It can<br />

be estimated that in a given technology a minimum size inverter with<br />

radiation tolerant layout is 70% bigger than the corresponding inverter<br />

with standard layout. Nevertheless, a radiation tolerant inverter in a<br />

quarter micron technology is about eight times smaller than a standard<br />

inverter in a 0.8 µm technology. The radiation dose which will be received<br />

by the readout electronics will be quite low, below 100 Krad in<br />

10 years. This value is probably below the limit <strong>of</strong> what a standard<br />

technology can afford; however conservative considerations suggested<br />

the use <strong>of</strong> radiation tolerant techniques for critical parts <strong>of</strong> the circuit.<br />

These techniques have been proven to work up to 30 MRad and allow<br />

a lower area penalty and lower cost compared with the radiation hard<br />

processes. So far the library chosen for the <strong>implementation</strong> <strong>of</strong> PAS-<br />

CAL, AMBRA and CARLOS chips is the 0.25 µm IBM technology<br />

with standard cells designed at CERN to be radiation tolerant.


Chapter 2<br />

Data <strong>compression</strong> techniques<br />

Data <strong>compression</strong> [7] is the art <strong>of</strong> science <strong>of</strong> representing information in<br />

a compact form. These compact representations are created by identifying<br />

and using structures that exist in the <strong>data</strong>. Data can be characters<br />

in a text file, numbers that are samples <strong>of</strong> speech or image waveforms<br />

or sequences <strong>of</strong> numbers that are generated by physical processes.<br />

Data <strong>compression</strong> plays an important role in many fields, for example<br />

in digital television signals transmission. If we wanted to transmit an<br />

HDTV (High Definition TeleVision) signal without any <strong>compression</strong>, we<br />

would need to transmit about 884 Mbits/s. Using <strong>data</strong> <strong>compression</strong>,<br />

we need to transmit less than 20 Mbits/s along with audio information.<br />

Compression is now very much a part <strong>of</strong> everyday life. If you use computers<br />

you are probably using a variety <strong>of</strong> products that make use <strong>of</strong><br />

<strong>compression</strong>. Most modems now have <strong>compression</strong> capabilities that allow<br />

to transmit <strong>data</strong> many times faster than otherwise possible. File<br />

<strong>compression</strong> utilities, that permit us to store more on our disks, are<br />

now commonplace.<br />

This chapter contains an introduction to <strong>data</strong> <strong>compression</strong> with a description<br />

<strong>of</strong> the most commonly used <strong>compression</strong> algorithms, with the<br />

aim <strong>of</strong> finding out the most suitable <strong>compression</strong> technique for physical<br />

<strong>data</strong> coming out from the SDD.<br />

21


22<br />

Data <strong>compression</strong> techniques<br />

2.1 Applications <strong>of</strong> <strong>data</strong> <strong>compression</strong><br />

An early example <strong>of</strong> <strong>data</strong> <strong>compression</strong> is the Morse code, developed<br />

by Samuel Morse in the mid-19th century. Letters sent by telegraph<br />

are encoded with dots and dashes. Morse noticed that certain letters<br />

occurred more <strong>of</strong>ten than others. In order to reduce the average time<br />

required to send a message, he assigned shorter sequences to letters that<br />

occur more frequently such as a (· −)ande (·) and longer sequences to<br />

letters that occur less frequently such as q (− −·−)orj (· −−−).<br />

What is being used to provide <strong>compression</strong> in the Morse code is the<br />

statistical structure <strong>of</strong> the message to compress, i.e. the message contains<br />

letters with a probability to occurr higher than others. So far<br />

most <strong>compression</strong> techniques exploit the input statistical structure to<br />

provide <strong>compression</strong>, but this is not the only kind <strong>of</strong> structure that<br />

exists in the <strong>data</strong>.<br />

There are many other kinds <strong>of</strong> structures in <strong>data</strong> <strong>of</strong> differents types that<br />

can be exploited for <strong>compression</strong>. Let us take speech as an example.<br />

When we speak, the physical construction <strong>of</strong> our voice box dictates the<br />

kinds <strong>of</strong> sounds that we can produce, that is the mechanics <strong>of</strong> speech<br />

production impose a structure on speech. Therefore, instead <strong>of</strong> transmitting<br />

the sampled speech itself we could send information about the<br />

conformation <strong>of</strong> the voice box, which could be used by the receiver to<br />

synthesize the speech. An adequate amount <strong>of</strong> information about the<br />

conformation <strong>of</strong> the voice box can be represented much more compactly<br />

than the sampled values <strong>of</strong> the speech. This <strong>compression</strong> approach is<br />

being used currently in a number <strong>of</strong> applications, including transmission<br />

<strong>of</strong> speech over mobile radios and the synthetic voice in toys that<br />

speak.<br />

Data <strong>compression</strong> can also take advantage <strong>of</strong> some redundant structure<br />

<strong>of</strong> the input signal, that is a structure containing more information than<br />

needed. For example if a sound has to be transmitted for being heard<br />

by a human being, all frequencies below 20 Hz and above 20 KHz<br />

can be eliminated (thus providing <strong>compression</strong>) since these frequencies


2.2 — Remarks on information theory<br />

cannnot be perceived by humans.<br />

2.2 Remarks on information theory<br />

Without going into details we just want to recall Shannon’s theorem [8].<br />

He defines the information contents <strong>of</strong> a message in the following way:<br />

given a message which is made up <strong>of</strong> N characters in total containing<br />

n different symbols, the information contents measured in bits <strong>of</strong> the<br />

message is the following:<br />

n<br />

I = N (−pilog(pi)) (2.1)<br />

i=1<br />

where pi is the occurrence probability <strong>of</strong> symbol i.<br />

What is regarded as a symbol depends on the application: it might be<br />

an ASCII code, 16 or 32 bit words, words in a text and so on.<br />

A practical illustration <strong>of</strong> the Shannon theorem is the following: let<br />

us assume to measure a charge or any other physical quantity using<br />

an 8-bit digitizer. Very <strong>of</strong>ten measured quantities will be distributed<br />

approximately exponentially. Let us assume that the mean value <strong>of</strong><br />

the statistical distribution is one tenth <strong>of</strong> the dynamic range, i.e. 25.6.<br />

Each value between 0 and 255 is regarded as a symbol. Applying the<br />

−(i+0.5)<br />

e 25.6<br />

Shannon’s formula with n = 256 and pi = we obtain a mean<br />

25.6<br />

information content I/N <strong>of</strong> 6.11 bits per measured value which is almost<br />

25% less than the 8 bits we need saving the <strong>data</strong> as a sequence<br />

<strong>of</strong> bytes. Even if we had increased the dynamic range by a factor <strong>of</strong> 4<br />

using a 10-bit ADC, it turns out that the mean information contents<br />

expressed as the number <strong>of</strong> bits per measurement would have been virtually<br />

the same and hence the possible <strong>compression</strong> gain even higher<br />

(39%). This might be surprising but considering that an exponential<br />

distribution delivers a value beyond ten times the mean only every e10 = 22026 samples, it is clear that even using a quite long code for such<br />

measurements cannot have an appreciable influence on the <strong>compression</strong><br />

23


24<br />

Data <strong>compression</strong> techniques<br />

rates. Considering that with all likelihood in a realistic architecture we<br />

would have had to expand the 10 bits to 16, the gain is impressive 62%<br />

in the latter case.<br />

The exponential distribution is a good approximation <strong>of</strong> the raw <strong>data</strong> in<br />

many cases and in particular for <strong>data</strong> coming out from the SDD. Comparing<br />

various probability distributions with the same RMS it seems<br />

that the exponential distribution is particularly hard to compress. For<br />

instance a discrete spectrum being distributed according to a Gaussian<br />

with the same RMS as the above exponential only has an information<br />

contents <strong>of</strong> 4.75 bits.<br />

2.3 Compression techniques<br />

When we speak <strong>of</strong> a <strong>compression</strong> technique or a <strong>compression</strong> algorithm<br />

we actually refer to two algorithms: the first one takes an input X<br />

and generates a representation XC that requires fewer bits; the second<br />

one is a reconstruction algorithm that operates on the compressed<br />

representation XC to generate the reconstruction Y . Based upon the<br />

requirements <strong>of</strong> reconstruction, <strong>data</strong> <strong>compression</strong> schemes can be divided<br />

into two broad classes:<br />

– lossless <strong>compression</strong> schemes, in which Y is identical to X;<br />

– lossy <strong>compression</strong> schemes, which generally provide much higher<br />

<strong>compression</strong> than lossless ones, but force Y to be different from<br />

X.<br />

In fact Shannon showed that the best performance achievable by a<br />

lossless <strong>compression</strong> algorithm is to encode a stream with an average<br />

number <strong>of</strong> bits equal to the I/N value. On the contrary lossy algorithms<br />

do not have upper bounds to the <strong>compression</strong> ratio.


2.3 — Compression techniques<br />

2.3.1 Lossless <strong>compression</strong><br />

Lossless <strong>compression</strong> techniques involve no loss <strong>of</strong> information. If <strong>data</strong><br />

have been losslessly compressed, the original <strong>data</strong> can be recovered<br />

exactly from the compressed <strong>data</strong>. Lossless <strong>compression</strong> is generally<br />

used for discrete <strong>data</strong>, such as text, computer-generated <strong>data</strong> and some<br />

kind <strong>of</strong> image and video information. There are many situations that<br />

require <strong>compression</strong> where we want the reconstruction to be identical<br />

to the original. There are also a number <strong>of</strong> situations in which it is<br />

possible to relax this requirement in order to get more <strong>compression</strong>: in<br />

these cases lossy <strong>compression</strong> techniques have to be used.<br />

2.3.2 Lossy <strong>compression</strong><br />

Lossy <strong>compression</strong> techniques involve some loss <strong>of</strong> information and <strong>data</strong><br />

that have been compressed using lossy techniques generally cannot be<br />

recovered or reconstructed exactly. In return for accepting distortion in<br />

the reconstruction, we can generally obtain much higher <strong>compression</strong><br />

ratios than it is possible with lossless <strong>compression</strong>. Whether the distortion<br />

introduced is acceptable or not depends on the specific application:<br />

for instance if the input source X contains a physical information plus<br />

noise, while the output Y contains only the physical signal, the distortion<br />

introduced is completely acceptable.<br />

2.3.3 Measures <strong>of</strong> performance<br />

A <strong>compression</strong> algorithm can be evaluated in a number <strong>of</strong> different<br />

ways. We could measure the relative complexity <strong>of</strong> the algorithm, the<br />

memory required to implement the algorithm, how fast the algorithm<br />

performs on a given machine or on dedicated <strong>hardware</strong>, the amount <strong>of</strong><br />

<strong>compression</strong> and how closely the reconstruction resembles the original.<br />

The last two features are the most important ones for our application<br />

to SDD <strong>data</strong>.<br />

25


26<br />

Data <strong>compression</strong> techniques<br />

A very logical way <strong>of</strong> measuring how well a <strong>compression</strong> algorithm compresses<br />

a given set <strong>of</strong> <strong>data</strong> is to look at the ratio <strong>of</strong> the number <strong>of</strong> bits<br />

required to represent the <strong>data</strong> before <strong>compression</strong> to the number <strong>of</strong> bits<br />

required to represent the <strong>data</strong> after <strong>compression</strong>. This ratio is called<br />

<strong>compression</strong> ratio. Suppose <strong>of</strong> storing an image made up <strong>of</strong> a square<br />

array <strong>of</strong> 256x256 8-bit pixels (exactly as a half SDD): it requires 64<br />

KBytes. If the compressed image requires only 16 KBytes we would<br />

then say that the <strong>compression</strong> ratio is 4.<br />

Another way <strong>of</strong> reporting <strong>compression</strong> performance is to provide the<br />

average number <strong>of</strong> bits required to represent a single sample. This is<br />

generally referred to as the rate. For instance, for the same image described<br />

above, the average number <strong>of</strong> bits per pixel in the compressed<br />

representation is 2: thus the rate is 2 bits/pixel.<br />

In lossy <strong>compression</strong> the reconstruction differs from the original <strong>data</strong>.<br />

Therefore, in order to determine the efficiency <strong>of</strong> a <strong>compression</strong> algorithm,<br />

we have to find some way to quantify the difference. The difference<br />

between the original <strong>data</strong> and the reconstructed ones is <strong>of</strong>ten<br />

called distortion. This value is usually calculated as a mathematical or<br />

percentual difference among <strong>data</strong> before and after <strong>compression</strong>.<br />

2.3.4 Modelling and coding<br />

The development <strong>of</strong> <strong>data</strong> <strong>compression</strong> algorithms for a variety <strong>of</strong> <strong>data</strong><br />

can be divided in two steps. The first phase is usually referred to<br />

as modelling. In this phase we try to extract information about any<br />

redundancy that exists in the <strong>data</strong> and describe the redundancy in the<br />

form <strong>of</strong> a model. The second phase is called coding. The description <strong>of</strong><br />

the model and a description <strong>of</strong> how the <strong>data</strong> differ from the model are<br />

encoded, generally using a binary alphabet.


2.4 — Lossless <strong>compression</strong> techniques<br />

2.4 Lossless <strong>compression</strong> techniques<br />

This section contains an explanation <strong>of</strong> the most widely used lossless<br />

<strong>compression</strong> techniques. In particular the following items are covered:<br />

– Huffman coding;<br />

– runlengthencoding;<br />

– differential encoding;<br />

– dictionary techniques;<br />

– selective readout.<br />

Some <strong>of</strong> these algorithms have been chosen for direct application in the<br />

1D <strong>compression</strong> algorithm implemented in the prototypes CARLOS v1<br />

and v2.<br />

2.4.1 Huffman coding<br />

Huffman based <strong>compression</strong> algorithm [7] encodes <strong>data</strong> samples in this<br />

way: symbols that occur more frequently (i.e. symbols having a higher<br />

probability <strong>of</strong> occurrence) will have shorter codewords than symbols<br />

that occurr less frequently. This leads to a variable-length coding<br />

scheme, in which each symbol can be encoded with a different number<br />

<strong>of</strong> bits. The choice <strong>of</strong> the code to assign to each symbol or, in other<br />

words, the design <strong>of</strong> the Huffman look-up table is carried out with standard<br />

criteria.<br />

An example can better explain this sentence. Suppose to have 5 <strong>data</strong>,<br />

a1, a2, a3, a4 and a5, each one with a probability <strong>of</strong> occurrence, P (a1) =<br />

0.2, P (a2) =0.4, P (a3) =0.2, P (a4) =0.1, P (a5) =0.1; at first, in<br />

order to write down the encoding c(ai) <strong>of</strong> each <strong>data</strong> ai, it is necessary<br />

to order <strong>data</strong> from the higher probable to the lower probable one, as<br />

shown in Tab. 2.1.<br />

27


28<br />

Data <strong>compression</strong> techniques<br />

Data Probability Code<br />

a2 0.4 c(a2)<br />

a1 0.2 c(a1)<br />

a3 0.2 c(a3)<br />

a4 0.1 c(a4)<br />

a5 0.1 c(a5)<br />

Table 2.1: Sample <strong>data</strong> and probability <strong>of</strong> occurrence<br />

The least probable <strong>data</strong> are a4 and a5; they are assigned the following<br />

codes:<br />

c(a4) = α1 ∗ 0 (2.2)<br />

c(a5) = α1 ∗ 1 (2.3)<br />

where α1 is a generic binary string and ∗ represents the concatenation<br />

between two strings.<br />

If a ′ 4 is a <strong>data</strong> for which the following relationship holds true P (a′ 4 )=<br />

P (a4)+P (a5) =0.2, then <strong>data</strong> in Tab. 2.1 can be reordered from the<br />

higher to the lower probable, as shown in Tab. 2.2.<br />

Data Probability Code<br />

a2 0.4 c(a2)<br />

a1 0.2 c(a1)<br />

a3 0.2 c(a3)<br />

a ′ 4 0.2 α1<br />

Table 2.2: Introduction <strong>of</strong> <strong>data</strong> a ′ 4<br />

In this table lower probability <strong>data</strong> are a3 and a ′ 4 : so far they can be<br />

encoded in the following way:<br />

c(a3) = α2 ∗ 0 (2.4)<br />

c(a ′ 4 ) = α2 ∗ 1 (2.5)<br />

Nevertheless, being c(a ′ 4 )=α1, from Tab. 2.2, then from (2.5) follows


2.4 — Lossless <strong>compression</strong> techniques<br />

that α1 = α2 ∗ 1, e then, (2.2) and (2.3) become:<br />

c(a4) = α2 ∗ 10 (2.6)<br />

c(a5) = α2 ∗ 11 (2.7)<br />

Defining a ′ 3 as the <strong>data</strong> for which P (a′ 3 )=P (a3)+P (a ′ 4 )=0.4, <strong>data</strong><br />

from Tab. 2.2 can be reordered from the higher probable to the lower<br />

probable as shown in Tab. 2.3.<br />

Data Probability Code<br />

a2 0.4 c(a2)<br />

a ′ 3 0.4 α2<br />

a1 0.2 c(a1)<br />

Table 2.3: Introduction <strong>of</strong> <strong>data</strong> a ′ 3<br />

In Tab. 2.3 lower probability <strong>data</strong> are a ′ 3 and a1; so far they can be<br />

encoded in the following way:<br />

c(a ′ 3 ) = α3 ∗ 0 (2.8)<br />

c(a1) = α3 ∗ 1 (2.9)<br />

Being c(a ′ 3 )=α2, from Tab. 2.3, then from (2.8) follows α2 = α3 ∗ 0,<br />

so far (2.4), (2.6) and (2.7), become:<br />

Finally, by defining a ′′<br />

3<br />

c(a3) = α3 ∗ 00 (2.10)<br />

c(a4) = α3 ∗ 010 (2.11)<br />

c(a5) = α3 ∗ 011 (2.12)<br />

as the <strong>data</strong> for which the following relationship<br />

holds true P (a ′′<br />

3 )=P (a′ 3 )+P (a1) =0.6, <strong>data</strong> from Tab. 2.3 can be<br />

reordered from the higher probable to the lower probable as shown in<br />

Tab. 2.4.<br />

29


30<br />

Data <strong>compression</strong> techniques<br />

Data Probability Code<br />

a ′′<br />

3 0.6 α3<br />

a2 0.4 c(a2)<br />

Table 2.4: Introduction <strong>of</strong> <strong>data</strong> a ′′<br />

3<br />

Only two <strong>data</strong> being left, the encoding is immediate:<br />

c(a ′′<br />

3) = 0 (2.13)<br />

c(a2) = 1 (2.14)<br />

Beside that, being c(a ′′<br />

3 )=α3, as shown in Tab. 2.4, then from (2.13)<br />

the following relationship becomes α3 = 0, i.e., (2.9), (2.10), (2.11) and<br />

(2.12), can be written as:<br />

c(a1) = 01 (2.15)<br />

c(a3) = 000 (2.16)<br />

c(a4) = 0010 (2.17)<br />

c(a5) = 0011 (2.18)<br />

Tab. 2.5 contains a complete view <strong>of</strong> the Huffman table so far generated.<br />

The method used for building the Huffman table in this example can<br />

be applied as it is to every <strong>data</strong> stream having whichever statistical<br />

structure. Huffman codes c(ai), so far generated, can be univoquely<br />

decoded: this means that from a sequence <strong>of</strong> variable length codes<br />

c(ai) created using the Huffman coding, only one <strong>data</strong> sequence ai can<br />

be reconstructed.<br />

Beside that, as shown in the example in Tab. 2.5, none <strong>of</strong> the codes<br />

c(ai) is contained as a prefix in the remaining codes; codes following<br />

this property are named prefix codes. In particular prefix codes also<br />

follow the property <strong>of</strong> being univoquely decodable, while the contrary<br />

does not always hold true.<br />

Finally an Huffman code is defined an optimum code since, among all<br />

the prefix codes, it is the one that minimizes the average code length.


2.4 — Lossless <strong>compression</strong> techniques<br />

Data Probability Code<br />

a2 0.4 1<br />

a1 0.2 01<br />

a3 0.2 000<br />

a4 0.1 0010<br />

a5 0.1 0011<br />

Table 2.5: Huffman table<br />

2.4.2 Run Length encoding<br />

Very <strong>of</strong>ten a <strong>data</strong> stream happens to contain long sequences <strong>of</strong> the<br />

same value: this may happen when a physical quantity holds the same<br />

value for several sampling periods, it can happen in text files where a<br />

character can be repeated several times, it can happen in digital images<br />

where spaces with the same color are encoded with pixels with the same<br />

value, and so on. The <strong>compression</strong> algorithm based on the Run Length<br />

[9] encoding is well suited for such repetitive <strong>data</strong>.<br />

As shown in the example in Fig. 2.1, where the zero symbol has been<br />

chosen as the repetitive <strong>data</strong> in the sequence, each zero sequence in the<br />

original sequence is encoded as a couple <strong>of</strong> words: the first contains<br />

the code for the zero symbol, the second contains the number <strong>of</strong> zero<br />

symbols consecutively occurred in the original sequence.<br />

The performances <strong>of</strong> the algorithm get better, in terms <strong>of</strong> <strong>compression</strong><br />

ratio, when the input <strong>data</strong> stream contains long sub-sequences <strong>of</strong> the<br />

same symbol and when it contains few single subsequences, such as the<br />

second code, 0→00, in Fig. 2.1. Finally this <strong>compression</strong> algorithm can<br />

be implemented in different ways: it can be applied only on one value<br />

<strong>of</strong> the original <strong>data</strong> sequence or on different elements <strong>of</strong> the sequence.<br />

One <strong>of</strong> the most important applications <strong>of</strong> the Run Length encoding<br />

system is the <strong>compression</strong> <strong>of</strong> facsimile or fax. In facsimile transmission a<br />

page is scanned and converted into a sequence <strong>of</strong> white and black pixels:<br />

since it is highly probable to have very long sequences <strong>of</strong> white or black<br />

pixels, coding the lengths <strong>of</strong> runs instead <strong>of</strong> coding individual pixels<br />

31


32<br />

Original sequence<br />

Run Length<br />

encoded sequence<br />

Data <strong>compression</strong> techniques<br />

17 8 54 0 0 0 97 5 16 0 45 23 0 0 0 0 43<br />

17 8 54 0 2 97 5 16 0 0 45 23 0 3 43<br />

Figure 2.1: Run length encoding<br />

leads to high <strong>compression</strong> ratios. Beside that Run Length encoding is<br />

<strong>of</strong>ten used in conjunction with other <strong>compression</strong> algorithms, after the<br />

input <strong>data</strong> stream has been transformed in a more compressible form.<br />

2.4.3 Differential encoding<br />

Differential encoding [7] is obtained performing the difference between<br />

one sample and the previous one, except for the first one, whose value<br />

is left unchanged, as shown in Fig. 2.2.<br />

It is to be noticed that each <strong>data</strong> <strong>of</strong> the original sequence can be reconstructed<br />

by summing to the corresponding <strong>data</strong> in the coded sequence<br />

all the previous <strong>data</strong>: for instance, 89 = 79+17+2+5+0+0+(−3)+<br />

(−6) + (−5). So far it is very important to leave the first value in the<br />

coded sequence unchanged, otherwise the reconstruction process cannot<br />

be carried out correctly. The differential algorithm is well suited<br />

for all <strong>data</strong> sequences with very small changes, in value, between consecutive<br />

samples: in fact for this kind <strong>of</strong> <strong>data</strong> streams the differential<br />

encoding produces an encoded stream with a smaller dynamics, i.e. the<br />

difference between the maximum and minimum values in the encoded<br />

stream is smaller than the same value calculated in the original sequence.<br />

So far the encoded sequence can be represented with a smaller<br />

number <strong>of</strong> bits than the original one.


Original sequence<br />

Sequence after<br />

differential encoding<br />

2.4 — Lossless <strong>compression</strong> techniques<br />

17 19 24 24 24 21 15 10 89 95 96 96 96 95 94 94 95<br />

...<br />

17 2 5 0 0 −3 −6 −5 79 6 1 0 0 −1 −1 0 1<br />

Figure 2.2: Differential encoding<br />

Beside that the differential encoding can be used in conjunction with<br />

the Run Length encoding system: in fact, if a sequence contains long<br />

sequences <strong>of</strong> equal values, it is converted into a sequence <strong>of</strong> zeros by the<br />

differential encoder and then further compressed using the Run Length<br />

encoder.<br />

2.4.4 Dictionary techniques<br />

In many applications, the output <strong>of</strong> a source consists <strong>of</strong> recurring patterns.<br />

A classical example is a text source in which certain patterns<br />

or words recur frequently. Also, there are certain patterns that simply<br />

do not occur or, if they do, occurr with great rarity. A very reasonable<br />

approach to encoding such sources is to keep a list or dictionary<br />

<strong>of</strong> frequently occurring patterns. When these patterns appear in the<br />

source, they are encoded with the reference to the dictionary containing<br />

the address to the right table location. If the pattern does not<br />

appear in the dictionary, then it can be encoded using some other,<br />

less efficient, method. In effect we are splitting the input domain in<br />

two classes: frequently occurring patterns and infrequently occurring<br />

patterns. For this technique to be effective, the class <strong>of</strong> frequently occurring<br />

patterns, and hence the size <strong>of</strong> the dictionary, must be much<br />

smaller than the number <strong>of</strong> all possible patterns. Depending upon how<br />

much information is available to build a dictionary, it can be used a<br />

static or a dynamic approach to the creation <strong>of</strong> the dictionary. Choos-<br />

33


34<br />

Data <strong>compression</strong> techniques<br />

ing a static dictionary technique is most appropriate when considerable<br />

prior knowledge about the source is available.<br />

When no a priori information is available on the structure <strong>of</strong> the input<br />

source an adaptive technique is adopted: for example the UNIX compress<br />

command makes use <strong>of</strong> this technique. It starts with a dictionary<br />

<strong>of</strong> size 512, thus transmitting codewords 9-bit long. Once the dictionary<br />

has filled up, the size <strong>of</strong> the dictionary is doubled to 1024 entries,<br />

so far transmitting codewords 10-bit long. The size <strong>of</strong> the dictionary is<br />

progressively filled up until it contains 216 entries, then compress becomes<br />

a static coding technique. At this point the algorithm monitors<br />

the <strong>compression</strong> ratio: if it falls below a threshold, the dictionary is<br />

flushed and the dictionary building process is restarted.<br />

The dictionary techniques are also used in the image <strong>compression</strong> field<br />

in the GIF (Graphics Interchange Format) standard, working in a very<br />

similar way to the compress command.<br />

2.4.5 Selective readout<br />

The selective readout technique [10] is a lossless <strong>data</strong> <strong>compression</strong> technique<br />

usually applied in High Energy Physics Experiments. Since really<br />

interesting <strong>data</strong> are a small fraction <strong>of</strong> the total amount <strong>of</strong> <strong>data</strong> actually<br />

produced, it proves useful to transmit and store only those <strong>data</strong>.<br />

The selective readout may reduce the <strong>data</strong> size by identifying regions<br />

in space containing a significant amount <strong>of</strong> energy. For example in<br />

the SDD case, the Central Trigger Processor (CTP) unit defines a Region<br />

Of Interest (ROI) that, event by event, contains the information<br />

<strong>of</strong> which ladders are to be read out and which ones can be discarded.<br />

Using the ROI feature a very high <strong>compression</strong> ratio can be achieved.


2.5 — Lossy <strong>compression</strong> techniques<br />

2.5 Lossy <strong>compression</strong> techniques<br />

This section contains an explanation <strong>of</strong> the most widely used lossy <strong>compression</strong><br />

techniques. In particular the following items will be covered:<br />

– zero suppression;<br />

– transform coding;<br />

– sub-band coding with some remarks on wavelets.<br />

The first <strong>of</strong> these algorithms has been chosen for direct application in<br />

the 1D <strong>compression</strong> algorithm implemented in the prototypes CARLOS<br />

v1 and v2.<br />

2.5.1 Zero supression<br />

Zero suppression is the very simple technique <strong>of</strong> eliminating <strong>data</strong> samples<br />

below a certain threshold, by putting them to 0. Zero suppression<br />

proves to be very useful in <strong>data</strong> containing large quantities <strong>of</strong> zeros and<br />

interesting <strong>data</strong> concentrated in small clusters: for instance, being the<br />

mean occupancy <strong>of</strong> a SDD in the inner layer <strong>of</strong> 2.5 %, a <strong>compression</strong><br />

ratio <strong>of</strong> 40 can be obtained by using the zero suppression technique<br />

only.<br />

A problem arises since the SDD <strong>data</strong> and, in general, <strong>data</strong> collections<br />

contain the sum <strong>of</strong> two different distributions: the real signal corresponding<br />

to the interesting physical event and a white noise with a<br />

Gaussian distribution around a mean value. So far if a lossy <strong>compression</strong><br />

algorithm obtains a good <strong>compression</strong> ratio just eliminating the<br />

noise, the distortion introduced is absolutely acceptable. The key task<br />

for a fair <strong>implementation</strong> <strong>of</strong> the zero suppression technique is the choice<br />

<strong>of</strong> the right value <strong>of</strong> the threshold parameter, in order to eliminate noise<br />

while preserving the physical signal.<br />

In the case <strong>of</strong> <strong>data</strong> coming out from the SDD detector and related<br />

front-end electronics, <strong>data</strong> values are shifted from the 0 level to a baseline<br />

level greater than 0. This baseline level corresponds to the mean<br />

35


36<br />

Data <strong>compression</strong> techniques<br />

value <strong>of</strong> the noise introduced by the preamplification electronics; then<br />

there is a spread among this value due to the RMS <strong>of</strong> the Gaussian<br />

distribution <strong>of</strong> the noise.<br />

The noise level introduced by the electronics may vary with time and<br />

with the amount <strong>of</strong> radiation absorbed: so far a <strong>compression</strong> algorithm<br />

making use <strong>of</strong> the zero suppression technique has to allow a tunable<br />

value <strong>of</strong> the threshold level, in order to accomodate fluctuations or<br />

drifts in the baseline values. Following this indication, the threshold<br />

level used in CARLOS v1 and v2 is completely presettable via s<strong>of</strong>tware<br />

using the JTAG port.<br />

2.5.2 Transform coding<br />

Transform coding [7] takes as input a <strong>data</strong> sequence and transforms it<br />

into a sequence in which most part <strong>of</strong> the information is contained into<br />

a few samples: so far the new sequence can be further compressed using<br />

the other <strong>compression</strong> algorithms described up to now. The key point<br />

<strong>of</strong> transform coding is the choice <strong>of</strong> the transform: this depends on the<br />

features and redundancies <strong>of</strong> the input <strong>data</strong> stream to compress. The<br />

algorithm, working on N elements at a time, consists <strong>of</strong> three steps:<br />

– transform: the input sequence {sn} is split in N-long sequences;<br />

then each block is mapped, using a reversible transformation, into<br />

the sequence {cn}.<br />

– quantization: the transformed sequence {cn} is quantized, i.e. a<br />

number <strong>of</strong> bits is assigned to each sample depending on the dynamics<br />

<strong>of</strong> the sequence, <strong>compression</strong> ratio desired and acceptable<br />

distortion.<br />

– coding: the quantized sequence {cn} is encoded using a binary<br />

encoding technique such as Run Length encoding or the Huffman<br />

coding.<br />

These concepts can be expressed in a mathematical way: given a sequence<br />

in input {sn}, it is divided in N-long blocks and it is mapped


2.5 — Lossy <strong>compression</strong> techniques<br />

using the reversible transform A into the sequence {cn}:<br />

or, in other terms:<br />

cn =<br />

N−1 <br />

i=0<br />

c = As (2.19)<br />

sian,i con [A]i,j = ai,j (2.20)<br />

Quantization and encoding steps are performed on the sequence {cn},<br />

so to optimize <strong>compression</strong>.<br />

The de<strong>compression</strong> algorithm, by means <strong>of</strong> the inverse transform B =<br />

A −1 , reconstructs the original sequence {sn} from the encoded sequence<br />

{cn}, in the following way:<br />

or:<br />

sn =<br />

N−1 <br />

i=0<br />

s = Bc (2.21)<br />

sibn,i con [B]i,j = bi,j (2.22)<br />

These concepts can be easily extended to bi-dimensional <strong>data</strong>, such as<br />

images or 2-D charge distributions, as in the case <strong>of</strong> the SDD.<br />

Let us take a portion N × N <strong>of</strong> a digital image S, containing Si,j as<br />

its (i, j)-th pixel; by performing a reversible bi-dimensional transform<br />

A working on N × N pixels at a time, with ai,j (i, j)-th element <strong>of</strong> the<br />

transform matrix A and Ci,j (i, j)-th pixel <strong>of</strong> the block N × N <strong>of</strong> the<br />

compressed image C, the following holds true:<br />

Ck,l =<br />

N−1 <br />

i=0<br />

N−1 <br />

j=0<br />

Si,jai,jak,l<br />

(2.23)<br />

A transform is defined separable if it is possible to apply the 2D transform<br />

<strong>of</strong> a N ×N block by applying, first, a 1D transform on the N rows<br />

<strong>of</strong> the block and, then, a transform on the N columns <strong>of</strong> the block, just<br />

transformed; by choosing a separable transform the (2.23) becomes:<br />

Ck,l =<br />

N−1 <br />

i=0<br />

N−1 <br />

j=0<br />

Si,jak,ial,j<br />

(2.24)<br />

37


38<br />

or, expressed as a matrix:<br />

Data <strong>compression</strong> techniques<br />

C = ASA T<br />

The inverse transform is the following one:<br />

S = BCB T<br />

(2.25)<br />

(2.26)<br />

Frequently orthonormal transforms are used, so that B = A −1 = A T ,<br />

in a way that calculating the inverse trasform reduces to:<br />

S = A T CA (2.27)<br />

Even in the bi-dimensional case, in order to reach a high <strong>compression</strong><br />

ratio, a good transform has to be chosen. For instance the JPEG<br />

standard has adopted, until the year 2000, the use <strong>of</strong> the Discrete<br />

Cosine Transform, known as DCT.<br />

If A is the matrix representing the DCT, the following relationship<br />

follows:<br />

<br />

(2j +1)iπ<br />

[A]i,j = w(i)cos<br />

j =0, 1,... ,N − 1 (2.28)<br />

2N<br />

where:<br />

⎧<br />

⎨<br />

w(i) =<br />

⎩<br />

<br />

1<br />

N <br />

2<br />

N<br />

i =0<br />

i =1,... ,N − 1<br />

Fig. 2.3 gives a graphical interpretation <strong>of</strong> (2.28).<br />

After choosing the transform, the next step consists in the quantization<br />

<strong>of</strong> the transformed image.<br />

Several approaches are possible: for example the zonal mapping foresees<br />

a preliminary analysis <strong>of</strong> the transformed coefficients statistics and<br />

alaterassignment<strong>of</strong>afixednumber<strong>of</strong>bits.<br />

The name zonal mapping comes from the assignment <strong>of</strong> a fixed number<br />

<strong>of</strong> bits depending on the zone in which each coefficient is placed in the<br />

square N × N block under study; Tab. 2.6 reports an allocation bit


2.5 — Lossy <strong>compression</strong> techniques<br />

Figure 2.3: Base coefficients for the bi-dimensional DCT in the case N =8<br />

8 7 5 3 1 1 0 0<br />

7 5 3 2 1 0 0 0<br />

4 3 2 1 1 0 0 0<br />

3 3 2 1 1 0 0 0<br />

2 1 1 1 0 0 0 0<br />

1 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 0 0<br />

Table 2.6: Allocation bit table for a 8 × 8block<br />

table for a 8 × 8block.<br />

It is interesting to note that quantization in Tab. 2.6 assigns zero bits<br />

to the coefficients in the lower-right side <strong>of</strong> the table: actually this is<br />

equivalent to ignore these coefficients. This kind <strong>of</strong> quantization makes<br />

sense since lower-right side coefficients come from a transformation <strong>of</strong><br />

the original image using high frequency cosines, i.e. these coefficients<br />

contain an information corresponding to the high frequencies in the<br />

original signal, see Fig. 2.3.<br />

Since human eye response strongly depends on frequency and, in particular,<br />

it is sensible to variations at low frequencies and far less sensible<br />

at higher frequencies, quantization in Tab. 2.6 tends to ignore informations<br />

that the human eye would not appreciate at all.<br />

39


40<br />

Data <strong>compression</strong> techniques<br />

After quantization, only non-null coefficients are transmitted. In particular<br />

for every non-null coefficient, two words have to be transmitted:<br />

the first with the quantized value <strong>of</strong> the coefficient itself; the second<br />

containing the number <strong>of</strong> null samples occurred after the last non null<br />

coefficient. This allows the de<strong>compression</strong> algorithm to exactly reconstruct<br />

the sequence as it was quantized and, from that, the original<br />

image.<br />

As an example, let us suppose to have the 8 × 8 8-bit pixels image<br />

reported in Tab. 2.7.<br />

124 125 122 120 122 119 117 118<br />

121 121 120 119 119 120 120 118<br />

126 124 123 122 121 121 120 120<br />

124 124 125 125 126 125 124 124<br />

127 127 128 129 130 128 127 125<br />

143 142 143 142 140 139 139 139<br />

150 148 152 152 152 152 150 151<br />

156 159 158 155 158 158 157 156<br />

Table 2.7: 8 × 8 block <strong>of</strong> a digital image<br />

Each value <strong>of</strong> the block is translated <strong>of</strong> a factor 2p−1 ,wherepis the<br />

number <strong>of</strong> bits per pixel (in this case p = 8); then the DCT is applied<br />

to the block obtaining the coefficients ci,j reported in Tab. 2.8.<br />

39.88 6.56 -2.24 1.22 -0.37 -1.08 0.79 1.13<br />

-102.43 4.56 2.26 1.12 0.35 -0.63 -1.05 -0.48<br />

37.77 1.31 1.77 0.25 -1.50 -2.21 -0.10 0.23<br />

-5.67 2.24 -1.32 -0.81 1.41 0.22 -0.13 0.17<br />

-3.37 -0.74 -1.75 0.77 -0.62 -2.65 -1.30 0.76<br />

5.98 -0.13 -0.45 -0.77 1.99 -0.26 1.46 0.00<br />

3.97 5.52 2.39 -0.55 -0.051 -0.84 -0.52 -0.13<br />

-3.43 0.51 -1.07 0.87 0.96 0.09 0.33 0.01<br />

Table 2.8: DCT coefficients related to the block in Tab. 2.7.


2.5 — Lossy <strong>compression</strong> techniques<br />

As already stated high-frequency related coefficients in the lower-right<br />

corner tend to be quite close to 0, while most <strong>of</strong> the information is<br />

concentrated in the upper-left corner.<br />

The quantization <strong>of</strong> the coefficients is obtained using the reference table<br />

as in Tab. 2.9; in particular quantized lij values are obtained with<br />

the following formula:<br />

<br />

cij<br />

lij = +0.5<br />

(2.29)<br />

Q t ij<br />

where Q t ij is the (i,j)-th element <strong>of</strong> the quantization table and ⌊⌋ is a<br />

function for which ⌊x⌋ is the greatest integer less than x.<br />

16 11 10 16 24 40 51 61<br />

12 12 14 19 26 58 60 55<br />

14 13 16 24 40 57 69 56<br />

14 17 22 29 51 87 80 62<br />

18 22 37 56 68 109 103 77<br />

24 35 55 64 81 104 113 92<br />

49 64 78 87 103 121 120 101<br />

72 92 95 98 112 100 103 99<br />

Table 2.9: Quantization table<br />

Tab. 2.10 contains the resulting bit allocation table obtained using the<br />

values contained in the quantization table Tab. 2.9:<br />

After studying the structure <strong>of</strong> matrices like Tab. 2.10, the order chosen<br />

for sending coefficients is the one shown in Fig. 2.4.<br />

This choice allows to have a high probability that the final sequence<br />

contains a lot <strong>of</strong> zero coefficients; so far this part <strong>of</strong> the sequence can<br />

be encoded using the Run-Length technique.<br />

2.5.3 Subband coding<br />

A signal can be decomposed in different frequency components (see<br />

Fig. 2.5) using analog or digital filters, then each resulting signal can<br />

41


42<br />

Data <strong>compression</strong> techniques<br />

2 1 0 0 0 0 0 0<br />

-9 0 0 0 0 0 0 0<br />

3 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 0 0<br />

Table 2.10: Resulting bit allocation table<br />

Figure 2.4: Zig-zag scanning pattern for an 8x8 transform<br />

be encoded and compressed using a specific algorithm. Digital filtering<br />

[9] involves taking a weighted sum <strong>of</strong> current and past inputs to the<br />

filter and, in some cases, the past outputs to the filter. The general<br />

form <strong>of</strong> the input-output relationship <strong>of</strong> the filter is given by:<br />

N<br />

M<br />

yn = aixn−i +<br />

(2.30)<br />

biyn−i<br />

i=0<br />

i=1<br />

where the sequence xn is the input to the filter, the sequence yn is<br />

the output from the filter and the values ai and bi are called the filter<br />

coefficients. If the input sequence is a single 1 followed by all 0s, the<br />

output sequence is called the impulse response <strong>of</strong> the filter. The im-


2.5 — Lossy <strong>compression</strong> techniques<br />

input signal<br />

Figure 2.5: Decomposition <strong>of</strong> a signal in frequency components<br />

pulse response completely specifies the filter: once we know the impulse<br />

response <strong>of</strong> the filter, we know the relationship between the input and<br />

the output <strong>of</strong> the filter. Notice that if the bi are all zero, there the<br />

impulse response will die out after N samples. These filters are called<br />

finite impulse response or FIR filters. In FIR filters Eq. 2.30 reduces<br />

to a convolution operation between the input signal and the filter coefficients.<br />

Filters with the nonzero values for some <strong>of</strong> the bi are called<br />

infinite response filters or IIR filters.<br />

The basic subband coding works as follows: the source is passed<br />

through a bank <strong>of</strong> filters (a 3-level filter bank is shown in Fig. 2.6),<br />

called the analysis filter bank which covers the range <strong>of</strong> frequencies<br />

that make up the source; the outputs <strong>of</strong> the filters are then subsampled<br />

as in Fig. 2.7. The justification <strong>of</strong> subsampling is the Nyquist rule and<br />

its generalization, which tells that for perfect reconstruction we only<br />

need twice as many samples per second as the range <strong>of</strong> frequencies.<br />

This means that it is possible to reduce the number <strong>of</strong> samples at the<br />

output <strong>of</strong> the filter as the range <strong>of</strong> frequencies is less than the range <strong>of</strong><br />

frequencies at the input <strong>of</strong> the filter. The process <strong>of</strong> reducing the number<br />

<strong>of</strong> samples is called decimation or downsampling. The amount <strong>of</strong><br />

decimation depends on the ratio <strong>of</strong> the bandwidth <strong>of</strong> the filter output<br />

43


44<br />

Data <strong>compression</strong> techniques<br />

High pass filter<br />

Low pass filter<br />

High pass filter<br />

Low pass filter<br />

High pass filter<br />

Low pass filter<br />

High pass filter<br />

Low pass filter<br />

High pass filter<br />

Low pass filter<br />

High pass filter<br />

Low pass filter<br />

High pass filter<br />

Low pass filter<br />

Figure 2.6: An 8-band 3-level filter bank<br />

to the filter input. If the bandwidth at the output <strong>of</strong> the filter is 1/M<br />

<strong>of</strong> the bandwidth at the input <strong>of</strong> the filter, the output is decimated by<br />

a factor <strong>of</strong> M by keeping every Mth sample. Once the output <strong>of</strong> the<br />

filters has been decimated, the output is encoded using one <strong>of</strong> several<br />

encoding schemes explained so far.<br />

Along with the selection <strong>of</strong> the <strong>compression</strong> scheme, the allocation <strong>of</strong><br />

bits between the subbands is an important design parameter, since<br />

different subbands contain differing amounts <strong>of</strong> information. The bit<br />

allocation procedure can have a significant impact on the quality <strong>of</strong><br />

the final reconstruction, especially when the information component <strong>of</strong><br />

different bands is very different.<br />

The de<strong>compression</strong> phase, in subband coding also named synthesis,<br />

works as follows: first the encoded samples for each subband are decoded<br />

at the receiver, then the decoded values are upsampled by inserting<br />

an appropriate number <strong>of</strong> zeros between the samples, then the<br />

upsampled signals are passed through a bank <strong>of</strong> reconstruction filters<br />

and added together.


2.5 — Lossy <strong>compression</strong> techniques<br />

input<br />

signal<br />

H ~<br />

~<br />

G<br />

ν<br />

Downsampling<br />

Analysis filter 1<br />

ν<br />

Analysis filter 2<br />

2<br />

2<br />

Downsampling<br />

Encoder 1<br />

Encoder 2<br />

Figure 2.7: Subband coding technique: analysis filter bank, downsampling<br />

and encoding<br />

Subband coding has applications in speech coding and audio coding<br />

with the MPEG audio, but can be applied also to image <strong>compression</strong>.<br />

2.5.4 Wavelets<br />

Another method <strong>of</strong> decomposing signals that has gained a great deal<br />

<strong>of</strong> popularity in recent years is the use <strong>of</strong> wavelets [11, 12, 13, 14].<br />

Decomposing a signal in terms <strong>of</strong> its frequency content using sinusoids<br />

results in a very fine resolution in the frequency domain. However<br />

siinusoids are defined on the time domain from −∞ to ∞, therefore<br />

individual frequency components give no temporal resolution [15].<br />

In a wavelet representation, a signal is represented in terms <strong>of</strong> functions<br />

that are localized both in time and in frequency. For instance, the<br />

following is known as the Haar wavelet:<br />

ψ0,0(x) =<br />

<br />

1 0 ≤ x< 1<br />

2<br />

−1 1 ≤ x


46<br />

ψ<br />

0,0<br />

Data <strong>compression</strong> techniques<br />

ψ<br />

2,0<br />

ψ<br />

ψ<br />

1,0 1,1<br />

ψ<br />

2,1<br />

Figure 2.8: The Haar wavelet<br />

ψ<br />

2,2<br />

From this “mother” function the following set <strong>of</strong> functions can be obtained:<br />

<br />

ψj,k(x) =ψ0,0(2 j x − k) =<br />

1 k2 −j ≤ x


2.5 — Lossy <strong>compression</strong> techniques<br />

(a) (b)<br />

(c) (d)<br />

Figure 2.9: Example <strong>of</strong> multiresolution analysis<br />

In 1989, Stephane Mallat ([16]) developed the multiresolution approach,<br />

which moved the representation using wavelets into the domain <strong>of</strong> subband<br />

coding. These concepts can be better understood with the help <strong>of</strong><br />

an example. Let us suppose we have to approximate the function f(t)<br />

drawn in Fig. 2.9a using the translated versions <strong>of</strong> some time-limited<br />

function φ(t). The indicator function is a simple approximating function:<br />

<br />

1 0 ≤ t


48<br />

Data <strong>compression</strong> techniques<br />

and c0,k are the average values <strong>of</strong> the function in the interval [k − 1,k).<br />

In other words:<br />

c0,k =<br />

k+1<br />

It is possible to scale φ(t) to obtain:<br />

<br />

φ1,0(t) =φ0,0(2t) =<br />

Its translates would be given by:<br />

k<br />

f(t)φ0,k(t)dt (2.37)<br />

1 0 ≤ t< 1<br />

2<br />

0 otherwise<br />

(2.38)<br />

φ1,k(t) =φ1,0(t − k) (2.39)<br />

<br />

= φ0,0(2t − k) = (2.40)<br />

1<br />

0<br />

<br />

0 ≤ 2t − k


2.5 — Lossy <strong>compression</strong> techniques<br />

is accurately represented by φ1 f (t). φ1f (t) can be decomposed into a<br />

lower resolution version <strong>of</strong> itself, namely φ0 f (t) and the difference φ1f (t)<br />

- φ0 f (t). Let us examine this function over an arbitrary interval [k,k+1):<br />

φ 1 f (t) − φ0f (t) =<br />

<br />

c0,k − c1,2k k ≤ t


50<br />

Data <strong>compression</strong> techniques<br />

2. If a function can be expressed exactly by a linear combination <strong>of</strong><br />

the set {φj,k(t)}, then it can also be expressed exactly as a function<br />

<strong>of</strong> the set {φl,k(t)} for all l ≥ j.<br />

3. The complete set {φj,k(t)} ∞ j,k=−∞<br />

tions with the property that:<br />

∞<br />

−∞<br />

can exactly represent all func-<br />

|f(t)| 2 < ∞ (2.52)<br />

4. If a function f(t) can be exactly represented by the set {φ0,k(t)},<br />

then any integer translate <strong>of</strong> the function f(t − k) can also be<br />

represented exactly by the same set.<br />

5.<br />

<br />

φ0,l(t)φ0,k(t)dt =<br />

<br />

0 l = k<br />

1 l = k<br />

(2.53)<br />

The set forms a multiresolution analysis [16]. So far at any resolution<br />

2−j every function f(t) can be decomposed in two components: one<br />

that can be expressed as a function <strong>of</strong> the set {φj,k(t)} and one that<br />

can be expressed as a linear combination <strong>of</strong> the wavelets {ψj,k(t)}.<br />

The mother wavelet ψ0,0(t) and the scaling function φ0,0(t) are related<br />

in the following manner: from Property 2, φ0,0 can be written in terms<br />

<strong>of</strong> φ1,k. If the relationship is given by:<br />

Then the wavelet ψ0,0(t) isgivenby:<br />

φ0,0(t) = hnφ1,n(t) (2.54)<br />

ψ0,0(t) = (−1) n hnφ1,n(t) (2.55)<br />

From this relationship we can assume that the wavelet decomposition<br />

can be implemented in terms <strong>of</strong> filters with impulse responses given<br />

by (2.54) and (2.55) and that the filters are quadrature mirror filters.<br />

Most <strong>of</strong> the orthonormal wavelets are nonzero over an infinite interval.<br />

Therefore the corresponding filters are IIR filters. Well known


2.6 — Implementation <strong>of</strong> <strong>compression</strong> algorithms<br />

exceptions are the Daubechies wavelets that correspond to FIR filters.<br />

Once obtained the coefficients <strong>of</strong> the FIR filters, the procedure for <strong>compression</strong><br />

using wavelets is identical to the one described for subband<br />

coding. From now on the terms multiresolution analysis and waveletbased<br />

analysis will be regarded as synonymous. Some <strong>of</strong> the most used<br />

wavelets families are shown in Fig. 2.10, Fig. 2.11 and Fig. 2.12.<br />

2.6 Implementation <strong>of</strong> <strong>compression</strong> algorithms<br />

Compression algorithms can be implemented in <strong>hardware</strong> or in s<strong>of</strong>tware,<br />

depending on the required speed. When speed is the most important<br />

constraint on the choice <strong>of</strong> the <strong>implementation</strong> <strong>of</strong> the <strong>compression</strong><br />

algorithm, <strong>hardware</strong> <strong>implementation</strong> becomes necessary.<br />

Commercial devices exist implementing <strong>data</strong> <strong>compression</strong> in <strong>hardware</strong>:<br />

for example the ALDC1-40S-M from IBM featuring an adaptive lossless<br />

<strong>data</strong> <strong>compression</strong> works at a rate <strong>of</strong> 40 MBytes/s, while the AHA32321<br />

chip from Aha can compress and decompress <strong>data</strong> at 10 MBytes/s with<br />

a clock frequency <strong>of</strong> 40 MHz. These rates are far too small than the<br />

one required for what concerns the SDD readout: in fact the <strong>compression</strong><br />

chip we need has to face an input <strong>data</strong> rate <strong>of</strong> 320 MByte/s.<br />

No commercial chip exists with such features, so we had to design an<br />

Application Specific Integrated Circuit (ASIC) targeted to our requirements.<br />

51


52<br />

Haar<br />

haar<br />

Wavelet function psi<br />

1<br />

0.5<br />

0<br />

−0.5<br />

Data <strong>compression</strong> techniques<br />

0 0.2 0.4 0.6 0.8 1<br />

−1<br />

Scaling function phi<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0 0.2 0.4 0.6 0.8 1<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

Daubachies<br />

db1 db2 db3 db10<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

0 1 2 3 4 5<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1.5<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

0 0.5 1 1.5 2 2.5 3<br />

1<br />

0.5<br />

0 5 10 15<br />

−0.4<br />

−0.2<br />

0<br />

0.2<br />

0.4<br />

0.6<br />

0.8<br />

1<br />

1<br />

0.5<br />

0<br />

0.5<br />

0.5<br />

0<br />

−0.5<br />

0<br />

0<br />

−0.5<br />

0 5 10 15<br />

−1<br />

0 1 2 3 4 5<br />

0 0.5 1 1.5 2 2.5 3<br />

0 0.2 0.4 0.6 0.8 1<br />

−1<br />

Scaling function phi<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0 0.2 0.4 0.6 0.8 1<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Decomposition low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Decomposition high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Decomposition low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Decomposition high−pass filter<br />

0.5<br />

0.5<br />

0<br />

0<br />

0 2 4 6 8 10 12 14 16 18<br />

−0.5<br />

0 2 4 6 8 10 12 14 16 18<br />

−0.5<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

Reconstruction high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Reconstruction low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Reconstruction high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Reconstruction low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

0.5<br />

0.5<br />

0<br />

0<br />

0 2 4 6 8 10 12 14 16 18<br />

−0.5<br />

0 2 4 6 8 10 12 14 16 18<br />

−0.5<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

0 1<br />

−0.5<br />

0<br />

0.5<br />

Figure 2.10: Some functions belonging to different wavelet families: note<br />

that db1 is equivalent to the Haar


Symlets<br />

sym2 sym3 sym4 sym8<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

Scaling function phi<br />

2.6 — Implementation <strong>of</strong> <strong>compression</strong> algorithms<br />

1<br />

0 1 2 3 4 5 6 7<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1.5<br />

Wavelet function psi<br />

0 1 2 3 4 5 6 7<br />

−0.2<br />

0<br />

0.2<br />

0.4<br />

0.6<br />

0.8<br />

1<br />

1.2<br />

Scaling function phi<br />

0 1 2 3 4 5<br />

−1.5<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1<br />

0 0.5 1 1.5 2 2.5 3<br />

−1.5<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1<br />

0.5<br />

0.5<br />

0.5<br />

0<br />

0<br />

−0.5<br />

0<br />

0 5 10 15<br />

0 5 10 15<br />

−0.2<br />

0<br />

0.2<br />

0.4<br />

0.6<br />

0.8<br />

1<br />

0 1 2 3 4 5<br />

0 0.5 1 1.5 2 2.5 3<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

0 2 4 6 8 10 12 14<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10 12 14<br />

−0.5<br />

0<br />

0.5<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7<br />

Decomposition high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Decomposition low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Decomposition high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Decomposition low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

0 2 4 6 8 10 12 14<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10 12 14<br />

−0.5<br />

0<br />

0.5<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7<br />

Reconstruction high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Reconstruction low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3 4 5<br />

Reconstruction high−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Reconstruction low−pass filter<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

−0.2<br />

−0.4<br />

0 1 2 3<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

Scaling function phi<br />

Wavelet function psi<br />

Wavelet function psi<br />

Scaling function phi<br />

1.5<br />

0 1 2 3 4 5<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1.5<br />

2<br />

1.5<br />

1<br />

1<br />

1<br />

0.5<br />

1<br />

0.5<br />

0.5<br />

0<br />

0<br />

0.5<br />

0<br />

−0.5<br />

−0.5<br />

−0.5<br />

0 2 4 6 8 10<br />

−0.2<br />

0<br />

0.2<br />

0.4<br />

0.6<br />

0.8<br />

1<br />

1.2<br />

Scaling function phi<br />

0<br />

0 5 10 15 20 25<br />

0 5 10 15 20 25<br />

−0.2<br />

0<br />

0.2<br />

0.4<br />

0.6<br />

0.8<br />

1<br />

0 5 10 15<br />

0 5 10 15<br />

−0.2<br />

0<br />

0.2<br />

0.4<br />

0.6<br />

0.8<br />

1<br />

0 2 4 6 8 10<br />

0 1 2 3 4 5<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

0 4 8 12 16 20 24 28<br />

−0.5<br />

0<br />

0.5<br />

0 4 8 12 16 20 24 28<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10 12 14 16<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10 12 14 16<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10<br />

−0.5<br />

0<br />

0.5<br />

0.5<br />

0.5<br />

0<br />

−0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

0 1 2 3 4 5<br />

Reconstruction low−pass filter<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

0 4 8 12 16 20 24 28<br />

−0.5<br />

0<br />

0.5<br />

0 4 8 12 16 20 24 28<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10 12 14 16<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10 12 14 16<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10<br />

−0.5<br />

0<br />

0.5<br />

0 2 4 6 8 10<br />

−0.5<br />

0<br />

0.5<br />

Figure 2.11: Some functions belonging to different wavelet families<br />

Coiflets<br />

coif1 coif2 coif3 coif5<br />

Reconstruction high−pass filter<br />

0.5<br />

0.5<br />

0<br />

−0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

0 1 2 3 4 5<br />

53


54<br />

Biorthogonal Wavelets<br />

bior1.1 bior1.3 bior1.5 bior6.8<br />

Decomposition wavelet function psi<br />

1.5<br />

1<br />

0.5<br />

Decomposition scaling function phi<br />

Decomposition wavelet function psi<br />

Decomposition scaling function phi<br />

Decomposition wavelet function psi<br />

Decomposition scaling function phi<br />

0 0.2 0.4 0.6 0.8 1<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

Decomposition wavelet function psi<br />

Decomposition scaling function phi<br />

0 2 4 6 8<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

0 1 2 3 4<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1<br />

0.5<br />

0<br />

0 1 2 3 4<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

1<br />

0<br />

−0.5<br />

1<br />

0.5<br />

0<br />

0 2 4 6 8<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

0.5<br />

0 5 10 15<br />

1<br />

0.5<br />

0<br />

0 5 10 15<br />

Decomposition low−pass filter<br />

0 0.2 0.4 0.6 0.8 1<br />

0<br />

Decomposition high−pass filter<br />

Decomposition high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

Decomposition high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

Decomposition high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Data <strong>compression</strong> techniques<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

Reconstruction wavelet function psi<br />

Reconstruction scaling function phi<br />

0 2 4 6 8<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

Reconstruction wavelet function psi<br />

Reconstruction scaling function phi<br />

Reconstruction wavelet function psi<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

0 1 2 3 4<br />

Reconstruction scaling function phi<br />

Reconstruction wavelet function psi<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

0 0.2 0.4 0.6 0.8 1<br />

Reconstruction scaling function phi<br />

0 5 10 15<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1<br />

1<br />

1<br />

1<br />

0.5<br />

0.5<br />

0.5<br />

0.5<br />

0 5 10 15<br />

0<br />

0 2 4 6 8<br />

0<br />

0 1 2 3 4<br />

0<br />

0 0.2 0.4 0.6 0.8 1<br />

0<br />

Reconstruction high−pass filter<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Reverse Biorthogonal Wavelets<br />

rbio1.1 rbio1.3 rbio1.5 rbio6.8<br />

Decomposition wavelet function psi<br />

Decomposition scaling function phi<br />

Decomposition wavelet function psi<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

0 1 2 3 4<br />

Decomposition scaling function phi<br />

0 0.2 0.4 0.6 0.8 1<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

Decomposition wavelet function psi<br />

Decomposition scaling function phi<br />

0 5 10 15<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1<br />

1<br />

1<br />

0.5<br />

0.5<br />

0.5<br />

0 5 10 15<br />

0<br />

0 2 4 6 8<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

Decomposition wavelet function psi<br />

Decomposition scaling function phi<br />

1<br />

0.5<br />

0 2 4 6 8<br />

0<br />

0 1 2 3 4<br />

0<br />

0 0.2 0.4 0.6 0.8 1<br />

0<br />

Decomposition high−pass filter<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

Decomposition high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

Decomposition high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

Decomposition high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Decomposition low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Reconstruction wavelet function psi<br />

1.5<br />

1<br />

0.5<br />

Reconstruction scaling function phi<br />

Reconstruction wavelet function psi<br />

Reconstruction scaling function phi<br />

Reconstruction wavelet function psi<br />

Reconstruction scaling function phi<br />

0 0.2 0.4 0.6 0.8 1<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

Reconstruction wavelet function psi<br />

Reconstruction scaling function phi<br />

0 1 2 3 4<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

1<br />

0.5<br />

0<br />

0 1 2 3 4<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

1<br />

0<br />

−0.5<br />

0.5<br />

0 5 10 15<br />

1<br />

0.5<br />

0<br />

0 5 10 15<br />

Reconstruction low−pass filter<br />

0 2 4 6 8<br />

−1<br />

−0.5<br />

0<br />

0.5<br />

1<br />

0 0.2 0.4 0.6 0.8 1<br />

0<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

0.5<br />

0<br />

−0.5<br />

0 2 4 6 8 10 12 14 16<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

1<br />

0.5<br />

0<br />

0 2 4 6 8<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5 6 7 8 9<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1 2 3 4 5<br />

Reconstruction high−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Reconstruction low−pass filter<br />

0.5<br />

0<br />

−0.5<br />

0 1<br />

Figure 2.12: Some functions belonging to different wavelet families: note<br />

that bior1.1 and rbior1.1 are equivalent to the haar


Chapter 3<br />

1D <strong>compression</strong> algorithm<br />

and <strong>implementation</strong>s<br />

3.1 Compression algorithms for SDD<br />

The choice <strong>of</strong> the algorithm for SDD <strong>data</strong> <strong>compression</strong> is strictly related<br />

to the input <strong>data</strong> stream features:<br />

– low detector occupancy (max 3 %)<br />

– small samples are much more probable than high samples<br />

The first feature suggests the use <strong>of</strong> a zero suppression algorithm: all<br />

samples below a certain value (depending on the noise distribution)<br />

are discarded. The second feature suggests to adopt an entropy coder,<br />

such as the Huffman one. Beside that it is important for the algorithm<br />

to contain s<strong>of</strong>tware tunable parameters in order to re-optimize the algorithm<br />

performance in case <strong>of</strong> changes on the statistics <strong>of</strong> the input<br />

distribution. For instance the threshold level has to be changeable via<br />

s<strong>of</strong>tware in order to take into account <strong>of</strong> changes on the signal to noise<br />

ratio over the years, so the Huffman tables have to be reconfigurable<br />

too. The other important features for the <strong>compression</strong> algorithms are:<br />

– they have to be fast<br />

55


56<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

– they have to be simple to implement in <strong>hardware</strong><br />

– they have to allow lossless <strong>data</strong> transmission<br />

For the development <strong>of</strong> the <strong>compression</strong> algorithms, studies have been<br />

performed on the statistical distribution <strong>of</strong> the sample <strong>data</strong> coming<br />

from the single-particle events <strong>of</strong> three beam tests, so that noise could<br />

be properly taken into account. The <strong>compression</strong> results have been<br />

evaluated in order to verify the algorithm efficiency and the best parameter<br />

values.<br />

3.2 1D <strong>compression</strong> algorithm<br />

Following these requirements the <strong>INFN</strong> Section <strong>of</strong> Torino has chosen<br />

a sequential <strong>compression</strong> algorithm [17] which scans <strong>data</strong> coming from<br />

each anode row as uni-dimensional <strong>data</strong> streams. As shown in Fig. 3.1<br />

as an example, <strong>data</strong> samples coming from anode 76 are processed, then<br />

from anode 77 and so on. The ultimate goal <strong>of</strong> the algorithm is to<br />

save <strong>data</strong> belonging to a cluster, while rejecting all the other samples<br />

regarded as noise. To have a <strong>data</strong> reduction system that is applicable<br />

to all the situations, the algorithm is provided with different tuning<br />

parameters (Fig. 3.2 provides a graphical explanation <strong>of</strong> them):<br />

– threshold: the threshold parameter is applied to the incoming<br />

samples, forcing the differences to zero if they are smaller than<br />

this value. This parameter has the goal <strong>of</strong> eliminating noise and<br />

pedestals affecting <strong>data</strong>.<br />

– tolerance: the tolerance parameter is applied to differences calculated<br />

between consecutive samples, forcing them to zero if they<br />

are less than this value (using this mechanism samples not very<br />

different are considered equal). So far non significant fluctuations<br />

<strong>of</strong> the input values are eliminated using the tolerance mechanism.<br />

– disable: the disable parameter is applied to the input <strong>data</strong>, removing<br />

all previous mechanisms for samples greater than disable


3.2 — 1D <strong>compression</strong> algorithm<br />

Figure 3.1: Cluster in two dimensions and its slices along the anode direction<br />

in order to have full information on the clusters and to maintain<br />

good double peak resolution. This means that the important information<br />

is not affected by the lossy <strong>compression</strong> algorithm.<br />

The 1D algorithm actually consists <strong>of</strong> 5 processing steps sequentially<br />

applied (see Fig. 3.3):<br />

– first the input <strong>data</strong> values below the threshold parameter value<br />

are put to 0;<br />

– then, the difference between a sample and the previous one (along<br />

the time direction) is calculated;<br />

– if the difference value is smaller than the tolerance parameter and<br />

if the input sample is smaller the the disable parameter, then the<br />

difference value is put to 0, otherwise its value is left unchanged;<br />

– these values are then encoded using the Huffman table;<br />

– the obtained values are then encoded using the Run Length encoding<br />

method.<br />

57


58<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

disable<br />

anodic signal<br />

+tolerance<br />

−tolerance<br />

threshold<br />

Figure 3.2: Threshold, tolerance and disable parameters<br />

The high probability <strong>of</strong> finding long zero sequences in the SDD charge<br />

distribution makes the Run Length encoding use very effective, especially<br />

when combined with threshold, tolerance and disable mechanisms.<br />

3.3 1D algorithm performances<br />

As explained in Chapter 1 in order to comply with the target figures <strong>of</strong><br />

DAQ speed and magnetic tape usage, the size <strong>of</strong> the SDD event has to<br />

be reduced from 32.5 MBytes to about 1.5 MBytes, which corresponds<br />

to a target <strong>compression</strong> coefficient <strong>of</strong> 22. Several standard <strong>compression</strong><br />

algorithms have been evaluated on SDD test beam events <strong>data</strong> in order<br />

to have an estimation <strong>of</strong> the <strong>compression</strong> performances achievable: the<br />

best <strong>compression</strong> coefficient has been obtained with the gzip utility<br />

implemented in the Unix operating system, so far it was chosen for<br />

comparison with our 1D algorithm. The <strong>data</strong> was submitted to the<br />

gzip program into a binary format for a fair comparison.<br />

time


3.3 — 1D algorithm performances<br />

s<strong>of</strong>tware tunable parameters<br />

threshold<br />

tolerance<br />

Huffman tables<br />

input stream<br />

simple threshold zero suppression<br />

differential encoding<br />

tolerance<br />

Huffman encoding<br />

run length encoding<br />

compressed <strong>data</strong><br />

Figure 3.3: 1D <strong>compression</strong> algorithms<br />

3.3.1 Compression coefficient<br />

For the comparison task <strong>data</strong> coming from the August 1998 test beam<br />

was chosen. The gzip <strong>compression</strong> algorithm achieves a <strong>compression</strong><br />

ratio around 2: this value is too far from our target value <strong>of</strong> 22.<br />

The 1D <strong>compression</strong> algorithm has been applied using a threshold value<br />

<strong>of</strong> 20 = 1 ∗ noise mean +1.35 ∗ noise RMS and tolerance =0: the<br />

<strong>compression</strong> value obtained is around 12.5. This is still an unacceptable<br />

value for our purposes. The goal <strong>compression</strong> value <strong>of</strong> 22 can<br />

only be reached by increasing the threshold parameter, which implies a<br />

larger information loss. For instance by applying the algorithm on the<br />

same test beam <strong>data</strong> it is possible to obtain a <strong>compression</strong> coefficient<br />

<strong>of</strong> about 33, with threshold =40=1∗noise mean+2.68∗noise RMS<br />

and tolerance = 0. Fig. 3.4 shows the variation <strong>of</strong> the <strong>compression</strong><br />

coefficient using the 1D algorithm as a function <strong>of</strong> the threshold level<br />

between 20 and 40 and for two values <strong>of</strong> tolerance.<br />

59


60<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Figure 3.4: 1D <strong>compression</strong> ratio as a function <strong>of</strong> threshold and tolerance<br />

An important feature <strong>of</strong> this <strong>compression</strong> algorithm is that it can be<br />

reversed to a lossless algorithm simply by putting the values <strong>of</strong> threshold<br />

and tolerance to 0. Sending <strong>data</strong> without losing any information<br />

will be very useful for the first event acquisitions since raw <strong>data</strong> will be<br />

analyzed for determing statistics, noise and so on. These raw <strong>data</strong> will<br />

also be used for determining the best Huffman tables, the ones allowing<br />

to obtain the best <strong>compression</strong> coefficient. When used in lossless mode,<br />

meaning that only differential encoding, Huffman and run length encoding<br />

are applied, the <strong>compression</strong> coefficient obtained is 2.3, that is<br />

even better than what we obtain with the gzip algorithm.<br />

3.3.2 Reconstruction error<br />

So far it was to be checked if the information loss introduced with a<br />

threshold level <strong>of</strong> 40 is acceptable or not. In particular it was decided to<br />

study how much <strong>data</strong> <strong>compression</strong> and de<strong>compression</strong> affected clusters<br />

geometry for what concerns centroid position and charge.<br />

A cluster finding routine was developed with the following two step<br />

procedure:


3.3 — 1D algorithm performances<br />

Figure 3.5: Spreads introduced by <strong>data</strong> <strong>compression</strong> on measurement <strong>of</strong><br />

coordinates <strong>of</strong> the SDD clusters and <strong>of</strong> the cluster charge (bottom right)<br />

– <strong>data</strong> streams are analyzed one anode row after the other: when<br />

a sample value is higher than a certain threshold level for two<br />

consecutive time bins, it is considered to be a hit until it goes<br />

below the same threshold for two consecutive time bins;<br />

– then if any two 1-D hits from adjacent anodes overlap in time they<br />

are considered as a part <strong>of</strong> a two-dimensional cluster.<br />

After finding samples belonging to clusters they are fitted with a twodimensional<br />

Gaussian function, with the following features:<br />

– the mean value corresponds to the cluster centroid;<br />

– the sigma value corresponds to the centroid resolution;<br />

– the volume under the Gaussian function corresponds to the charge<br />

released on the detector by the ionizing particle.<br />

61


62<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

1D <strong>compression</strong> and de<strong>compression</strong> algorithms were then applied on<br />

test beam <strong>data</strong>, performed cluster finding and analysis on both <strong>data</strong>:<br />

the results are shown in Fig. 3.5. The picture on the upper left shows<br />

the distribution <strong>of</strong> the differences in the centroid coordinates before<br />

and after <strong>compression</strong> along the anode and drift time direction. The<br />

picture on the upper right shows the same distribution on the drift time<br />

direction, while the picture on the bottom left shows the distribution<br />

along the anode direction. These plots show that the <strong>compression</strong> algorithm<br />

with a threshold <strong>of</strong> 40 does not introduce biases on the centroid<br />

coordinate measurements, but that worsen their accuracy by about 9<br />

µm (+4%) along the anode direction and by about 16 µm (+8%) along<br />

the drift time axis. The bottom right picture shows the percentual difference<br />

<strong>of</strong> charge before and after <strong>compression</strong>: so far the 1D algorithm<br />

also introduces an underestimation <strong>of</strong> the cluster charge <strong>of</strong> about 4 %.<br />

3.4 CARLOS v1<br />

During 1999 I have collaborated with <strong>INFN</strong> group in Torino for the<br />

design and test <strong>of</strong> a first <strong>hardware</strong> <strong>implementation</strong> <strong>of</strong> the 1D algorithm:<br />

CARLOS v1. This device is physically implemented as a PCB (Printed<br />

Circuit Board) containing 2 FPGAs (Field Programmable Gate Array)<br />

circuits and some connectors for use in a test beam <strong>data</strong> acquisition<br />

system, as shown in Fig. 3.6. The device processes <strong>data</strong> coming from<br />

one macrochannel only, that is <strong>data</strong> coming from one half-detector, and<br />

directly interfaces the SIU board, the first stage <strong>of</strong> the DAQ system.<br />

3.4.1 Board description<br />

The main two processing blocks mounted on the board are the two<br />

Xilinx FPGA devices. An FPGA is a completely programmable device<br />

widely used for fast prototyping before the final <strong>implementation</strong> <strong>of</strong><br />

the design on an ASIC circuit which requires more resources as far as


3.4 — CARLOS v1<br />

Figure 3.6: CARLOS prototype v1 picture<br />

time, money and design efforts. An FPGA contains a matrix <strong>of</strong> CLBs<br />

(Configurable Logic Blocks) that can be individually programmed and<br />

connected together in order to implement the desired input/output<br />

logic function. Each CLB contains a SRAM (Static RAM) that is used<br />

to implement a logic function by putting the input values on the address<br />

bus: they are used as look-up tables.<br />

An other piece <strong>of</strong> silicon area on the FPGA die contains the configuration<br />

RAM : depending on the contents <strong>of</strong> this block the device<br />

will accomplish different logic functions. The configuration RAM is<br />

written on power-on from an external EPROM: CARLOS v1 hosts two<br />

EPROM devices for the configuration <strong>of</strong> the two FPGAs. The process<br />

<strong>of</strong> configuration takes around 20 ms, after which the devices are<br />

completely operational. A 10 MHz clock generator is hosted between<br />

the EPROM chips: we could not achieve a higher working frequency<br />

with our choice <strong>of</strong> FPGA device. In fact the final operating frequency<br />

63


64<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Features Values<br />

Logic cells 2432<br />

Max logic gates (no RAM) 25k<br />

Max RAM bits (no logic) 32768<br />

Typical gate range (logic and RAM) 15k - 45k<br />

CLB matrix 32x32<br />

Total CLBs 1024<br />

Number <strong>of</strong> flip-flops 2560<br />

Number <strong>of</strong> user I/O 256<br />

Table 3.1: XC4025 Xilinx FPGA main features<br />

is a function <strong>of</strong> how many internal resources are being used: the more<br />

resources are used, the slower becomes the final working frequency.<br />

With the final 10 MHz frequency we reached a good trade-<strong>of</strong>f between<br />

logic complexity and speed; furthermore this frequency was sufficient<br />

for application in a test-beam environment. Tab. 3.1 reports the main<br />

features <strong>of</strong> the chosen FPGA devices XC4025E-4 HQ240C.<br />

The board also contains 3 connectors from left to right:<br />

– the first is used for <strong>data</strong> injection into the first FPGA device using<br />

a Hewlett Packard (HP) pattern generator;<br />

– the second one is used for analyzing <strong>data</strong> coming out from the<br />

first device by making use <strong>of</strong> a logic analyzer probe;<br />

– the third connector is used for the communication between CAR-<br />

LOS v1 and the SIU board. Fig. 3.7 shows a picture <strong>of</strong> the final<br />

SIU board. We used a SIU simplified version called SIMU (SIU<br />

simulator), distributed at CERN for helping front-end designers to<br />

realize DAQ-compatible devices. The SIMU board can be directly<br />

plugged onto this connector.


3.4 — CARLOS v1<br />

Figure 3.7: Picture <strong>of</strong> the SIU board<br />

3.4.2 CARLOS v1 design flow<br />

I have carried out the design <strong>of</strong> the second FPGA device following the<br />

digital design flow shown in Fig. 3.8. In particular the design flow is<br />

composed by the following steps:<br />

– block specifications have been coded with the VHDL language<br />

using a hierarchical structure starting from the bottom layer up<br />

to the top-level;<br />

– each VHDL model has been simulated in order to debug the code<br />

using the Synopsys simulator s<strong>of</strong>tware;<br />

– each VHDL model has been synthesized, that means translated to<br />

a netlist, using the Synopsys synthesis tool; the netlist contains<br />

usual standard cells such as AND, OR or flip-flops, but the FPGA<br />

device does not contain these elements, it contains only RAM<br />

blocks. The netlist is only a logic representation <strong>of</strong> the circuit<br />

itself, it has no physical meaning.<br />

– the netlist is simulated using the Synopsys simulator, taking into<br />

account cell timing delays and constraints.<br />

– the netlist is automatically converted into a physical layout using<br />

65


66<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Figure 3.8: Digital design flow for CARLOS v1<br />

the place and route s<strong>of</strong>tware Alliance from Xilinx.<br />

– the layout information is put in a binary file ready to be downloaded<br />

on the EPROM chip using the Alliance s<strong>of</strong>tware, together<br />

with an EPROM programmer.<br />

This is a very straight-forward and automated process; besides the<br />

time needed between a slight modification in the VHDL code and its<br />

actual <strong>implementation</strong> in the FPGA device is very short. This is the<br />

main reason why FPGAs are so widely used for prototyping. An other<br />

very important reason is the following one: running millions <strong>of</strong> test<br />

vectors as a s<strong>of</strong>tware simulation <strong>of</strong> a VHDL model is a very long process<br />

even for fast machines; the same set <strong>of</strong> test vectors can be run in a<br />

few seconds on the <strong>hardware</strong> prototype. FPGA <strong>implementation</strong> easily<br />

allows algorithms verification on a huge amount <strong>of</strong> <strong>data</strong>.


3.4 — CARLOS v1<br />

3.4.3 Functions performed by CARLOS v1<br />

The FPGA on the left in Fig. 3.6 contains the 1D <strong>compression</strong> algorithm,<br />

as explained in the previous sections, composed <strong>of</strong> 5 processing<br />

blocks sequentially applied to the input <strong>data</strong>. The blocks form a 5-level<br />

pipeline chain, each one requiring one clock cycle. The variable-length<br />

<strong>compression</strong> coefficients are produced as 32-bit long words.<br />

The FPGA on the right contains the following blocks:<br />

– firstcheck: this block processes 32-bit input words coming from<br />

the compressor FPGA: if the MSB is high the incoming <strong>data</strong> is<br />

rejected, otherwise it is accepted and splitted in two different <strong>data</strong><br />

words, one 26-bit wide containing the variable length code and one<br />

5-bit one containing the information <strong>of</strong> how many bits have to be<br />

stored.<br />

– barrel: this block packs 2 to 26 bits variable length codes in fixedsize<br />

32 bits words. The information <strong>of</strong> how many bits from 2 to 26<br />

have to be stored is contained in the 5-bit length bus coming from<br />

the firstcheck block. Variable length Huffman codes packed in 32bit<br />

words can be uniquely unpacked by using the Huffman table<br />

and starting from the MSB to LSB. When a word is complete an<br />

output-push signal is asserted.<br />

– fifo: it contains a 64x32 RAM memory wide for storing <strong>data</strong> coming<br />

out <strong>of</strong> the barrel shifter. When the FIFO contains at least<br />

16 <strong>data</strong> words it asserts a query signal in order to ask the feesiu<br />

block to begin <strong>data</strong> popping.<br />

– feesiu: this is the most complex block <strong>of</strong> the prototype containing<br />

the interface between CARLOS and the SIU board. The main behavior<br />

is quite simple: CARLOS waits for a “Ready to Receive”<br />

(RDYRX) command from the SIU on a bidirectional <strong>data</strong> bus;<br />

after receiving it CARLOS takes possession <strong>of</strong> the bidirectional<br />

bus and begins sending <strong>data</strong> towards the SIU as 17 32-bit words<br />

packets. Each packet is built as a header word containing exter-<br />

67


68<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

nally hardwired informations and 16 <strong>data</strong> words coming out <strong>of</strong> the<br />

FIFO. When the FIFO is empty or it does not contain 16 <strong>data</strong><br />

words, no valid <strong>data</strong> is sent to the SIU. Otherwise if a FIFO begins<br />

to acquire large quantities <strong>of</strong> <strong>data</strong> and the connection to the SIU<br />

is not still open (a RDYRX command has not been received yet)<br />

a <strong>data</strong>-stop signal is asserted for stopping the <strong>data</strong> stream coming<br />

into CARLOS from AMBRA.<br />

3.4.4 Tests performed on CARLOS v1<br />

The test <strong>of</strong> the CARLOS prototype has been carried on using the pattern<br />

generator and logic analyzer HP16700A at the <strong>INFN</strong> Section in<br />

Torino. Data were injected on the first connector, analyzed on the<br />

second connector, while the third one has been connected to a SIU<br />

extender board, which directly connects to the SIMU board. The SIU<br />

extender is very useful for debugging purposes since it provides 5 logic<br />

analyzer compatible connectors for analyzing signals being exchanged<br />

in the interface CARLOS-SIU. Here follows a list <strong>of</strong> the test performed<br />

on CARLOS:<br />

1. functional test and <strong>compression</strong> algorithm verification;<br />

2. opening <strong>of</strong> a transaction by manually pushing buttons on the<br />

SIMU board;<br />

3. event <strong>data</strong> transmission from CARLOS to the SIMU. The SIMU<br />

does not store <strong>data</strong>, so the only way to check if <strong>data</strong> are correct<br />

on not is by using the logic analyzer.<br />

Prototype test was especially useful in order to design a perfectly compatible<br />

interface towards the SIU. The main difficulty in testing the<br />

interface towards the SIU without a SIU board is due to the presence<br />

<strong>of</strong> bidirectional pads: it is quite a difficult job to work with such pads<br />

using a pattern generator.<br />

Many corrections had to be applied to the original version in order to


3.5 — CARLOS v2<br />

have a 100% compatible interface. The final VHDL version was then<br />

frozen and then used for the ASIC <strong>implementation</strong> <strong>of</strong> CARLOS v2.<br />

The VHDL model, in fact, does not depend on the technology chosen<br />

for the <strong>implementation</strong> and is completely re-usable.<br />

3.5 CARLOS v2<br />

The first CARLOS prototype has been very useful for testing the <strong>compression</strong><br />

algorithm on a huge amount <strong>of</strong> <strong>data</strong> and for correctly designing<br />

complex blocks as the interface towards the SIU, but it has many limitations<br />

if compared to the final version we need to design. So far we<br />

decided to pass to a second prototype <strong>of</strong> CARLOS with the following<br />

features:<br />

– 40 MHz clock frequency;<br />

– 8 macro-channels parallel processing;<br />

– small size for an easier use in test-beam environment;<br />

– a JTAG port for downloading the Huffman look-up tables, the<br />

threshold and tolerance values .<br />

The CARLOS chip design has been logically divided into two main<br />

parts, the first one designed in Torino and the second one in <strong>Bologna</strong>:<br />

– a <strong>data</strong> compressor on 8 incoming streams, using the 1D <strong>compression</strong><br />

algorithm. The compressor accepts 8-bit input <strong>data</strong> and gives<br />

as output 32-bit words containing the variable length codes.<br />

– a <strong>data</strong> packing and formatting block, a multiplexer selecting which<br />

one <strong>of</strong> the 8 incoming streams has to be sent in output and an<br />

interface block towards the SIU.<br />

As you can see in Fig. 3.9 the main sub-blocks are 6: firstcheck, barrel,<br />

fifo, event-counter, outmux, feesiu.<br />

69


70<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Figure 3.9: CARLOS v2 schematic blocks


3.5 — CARLOS v2<br />

3.5.1 The firstcheck block<br />

The I/O signals are:<br />

– input<strong>data</strong>: input 32-bit bus;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– load: output signal;<br />

– addressvalid: output 5-bit bus;<br />

– <strong>data</strong>valid: output 26-bit bus.<br />

The firstcheck block takes as input the compressed codes coming from<br />

the <strong>compression</strong> block and selects the useful bits while rejecting the<br />

dummy ones. In fact the 32-bit input word has the following structure:<br />

– bit 31: under-run bit: when set to 1 it means that incoming <strong>data</strong><br />

are dummy and have to be discarded; this may happen, for example,<br />

when the run length encoder is packing long zeros sequences,<br />

thus temporarily interrupting the <strong>data</strong> flow towards the SIU.<br />

– bit 30 to 26: this 5-bit word contains the actual number <strong>of</strong> bits<br />

that have to be selected by the following logic block, the barrel<br />

shifter.<br />

– bit 25 to 0: this 26-bit word contains the compressed code.<br />

The real interesting bits are usually much less than 26, thus obtaining<br />

a reduction in the <strong>data</strong> stream volume.<br />

The firstcheck behavior is quite simple: when the reset signal is active<br />

(active high) all outputs are set to 0; when reset is inactive the<br />

firstcheck block samples the under-run bit value: when 1 all outputs are<br />

set to 0, when 0 load is set to 1, addressvalid is assigned inpu<strong>data</strong>(30<br />

downto 26) and <strong>data</strong>valid is assigned input<strong>data</strong>(25 downto 0).<br />

71


72<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

3.5.2 The barrel shifter block<br />

The I/O signals are:<br />

– input: input 26-bit bus;<br />

– sel: input 5-bit bus;<br />

– load: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– end-trace: input signal;<br />

– output-push: output signal;<br />

– output: output 32-bit bus.<br />

The barrel shifter has to pack all the valid <strong>data</strong> coming out from the<br />

firstcheck block into a fixed-length 32-bit register word to be put in output:<br />

in this way all dummy <strong>data</strong> are rejected and we have no more any<br />

distinction between <strong>data</strong>-length and <strong>data</strong> itself. All <strong>data</strong> are packed in<br />

the same word and can be easily reconstructed by using the Huffman<br />

tree decoding scheme. If an input <strong>data</strong> cannot be completely stored<br />

into a 32-bit word, it is broken into 2 pieces: the first as the MSBs <strong>of</strong><br />

the current output so to completely fill it, the second as the LSBs <strong>of</strong><br />

the following valid output word.<br />

When the reset is active all internal registers and outputs are set to<br />

0, when the reset is inactive the barrel shifter begins to wait for valid<br />

<strong>data</strong> coming from the firstcheck block, that is <strong>data</strong> with the load signal<br />

set to 1. When it happens the barrel shifter selects the valid bits from<br />

input and packs them together in a 64-bit circular register word. When<br />

32 bits are written on the register, the block asserts a signal outputpush<br />

high to communicate to the following block (the FIFO) that the<br />

output is valid and has to be stored.<br />

Two situations are very important for the barrel shifter working properly:<br />

when the load signal changes from 1 to 0 the barrel stops packing


3.5 — CARLOS v2<br />

<strong>data</strong> and when load turns to 1 again the barrel begins packing <strong>data</strong> as<br />

if no pause had happened.<br />

The end-trace signal is asserted for one clock period in coincidence with<br />

the last valid <strong>data</strong>: this <strong>data</strong> has to be packed together with the others,<br />

then the 32-bit word has to be pushed in output (by putting outputpush<br />

to 1) even if it is not complete. After the end-trace and after<br />

the last valid word has been sent to output the barrel shifter puts n<br />

zero words as valid outputs: that number depends on how many words<br />

have been sent to output from the beginning <strong>of</strong> the current event. In<br />

fact the total number <strong>of</strong> valid words per event has to be an integer<br />

multiple <strong>of</strong> 16. So far if (16k + 7) words have been sent in output after<br />

the end-trace gets active n=9 zero words follows with output-push set<br />

to 1. This condition is strictly related to the <strong>data</strong> transmission policy<br />

and multiplexing <strong>of</strong> the 8 incoming <strong>data</strong> streams onto a single 32-bit<br />

output, as will be explained in the next paragraph.<br />

3.5.3 The fifo block<br />

The I/O signals are:<br />

– <strong>data</strong>in: input 32-bit bus;<br />

– ck: input signal;<br />

– push: input signal;<br />

– pop: input signal;<br />

– reset: input signal;<br />

– empty: output signal;<br />

– full: output signal;<br />

– query: output signal;<br />

– <strong>data</strong>out: output 32-bit bus.<br />

The fifo block contains a double-port RAM block with 64 32-bits words<br />

plus some control logic. Its purpose is to buffer the input <strong>data</strong> stream<br />

73


74<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

and derandomize the queues that are waiting to be served by the outmux<br />

block. The buffer memory has to be large enough so to allow <strong>data</strong><br />

storing when the other queues are being served, since we have to avoid<br />

block conditions. On the other side it cannot be too large since CAR-<br />

LOS hosts 8 fifo blocks and the chip area is a strong design constraint.<br />

The fifo allows 3 main storage operations:<br />

– write only;<br />

– read only;<br />

– read/write at the same time but at different cell locations.<br />

The FIFO allows to write <strong>data</strong> coming from the barrel shifter and to<br />

read them when the queue has to be served by the outmux block. The<br />

most important feature is that read and write operations can be executed<br />

in parallel. In order to accomplish this feature the control logic<br />

provides two pointers named address-write and address-read. They run<br />

from 0 to 63 and then back to 0 in a circular way: obviously address-read<br />

has always to follow address-write, otherwise we would be extracting<br />

invalid <strong>data</strong> from the memory. Data is written in the fifo and the<br />

address-write pointer is incremented by one when the input push is set<br />

to 1: the input push <strong>of</strong> the fifo isthesamesignalastheoutput-push<br />

one from the barrel. In this way when the barrel shifter has an output<br />

valid, it is written in a free location <strong>of</strong> the fifo at the next clock cycle.<br />

The RAM read phase is activated by the pop input signal: for every<br />

clock cycle in which pop is 1, the <strong>data</strong> value corresponding to addressread<br />

is taken in output <strong>data</strong>out and then the pointer address-read is<br />

incremented by 1. When both push and pop are set to 1 the fifo is<br />

read and written at the same time and the distance between the two<br />

pointers remains constant. Three important signals are:<br />

– query signal: the query signal is set to 1 when the memory contains<br />

at least 16 valid <strong>data</strong>, that is when the distance among the two<br />

pointers is greater or equal to 16. The query signal is used at<br />

the outmux block where a priority encoding based arbiter decides


3.5 — CARLOS v2<br />

which <strong>of</strong> the 8 queues has to be served in output. When a fifo<br />

blockisservedbytheoutmux, the number <strong>of</strong> total valid words<br />

decreases and the signal query comesbackto0. Itcanhappen<br />

that the signal query remains to 1 if more than 32 valid words were<br />

stored in the fifo. In this case it is possible that the fifo might be<br />

read again. All depends on how many queues are sending queries<br />

for being emptied to the scheduler.<br />

– empty signal: the empty signal is set to 1 when the fifo does not<br />

contain any valid <strong>data</strong>, that is when address-write and addressread<br />

have the same value and are pointing to the same memory<br />

location. This signal will be used by the feesiu block in order to<br />

decide when all the 8 queues have been completely emptied and a<br />

new <strong>data</strong> set can enter CARLOS.<br />

– full signal: the full signal is very important since it is backpropagated<br />

to the compressor block in order to assert the fact that<br />

the FIFO is getting full and the input stream has to be stopped.<br />

The compressor block will back-propagate this full signal to the<br />

AMBRA chip which will stop sending <strong>data</strong> to CARLOS. Obviously<br />

the full signal has to be asserted before the FIFO is really<br />

full, otherwise some input <strong>data</strong> would be lost. For this reason the<br />

fifo full signal works between 2 thresholds: 32 and 48: the full<br />

signal goes high when the fifo contains more than 48 valid words,<br />

then it comes back to 0 only when the fifo has been served by the<br />

outmux block, that is when the fifo contains less than 32 valid<br />

words. With this trick the risk for the fifo to get completely full<br />

is reduced, at least if the queues arbiter is fair enough with every<br />

input stream.<br />

3.5.4 The event-counter block<br />

The I/O signals are:<br />

– end-trace: input signal;<br />

75


76<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

– ck: input signal;<br />

– reset: input signal;<br />

– event-id: output 3-bit bus.<br />

The event-counter block is a very simple 3-bit binary counter used<br />

to assign a number to every physical event, at least for being able to<br />

easily discriminate consecutive events. When the reset is active internal<br />

registers and outputs are put to 0, then, when the reset is inactive,<br />

the event-counter block increments by one its output signal event-id<br />

every time it samples the end-trace signal at logic level 1. The endtrace<br />

feeding the event-counter block is a signal coming from the feesiu<br />

block called all-fifos-empty. This signal is asserted for two clock periods<br />

when all the 8 end-trace signals have been set to 1 and when all the<br />

8 queues have been completely emptied. For this purpose CARLOS<br />

contains a global end-trace signal which is activated when all the 8<br />

local end-traces have been high for at least one clock period; it is not<br />

strictly necessary that a temporal overlap exists between the 8 signals.<br />

Nevertheless, this means that the global end-trace will never be put to<br />

1 if some <strong>of</strong> the local end-traces are not used and remain stuck at 0.<br />

After an end-trace global is activated, the feesiu block begins waiting<br />

for the 8 FIFOs being emptied: as soon as this happens the all-fifosempty<br />

signal is activated and the event-id signal is incremented by one.<br />

The signal all-fifos-empty stays at logical level 1 for two consecutive<br />

clock periods: nevertheless the event-id counter is incremented only by<br />

1. The value <strong>of</strong> event-id is used in the outmux block and it is sent to<br />

the SIU as a part <strong>of</strong> the header word. We thought that 3 bits could be<br />

sufficient to discriminate the events and for putting them in the right<br />

order during <strong>data</strong> de<strong>compression</strong> and reconstruction stages.<br />

3.5.5 The outmux block<br />

The I/O signals are:<br />

– indat7 : input 32-bit bus;


– indat6 : input 32-bit bus;<br />

– indat5 : input 32-bit bus;<br />

– indat4 : input 32-bit bus;<br />

– indat3 : input 32-bit bus;<br />

– indat2 : input 32-bit bus;<br />

– indat1 : input 32-bit bus;<br />

– indat0 : input 32-bit bus;<br />

– reset: input signal;<br />

– ck: input signal;<br />

– query: input 8-bit bus;<br />

– event-id: input 3-bit bus;<br />

– enable-read: input signal;<br />

3.5 — CARLOS v2<br />

– half-ladder-id: input 7-bit bus;<br />

– good-<strong>data</strong>: output signal;<br />

– read : output 8-bit bus;<br />

– output: out 32-bit bus.<br />

The outmux block has two distinct functions in the overall logic:<br />

– multiplexing the 8 compressed and packed streams onto a single<br />

32-bit output (femux sub-block);<br />

– deciding which queue has to be served using a priority encoding<br />

based arbiter (ppe sub-block).<br />

The femux and ppe blocks implement the following 17-word <strong>data</strong> packet<br />

transmission protocol (see Fig. 3.10):<br />

– a 32-bit header;<br />

– 16 32-bit <strong>data</strong> words, all coming from one macrochannel and from<br />

one event.<br />

77


78<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Figure 3.10: 17-bit word <strong>data</strong> transmission protocol<br />

The header contains the following information from MSB to LSB:<br />

– half ladder id (7 bits): this number is hardwired externally to each<br />

CARLOS chip, depending on the ladder it will be connected to;<br />

– packet sequence number (10 bits): this is a 10-bit wide counter<br />

incremented once a packet is transmitted, i.e. every 17 <strong>data</strong> words;<br />

– cyclic event number (3 bits): this is the event number coming from<br />

the event-counter block;<br />

– available bits (9 bits): these will be used in a future expansion <strong>of</strong><br />

CARLOS;<br />

– half detector id (3 bits): every half ladder contains 8 half detectors.<br />

They are numbered from 0 to 7 and this number is provided by<br />

the macro-channel being served.<br />

Let’s take a look at the 2 sub-blocks <strong>of</strong> the outmux :


3.5 — CARLOS v2<br />

– femux is a multiplexer with nine 32-bit inputs and a 9-bit selection<br />

bus. The 9 <strong>data</strong> inputs are the header and the 8 input channels<br />

coming from the FIFOs. The selection bus value is given by the<br />

queues scheduler: this bus contains all zeros but one.<br />

– ppe stands for programmable priority encoder. It is a completely<br />

combinatorial block with two inputs and one output: request (8<br />

bits) contains the query signals coming from the 8 macro-channels;<br />

priority (8 bits) is a bus containing only one 1 and all the other<br />

bits at 0; served (8 bits), like priority, contains only one bit at<br />

logic level 1 and this bit indicates which <strong>of</strong> the 8 macro-channels<br />

has to be served from the femux.<br />

The programmable priority encoder works in a very simple way:<br />

it scans the request bus starting from the bit stuck at 1 in the<br />

priority bus until it finds a 1. Its bit position from 0 to 7 corresponds<br />

to the channel chosen by the arbiter. At the next choice<br />

that the arbiter has to take, the priority bus value is updated in<br />

the following way: the served bus value is shifted on the right as if<br />

it were a circular register and its value is assigned to the priority<br />

bus. In this way we avoid the risk <strong>of</strong> a queue being served many<br />

times consecutively in spite <strong>of</strong> other queues making requests. An<br />

example will easily clarify this situation: request = 10100010, priority<br />

= 00010000, served = 00000010. At the next clock cycle,<br />

the value ”00000001” will be assigned to the priority bus. There<br />

are several possible <strong>implementation</strong>s for a scheduling algorithm<br />

based on a programmable priority encoder: they differ in area<br />

and timing requirements. We chose the <strong>implementation</strong> used in<br />

the Stanford University’s Tiny Tera prototype as described in [18].<br />

I’ll try now to explain how the outmux block works: the outmux block<br />

is stopped and it is initialized when the reset signal is active. When<br />

the reset is inactive, the outmux block begins waiting for the enableread<br />

signal to get active. This is a signal coming from the feesiu block:<br />

when low it states that the link between the SIU and CARLOS has<br />

79


80<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

not been initialized yet or it means that temporarily the SIU cannot<br />

accept <strong>data</strong>. When the enable-read is high, the SIU is able to receive<br />

<strong>data</strong> from CARLOS, so the outmux block begins evaluating the value<br />

<strong>of</strong> the query bus. When its value is low it means that no macro-channel<br />

has still required to be served, otherwise the ppe block decides which<br />

queue to send in output. The first word served as output is the header<br />

word containing the information on the macro-channel being served<br />

and other information as stated above in the paragraph. In order to<br />

get the 16 <strong>data</strong> words to send as output, the outmux block has to<br />

provide the right pop signal to send to one <strong>of</strong> the 8 FIFOs. The 8<br />

pop signals to the FIFOs are grouped in the 8-bit read bus; <strong>of</strong> course<br />

only one bit at a time will be asserted. Signal read(7) will be sent to<br />

fifonew7, read(6) to fifonew6 and so on, as to extract 16 valid <strong>data</strong><br />

from the FIFO. Since we want to send <strong>data</strong> to the SIU at a 20 MHz<br />

clock (half the system clock frequency) the pop signal cannot be stuck<br />

at 1 for 16 clock periods but it is alternatively 0 and 1 in order to get<br />

a <strong>data</strong> word out from the FIFO one clock period every two. When<br />

the outmux block is putting in output the 17 words <strong>of</strong> the packet, the<br />

output signal good-<strong>data</strong> is set to 1 in order to grant the feesiu block<br />

that it is receiving significant <strong>data</strong>. While sending the last <strong>data</strong> word<br />

<strong>of</strong> a packet, the outmux block updates the priority bus value as stated<br />

above and examines the query bus value, then it computes the right<br />

served value. If served is not 0, that is if any request has occurred, the<br />

outmux block begins sending in output an other packet, without any<br />

interruptions (there are not wasted clock periods), otherwise the block<br />

stops waiting for a new request to be asserted. If the enable-read turns<br />

from 1 to 0 when transmitting <strong>data</strong>, the outmux block sends only an<br />

other valid word in output, then stops and waits for the enable-read<br />

signal to be restored to its active value: then it continues sending <strong>data</strong><br />

to the feesiu block as if no pause had really occurred. The outmux<br />

block itself provides to increment the 10-bit packet sequence number<br />

after every packet has been completely transmitted.<br />

The reason why a 20 MHz clock has been chosen is related to the


3.5 — CARLOS v2<br />

total optical fibre bandwidth to be used by CARLOS: 800 Mbits/s. If<br />

CARLOS puts in output 32-bit <strong>data</strong> at 40 MHz the total bandwidth<br />

required is 1.280 Gbits/s, while at 20 MHz only 640 Mbits/s. For this<br />

reason a half-frequency <strong>data</strong> rate has been chosen as the final one.<br />

3.5.6 The feesiu (toplevel) block<br />

The I/O signals are:<br />

– huffman7 : input 32-bit bus;<br />

– huffman6 : input 32-bit bus;<br />

– huffman5 : input 32-bit bus;<br />

– huffman4 : input 32-bit bus;<br />

– huffman3 : input 32-bit bus;<br />

– huffman2 : input 32-bit bus;<br />

– huffman1 : input 32-bit bus;<br />

– huffman0 : input 32-bit bus;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– end-trace7 : input signal;<br />

– end-trace6 : input signal;<br />

– end-trace5 : input signal;<br />

– end-trace4 : input signal;<br />

– end-trace3 : input signal;<br />

– end-trace2 : input signal;<br />

– end-trace1 : input signal;<br />

– end-trace0 : input signal;<br />

– fidir: input signal;<br />

81


82<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

– fiben-n: input signal;<br />

– filf-n: input signal;<br />

– half-ladder-id: input 7-bit bus;<br />

– wait-request7 : output signal;<br />

– wait-request6 : output signal;<br />

– wait-request5 : output signal;<br />

– wait-request4 : output signal;<br />

– wait-request3 : output signal;<br />

– wait-request2 : output signal;<br />

– wait-request1 : output signal;<br />

– wait-request0 : output signal;<br />

– foclk: output signal;<br />

– fbten-n: bidirectional signal;<br />

– fbctrl-n: bidirectional signal;<br />

– fobsy-n: output signal;<br />

– fbd: bidirectional 32-bit bus.<br />

The VHDL feesiu block contains all the other block instances (see<br />

Fig. 3.11) and the logic working as interface with the SIU board. So<br />

far the feesiu block contains 8 instances <strong>of</strong> firstcheck, 8 instances <strong>of</strong><br />

barrel, 8 instances <strong>of</strong> fifo, 1 instance <strong>of</strong> event-counter and 1 instance <strong>of</strong><br />

outmux. However we can imagine the feesiu block as the block taking<br />

<strong>data</strong> from the outmux block and directly interfacing the SIU board, as<br />

if it were at the same hierarchical level as the other blocks. In Fig. 3.9<br />

the feesiu block is represented exactly in this fashion.<br />

3.5.7 CARLOS-SIU interface<br />

Let’s now take a look the interface signals between CARLOS and the<br />

SIU and how the communication protocol has been implemented:


3.5 — CARLOS v2<br />

Figure 3.11: Design hierarchy <strong>of</strong> CARLOS v1<br />

– fidir: it’s an input to CARLOS. It asserts the direction <strong>of</strong> the<br />

<strong>data</strong> flow between CARLOS and the SIU: when low, <strong>data</strong> flow is<br />

directed from the SIU to CARLOS, otherwise <strong>data</strong> flow is directed<br />

from CARLOS to the SIU.<br />

– fiben-n: it’s an input to CARLOS, active low. It enables the communication<br />

on the bidirectional buses between CARLOS and the<br />

SIU. When low, communication is enabled, otherwise communication<br />

is disabled.<br />

– filf-n: it’s an input to CARLOS, active low, ”lf” stands for link<br />

full. When the SIU is no longer able to accept <strong>data</strong> coming from<br />

CARLOS, it puts this signal active. When this happens CARLOS<br />

sends an other valid <strong>data</strong> word, then stops transmitting waiting<br />

for the filf-n signal to be asserted again. This is the signal used by<br />

the SIU to implement the back-pressure on the <strong>data</strong> flow running<br />

from the front-end to the <strong>data</strong> acquisition system.<br />

– foclk: it is a free running clock generated on CARLOS and driving<br />

83


84<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

the CARLOS-SIU interface. It is a 20 MHz clock generated by<br />

dividing the system clock frequency by 2. Interface signals coming<br />

from the SIU are triggered on the falling edge <strong>of</strong> foclk.<br />

– fbten-n: it is a bidirectional signal, active low, it can be driven by<br />

CARLOS or by the SIU, ”ten” stands for transfer enable. When<br />

CARLOS is assigned to drive the bidirectional buses (when fidir<br />

is high and fiben-n is 0) fbten-n value is asserted from CARLOS: it<br />

turns to its active state when CARLOS is transmitting valid <strong>data</strong><br />

to the SIU, otherwise it is inactive. When the SIU is assigned<br />

to drive the bidirectional buses (when fidir is 0 and fiben-n is<br />

0) fbten-n value is asserted from the SIU: it turns to its active<br />

state when the SIU is transmitting valid commands to CARLOS,<br />

otherwise it is inactive.<br />

– fbctrl-n: it is a bidirectional signal, active low, it can be driven by<br />

CARLOS or by the SIU, ”ctrl” stands for control. When CARLOS<br />

is assigned to drive the bidirectional buses (when fidir is 1 and<br />

fiben-n is 0) fbctrl-n value is asserted from CARLOS: it turns<br />

to its active state when CARLOS is transmitting a Front End<br />

Status Word to the SIU, otherwise, when in the inactive state,<br />

CARLOS is sending normal <strong>data</strong> to the SIU. When the SIU is<br />

assigned to drive bidirectional buses (when fidir is 0 and fiben-n<br />

is 0) fbctrl-n value is asserted from the SIU: it turns to its active<br />

state when sending command words to CARLOS, to its inactive<br />

state when sending <strong>data</strong> words. The second option has not been<br />

implemented on CARLOS since we decided that CARLOS needs<br />

only commands and not <strong>data</strong> from the SIU. Other detectors use<br />

this option in order to download <strong>data</strong> to the detector itself: this<br />

is the case, for example, <strong>of</strong> the Silicon Pixel Detector.<br />

– fobsy-n: it is an input signal to the SIU, active low, ”bsy” stands<br />

for busy. CARLOS should put this signal active when not able<br />

to accept <strong>data</strong> coming from the SIU. Since CARLOS has not to<br />

receive <strong>data</strong> from the SIU, this signal has been stuck at 1, meaning


3.5 — CARLOS v2<br />

that CARLOS will never be in a busy state. In fact it always has<br />

to accept command words coming from the SIU.<br />

– fbd: it is a bidirectional 32-bit bus on which <strong>data</strong> or command<br />

words are exchanged between CARLOS and the SIU.<br />

This is the way the communication protocol works: the SIU acts as the<br />

master and CARLOS acts as the slave, i.e. the SIU sends commands to<br />

CARLOS and CARLOS sends <strong>data</strong> and front end status words to the<br />

SIU. At first the link CARLOS - SIU has to be initialized and the SIU<br />

acts as the master <strong>of</strong> the bidirectional buses. So CARLOS waits for the<br />

bidirectional buses to be driven from the SIU (fidir is 0 and fiben-n is<br />

0) and waits for a valid (fbten-n = 0) command (fbctrl-n =0)named:<br />

Ready to Receive (RDYRX). This command is always used in order<br />

for a new event transaction to begin. The RDYRX command contains<br />

a transaction identifier (bits 11 to 8) and the string ”00010100” as the<br />

less significant bits.<br />

As the command is accepted and recognized, CARLOS waits for the<br />

fidir signal to change value in order to take possession <strong>of</strong> the bidirectional<br />

buses, then, if the filf-n is not active, it is able to send valid<br />

<strong>data</strong> on the fbd bus if the good-<strong>data</strong> signal is active. In this state,<br />

CARLOS sends valid <strong>data</strong> <strong>of</strong> an event to the SIU only when some<br />

queues are making requests <strong>of</strong> being served in output, otherwise the<br />

feesiu stops sending <strong>data</strong> by putting the fbten-n signal to 1. When<br />

an end-trace signal has arrived on each macrochannel and every queue<br />

has been completely emptied (no more <strong>data</strong> <strong>of</strong> a particular event are<br />

stored in CARLOS yet), CARLOS puts in output the Front End Status<br />

Word (FESTW), a word that confirms that no errors occurred and<br />

that the whole event has been successfully transferred to the SIU. The<br />

FESTW contains the Transaction Id code received upon the opening <strong>of</strong><br />

the transaction (bits 11 to 8) and the 8-bit FESTW code ”01100100”.<br />

After this happens CARLOS begins to wait for some action <strong>of</strong> the SIU<br />

to be taken: it means that the SIU can decide to take back its control<br />

on the bidirectional buses and close the <strong>data</strong> link towards the <strong>data</strong> ac-<br />

85


86<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

quisition system, or the SIU can leave the bidirectional buses control to<br />

CARLOS for an other <strong>data</strong> event to be sent. So far, CARLOS begins<br />

waiting 16 foclk periods: if nothing happens, CARLOS is able to begin<br />

sending <strong>data</strong> again without the need to receive some other commands<br />

from the SIU; if the SIU takes back the possession <strong>of</strong> the bidirectional<br />

buses, CARLOS closes the link towards the SIU and keeps waiting for<br />

an other RDYRX command raised from the SIU itself.<br />

The feesiu block implements this communication protocol with the SIU<br />

using a simple state-machine: for example state 0 is the state in which<br />

CARLOS is waiting for a command <strong>of</strong> initialization from the SIU, state<br />

1 is the state in which CARLOS sends <strong>data</strong> from the SIU, state 2 in<br />

which CARLOS sends the front end status word to the SIU, state 3<br />

in which CARLOS waits 16 foclk periods waiting for some action from<br />

the SIU to happen.<br />

An important feature <strong>of</strong> CARLOS realized in the feesiu blockisthe<br />

following one: CARLOS cannot accept a new event before the previous<br />

one has been completely sent in output, otherwise we run into the<br />

risk <strong>of</strong> mixing <strong>data</strong> belonging to different events. The only way CAR-<br />

LOS has to implement back-pressure on the AMBRA chips is using the<br />

wait-request signals. So far the wait-request signal has to avoid that<br />

CARLOS fetches new input <strong>data</strong> values while emptying the FIFOs.<br />

For this reason a new signal, dont-send-<strong>data</strong>, has been introduced for<br />

every macro-channel which turns to 1 when the end-trace is activated<br />

and turns back to 0 when all the FIFOs are completely empty. So<br />

the wait-request <strong>of</strong> every macro-channel is obtained by putting in OR<br />

the full and dont-send-<strong>data</strong> signals. The feesiu acknowledges that all<br />

the FIFOs have been emptied using the empty signal <strong>of</strong> every FIFO<br />

block. When all the 8 signals turn to 1 the feesiu block raises the allfifos-empty<br />

signal which stands at logical level 1 for at least two clock<br />

periods in order to be sensed by the foclk clock. The all-fifos-empty signal<br />

is also used to trigger the event-counter block: in fact the number<br />

<strong>of</strong> total events is exactly the same as the total number <strong>of</strong> occurrences<br />

<strong>of</strong> the all-fifos-empty signal. An other signal, end-trace-global is set to


3.6 — CARLOS v2 design flow<br />

Figure 3.12: Digital design flow for CARLOS v2<br />

1 only if all the local end-trace signals have been put to 1 for at least<br />

one clock period in the current event. From the moment in which the<br />

end-trace-global is asserted and when the all-fifos-empty is activated<br />

no new input <strong>data</strong> set can enter CARLOS.<br />

3.6 CARLOS v2 design flow<br />

Fig. 3.12 illustrates the digital design flow for CARLOS v2. The front<br />

end steps are exactly the same as the ones followed in the design <strong>of</strong><br />

CARLOS v1. The only difference is the library used, being, in this<br />

case, the Alcatel Mietec 0.35 µm digital library provided via Europractice.<br />

This is a very rich library since it contains more than 200<br />

differents standard cells and RAM blocks with several dimensions. A<br />

87


88<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Figure 3.13: Layout <strong>of</strong> the ASIC CARLOS v2<br />

RAM generator s<strong>of</strong>tware allows the designer to get a macrocell with<br />

the exact number <strong>of</strong> words and bits per word as requested: in our case<br />

a 64 32-bit macrocell instantiated 8 times, one for macrochannel.<br />

The back end steps were carried out at IMEC using the Avant! s<strong>of</strong>tware<br />

Acquarius. We could not succeed to get a license <strong>of</strong> this s<strong>of</strong>tware<br />

due to the high cost (more then 100k$ for a license), while no other<br />

available s<strong>of</strong>tware, such as Cadence, was able to work with the design<br />

kit provided. The final physical layout is depicted in Fig. 3.13. The<br />

chip has a total area <strong>of</strong> 30 mm2 containing 300 k standard cells, 180


3.7 — Tests performed on CARLOS v2<br />

I/O pads and 24 RAM blocks.<br />

After the design <strong>of</strong> the layout, IMEC sent us the post-layout netlist<br />

and a SDF file (Standard Delay Format) containing the information<br />

on each net and cell delay for post-layout simulation with the same<br />

test-benches already used for pre-layout simulation. This is usually an<br />

iterative process since, if some simulation problems arise, the layout<br />

has to be re-designed. Luckily due to the relatively small working frequency<br />

(40 MHz) (the technology adopted can easily work up to 200<br />

MHz) the post-layout simulation gave no problems and the design was<br />

then sent to the foundry.<br />

3.7 Tests performed on CARLOS v2<br />

After receiving from the Alcatel Mietec foundry 20 samples <strong>of</strong> naked<br />

chips (without any package), they have been directly bonded on the<br />

test PCB at the <strong>INFN</strong> <strong>of</strong> Torino, one sample per PCB. The test PCB<br />

shown in Fig 3.14, especially designed for testing CARLOS v2 and for<br />

its use in test beam <strong>data</strong> taking, contains the following:<br />

– 5 2x10 pins DIL connectors pin compatible with the pattern generator<br />

and logic analyzer HP16600/16700A pods;<br />

– 2 Mictor 38 connectors;<br />

– a DIP switch providing a facility to setup the hardwired parameters,<br />

such as the half ladder ID;<br />

– filter capacitors for a total capacity greater than 100 nF;<br />

– buffers for preserving CARLOS input pads integrity.<br />

After testing the JTAG control unit on CARLOS, the connection towards<br />

the SIMU was successfully tested: after the SIMU opens a transaction,<br />

CARLOS takes possession <strong>of</strong> the bidirectional buses and starts<br />

sending <strong>data</strong>. After these tests, the SIMU has been replaced by the<br />

SIU board and all the <strong>data</strong> acquisition system, i.e. DIU (Destination<br />

89


90<br />

1D <strong>compression</strong> algorithm and <strong>implementation</strong>s<br />

Figure 3.14: CARLOS v2 test board<br />

Interface Unit) and PCI RORC (Read Out Receiver Card) directly connected<br />

to a PC. So far testing CARLOS behavior with huge amounts<br />

<strong>of</strong> <strong>data</strong> becomes easier to simply use the Logic State Analyzer and the<br />

complete <strong>data</strong> acquisition system can be used to acquire <strong>data</strong> in test<br />

beams.


Chapter 4<br />

2D <strong>compression</strong> algorithm<br />

and <strong>implementation</strong><br />

This chapter contains a brief description <strong>of</strong> the 2D algorithm [19] conceived<br />

at the <strong>INFN</strong> Section <strong>of</strong> Torino and a first <strong>implementation</strong> attempt<br />

in ASIC with the third prototype <strong>of</strong> CARLOS.<br />

4.1 2D <strong>compression</strong> algorithm<br />

The 2D algorithm operates a <strong>data</strong> reduction based on a two-threshold<br />

discrimination and a two-dimensional analysis along both the drift time<br />

axis and the SDD anode axis. The proposed scheme allows for a better<br />

understanding <strong>of</strong> the neighbourhoods <strong>of</strong> the SDD signal clusters,<br />

thus improving their reconstructability and also provides a statistical<br />

monitoring <strong>of</strong> the background features for each SDD anode.<br />

4.1.1 Introduction<br />

As shown in Chapter 3, due to the presence <strong>of</strong> noise a simple singlethreshold<br />

one-dimensional zero suppression does not allow a good clus-<br />

91


92<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

ter reconstruction in all circumstances. Indeed in order to obtain a<br />

good <strong>compression</strong> factor using the 1D algorithm a threshold <strong>of</strong> about<br />

three times the RMS <strong>of</strong> the noise has to be used. Such threshold <strong>of</strong>ten<br />

determines a rather sharp cut <strong>of</strong> the tails <strong>of</strong> the anode signals containing<br />

high samples and, more important, it can completely suppress the<br />

anodic signals with small values which are on the sides <strong>of</strong> the cluster.<br />

Both these sharp cuts, particularly the latter, can significantly affect<br />

the spatial resolution. Though samples below a 3 RMS threshold have<br />

small information contents, it is conceivable that, in the more accurate<br />

<strong>of</strong>f-line analysis, they can help to improve the pattern recognition and<br />

the fitting <strong>of</strong> the cluster features. In order to read out small-amplitude<br />

samples without increasing too much the collection <strong>of</strong> the noise, a twothreshold<br />

algorithm can be used, so that small samples that satisfy a<br />

low threshold are collected only when, along the drift direction, they<br />

are near to samples satisfying a high threshold. Since the charge cloud<br />

diffuses in two orthogonal directions for symmetry reasons and due the<br />

previous considerations, the two-threshold method should be applied<br />

along the anode axis too. We want that such a two-threshold twodimensional<br />

<strong>data</strong> <strong>compression</strong> and zero suppression algorithm satisfy<br />

the following criteria:<br />

– the values <strong>of</strong> the samples, in the neighbourhood <strong>of</strong> a cluster, be<br />

available both for an accurate measurement <strong>of</strong> the characteristics<br />

<strong>of</strong> the clusters and for a good monitoring and understanding <strong>of</strong><br />

the characteristics <strong>of</strong> the background;<br />

– the statistical nature <strong>of</strong> the suppressed samples be available to<br />

monitor the noise level <strong>of</strong> the anodes and to obtain their baseline<br />

values, which have to be subtracted from the cluster samples in<br />

order to obtain a correct measurement <strong>of</strong> the related charge.<br />

Here follows a description <strong>of</strong> the studied algorithm: the <strong>data</strong> reduction<br />

algorithm is applied to the resulting matrix <strong>of</strong> 256 rows by 256<br />

columns like the one shown in the upper part <strong>of</strong> Fig. 4.1. Each matrix<br />

element expresses an 8-bit quantized amplitude. A row represents a


4.1 — 2D <strong>compression</strong> algorithm<br />

Figure 4.1: Example <strong>of</strong> the digitized <strong>data</strong> produced by a half SDD<br />

time sequence <strong>of</strong> the samples from a single SDD anode and a column<br />

represents a spatial snapshot <strong>of</strong> the simultaneous anode outputs for an<br />

instant <strong>of</strong> time. For each charge cloud we expect several high values in<br />

one or more columns and rows. This extension in both time and space<br />

thus requires that correlations in both dimensions be preserved for future<br />

analysis. We refer to correlations within a column as space-like<br />

and correlations within a row as time-like. Therefore, in the proposed<br />

two-threshold two-dimensional algorithm, the high threshold TH must<br />

be satisfied by a pixel value in order that it be part <strong>of</strong> a cluster, and the<br />

93


94<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

W<br />

N<br />

C<br />

S<br />

Figure 4.2: Neighbourhood <strong>of</strong> the pixel C<br />

low threshold TL leads to the registering <strong>of</strong> a pixel whose value satisfies<br />

it, if adjacent to an other pixel satisfying TH. In this way the lower<br />

value pixels on the border <strong>of</strong> a cluster are encoded thus ensuring that<br />

the tails <strong>of</strong> the charge distribution are retrieved.<br />

Within this framework, a cluster is redefined operationally as a set <strong>of</strong><br />

adjacent pixels whose values tend to stand out above the background.<br />

In the described algorithm there is a trade-<strong>of</strong>f in the definition <strong>of</strong> such<br />

a cluster, which lies in the definition <strong>of</strong> adjacency. We have considered<br />

as adjacent (or neighbour) to the (i, j) element, the pixels for which<br />

only one <strong>of</strong> the two indexes change by 1: so far the neighbour pixels are<br />

(i − 1,j), (i +1,j), (i, j − 1) and (i, j + 1). Thus a correlation involves<br />

a quintuple composed <strong>of</strong> a central (C) pixel and its north (N), south<br />

(S), east (E) and west (W) neighbours only (see Fig. 4.2. In order to<br />

monitor the statistical nature <strong>of</strong> the suppressed samples, the number <strong>of</strong><br />

zero quantized values (due either to negative analog values <strong>of</strong> the noise<br />

or to baseline equalization), and the numbers <strong>of</strong> samples satisfying TH<br />

and TL are recorded. The background average and standard devia-<br />

tion are obtained by applying a minimization procedure to the three<br />

counted <strong>data</strong>. An aspect <strong>of</strong> this reduction algorithm allows the conservation<br />

<strong>of</strong> information about the background both near and far from the<br />

clusters. When the thresholds are properly chosen, statistically, pairs<br />

and a few triplets <strong>of</strong> background pixels not associated with a particleproduced<br />

cluster will satisfy the described discrimination criteria and<br />

E


4.1 — 2D <strong>compression</strong> algorithm<br />

Figure 4.3: Cluster in two dimensions and its slices along the anode direction<br />

provide consistency information on the background statistics, assumed<br />

to be Gaussian white noise. At the same time single high background<br />

peaks are suppressed as zeros (if they do not have at least one neighbour<br />

that satisfies at least the low threshold) so as not to overload the<br />

<strong>data</strong> acquisition and to allow an efficient zero suppression. The only<br />

parameters needed as input to the 2D <strong>compression</strong> algorithm are the<br />

two thresholds, TH, TL and the baseline equalization values.<br />

4.1.2 How the 2D algorithm works<br />

The 2D algorithm makes use <strong>of</strong> two threshold values:<br />

– a high threshold TH for cluster selection;<br />

– a low threshold TL so to collect information around the selected<br />

cluster.<br />

The algorithm retains <strong>data</strong> belonging to a cluster and around a cluster<br />

in the following way (as graphically shown as an example in Fig. 4.3):<br />

– the pixel matrix is scanned searching for values higher than the<br />

TH value (70 in Fig. 4.3);<br />

– the pixels positioned around the previously selected ones are accepted<br />

if higher than the low threshold value TL (40 in Fig. 4.3),<br />

otherwise they are rejected;<br />

95


96<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

– thus a cluster is defined and cluster values are saved exactly as<br />

they are: other pixels, not belonging to clusters, are discarded;<br />

– if a pixel value higher than the TH value is found but it has not<br />

pixel values higher than TL around its value is rejected. This is<br />

the case <strong>of</strong> the 78 value on the bottom-left corner in Fig. 4.3 which<br />

is discarded, even it its value is greater than the high threshold<br />

value.<br />

– pixel values belonging to a cluster are encoded using a simple lookup<br />

table method, assigning long codes to non-frequent values and<br />

short codes to frequent symbols.<br />

So far in Fig. 4.3, after applying the 2D <strong>compression</strong> algorithm, only<br />

the shadowed values are stored, while the other value ares erased. The<br />

2D algorithm is conceptually very simple to understand, but it is quite<br />

more complex than the 1D for what concerns <strong>hardware</strong> <strong>implementation</strong>.<br />

In fact having to perform a bi-dimensional analysis <strong>of</strong> the pixel array<br />

implies the need <strong>of</strong> storing all the information on a digital buffer on<br />

CARLOS, thus requiring a larger silicon surface and a higher cost.<br />

4.1.3 Compression coefficient<br />

Fig. 4.4 shows the 2D <strong>compression</strong> coefficient as a function <strong>of</strong> the high<br />

threshold value, calculated using <strong>data</strong> coming from the test beam <strong>of</strong><br />

September 1998. The 2D <strong>compression</strong> algorithm reaches a <strong>compression</strong><br />

ratio<strong>of</strong>22choosingTHvalue <strong>of</strong> 1.5 noise RMS and TL <strong>of</strong> 1.2 noise<br />

RMS. It is to be remembered that the 1D <strong>compression</strong> algorithm had<br />

to use a threshold level <strong>of</strong> 3 noise RMS in order to reach the target<br />

<strong>compression</strong> ratio. So far the 2D algorithm shows higher performances<br />

than the 1D since it reaches the target <strong>compression</strong> ratio, while losing<br />

a lower amount <strong>of</strong> physical information. This is the main reason why<br />

the 2D algorithm has been chosen as the one that will be implemented<br />

on the final version <strong>of</strong> CARLOS.


4.1 — 2D <strong>compression</strong> algorithm<br />

Figure 4.4: 2D <strong>compression</strong> coefficient ratio as a function <strong>of</strong> the high<br />

threshold<br />

4.1.4 Reconstruction error<br />

Even for what concerns the reconstruction error, the 2D algorithm<br />

proves to have better performances than 1D. In fact the difference values<br />

between cluster centroid position before and after <strong>compression</strong> are<br />

fitted by a Gaussian distribution centered around the 0 value with a<br />

σ value <strong>of</strong> 10 µm along the drift time direction and 10 µm alongthe<br />

anode direction, choosing 1.5 noise RMS for TH and 1.2 noise RMS for<br />

TL. So far the 2D algorithm manages to achieve a better cluster center<br />

resolution than 1D by keeping track <strong>of</strong> more pixel values around the<br />

cluster center. Moreover the 2D algorithm introduces a smaller bias on<br />

the reconstructed charge than 1D with a value <strong>of</strong> around 3 %, meaning<br />

that the reconstructed cluster charge is 3 % lower than before <strong>compression</strong><br />

- de<strong>compression</strong> steps.<br />

Beside that the 2D algorithm is very useful for what concerns the study<br />

<strong>of</strong> the noise distribution: in fact monitoring the couples <strong>of</strong> noise samples<br />

passing the double threshold filter allows to recover information<br />

on the average and on the standard deviation <strong>of</strong> the Gaussian noise<br />

distribution. This is quite important for checking how the signal to<br />

background ratio changes in time.<br />

97


98<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

If used in lossless mode, the 2D <strong>compression</strong> ratio is 1.3 versus the<br />

2.3 value obtained using the lossless version <strong>of</strong> the 1D algorithm: this<br />

requires a more complex second level compressor in counting room, in<br />

order to reach the target <strong>compression</strong> ratio <strong>of</strong> 22, in the case the 2D<br />

<strong>compression</strong> algorithm cannot be applied to <strong>data</strong>. In fact there are<br />

some cases in which it might prove no longer desirable the use <strong>of</strong> the<br />

2D <strong>compression</strong> algorithm: for example when the baseline value is not<br />

constant through the 256 samples <strong>of</strong> an anode row. This is the case <strong>of</strong><br />

the present version <strong>of</strong> the PASCAL chip, which introduces a slope in<br />

each anode row baseline and, what is worst, the slope value varies from<br />

different rows. It is obvious that a fixed double-threshold compressor,<br />

as the one explained in this Chapter, cannot deal with this problem. So<br />

far the foreseen solution is to eliminate the baseline slope in the final<br />

version <strong>of</strong> PASCAL. If this proves to be not possible or if a baseline<br />

with slope behavior emerges after some working time, the use <strong>of</strong> the<br />

2D algorithm can no longer be accepted. In this case <strong>data</strong> <strong>compression</strong><br />

on CARLOS has to be switched <strong>of</strong>f and a second level compressor algorithm<br />

implemented directly in counting room will do the job.<br />

4.2 CARLOS v3 vs. the previous prototypes<br />

There are several differences between CARLOS v3 and the previous<br />

versions. This is a brief list containing the most important ones:<br />

1. CARLOS v1 and v2 were meant to work in a radiation free environment,<br />

since, when they were designed, the problem <strong>of</strong> radiation<br />

had not been faced yet. So far commercial technologies such as<br />

Xilinx FPGAs or Alcatel Mietec design kit have been chosen for<br />

prototype <strong>implementation</strong>. The necessity for CARLOS to work in<br />

a radiation environment emerged some times after sending CAR-


4.2 — CARLOS v3 vs. the previous prototypes<br />

LOS v2 to the foundry. The radiation level CARLOS has to withstand<br />

is in the range from 5 to 15 krads. This led us to the search<br />

<strong>of</strong> a radiation-safe technology.<br />

One <strong>of</strong> the possible solutions is given by SOI (Silicon On Insulator)<br />

technology which provide a complete radiation resistance. This is<br />

the case for instance <strong>of</strong> the 0.8 µm DMILL technology that is being<br />

widely used even in satellite applications at ESA (European<br />

Space Agency). The problem related to this technology in mainly<br />

one: the cost is too high for our budget. So far we decided to<br />

choose a commercial technology, IBM 0.25 µm, with a library <strong>of</strong><br />

standard cells designed to be radiation tolerant up to some Mrads.<br />

The library has been designed by the EP-MIC group at CERN.<br />

2. Mechanical constraints emerged not allowing the use <strong>of</strong> the SIU in<br />

the end-ladder zone, since it is far too big for the space available.<br />

Another problem concerning the SIU is that this device cannot<br />

safely work in a radiation environment since it contains commercial<br />

devices, such as ALTERA PLDs. Finally the laser driver<br />

hosted on the SIU board has a mean life <strong>of</strong> a few years, while we<br />

are looking for something lasting until the end <strong>of</strong> the experiment<br />

<strong>data</strong> taking.<br />

These considerations led us to change all the readout architecture<br />

from CARLOS to the DAQ. Instead <strong>of</strong> directly interfacing<br />

the SIU, CARLOS v3 interfaces the radiation-tolerant serializer<br />

GOL chip (Gigabit Optical Link) [20]. Serial <strong>data</strong> is then sent<br />

to the counting room using a 200 m long optic fibre, deserialized<br />

using a commercial deserializer device and then sent to the SIU<br />

board using a FPGA device named CARLOS-rx that is still to<br />

be designed. This final readout architecture is shown in details in<br />

Fig. 4.5.<br />

3. CARLOS v3 contains only 2 <strong>data</strong> processing channels, versus the<br />

8 hosted in the two previous prototypes. This choice was due to<br />

99


100<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

the need <strong>of</strong> reducing the ASIC complexity and to greatly reduce<br />

the possibility <strong>of</strong> losing <strong>data</strong> in case <strong>of</strong> chip failure. In fact if<br />

a CARLOS v2 chip breaks down for some reasons, <strong>data</strong> coming<br />

from a half-ladder, i.e. from 4 detectors, is completely lost until<br />

the chip is substituted with a working one. On the other side, if<br />

a CARLOS v3 chip breaks down, only <strong>data</strong> coming from an SDD<br />

detector are lost. So far a 2-channel version <strong>of</strong> CARLOS provides<br />

a greater failure resistance and is far less complex.<br />

4. CARLOS v3 contains a preliminary interface with the TTCrx chip<br />

that distributes trigger signals and the clock to the end-ladder<br />

board.<br />

5. CARLOS v3 also contains a BIST structure (Built In Self Test)<br />

for a quick test <strong>of</strong> the chip itself issued via the JTAG port.<br />

Figure 4.5: The final readout chain


4.3 — The final readout architecture<br />

4.3 The final readout architecture<br />

The chosen architecture for the final readout system introduces new<br />

items to carry on and new problems to solve.<br />

For instance splitting CARLOS in 4 chips makes every chip much simpler<br />

to design, test and control (CARLOS v2 is a very complex and<br />

difficult to debug chip), but moving the SIU board in counting room<br />

implies the design <strong>of</strong> the CARLOS-rx device taking <strong>data</strong> from 4 deserializer<br />

chips and feeding <strong>data</strong> to the SIU.<br />

Beside that, putting a 200 m distance between CARLOS and the SIU<br />

implies that no back-pressure can be used: in fact if the SIU asserts<br />

the filf − n signal, meaning that it cannot accept further <strong>data</strong> starting<br />

from the following foclk signal, CARLOS receives this information<br />

after 2 µs, i.e. after 40 foclk cycles. So far the CARLOS-rx chip has<br />

to contain a well-sized FIFO buffer chip to store <strong>data</strong> when the SIU is<br />

not able to accept them.<br />

The role <strong>of</strong> the JTAG link is shown in Fig. 4.6. In the new architecture<br />

a transaction can be opened and closed via the JTAG link, instead <strong>of</strong><br />

using the 32-bit bus fbd. The JTAG link is obtained serializing the<br />

5-bit JTAG port coming from the SIU for transmission to the frontend<br />

zone through an optic fibre, then the HAL (Hardware Abstraction<br />

Layer) chip performs the serial to parallel conversion for distributing<br />

the JTAG signals to the PASCAL, AMBRA and CARLOS chips. A<br />

rad-hard version <strong>of</strong> the HAL chip has to be implemented yet.<br />

Currently we plan to use a commercial pair <strong>of</strong> chips for serializingdeserializing<br />

<strong>data</strong> from Agilent Technologies: in the final architecture<br />

the serializer chip will be substituted with the rad-hard Gigabit Optical<br />

Link (GOL) chip designed by the Marchioro group at CERN. This chip<br />

is a multi-protocol high-speed transmitter ASIC, wich is able to withstand<br />

high doses <strong>of</strong> radiation. The IC supports two standard protocols,<br />

the G-Link and GBit-Ethernet and sustains transmission <strong>data</strong> at both<br />

800 Mbits/s and 1.6 Gbits/s. The ASIC was implemented using CERN<br />

library 0.25 µm CMOS technology employing radiation tolerant layout<br />

101


102<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

Figure 4.6: Final readout chain zoom<br />

techniques.<br />

A problem concerning the use <strong>of</strong> the GOL chip is to be solved yet: the<br />

TTCrx chip distributes to all front-end chips a clock with a maximum<br />

jitter <strong>of</strong> around 300 ps. This is not a problem for AMBRA and CAR-<br />

LOS ICs working at 40 MHz but it proves to be a big problem for the<br />

GOL chip, since it contains an internal PLL to multiply the incoming<br />

40 MHz clock by 20 or 40, so to get an internal 800 MHz or 1.6 GHz<br />

frequency. The PLL shows some synchronization problems with the<br />

incoming clock if the input jitter is greater than 100 ps. This problem<br />

has still to be faced and solved.<br />

4.4 CARLOS v3<br />

CARLOS v3 is our first prototype tailored to fit in the new readout<br />

architecture. The main new features <strong>of</strong> this chip are:


4.5 — CARLOS v3 building blocks<br />

– two processing channels;<br />

– the radiation tolerant technology chosen.<br />

Nevertheless CARLOS v3 does not contain the complete 2D <strong>compression</strong><br />

algorithm as would be expected. We made this choice in order to<br />

acquire experience with a small chip with the new technology and with<br />

the new layout techniques since we had to carry out the layout design<br />

task. Taking into account that the CERN 0.25 µm library contains a<br />

small number <strong>of</strong> standard cells and they are not so well characterized as<br />

commercial ones, we decided to try the new design flow and new technology<br />

with a simple chip: the result is CARLOS v3, that has been<br />

sent to the foundry in November 2001 and will be tested starting from<br />

February 2002.<br />

As a <strong>compression</strong> block, CARLOS v3 only hosts the simple encoding<br />

scheme conceived as the final part <strong>of</strong> the 2D algorithm. Nevertheless if<br />

CARLOS v3 proves to be perfectly working, it will be used to acquire<br />

<strong>data</strong> in the test beams and will allow us to build and test the foreseen<br />

readout architecture.<br />

4.5 CARLOS v3 building blocks<br />

Fig. 4.7 shows the main building blocks <strong>of</strong> CARLOS v3. The complete<br />

design <strong>of</strong> CARLOS v3 has been carried out in <strong>Bologna</strong>: I have worked<br />

on the VHDL models, while other people worked on the C++ models<br />

<strong>of</strong> the same blocks. Each block has been designed both in VHDL and<br />

C++, so to allow an easy verification and debugging process.<br />

The main two processing channels are the ones with encoderbo, barrel15,<br />

fifonew32x15 and the outmux blocks: theseblockstake<strong>data</strong><br />

coming from the AMBRA chips, encode them using a lossless <strong>compression</strong><br />

algorithm, pack them into 15-bit words and store them in a FIFO<br />

memory before sending them in output to the GOL chip one channel<br />

after the other.<br />

103


104<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

Figure 4.7: CARLOS v3 building blocks


4.5 — CARLOS v3 building blocks<br />

The channel containing the ttc-rx-interface and fifo-trigger15x12 receives<br />

trigger numbers (bunch counter and event counter) from the<br />

TTCrx chip and sends them in output at the beginning <strong>of</strong> each <strong>data</strong><br />

packet. The event-counter block is a local event number generator providing<br />

a further information to be added to the event number coming<br />

from the TTCrx chip: this gives us a greater confidence <strong>of</strong> being able to<br />

reconstruct <strong>data</strong> and to find errors if present. Then a trigger-interface<br />

block handles the trigger signals L0, L1 and L2 coming from the Central<br />

Trigger Processor (CTP) through the TTCrx chip. A Command<br />

Mode Control Unit (CMCU ) receives commands issued through the<br />

JTAG port and puts CARLOS in one <strong>of</strong> some logic states: running,<br />

idle, bist and so on. Finally the BIST blocks on chip are based on a<br />

pseudo-random pattern generator and a signature maker circuit. Next<br />

paragraph contain a detailed description <strong>of</strong> these blocks.<br />

4.5.1 The channel block<br />

The channel block is the main processing unit contained in CARLOS<br />

for <strong>data</strong> encoding, packing and storing. It is composed by three blocks:<br />

encoderbo, barrel15 and fifonew32x15. Two identical channel blocks<br />

are hosted on CARLOS v3.<br />

4.5.2 The encoder block<br />

The I/O signals are:<br />

– value: input 8-bit bus;<br />

– value-strobe: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– <strong>data</strong>: output 10-bit bus;<br />

– field: output 4-bit bus;<br />

105


106<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

Input range Output code Total<br />

0-1 1 bit + 000 4bits<br />

2-3 1 LSB bit + 001 4bits<br />

4-7 2 LSB bits + 010 5bits<br />

8-15 3 LSB bits + 011 6bits<br />

16-31 4 LSB bits + 100 7bits<br />

32-63 5 LSB bits + 101 8bits<br />

64-127 6 LSB bits + 110 9bits<br />

128-255 7 LSB bits + 111 10 bits<br />

Table 4.1: Lossless <strong>compression</strong> algorithm encoding scheme<br />

– valid: output signal.<br />

The encoderbo block encodes 8-bit input <strong>data</strong> in variable length codes<br />

in the range from 4 to 10 bits long in a completely lossless way. Table<br />

4.1 contains a detailed description <strong>of</strong> the encoding mechanism. This<br />

encoding scheme provides a <strong>compression</strong> on input <strong>data</strong> based on the<br />

knowledge <strong>of</strong> the statistics <strong>of</strong> the stream: in fact small-value <strong>data</strong> are<br />

much more probable than high-value ones. So far most input <strong>data</strong> will<br />

be reduced from 8 to 4 or 5 bits, providing some degree <strong>of</strong> <strong>compression</strong>.<br />

Indeed it is possible that locally, in time, this compressor may provide<br />

an expansion <strong>of</strong> <strong>data</strong>: in fact if a long sequence <strong>of</strong> values greater than<br />

127 occur, the encoderbo block provides as output a stream <strong>of</strong> 10-bit<br />

<strong>data</strong>, that have to be temporarily stored in a FIFO buffer. Here is<br />

a description <strong>of</strong> how the block actually works: when the input signal<br />

value-strobe is high, the 8-bit input value is encoded in the 10-bit output<br />

<strong>data</strong> and the valid output signal is asserted. The field output signal<br />

is assigned the number <strong>of</strong> bits actually containing information in the<br />

10-bit <strong>data</strong> register. The block is synchronous with the rising edge <strong>of</strong><br />

the clock, while the reset signal is active high and asynchronous.


4.5 — CARLOS v3 building blocks<br />

Figure 4.8: Graphical description <strong>of</strong> how the barrel shifter works<br />

4.5.3 The barrel15 block<br />

The I/O signals are:<br />

– input: input 8-bit bus;<br />

– sel: input 4-bit bus;<br />

– load: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– end-trace: input signal;<br />

– output-push: output signal;<br />

– output: output 15-bit bus.<br />

The barrel15 is the block packing the 4 to 10 bits variable length codes<br />

coming from the encoderbo block to a fixed length 15-bit word. Data<br />

are packed as shown in Figure 4.8. The barrel block makes use <strong>of</strong> two<br />

internal 15-bit registers, so to be able to break an input <strong>data</strong> in two<br />

pieces without losing any information: when the first word is put in<br />

output by putting the output signal output-push low, the second word<br />

is used to store the input <strong>data</strong>. The latency <strong>of</strong> the barrel block is <strong>of</strong><br />

107


108<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

2 clock periods: it means that it takes 2 clock periods before a word<br />

is packed by the barrel15 block. When the input signal end-trace is<br />

asserted, meaning that this is the last <strong>data</strong> belonging to the current<br />

event, the current value in the internal register is put in output even if<br />

it is not completely full: not defined bits are put to 0.<br />

Data coming from the barrel can be easily reconstructed by starting<br />

from the 3 LSBs <strong>of</strong> the first barrel word containing the information <strong>of</strong><br />

how many bits have to be selected on the left side <strong>of</strong> the code. By<br />

going on in this way from the LSB to the MSB <strong>of</strong> every valid word, it<br />

is possible to retrieve all the encoded information.<br />

4.5.4 The fifonew32x15 block<br />

The I/O signals are:<br />

– push-req-n: input signal;<br />

– pop-req-n: input signal;<br />

– diag-n: input signal;<br />

– <strong>data</strong>-in: input 15-bit bus;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– empty: output signal;<br />

– almost-empty: output signal;<br />

– half-full: output signal;<br />

– almost-full: output signal;<br />

– full: output signal;<br />

– error: output signal;<br />

– <strong>data</strong>out: output 15-bit bus.<br />

The fifonew32x15 block has the purpose <strong>of</strong> storing information coming<br />

out from the barrel shifter. The multiplexing scheme that has been


4.5 — CARLOS v3 building blocks<br />

chosen cannot avoid the use <strong>of</strong> buffers before the multiplexer: in fact<br />

since the output <strong>data</strong> is fairly allocated 50 % <strong>of</strong> the time to both channels<br />

(one clock period for channel 0, the next clock period for channel<br />

1 and so on) and since the encoding algorithm can locally, in time, behave<br />

as an expansor, <strong>data</strong> has to be locally stored before multiplexing.<br />

The only decision that has to be taken is about FIFO dimensions: we<br />

have chosen a FIFO containing 32 words coming from the barrel shifter<br />

(32x15 bits) in order to take into account the worst possible input <strong>data</strong><br />

stream. The problem we have faced designing the FIFO block is the<br />

following one: a FIFO is usually composed <strong>of</strong> a dual port RAM block<br />

plus some logic for <strong>implementation</strong> <strong>of</strong> the First In First Out phylosophy.<br />

This is for example what has been done in CARLOS v2. Nevertheless<br />

the CERN library 0.25 µm only provides one size <strong>of</strong> RAM memories,<br />

that is 64x32 bits size. This block is at least 4 times bigger than the<br />

block dimensions we need (2048 bits versus 480). Beside that it is quite<br />

difficult, if not impossible, to share the same RAM block between two<br />

different FIFO designs: the idea to share the FIFOs <strong>of</strong> the two channels<br />

is quite difficult to implement since the number <strong>of</strong> read/write ports has<br />

to be doubled. So far we decided to design a flip-flop based RAM for<br />

the FIFO taken from the “Designer Foundation” library provided together<br />

with our design s<strong>of</strong>tware Synopsys. This is a library containing<br />

IP (Intellectual Property) blocks ready to be inserted into a design such<br />

as logic and arithmetic blocks, RAMs and application-specific blocks,<br />

for instance for error checking and correction or for a JTAG controller.<br />

The idea is: it is completely useless that every ASIC designer loses<br />

time while designing a block that is necessary to hundreds <strong>of</strong> other designers<br />

in all over the world. With this idea in mind, many IP libraries<br />

have been collected such as the one provided by Synopsys we have been<br />

making use <strong>of</strong>.<br />

This is the behavior <strong>of</strong> the fifonew32x15 block: a push is executed when<br />

the push-req-n input is asserted (low) and either the full flag is inactive<br />

(low)orthefull flag is active and the pop-req-n input is asserted (low).<br />

So far a push can occur even if the FIFO is full, as long as a pop is<br />

109


110<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

executed in the same cycle period. Asserting push-req-n in either <strong>of</strong><br />

the above cases causes the <strong>data</strong> at the <strong>data</strong>-in port to be written to<br />

the next available location in the FIFO. A pop operation occurs when<br />

pop-req-n is asserted (LOW), as long as the FIFO is not empty. Asserting<br />

pop-req-n causes the internal read pointer to be incremented on<br />

the next rising edge <strong>of</strong> ck. Thus the RAM read <strong>data</strong> must be captured<br />

on the ck following the assertion <strong>of</strong> pop-req-n. Push and pop can occur<br />

at the same time if there is <strong>data</strong> in the FIFO, even when the FIFO is<br />

full. In this case first the pop <strong>data</strong> is captured by the next stage <strong>of</strong><br />

logic after the FIFO and then the new <strong>data</strong> is pushed into the same<br />

location from which the <strong>data</strong> was popped. So far there is no conflict in<br />

a simultaneous push and pop when the FIFO is full. A simultaneous<br />

push and pop cannot occur when the FIFO is empty since there is no<br />

pop<strong>data</strong>toprefetch.<br />

The FIFO block contains some important flags such as empty, almostfull,<br />

full. Theempty flag indicates that there are no words in the FIFO<br />

availabletobepopped. Thealmost-full flag is asserted when there<br />

are no more than 8 empty locations left in the FIFO. This number is<br />

used as a threshold and is very useful for preventing the FIFO from<br />

overflowing. When this flag is asserted the <strong>data</strong>-stop signal, output<br />

from CARLOS, is sent to the AMBRA chip asking to stop the <strong>data</strong><br />

stream transmission. AMBRA requires 3 clock cycles before it actually<br />

stops sending <strong>data</strong> to CARLOS. So far the threshold level 8 chosen<br />

for the FIFO design has to take into account for these 3 clock periods<br />

delay due to AMBRA and for the latency due to the encoder and barrel<br />

blocks. So far this flag is very useful for managing <strong>data</strong> transmission<br />

between AMBRA and CARLOS without losing any <strong>data</strong>. The last flag<br />

full indicates that the FIFO is full and there is no space available for<br />

pushing <strong>data</strong>. If AMBRA - CARLOS communication works well this<br />

flag should never be asserted. Fig. 4.9 shows the FIFO timing waveforms<br />

during the push phase, while Fig. 4.10 shows the FIFO timing<br />

waveforms during the pop phase.


4.5 — CARLOS v3 building blocks<br />

Figure 4.9: FIFO timing waveforms during the push phase<br />

Figure 4.10: FIFO timing waveforms during the pop phase<br />

4.5.5 The channel-trigger block<br />

The channel-trigger block has the purpose <strong>of</strong> getting trigger numbers<br />

from the TTCrx chip and store them before they are multiplexed and<br />

sent to the GOL chip. It is composed by two different blocks: the<br />

111


112<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

ttc-rx-interface and the fifo-trigger block.<br />

4.5.6 The ttc-rx-interface block<br />

The I/O signals are:<br />

– TTCready: input signal;<br />

– BCnt: 12-bit input bus;<br />

– BCntLStr: input signal;<br />

– EvCntLStr: input signal;<br />

– EvCntHStr: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– BCnt-reg: output 12-bit bus;<br />

– EvCntL-reg: output 12-bit bus;<br />

– EvCntH-reg: output 12-bit bus.<br />

The ttc-rx-interface block receives trigger information from the TTCrx<br />

chip when the input signal TTCready coming from the TTCrx chip<br />

is high, meaning that the TTCrx is ready. When BCntStr is high,<br />

the 12-bit input word is fetched in the register BCnt-reg, the same for<br />

EvCntLStr and EvCntHStr for the MSB and LSB <strong>of</strong> the 24-bit word<br />

event counter. Following a L2accept signal active the values <strong>of</strong> these<br />

three registers are written into 3 memory locations <strong>of</strong> the fifo-trigger<br />

block. Since the event can be discarded until the final confirmation<br />

arrives through signal L2accept it is necessary to wait for such a signal<br />

before storing them in the FIFO.<br />

4.5.7 The fifo-trigger block<br />

This block is logically equivalent to the FIFO block except for what<br />

concerns dimensions: its size is 15x12 words. During the transmission


4.5 — CARLOS v3 building blocks<br />

<strong>of</strong> a complete event from AMBRA to CARLOS lasting for 1.6 ms, up<br />

to four events can be stored in the AMBRA chip, so far CARLOS has<br />

to process 4 triplets <strong>of</strong> incoming signals L0, L1accept and L2accept.<br />

Thus a 15 words deep FIFO is necessary for storing bunch counter and<br />

event counter information concerning 5 consecutive accepted events.<br />

When CARLOS is ready to send a <strong>data</strong> packet in output, the first 3<br />

trigger words are read and taken to the outmux block. So far a correct<br />

synchronization between <strong>data</strong> being sent and trigger information is preserved.<br />

Output flags from the fifo-trigger block empty, almost-full and<br />

full are not used by other blocks as a control since we do not expect to<br />

have a buffer overflow due to the structure <strong>of</strong> the AMBRA chip.<br />

4.5.8 The event-counter block<br />

The I/O signals are:<br />

– end-trace: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– event-id: output 3-bit bus.<br />

A local event counting is performed on CARLOS thanks to the eventcounter<br />

block. It is a very simple 3-bit counter triggered by the eventident<br />

signal coming from the outmux block: this signals asserts that an<br />

event has been completely transmitted and a new one can be accepted.<br />

This number is used both in the header and in the footer words for a<br />

safer transmission protocol.<br />

4.5.9 The outmux block<br />

The I/O signals are:<br />

– indat1 : input 15-bit bus;<br />

113


114<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

– indat0 : input 15-bit bus;<br />

– trigger-<strong>data</strong>: input 12-bit bus;<br />

– reset: input signal;<br />

– ck: input signal;<br />

– gol-ready: input signal;<br />

– fifo-empty: input 2-bit bus;<br />

– half-ladder-id: input 7-bit bus;<br />

– all-fifos-empty: input signal;<br />

– event-id: input 3-bit bus;<br />

– no-input-<strong>data</strong>: input signal;<br />

– event-identifier: output signal;<br />

– read-<strong>data</strong>: output 2-bit bus;<br />

– read-trigger: output signal;<br />

– output-strobe: output signal;<br />

– output: output 16-bit bus.<br />

The outmux block is a multiplexing unit for sending in output <strong>data</strong><br />

coming from the two main processing channels in an interlaced way,<br />

meaning that during the even clock periods <strong>data</strong> coming from channel<br />

1 are put in output, while during the odd clock periods <strong>data</strong> coming<br />

from channel 0 are served.<br />

This is the way the outmux block behaves: as soon as <strong>data</strong> begin to fill<br />

the two FIFO blocks the outmux block begins to put in output a packet<br />

like the one shown in Fig. 4.11. The first 3 16-bit words contain trigger<br />

informations coming from the trigger channel, the first word contains<br />

the bunch counter, while second and third word contain event counter<br />

MSBs and LSBs respectively. Since trigger informations are 12-bit long<br />

they are added the bits 1011 as MSBs in order to be able to recognize<br />

them easily in a later phase <strong>of</strong> <strong>data</strong> reconstruction.<br />

Follow two header words containing the local event-id number and the


4.5 — CARLOS v3 building blocks<br />

Figure 4.11: CARLOS v3 <strong>data</strong> transmission protocol<br />

externally hardwired information half-ladder-id. The MSBs from the<br />

header word are 110.<br />

Headers are followed by an even number <strong>of</strong> <strong>data</strong> words containing <strong>data</strong><br />

from the two main channels: if a channel has not valid <strong>data</strong> to send,<br />

the MSB is put to 1 and all the other bits are set to 0, meaning that a<br />

dummy <strong>data</strong> is sent in output, otherwise the MSB is set to 0 meaning<br />

that the <strong>data</strong> word is valid.<br />

The <strong>data</strong> packet is then concluded with the transmission <strong>of</strong> two footer<br />

words containing the same information <strong>of</strong> the header regarding the<br />

event-id number and the number <strong>of</strong> words being sent in output. The<br />

MSBs are set to 1, so to uniquely identify the footer word type.<br />

The outmux block puts in output the 16-bit <strong>data</strong> words and the signal<br />

output-strobe. When this signal is high, CARLOS is transmitting <strong>data</strong><br />

belonging to a packet, while when low CARLOS is not sending useful<br />

115


116<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

information to the GOL chip. When the gol-ready signal coming from<br />

the GOL chip goes low, meaning that it has lost synchronization with<br />

the input clock, CARLOS stops sending <strong>data</strong> and begins transmission<br />

again only when gol-ready goes high. The outmux block also puts in<br />

output the 2-bit signal read-<strong>data</strong> that is sent in input to the 2 main<br />

FIFOsasapop signal and the signal read-trigger sent to the FIFOtrigger<br />

block. The block outmux also asserts the signal event-ident,that<br />

is used as a trigger for the event-counter block. The input signal allfifos-empty<br />

is a signal that puts an end to the <strong>data</strong> packet transmission<br />

since the end <strong>of</strong> an event has been reached: in fact after the occurrence<br />

<strong>of</strong> the input signals <strong>data</strong>-end1 and <strong>data</strong>-end0 high values, CARLOS<br />

waits until both FIFOs get empty in order to assert the all-fifo-empty<br />

signal. This triggers the end <strong>of</strong> an event transmission.<br />

4.5.10 The trigger-interface block<br />

The I/O signals are:<br />

– reference-count-trigger: input 8-bit bus;<br />

– L0 : input signal;<br />

– L1accept: input signal;<br />

– L2accept: input signal;<br />

– L2reject: input signal;<br />

– dis-trigger: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– busy: output signal;<br />

– trigger: output signal;<br />

– abort: output signal.<br />

This block accepts as inputs the trigger signals L0, L1accept, L2accept<br />

and L2reject. Follows a brief description <strong>of</strong> how these signals can be


4.5 — CARLOS v3 building blocks<br />

used for accepting or rejecting an event for storage: the L0 signal is<br />

asserted 1.2 µs after the interaction; L1accept signal is asserted 5.5 µs<br />

after the interaction, if it is not asserted in time the event is rejected;<br />

L2accept is asserted after 100 µs from the interaction if the event is<br />

accepted, otherwise a L2reject signal is asserted before 100 µs. It means<br />

that either a L2accept signal or a L2reject signal is asserted.<br />

The trigger-interface block receives these inputs, processes them to<br />

build 3 other signals: trigger, busy and abort. The trigger signal is<br />

L0 delayed <strong>of</strong> a quantity <strong>of</strong> clock cycles programmable via JTAG and<br />

is distributed to the PASCAL and AMBRA chips. This is the signal<br />

triggering an event <strong>data</strong> acquisition on the PASCAL chip.<br />

The busy signal is asserted just after L0, then waits in the active state<br />

until 5.5 µs after the interaction. If the signal L1accept is not asserted,<br />

then busy goes low again, otherwise it stays active until the signal<br />

dis-trigger coming from AMBRA is activated. The meaning is the<br />

following: until PASCAL is transferring <strong>data</strong> to AMBRA the readout<br />

system is not ready to accept any other trigger signals, that is to acquire<br />

any other <strong>data</strong>. The time necessary for the transmission <strong>of</strong> an event<br />

from PASCAL to AMBRA is about 360 µs. Finally the abort signal<br />

that CARLOS sends to AMBRA is asserted when the L1accept signal is<br />

not asserted at the prefixed time or when the L2reject signal is asserted.<br />

The abort signal causes <strong>data</strong> transmission from PASCAL to AMBRA<br />

to end and <strong>data</strong> already stored are discarded.<br />

4.5.11 The cmcu block<br />

The I/O signals are:<br />

– tdi: input signal;<br />

– tms: input signal;<br />

– trst: input signal;<br />

– tck: input signal;<br />

117


118<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

Figure 4.12: CMCU logic state diagram<br />

– bist-ok-tcked: input signal;<br />

– bist-failure-tcked: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– reference-count-trigger: output 8-bit bus;<br />

– tdo: output signal;<br />

– state-tcked: output signal;<br />

– reset-pipe: output signal.<br />

The Command Mode Control Unit (cmcu) is CARLOS internal control<br />

unit remotely controlled via the JTAG port. Serial <strong>data</strong> coming from<br />

the JTAG pin tdi are packed into 8-bit words and interpreted as a very<br />

simple program containing commands and operands. Fig. 4.12 shows<br />

CARLOS working states reachable using the JTAG port.<br />

At power-on CARLOS is put in an IDLE state in which no calculation<br />

is performed. Then it can be put is a RESET-PIPELINE state in which


4.5 — CARLOS v3 building blocks<br />

an internal reset signal is asserted and all registers are initialized. The<br />

following state is the BIST (Built In Self Test) state in which CARLOS<br />

runs an internal test at working speed to check if everything is working<br />

fine or not, then depending on the test results CARLOS enters the<br />

BIST-FAILURE state or BIST-SUCCESS state. In case <strong>of</strong> success the<br />

8-bit word sent serially as output on tdo is A0, otherwise the word is<br />

55. In the state WRITE-REG CARLOS prepares to write an internal<br />

register with the value read via JTAG in the next state WRITE-REG-<br />

FETCH: this register contains the number <strong>of</strong> clock cycles <strong>of</strong> delay to be<br />

applied to the incoming L0 signal before passing it to the AMBRA chip.<br />

If needed, during the READ-REG stage the CARLOS user can read<br />

this value to check that no errors occurred during the writing phase<br />

by means <strong>of</strong> the tdo output JTAG pin. Then CARLOS can finally<br />

enter the RUNNING stage in which it is able to accept and process<br />

input <strong>data</strong> streams and to manage the interfaces towards the GOL and<br />

TTCrx chips. When CARLOS is not in RUNNING mode the busy<br />

signal is set high, meaning that no L0 trigger signal is accepted from<br />

the CTP and no <strong>data</strong> is transmitted to the GOL chip.<br />

4.5.12 The pattern-generator block<br />

The I/O signals are:<br />

– bist-start: input signal;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– <strong>data</strong>: output 8-bit bus;<br />

– <strong>data</strong>-valid: output signal;<br />

– <strong>data</strong>-end: output signal.<br />

The pattern generator block is part <strong>of</strong> the BIST utility implemented<br />

on CARLOS v3. The BIST [21, 22] is an in-circuit testing scheme for<br />

digital circuits in which both test generation and test verification are<br />

119


120<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

done by circuitry built into the chip itself. BIST schemes <strong>of</strong>fer three<br />

attractive advantages:<br />

1. they <strong>of</strong>fer a solution to the problem <strong>of</strong> testing large integrated<br />

circuits with limited number <strong>of</strong> I/O pins;<br />

2. they are useful for high speed testing since they can run at design<br />

speed;<br />

3. they do not require expensive external automatic test equipment<br />

(ATE).<br />

BIST schemes, in the most general sense, can have any <strong>of</strong> the following<br />

characteristics:<br />

– concurrent or non-concurrent operation: concurrent testing is designed<br />

to detect faults during normal circuit operation, while nonconcurrent<br />

testing requires that normal operation be suspended<br />

during testing. In CARLOS v3 non-concurrent operation has been<br />

chosen since we decided to use BIST only to check the correct behavior<br />

<strong>of</strong> the chip when <strong>of</strong>f-line.<br />

– exhaustive or non-exhaustive test design: an exhaustive test <strong>of</strong> a<br />

circuit requires that every intended state <strong>of</strong> circuit be shown to<br />

exist and that all transitions be demonstrated. For large sequential<br />

circuits as CARLOS this is not practical, so we decided to<br />

implement a non-exhaustive testing design.<br />

– deterministic or pseudo-random generation <strong>of</strong> test vectors: deterministic<br />

testing occurs when specific produced vectors have to be<br />

applied, while pseudorandom testing occurs when random-like test<br />

vectors are produced. We chose the pseudo-random generation<br />

since its <strong>implementation</strong> requires much less area than the deterministic<br />

generation. Pseudo-random generation on CARLOS v3<br />

is performed by the pattern generator block.<br />

The pattern generator block provides a set <strong>of</strong> 200 pseudo-random test<br />

vectors for BIST. These vectors are provided at the same time to both


4.5 — CARLOS v3 building blocks<br />

processing channels. The pseudo-random sequence is obtained using<br />

a linear feed-back shift register, that is a very simple structure and it<br />

requires a very small on-chip area.<br />

4.5.13 The signature-maker block<br />

The I/O signals are:<br />

– bist-vector: input 16-bit bus;<br />

– ck: input signal;<br />

– reset: input signal;<br />

– bist-strobe: output signal;<br />

– signature: output 16-bit bus.<br />

The signature maker block performs the signature analysis. In signature<br />

analysis, the test responses <strong>of</strong> a system are compacted into a<br />

signature using a linear feedback shift register (LFSR). Then the signature<br />

<strong>of</strong> the device under test is compared with the expected (reference)<br />

signature. If they both match, the device is declared fault free, otherwise<br />

it is declared faulty. Since several thousands <strong>of</strong> test responses are<br />

compacted into a few bits <strong>of</strong> signature by a LFSR, there is an information<br />

loss. As a result some faulty devices may have the same correct<br />

signature. The probability <strong>of</strong> a faulty device having the same signature<br />

<strong>of</strong> a working device is called the probability <strong>of</strong> aliasing. The probability<br />

<strong>of</strong> aliasing is shown to be approximately 2−m ,wheremdenotes the<br />

number <strong>of</strong> bits in the signature.<br />

The signature register implemented on CARLOS is 16 bits wide, so the<br />

probability <strong>of</strong> aliasing is 2−16 . The signature maker block takes the<br />

16-bit bist-vector word coming from the outmux block, performs the<br />

signature analysis, then, when the FIFO have been completely emptied,<br />

asserts the bist-strobe signal when the signature value is ready.<br />

121


122<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

Figure 4.13: Digital design flow for CARLOS v3<br />

4.6 Digital design flow for CARLOS v3<br />

Fig. 4.13 shows in some details the digital design flow we have used for<br />

the design <strong>of</strong> CARLOS v3 with the CERN library 0.25 µm. Since it is<br />

quite a recent library, we had to face some problems: for instance the<br />

small number <strong>of</strong> standard cells, the lack <strong>of</strong> 3-state buffers, the lack <strong>of</strong><br />

worst-case cell models, the fact that only Verilog models for cells and<br />

not VHDL models were provided and so on.<br />

The reason for these lacks has to be searched in the fact that up to now<br />

very few chips have been realized and tested using this library, so not<br />

so much characterization work could be done.<br />

So far we had to learn how to use the s<strong>of</strong>tware Cadence Verilog XL for


4.7 — CARLOS layout features<br />

post-synthesis simulations, since Synopsys allows to simulate VHDL<br />

models only. Our main difficulty was due to the necessity <strong>of</strong> using<br />

VHDL-written testbenches for logic simulation and Verilog-written ones<br />

for netlist simulation: this can be very error-prone since it is quite difficult<br />

to exactly match the two models together.<br />

Beside that we had to learn how to use Cadence Silicon Ensemble for<br />

the place and route job. This is really a very difficult job when the<br />

standard cells are not completely characterized. We received a great<br />

help from Marchioro group especially for what concerns the back-end<br />

design flow. They suggested us to follow a completely flat approach to<br />

the problem since the chip is very small: the hierarchical approach, i.e.<br />

design the layout <strong>of</strong> each block and then route them together is only<br />

worthy when dealing with chip complexities one order <strong>of</strong> magnitude<br />

greater then ours.<br />

4.7 CARLOS layout features<br />

Fig. 4.14 shows a picture <strong>of</strong> the final layout <strong>of</strong> CARLOS v3, as it has<br />

been sent to the foundry. As one can easily observe it is pad-limited,<br />

i.e. the total silicon surface is due to the number <strong>of</strong> I/O pads (100)<br />

and not to the number <strong>of</strong> standard cells it contains. Adding some extra<br />

logic would not imply any additional cost if contained in the area that<br />

is now empty. So far we hope that adding the 2D <strong>compression</strong> logic will<br />

not substantially increase the chip area and, consequently, production<br />

cost. The total area is 16 mm2 corresponding to the minimal size the<br />

silicon wafer was divided into.<br />

CARLOS v3 is fairly a very simple chip if compared to CARLOS v2<br />

with its 300 kgates <strong>of</strong> logical complexity: in fact it contains only 10<br />

Kgates. Nevertheless it has been designed in order to test our approach<br />

to the new library and to verify that we were able to run through all<br />

the design flow steps. Our final check will be the test <strong>of</strong> the chip itself<br />

in order to verify that everything was correctly designed, so to have<br />

123


124<br />

2D <strong>compression</strong> algorithm and <strong>implementation</strong><br />

Figure 4.14: CARLOS v3 layout picture<br />

very clear ideas for the design <strong>of</strong> the final version <strong>of</strong> CARLOS.<br />

A specific PCB is in the design phase right now: it will contain only<br />

the connectors for probing with the Tektronics pattern generator and<br />

logic analyzer pods and the chip itself. Differently from CARLOS v2,<br />

the chip will be bonded into a PGA package and inserted on the PCB<br />

using a ZIF socket. This will allow us to test the 100 samples <strong>of</strong> the<br />

chip by using only a few PCB samples.


Chapter 5<br />

Wavelet based <strong>compression</strong><br />

algorithm<br />

As an alternative to the 1D and 2D <strong>compression</strong> algorithms conceived<br />

at the <strong>INFN</strong> Section <strong>of</strong> Torino, our group in <strong>Bologna</strong> decided to study<br />

other <strong>compression</strong> algorithms that may be used as a second level compressor<br />

on SDD <strong>data</strong>. After studying the main standard <strong>compression</strong><br />

algorithms, we decided to focuse on a wavelet-based <strong>compression</strong> algorithm<br />

and its performances when used to compress SDd <strong>data</strong>.<br />

The wavelet based <strong>compression</strong> algorithm design can be divided in 4<br />

steps, requiring the use <strong>of</strong> different s<strong>of</strong>tware tools:<br />

1. choice <strong>of</strong> the algorithm main features;<br />

2. optimization <strong>of</strong> the algorithm with respect to SDD <strong>data</strong> using the<br />

Matlab Wavelet Toolbox [23];<br />

3. choice <strong>of</strong> the architecture for the <strong>implementation</strong> <strong>of</strong> the algorithm<br />

using Simulink [24];<br />

4. comparison between the wavelet algorithm performances and the<br />

ones implemented on CARLOS prototypes, in terms <strong>of</strong> <strong>compression</strong><br />

ratio and reconstruction error.<br />

125


126<br />

Wavelet based <strong>compression</strong> algorithm<br />

5.1 Wavelet based <strong>compression</strong> algorithm<br />

The idea <strong>of</strong> compressing SDD <strong>data</strong> using a multiresolution based <strong>compression</strong><br />

algorithm comes from the growing success <strong>of</strong> this technique,<br />

both for uni-dimensional and bi-dimensional signal <strong>compression</strong>.<br />

Multiresolution analysis gives an equivalent representation <strong>of</strong> an input<br />

signal in terms <strong>of</strong> approximation and detail coefficients; these coefficients<br />

can then be encoded using standard techniques, such as run<br />

length encoding.<br />

An SDD event, i.e. <strong>data</strong> coming from a half-SDD, can be analyzed as<br />

a unidimensional <strong>data</strong> stream <strong>of</strong> 64k samples or as a bi-dimensional<br />

structure <strong>of</strong> 256 by 256 elements. So far the first choice we have to<br />

take is whether implementing a 1D or 2D multiresolution analysis.<br />

In 1D analysis the signal can be written as:<br />

S =<br />

⎛<br />

⎝s1,s2,... ,s256<br />

<br />

1o ,s257,s258,... ,s512<br />

<br />

anode<br />

2o ,... ,s65281,s65282,... ,s65536<br />

<br />

anode<br />

256o anode<br />

In 2D analysis the signal can be written as:<br />

⎛<br />

⎜<br />

S = ⎜<br />

⎝<br />

s1,1 s1,2 ... s1,256<br />

s2,1 s2,2 ... s2,256<br />

.<br />

.<br />

. ..<br />

s256,1 s256,2 ... s256,256<br />

.<br />

⎞<br />

⎟<br />

⎠<br />

1 o anode<br />

2 o anode<br />

.<br />

256 o anode<br />

⎞<br />

⎠<br />

(5.1)<br />

(5.2)<br />

In the case <strong>of</strong> 1D analysis, once chosen the two decomposition filters<br />

H and G, the multiresolution analysis can be applied with a number<br />

<strong>of</strong> levels, that is the number <strong>of</strong> cascadable filters, between 1 and 16.<br />

So far an orthogonal wavelet decomposition C with 64k coefficients is<br />

produced: the ratio <strong>of</strong> the approximation coefficients ai number to the<br />

detail coefficients di number depends on the number <strong>of</strong> decomposition


levels used:<br />

<br />

5.1 — Wavelet based <strong>compression</strong> algorithm<br />

S = s1,.... ............................ ,s65536<br />

⎛<br />

⎞<br />

<br />

0 decomposition levels<br />

⎝a1,......... ,a32768,d32769,.........<br />

,d65536⎠<br />

1 decomposition level<br />

C =<br />

<br />

⎛ coeffs. app.<br />

coeffs. dett. ⎞<br />

⎝a1,...... ,a16384,d16385,............<br />

,d65536⎠<br />

2 decomposition levels<br />

C =<br />

<br />

⎛ coeffs. app.<br />

coeffs. dett. ⎞<br />

C =<br />

⎝a1,... ,a8192,d8193,................<br />

,d65536⎠<br />

3 decomposition levels<br />

⎛<br />

<br />

coeffs. app.<br />

<br />

coeffs. dett.<br />

.<br />

,d5,................... ,d65536⎠<br />

14 decomposition levels<br />

C = ⎝a1,a2,a3,a4<br />

<br />

⎛ coeffs. app.<br />

coeffs. dett. ⎞<br />

C = ⎝ a1,a2<br />

<br />

⎛<br />

C =<br />

coeffs. app.<br />

⎝ a1<br />

<br />

coeff. app.<br />

,d3,...................... ,d65536⎠<br />

15 decomposition levels<br />

⎞<br />

<br />

coeffs. dett.<br />

⎞<br />

,d2,...................... ,d65536⎠<br />

16 decomposition levels<br />

<br />

coeffs. dett.<br />

In the case <strong>of</strong> 2D analysis, once chosen the two decomposition filters<br />

H and G, the bi-dimensional decomposition scheme is applied with a<br />

number <strong>of</strong> levels to be chosen between 1 and 8. First, multiresolution<br />

analysis is applied to each row <strong>of</strong> the 2D signal, then each column resulting<br />

from the previous analysis is decomposed using the same number<br />

<strong>of</strong> levels.<br />

So far the 2D signal (5.2) is transformed into the 2D orthogonal wavelet<br />

decomposition, containing 64k coefficients; even in this case the ratio<br />

<strong>of</strong> the approximation coefficients number to detail coefficients number<br />

.<br />

127


128<br />

Wavelet based <strong>compression</strong> algorithm<br />

depends on the decomposition levels applied:<br />

⎛<br />

S =<br />

⎜<br />

⎝<br />

s1,1 ......................... s1,256<br />

.<br />

s256,1 ......................... s256,256<br />

.<br />

⎞<br />

⎟<br />

⎠<br />

⎛<br />

⎞<br />

a1,1<br />

⎜ .<br />

⎜ a128,1<br />

C = ⎜ d129,1<br />

⎜<br />

⎝ .<br />

...<br />

...<br />

...<br />

a1,128<br />

.<br />

a128,128<br />

d129,128<br />

.<br />

d1,129<br />

.<br />

d128,129<br />

d129,129<br />

.<br />

...<br />

...<br />

...<br />

d1,256<br />

⎟<br />

.<br />

⎟<br />

d128,256 ⎟<br />

d129,256 ⎟<br />

. ⎟<br />

⎠<br />

d256,1 ... d256,128 d256,129 ... d256,256<br />

⎛<br />

⎜<br />

C = ⎜<br />

⎝<br />

⎛<br />

⎜<br />

C = ⎜<br />

⎝<br />

.<br />

a1,1 a1,2 d1,3 ....... .... d1,256<br />

a2,1 a2,2 d2,3 ....... .... d2,256<br />

d3,1 d3,2 d3,3 ....... .... d3,256<br />

.<br />

.<br />

.<br />

d256,1 d256,2 d256,3 ....... .... d256,256<br />

a1,1 d1,2 .................. d1,256<br />

d2,1 d2,2 .................. d2,256<br />

.<br />

.<br />

d256,1 d256,2 .................. d256,256<br />

.<br />

.<br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

⎟<br />

⎠<br />

0 decomposition levels<br />

1 decomposition levels<br />

.<br />

7 decomposition levels<br />

8 decomposition levels<br />

Applying multiresolution analysis to SDD <strong>data</strong> proves to be useful since<br />

approximation coefficients feature high values, since they represent the<br />

signal approximation, while detail coefficients feature values near to 0.<br />

So far, in order to get <strong>compression</strong>, detail coefficients can be eliminated<br />

without losing significant information on the input signal.<br />

An easy and effective technique for compressing <strong>data</strong> after multiresolution<br />

analysis is to put a threshold level over every coefficient ai and


5.2 — Multiresolution algorithm optimization<br />

di. What we expect is that approximation coefficients ai remain unchanged,<br />

while detail coefficients di are all put to 0. This is useful since<br />

the long zero sequences coming from the detail coefficients can be further<br />

compressed using the run length encoding technique.<br />

The multiresolution based <strong>compression</strong> algorithm described so far is a<br />

lossy technique but it can be used in a lossless way without putting the<br />

threshold on wavelet coefficients.<br />

5.1.1 Configuration parameters <strong>of</strong> the multiresolution<br />

algorithm<br />

Some algorithm parameters can be tuned in order to get the best performances<br />

in terms <strong>of</strong> <strong>compression</strong> ratio and reconstruction error. These<br />

parameters are:<br />

– the pair <strong>of</strong> decomposition filters H and G, used to implement the<br />

multiresolution analysis;<br />

– the number <strong>of</strong> dimensions used for the analysis: 1D or 2D;<br />

– the number <strong>of</strong> decomposition levels;<br />

– the threshold value applied to ai and di coefficients.<br />

5.2 Multiresolution algorithm optimization<br />

The multiresolution algorithm optimization has been carried out using<br />

the Wavelet Toolbox from Matlab.<br />

First, the pair <strong>of</strong> decomposition filters that, with a fixed value <strong>of</strong> the<br />

threshold, gives the higher number <strong>of</strong> null coefficients ai and di and the<br />

lower reconstruction error has been chosen; then the other 3 parameters<br />

have been evaluated one after the other for optimization.<br />

129


130<br />

Wavelet based <strong>compression</strong> algorithm<br />

5.2.1 The Wavelet Toolbox from Matlab<br />

The Wavelet Toolbox is a collection <strong>of</strong> functions from Matlab that,<br />

using Matlab line commands and a user-friendly graphical interface,<br />

allows to develop wavelet techniques to be applied to real problems.<br />

In particular the Wavelet Toolbox allowed us to:<br />

– perform the multiresolution analysis <strong>of</strong> a signal and the corresponding<br />

synthesis, using a wide variety <strong>of</strong> decomposition and<br />

reconstruction filters;<br />

– treat signals as uni-dimensional or bi-dimensional;<br />

– analyze signals on a variable number <strong>of</strong> levels;<br />

– apply different threshold levels to the coefficients obtained ai and<br />

di.<br />

The wide choice <strong>of</strong> filters corresponds to the wide number <strong>of</strong> wavelet<br />

families implemented by the Wavelet Toolbox, shown in Tab. 5.1 and<br />

in Fig. 2.10, Fig. 2.11 and Fig. 2.12.<br />

In particular the Haar family is composed by the wavelet function ψ(x)<br />

Family Name identifier<br />

Haar wavelet ’haar’<br />

Daubechies wavelets ’db’<br />

Symlets ’sym’<br />

Coiflets ’coif’<br />

Biorthogonal wavelets ’bior’<br />

Reverse Biorthogonal wavelets ’rbio’<br />

Table 5.1: Wavelet families used for multiresolution analysis<br />

and its corresponding scale function φ(x), already discussed in Chapter<br />

2. On the other side each Daubechies, Symlets e Coiflets family is<br />

composed by more than a pair <strong>of</strong> functions ψ(x) andφ(x): Daubechies<br />

family pairs are named db1, ... , db10, Symlets family pairs are named<br />

sym2, ... , sym8, while Coiflets family pairs are named coif1, ... , coif5.


5.2 — Multiresolution algorithm optimization<br />

Biorthogonal (bior1.1, ... , bior6.8) and Reverse Biorthogonal (rbio1.1,<br />

... , rbio6.8) are composed by quartets <strong>of</strong> functions ψ1(x), φ1(x), ψ2(x)<br />

and φ2(x), where, the first pair is used for decomposition and the second<br />

for reconstruction. Using a particular function <strong>of</strong> the Wavelet Toolbox<br />

which requires the name <strong>of</strong> the pair <strong>of</strong> functions ψ(x) andφ(x) chosen<br />

or the name <strong>of</strong> the quartet ψ1(x), φ1(x), ψ2(x) andφ2(x) when using<br />

Biorthogonal and Reverse Biorthogonal, it is possible to determine the<br />

impulse response representing, respectively, the low pass filter H and<br />

the high pass filter G used for decomposition and the low pass filter H<br />

and high pass filter G, used in the reconstruction stage.<br />

Multiresolution analysis and synthesis are computed as described in<br />

Chapter 3: in particular the analysis step is performed with a convolution<br />

operation between the input signal and the filters H and G,<br />

followed by decimation, while synthesis is performed with up-sampling,<br />

followed by a convolution operation between the signal and the filters<br />

H and G.<br />

5.2.2 Choice <strong>of</strong> the filters<br />

In order to choose the best filters H, G, H and G for SDD <strong>data</strong> <strong>compression</strong>,<br />

10 64-kbytes SDD events have ben analyzed using the Wavelet<br />

Toolbox using the wavelet families shown in Tab.5.1.<br />

Each signal S, interpreted both as unidimensional as in in Fig. 5.1 and<br />

bi-dimensional as in Fig. 5.2, has been processed in the following way:<br />

– after choosing a pair <strong>of</strong> functions ψ(x) andφ(x) or the quartet<br />

ψ1(x), φ1(x), ψ2(x), φ2(x), the corresponding filter coefficients H,<br />

G, H and G have been determined;<br />

– the signal S has been analyzed using the filters H and G obtaining<br />

the decomposition coefficients C;<br />

– a threshold th has been applied to the coefficients C, obtaining<br />

the modified coefficients Cth;<br />

131


132<br />

Wavelet based <strong>compression</strong> algorithm<br />

s<br />

a 5<br />

d 5<br />

d 4<br />

d 3<br />

−50<br />

40<br />

20<br />

d 0<br />

2 −20<br />

−40<br />

d 1<br />

150<br />

100<br />

50<br />

0<br />

30<br />

20<br />

10<br />

10<br />

0<br />

−10<br />

20<br />

0<br />

−20<br />

50<br />

0<br />

20<br />

0<br />

−20<br />

Decomposition at level 5 : s = a5 + d5 + d4 + d3 + d2 + d1 .<br />

1 2 3 4 5 6<br />

Figure 5.1: Uni-dimensional analysis on 5 levels <strong>of</strong> the signal S<br />

– the coefficients Cth have been synthesized into the signal R, using<br />

the filters H and G.<br />

Both in the uni-dimensional and in the bi-dimensional case, the performances<br />

related to <strong>compression</strong> have been quantified using the percentage<br />

P <strong>of</strong> the number <strong>of</strong> null coefficients in Cth, while the performances<br />

related to the reconstruction error have been quantified using the root<br />

mean square error E between the original signal S and the signal R,<br />

obtained after the analysis and synthesis <strong>of</strong> Cth.<br />

In particular, since the total number <strong>of</strong> elements in Cth is 65536, in<br />

the uni-dimensional case, the parameter P can be expressed in the<br />

following way:<br />

P =<br />

100 · (number <strong>of</strong> null coefficients in Cth)<br />

65536<br />

x 10 4<br />

(5.3)<br />

Even the total number <strong>of</strong> elements in S and in R is 65536, so, if si<br />

is the i-th element <strong>of</strong> the uni-dimensional signal S and ri is the i-th


5.2 — Multiresolution algorithm optimization<br />

50<br />

100<br />

150<br />

200<br />

250<br />

Original Image<br />

50 100 150 200 250<br />

Synthesized Image<br />

dwt<br />

idwt<br />

Approximation coef. at level 5<br />

Image Selection<br />

Decomposition at level 5<br />

Figure 5.2: Bi-dimensional analysis on 5 levels <strong>of</strong> the signal S<br />

element <strong>of</strong> R, the parameter E can be expressed in the following way:<br />

<br />

<br />

<br />

E = 1<br />

65536 <br />

(si − ri)<br />

65536<br />

2 (5.4)<br />

In the bi-dimensional case P is calculated in the same way while, naming<br />

si,j as the (i, j)-th element <strong>of</strong> S and ri,j as the (i, j)-th element <strong>of</strong><br />

R, the parameter E can be expressed in the following way:<br />

<br />

<br />

<br />

E = 1 256<br />

256<br />

(si,j − ri,j)<br />

65536<br />

2 (5.5)<br />

i=1<br />

Even if the parameters P and E cannot be directly comparable to<br />

the results obtained in the <strong>compression</strong> algorithms implemented on the<br />

CARLOS prototypes, they give an important indication about the performance<br />

<strong>of</strong> each filter set used during the analysis.<br />

In particular, P gives a rough estimation <strong>of</strong> how much the coefficients<br />

Cth can be compressed using the run length encoding, while E can<br />

i=1<br />

j=1<br />

133


134<br />

Wavelet based <strong>compression</strong> algorithm<br />

be interpreted as the error introduced in the value associated to each<br />

sample coming the SDD. The analysis results related to 10 SDD events<br />

are shown from Tab. 5.2 to Tab. 5.7. In particular, Tab. 5.2 shows<br />

the parameter P and E values related to a 5-level analysis using the<br />

Haar filter, both in 1D and 2D, with a threshold value th variable in<br />

the range 0-25. The other tables show the P and E values obtained<br />

with a 5-level analysis with a threshold th <strong>of</strong> 25 using filters belonging<br />

to Daubechies (Tab. 5.3), Symlets (Tab. 5.4), Coiflets (Tab. 5.5),<br />

Biorthogonal (Tab. 5.6) and Reverse Biorthogonal (Tab. 5.7) families,<br />

in the 1D and 2D cases. The uncertainties ∆P and ∆E have been<br />

reported in terms <strong>of</strong> the respective orders <strong>of</strong> magnitude only, since we<br />

are only looking for an estimation <strong>of</strong> these values.<br />

An intersting feature emerging from Tab. 5.2 is the progressive increase<br />

<strong>of</strong> the values P and E with the increase <strong>of</strong> the threshold values th applied<br />

to the coefficients C.<br />

The trend <strong>of</strong> P is easy to understand considering that, applying the<br />

threshold th to decomposition coefficients C means putting to 0 all coefficients<br />

less than th in absolute value: so far the greater the th value,<br />

the greater the parameter P value.<br />

For what concerns E, the greater the th value, the greater the differences<br />

between Cth and the original C and the distortion introduced.<br />

It is to be noticed that for a value <strong>of</strong> th equal to 0, the parameter<br />

P is 9.12, while the parameter E is 1.26 e-14, that is the percentage<br />

<strong>of</strong> null coefficients in Cth and the reconstruction error are very small.<br />

This is quite easy to understand for what concerns P since, without<br />

a threshold, the only null coefficients are a very small fraction <strong>of</strong> the<br />

total number. For what concerns E, avoiding to modify the coefficients<br />

C with the threshold assures a nearly perfect reconstruction <strong>of</strong> the signal.<br />

The value 1.26 e-14 comes from the finite precision <strong>of</strong> the machine<br />

performing the analysis and synthesis processes.


5.2 — Multiresolution algorithm optimization<br />

Haar<br />

1D 2D<br />

Threshold value th P E P E<br />

0 9.12 1.26 e-14 3.68 2.50 e-14<br />

1 24.68 0.27 22.21 0.28<br />

2 40.01 0.63 42.63 0.75<br />

3 58.60 1.64 56.34 1.19<br />

4 67.08 1.71 67.76 1.67<br />

5 75.56 2.09 75.50 2.09<br />

6 79.87 2.38 80.77 2.44<br />

7 83.56 2.68 84.96 2.77<br />

8 86.71 2.99 88.21 3.08<br />

9 88.82 3.23 90.75 3.36<br />

10 90.70 3.48 92.88 3.63<br />

11 92.21 3.72 94.49 3.87<br />

12 93.20 3.89 95.80 4.08<br />

13 94.16 4.07 96.78 4.26<br />

14 94.81 4.21 97.56 4.42<br />

15 95.33 4.34 98.20 4.57<br />

16 95.72 4.44 98.73 4.71<br />

17 96.03 4.54 99.05 4.80<br />

18 96.20 4.60 99.25 4.86<br />

19 96.41 4.67 99.44 4.93<br />

20 96.54 4.72 99.55 4.97<br />

21 96.62 4.76 99.64 5.01<br />

22 96.69 4.79 99.69 5.03<br />

23 96.73 4.81 99.74 5.05<br />

24 96.76 4.83 99.77 5.07<br />

25 96.79 4.85 99.80 5.09<br />

Table 5.2: Mean values <strong>of</strong> P and E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />

the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />

Haar derived from the Haar wavelet.<br />

135


136<br />

Wavelet based <strong>compression</strong> algorithm<br />

Daubechies<br />

1D 2D<br />

Filters P E P E<br />

db1 96.79 4.85 99.80 5.09<br />

db2 96.75 4.82 99.63 5.08<br />

db3 96.73 4.81 99.54 5.07<br />

db4 96.73 4.81 99.48 5.07<br />

db5 96.72 4.81 99.33 5.07<br />

db6 96.71 4.81 99.27 5.07<br />

db7 96.72 4.82 99.20 5.07<br />

db8 96.70 4.81 99.08 5.08<br />

db9 96.69 4.81 98.98 5.09<br />

db10 96.68 4.80 98.98 5.09<br />

Table 5.3: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />

the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />

Daubechies and a threshold level th equal to 25; the values obtained with<br />

db1 are equivalent to the ones obtained with Haar, since the corresponding<br />

filters are equivalent.<br />

Symlets<br />

1D 2D<br />

Filters P E P E<br />

sym2 96.75 4.82 99.63 5.08<br />

sym3 96.73 4.81 99.54 5.07<br />

sym4 96.74 4.82 99.43 5.07<br />

sym5 96.72 4.81 99.38 5.06<br />

sym6 96.73 4.81 99.33 5.07<br />

sym7 96.70 4.80 99.17 5.06<br />

sym8 96.71 4.80 99.11 5.08<br />

Table 5.4: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />

the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />

Symlets and a threshold value th equal to 25.


5.2 — Multiresolution algorithm optimization<br />

Coiflets<br />

1D 2D<br />

Filters P E P E<br />

coif1 96.74 4.82 99.51 5.07<br />

coif2 96.72 4.80 98.32 4.75<br />

coif3 96.72 4.81 99.60 5.06<br />

coif4 96.69 4.80 98.62 5.06<br />

coif5 96.68 4.80 98.29 5.05<br />

Table 5.5: Mean values <strong>of</strong> P and E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />

the analysis has been performed on a 5-level base, using the set <strong>of</strong> filters<br />

Coiflets and a threshold value th equal to 25.<br />

Biorthogonal<br />

1D 2D<br />

Filters P E P E<br />

bior1.1 96.79 4.85 99.80 5.09<br />

bior1.3 96.68 4.81 99.48 5.07<br />

bior1.5 96.64 4.82 99.25 5.05<br />

bior2.2 96.28 4.71 98.70 4.94<br />

bior2.4 96.28 4.65 98.56 4.92<br />

bior2.6 96.23 4.62 98.27 4.91<br />

bior2.8 96.21 4.63 97.81 4.91<br />

bior3.1 93.41 5.68 94.15 5.58<br />

bior3.3 94.37 4.84 95.43 5.01<br />

bior3.5 94.70 4.65 96.60 5.10<br />

bior3.7 94.81 4.59 95.13 4.85<br />

bior3.9 94.88 4.56 94.13 4.85<br />

bior4.4 96.75 4.82 99.39 5.07<br />

bior5.5 96.78 4.88 99.46 5.10<br />

bior6.8 96.68 4.79 98.95 5.04<br />

Table 5.6: Mean values <strong>of</strong> P and E using the Biorthogonal filters<br />

137


138<br />

Wavelet based <strong>compression</strong> algorithm<br />

Reverse Biorthogonal<br />

1D 2D<br />

Filters P E P E<br />

rbio1.1 96.79 4.85 99.80 5.09<br />

rbio1.3 96.77 4.85 99.57 5.08<br />

rbio1.5 96.75 4.86 99.39 5.06<br />

rbio2.2 96.78 4.92 96.89 4.58<br />

rbio2.4 96.79 4.88 99.47 5.12<br />

rbio2.6 96.77 4.87 99.32 5.11<br />

rbio2.8 96.78 4.88 99.18 5.12<br />

rbio3.1 96.38 8.67 98.76 11.29<br />

rbio3.3 96.72 5.14 99.29 5.39<br />

rbio3.5 96.76 4.95 99.28 5.18<br />

rbio3.7 96.76 4.92 99.09 5.18<br />

rbio3.9 96.74 4.91 98.97 5.20<br />

rbio4.4 96.68 4.80 99.29 5.06<br />

rbio5.5 93.32 4.63 98.56 4.92<br />

rbio6.8 96.71 4.81 99.10 5.08<br />

Table 5.7: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈<br />

0.01): the analysis has been performed on a 5-level base, using a set <strong>of</strong><br />

filters Rev. Biorthogonal and a threshold value th equal to 25; the values<br />

obtained with bior1.1 are equivalent to the ones obtained with Haar, since<br />

the corresponding filters are equivalent.<br />

The common feature from Tab. 5.3, Tab. 5.4, Tab. 5.5, Tab. 5.6 and<br />

Tab. 5.7 is the increasing value <strong>of</strong> P and E with the increase <strong>of</strong> the th<br />

value.<br />

Nevertheless some wavelet families are better suited than others to the<br />

<strong>compression</strong> task; by comparing the values obtained for th = 25, it is<br />

evident that the Haar set <strong>of</strong> filters shows the best performances. In<br />

particular with P =96.79 and E =4.85 in the uni-dimensional case<br />

and P =99.80 and E =5.09 in the bi-dimensional case, the Haar set<br />

<strong>of</strong> filters gets the higher percentage <strong>of</strong> null coefficients with an accept-


5.2 — Multiresolution algorithm optimization<br />

Family Set <strong>of</strong> filters name Filter length<br />

Haar haar 2<br />

Daubechies dbN 2N<br />

Symlets symN 2N<br />

Coiflets coifN 6N<br />

Biorthogonal bior1.1 2<br />

biorN1.N2, N1=1,N2=1 max(2N1,2N2)+2<br />

Reverse Biorthogonal rbio1.1 2<br />

rbioN1.N2, N1=1,N2=1 max(2N1,2N2)+2<br />

Table 5.8: Length <strong>of</strong> filters belonging to different families<br />

able error. The choice <strong>of</strong> the Haar filters can be supported with other<br />

argomentations too, concerning Haar filter’s length H, G, H and G,<br />

i.e. the number <strong>of</strong> coefficients which characterize the impulse response.<br />

As shown in Tab. 5.8 filters belonging to the Haar family have the<br />

smallest number <strong>of</strong> coefficients among filters, obviously together with<br />

the set <strong>of</strong> filters db1, bior1.1 and rbio1.1. Since the analysis and synthesis<br />

processes consist <strong>of</strong> successive convolutions between the signal<br />

to analyze or synthesize and the respective filters, this small number<br />

<strong>of</strong> coefficients allows for a higher execution speed <strong>of</strong> the analysis and<br />

synthesis processes.<br />

5.2.3 Choice <strong>of</strong> the dimensionality, number <strong>of</strong> levels<br />

and threshold value<br />

Once chosen the Haar set <strong>of</strong> filters, we studied the effect on the P and E<br />

parameters <strong>of</strong> dimensionality (1D or 2D), the number <strong>of</strong> levels used for<br />

decomposition (1,2, ... ,16 in 1D and 1,2, ... ,8 in 2D) and the value<br />

<strong>of</strong> the threshold th.<br />

Tab. 5.9 and Tab. 5.10 show the analysis <strong>of</strong> the usual 10 SDD events in<br />

139


140<br />

Wavelet based <strong>compression</strong> algorithm<br />

1D and 2D; each table also contains the value <strong>of</strong> P and E for 1, 3 and<br />

5 levels <strong>of</strong> decomposition and for each level a threshold value between<br />

0 and 25 has been adopted.<br />

The first result is that bi-dimensional analysis produces a higher percentage<br />

P <strong>of</strong> null coefficients than the uni-dimensional case; nevertheless<br />

its E values are also higher.<br />

For instance comparing the P and E values for a threshold value th<br />

<strong>of</strong> 35 the 1D analysis on 1 level determines P =50.01 and E =1.85,<br />

while 2D analysis determines P =74.96 and E =3.96; the same 1D<br />

analysis on 3 levels determines P =87.45 and E =4.18, versus the<br />

values P =99.80 and E =5.09 in the 2D case.<br />

An other result we obtained from the tables is that, once decided<br />

whether to use 1D or 2D analysis, an increase in the number <strong>of</strong> decomposition<br />

levels determines an increase in the values <strong>of</strong> the parameters<br />

P and E.<br />

For instance, by comparing values in Tab. 5.9 obtained with th equal to<br />

25, it can be noticed that 1D analysis on 1 level determines P =50.01<br />

and E =1.85, on 2 levels P =87.45 and E =4.18, while on 3 levels<br />

P =96.79 and E =4.85. The same concept holds true for 2D analysis<br />

and synthesis. So far we found out that the optimized version <strong>of</strong> a<br />

multiresolution analysis based algorithm for SDD <strong>data</strong> is a 2D analysis<br />

on the maximum number <strong>of</strong> decomposition levels using the Haar set <strong>of</strong><br />

filters.<br />

For what concerns the threshold th, the parameters P and E increase<br />

when th is increased. In order to decide the th valuewehavetoable<br />

to quantify the reconstruction error introduced after wavelet analysis<br />

and to compare it with the <strong>compression</strong> algorithms implemented on<br />

CARLOS.


5.3 — Choice <strong>of</strong> the architecture<br />

5.3 Choice <strong>of</strong> the architecture<br />

The precision related to the architecture chosen for the <strong>implementation</strong><br />

<strong>of</strong> the multiresolution analysis can strongly affect the percentage P <strong>of</strong><br />

null coefficients and the reconstruction error E. As an example it is<br />

sufficient to apply both the analysis and synthesis processes to an input<br />

signal without any threshold : the reconstruction error E, though very<br />

little, is different from 0, due to the finite precision that our Pentium<br />

II processor used to perform the calculations.<br />

In order to quantify the influence <strong>of</strong> the architecture on the algorithm<br />

performance we used Simulink, a s<strong>of</strong>tware tool from Matlab for the<br />

design and simulation <strong>of</strong> complex systems, and Fixed-Point Blockset<br />

[25] that allows to simulate the performances <strong>of</strong> a given algorithm when<br />

implemented on different architectures, both in fixed and floating point.<br />

5.3.1 Simulink and the Fixed-Point Blockset<br />

The Fixed-Point Blockset tool [25] is one <strong>of</strong> the Simulink libraries which<br />

contains blocks performing operations between signals such as sum,<br />

multiplication, convolution and so on, simulating various types <strong>of</strong> architectures,<br />

both fixed and floating point. This tool is very useful since<br />

it allows the designer to study the performance <strong>of</strong> a given algorithm on<br />

different architectures before the actual <strong>implementation</strong> takes place.<br />

For instance, this tool can be successfully used in order to decide if<br />

a Fourier transform can be implemented with acceptable performance<br />

in a fixed-point DSP (Digital Signal Processor) or it has to be implemented<br />

in a floating-point DSP. The difference is relevant especially for<br />

cost reasons, since a floating-point DSP has a much higher cost than<br />

a fixed-point one. We used the Fixed-Point Blockset with the same<br />

purpose <strong>of</strong> finding the more suitable architecture before actual <strong>implementation</strong>.<br />

Among the various floating and fixed-point architectures handled by<br />

141


142<br />

Wavelet based <strong>compression</strong> algorithm<br />

the Fixed-Point Blockset, we studied the following ones:<br />

– double precision floating point IEEE 754 standard architecture;<br />

– single precision floating point IEEE 754 standard architecture;<br />

– fractional fixed point.<br />

IEEE 754 standard architecture is one <strong>of</strong> the most widespread architectures<br />

and it is used in most floating-point processors.<br />

When the double precision is used, the standard architecture requires<br />

a 64-bit word in which 1 bit stands for the sign s, 11 bits for the exponent<br />

e and the remaining 52 bits for the mantissa m. The relationship<br />

b b b b<br />

63 62 51 0<br />

s e m<br />

between binary and decimal representation is the following one:<br />

valore decimale = (−1) s · 2 e−1023 (1.m) , 0


5.3 — Choice <strong>of</strong> the architecture<br />

on the right (b0 − bs−1) contain the fractionary part <strong>of</strong> the number, one<br />

bit on the left (bs) contains the sign <strong>of</strong> the number and the other guard<br />

bits (bs+1 − b31) on the left <strong>of</strong> the radix point contain the integer part<br />

<strong>of</strong> the number.<br />

It is to be noticed that double precision floating point IEEE 754 stan-<br />

b b b<br />

s+1<br />

31 30 s<br />

guard bits<br />

b b s−1 1<br />

radix point<br />

b 0 b<br />

dard architecture features a precision <strong>of</strong> 2−52 ≈ 10−16 , single precision<br />

IEEE 754 has a precision <strong>of</strong> 2−23 ≈ 10−7 , while fractional fixed point<br />

architecture has a precision <strong>of</strong> 2−s , i.e. the precision depends on the<br />

number <strong>of</strong> bits being used for the fractional part <strong>of</strong> the number. So<br />

far the study <strong>of</strong> the influence <strong>of</strong> the fixed fractional architecture on the<br />

multiresolution analysis has been carried on by varying the position <strong>of</strong><br />

the radix point among the 32 bit word.<br />

5.3.2 Choice <strong>of</strong> the architecture<br />

Implementing bi-dimensional multiresolution analysis and synthesis using<br />

Simulink is quite a long job, both in terms <strong>of</strong> design and simulation<br />

time. So far we decided to implement a uni-dimensional algorithm on<br />

16 decomposition levels, since it is a much quicker and simpler job.<br />

Beside that it gives a rather good estimation on the performances <strong>of</strong><br />

the 3 architectures on an algorithm very similar to the one we have<br />

chosen.<br />

The <strong>implementation</strong> with Simulink <strong>of</strong> the multiresolution analysis<br />

and synthesis processes is shown in the external blocks in Fig.5.3: the<br />

block on the left performs the 1D analysis <strong>of</strong> the signal S using the<br />

Haar set <strong>of</strong> filters, while the block on the right applies a threshold on<br />

the decomposition coefficients and performs the synthesis <strong>of</strong> the signal<br />

143


144<br />

Wavelet based <strong>compression</strong> algorithm<br />

D1<br />

D2<br />

D1 del<br />

D2 del<br />

D1<br />

D2<br />

D3<br />

D4<br />

D3 del<br />

D4 del<br />

D3<br />

D4<br />

D5<br />

D6<br />

D5 del<br />

D6 del<br />

D5<br />

D6<br />

D7<br />

D8<br />

D7 del<br />

D8 del<br />

D7<br />

D8<br />

R<br />

Segnale Ricostruito<br />

D9<br />

D10<br />

D9 del<br />

D10 del<br />

D9<br />

D10<br />

Segnale<br />

S<br />

Segnale<br />

Ricostruito<br />

D11 del<br />

D12 del<br />

D11<br />

D12<br />

Segnale<br />

D13 del<br />

D14 del<br />

D13<br />

D14<br />

D15 del<br />

D16 del<br />

D15<br />

D16<br />

D1<br />

D2<br />

D3<br />

D4<br />

D5<br />

D6<br />

D7<br />

D8<br />

D9<br />

D10<br />

D11<br />

D12<br />

D13<br />

D14<br />

D15<br />

D16<br />

A16<br />

Figure 5.3: Developed Simulink blocks: from left to right the analysis<br />

block, the delay block and the threshold and synthesis block<br />

D11<br />

D12<br />

D13<br />

D14<br />

D15<br />

D16<br />

A16<br />

A16 del<br />

A16<br />

Applicazione soglia e 16 livelli di sintesi<br />

Delay<br />

16 livelli di analisi


Dettaglio 1<br />

1<br />

D1<br />

2<br />

Downsample<br />

Hi_Dec Filter<br />

1<br />

Segnale<br />

2<br />

Downsample1<br />

Low_Dec Filter<br />

Dettaglio 2<br />

2<br />

D2<br />

2<br />

Downsample2<br />

Hi_Dec Filter1<br />

2<br />

Downsample3<br />

Low_Dec Filter1<br />

Dettaglio 3<br />

3<br />

D3<br />

2<br />

Downsample4<br />

Hi_Dec Filter2<br />

Dettaglio 6<br />

6<br />

D6<br />

2<br />

2<br />

Downsample5<br />

Low_Dec Filter2<br />

Downsample10<br />

Hi_Dec Filter5<br />

Dettaglio 4<br />

4<br />

D4<br />

2<br />

Downsample6<br />

Hi_Dec Filter3<br />

5.3 — Choice <strong>of</strong> the architecture<br />

2<br />

Downsample7<br />

Low_Dec Filter3<br />

2<br />

Dettaglio 5<br />

5<br />

D5<br />

2<br />

Downsample8<br />

Hi_Dec Filter4<br />

Downsample11<br />

Low_Dec Filter5<br />

2<br />

Downsample9<br />

Low_Dec Filter4<br />

Dettaglio 6<br />

6<br />

D6<br />

2<br />

Downsample10<br />

Hi_Dec Filter5<br />

2<br />

Downsample11<br />

Low_Dec Filter5<br />

Dettaglio 7<br />

7<br />

D7<br />

2<br />

Downsample12<br />

Hi_Dec Filter6<br />

2<br />

Downsample13<br />

Low_Dec Filter6<br />

Dettaglio 8<br />

8<br />

D8<br />

2<br />

Downsample14<br />

Hi_Dec Filter7<br />

Figure 5.4: Zoom on the developed analysis block<br />

2<br />

Downsample15<br />

Low_Dec Filter7<br />

Dettaglio 9<br />

9<br />

D9<br />

2<br />

Downsample16<br />

Hi_Dec Filter8<br />

2<br />

Downsample17<br />

Low_Dec Filter8<br />

Dettaglio 10<br />

10<br />

D10<br />

2<br />

Downsample18<br />

Hi_Dec Filter9<br />

2<br />

Downsample19<br />

Low_Dec Filter9<br />

Dettaglio 11<br />

11<br />

D11<br />

2<br />

Downsample20<br />

Hi_Dec Filter10<br />

2<br />

Downsample21<br />

Low_Dec Filter10<br />

Dettaglio 12<br />

12<br />

D12<br />

2<br />

Downsample22<br />

Hi_Dec Filter11<br />

2<br />

Downsample23<br />

Low_Dec Filter11<br />

Dettaglio 13<br />

13<br />

D13<br />

2<br />

Downsample24<br />

Hi_Dec Filter12<br />

2<br />

Downsample25<br />

Low_Dec Filter12<br />

Dettaglio 14<br />

14<br />

D14<br />

2<br />

Downsample26<br />

Hi_Dec Filter13<br />

2<br />

Downsample27<br />

Low_Dec Filter13<br />

Dettaglio 15<br />

15<br />

D15<br />

2<br />

Downsample28<br />

Hi_Dec Filter14<br />

2<br />

Downsample29<br />

Low_Dec Filter14<br />

Dettaglio 16<br />

16<br />

D16<br />

2<br />

Downsample30<br />

Hi_Dec Filter15<br />

Approssimazione 16<br />

17<br />

A16<br />

2<br />

Downsample31<br />

Low_Dec Filter15<br />

145


146<br />

Wavelet based <strong>compression</strong> algorithm<br />

1<br />

Segnale Ricostruito<br />

D1<br />

D2<br />

D3<br />

D4<br />

D5<br />

D6<br />

D7<br />

D8<br />

D9<br />

D10<br />

D11<br />

D12<br />

D13<br />

D14<br />

D15<br />

D16<br />

A16<br />

In1<br />

Out1<br />

In2<br />

Out2<br />

In3<br />

Out3<br />

In4<br />

Out4<br />

In5<br />

Out5<br />

In6<br />

Out6<br />

In7<br />

Out7<br />

In8<br />

Out8<br />

In9<br />

Out9<br />

In10<br />

Out10<br />

In11<br />

Out11<br />

In12<br />

Out12<br />

In13<br />

Out13<br />

In14<br />

Out14<br />

In15<br />

Out15<br />

In16<br />

Out16<br />

In17<br />

Out17<br />

To Workspace<br />

D1 th<br />

D2 th<br />

D3 th<br />

D4 th<br />

D5 th<br />

D6 th<br />

D7 th<br />

D8 th<br />

D9 th<br />

D10 th<br />

D11 th<br />

D12 th<br />

D13 th<br />

D14 th<br />

D15 th<br />

D16 th<br />

A16 th<br />

D1<br />

D2<br />

D3<br />

D4<br />

D5<br />

D6<br />

D7<br />

D8<br />

D9<br />

D10<br />

D11<br />

D12<br />

D13<br />

D14<br />

D15<br />

D16<br />

A16<br />

1<br />

D1<br />

2<br />

D2<br />

3<br />

D3<br />

4<br />

D4<br />

5<br />

D5<br />

6<br />

D6<br />

7<br />

D7<br />

8<br />

D8<br />

Figure 5.5: Zoom on the developed threshold and synthesis block<br />

Segnale Ricostruito<br />

9<br />

D9<br />

10<br />

D10<br />

11<br />

D11<br />

Dettagli<br />

12<br />

D12<br />

13<br />

D13<br />

14<br />

D14<br />

15<br />

D15<br />

16<br />

D16<br />

16 livelli di ricostruzione<br />

Applicazione soglia<br />

17<br />

A16<br />

Approssimazione


2<br />

Upsample<br />

Dettaglio 1 1<br />

D1<br />

Hi_Rec Filter10<br />

1<br />

Segnale Ricostruito<br />

FixPt<br />

Sum13<br />

2<br />

Upsample1<br />

2<br />

D2<br />

Dettaglio 2<br />

5.3 — Choice <strong>of</strong> the architecture<br />

Hi_Rec Filter1<br />

2<br />

Upsample3<br />

FixPt<br />

Sum12<br />

Low_Rec Filter10<br />

2<br />

Upsample4<br />

3<br />

D3<br />

Dettaglio 3<br />

Hi_Rec Filter2<br />

2<br />

Upsample10<br />

6<br />

D6<br />

2<br />

Upsample2<br />

FixPt<br />

Sum11<br />

2<br />

Upsample6<br />

Dettaglio 4 4<br />

D4<br />

Hi_Rec Filter5<br />

Low_Rec Filter1<br />

Hi_Rec Filter3<br />

2<br />

Upsample5<br />

FixPt<br />

Sum10<br />

Low_Rec Filter2<br />

2<br />

Upsample9<br />

2<br />

Upsample8<br />

5<br />

D5<br />

Dettaglio 5<br />

FixPt<br />

Sum8<br />

Hi_Rec Filter4<br />

2<br />

FixPt<br />

Upsample7<br />

Sum9<br />

Low_Rec Filter3<br />

2<br />

Upsample10<br />

Dettaglio 6 6<br />

D6<br />

Hi_Rec Filter5<br />

2<br />

Upsample9<br />

FixPt<br />

Sum8<br />

Low_Rec Filter4<br />

Dettaglio 7 7<br />

2<br />

D7<br />

Upsample12<br />

2<br />

Upsample11<br />

Hi_Rec Filter6<br />

Low_Rec Filter5<br />

FixPt<br />

Sum7<br />

2<br />

Upsample11<br />

FixPt<br />

Sum7<br />

Low_Rec Filter5<br />

Dettaglio 8 8<br />

2<br />

D8<br />

Upsample14<br />

Hi_Rec Filter7<br />

2<br />

Upsample13<br />

FixPt<br />

Sum6<br />

Low_Rec Filter6<br />

Dettaglio 9 9<br />

2<br />

D9<br />

Upsample16<br />

Hi_Rec Filter8<br />

2<br />

Upsample15<br />

FixPt<br />

Sum5<br />

Low_Rec Filter7<br />

2<br />

Upsample18<br />

Dettaglio 10 10<br />

D10<br />

Figure 5.6: Zoom on the developed synthesis block<br />

Hi_Rec Filter9<br />

2<br />

FixPt<br />

Upsample17<br />

Sum4<br />

Low_Rec Filter8<br />

2<br />

Upsample20<br />

Dettaglio 11 11<br />

D11<br />

Hi_Rec Filter11<br />

2<br />

Upsample19<br />

FixPt<br />

Sum1<br />

Low_Rec Filter9<br />

2<br />

Upsample22<br />

Dettaglio 12 12<br />

D12<br />

Hi_Rec Filter12<br />

2<br />

Upsample21<br />

FixPt<br />

Sum2<br />

Low_Rec Filter11<br />

2<br />

Upsample24<br />

Dettaglio 13 13<br />

D13<br />

Hi_Rec Filter13<br />

2<br />

Upsample23<br />

FixPt<br />

Sum3<br />

Low_Rec Filter12<br />

2<br />

Upsample26<br />

Dettaglio 14 14<br />

D14<br />

Hi_Rec Filter14<br />

2<br />

Upsample25<br />

FixPt<br />

Sum14<br />

Low_Rec Filter13<br />

2<br />

Upsample28<br />

Dettaglio 15 15<br />

D15<br />

Hi_Rec Filter15<br />

2<br />

Upsample27<br />

FixPt<br />

Sum15<br />

Low_Rec Filter14<br />

2<br />

Upsample30<br />

Dettaglio 16 16<br />

D16<br />

Hi_Rec Filter16<br />

2<br />

Upsample29<br />

FixPt<br />

Sum16<br />

Low_Rec Filter15<br />

2<br />

Upsample31<br />

17<br />

A16<br />

Approssimazione 16<br />

Low_Rec Filter16<br />

147


148<br />

Wavelet based <strong>compression</strong> algorithm<br />

R.<br />

The analysis block has been implemented as a 16-level cascade, see<br />

Fig. 5.4, containing high-pass filter operators (Hi Dec Filter), low pass<br />

filter operators (Low Dec Filter) and Downsample operators. Hi Dec<br />

Filter operators perform convolution between the incoming signal and<br />

the Haar high pass decomposition filter, Low Dec Filter operators perform<br />

convolution between the incoming signal and the Haar low pass<br />

decomposition filter, while the Downsample operators perform the decimation<br />

<strong>of</strong> the incoming signal.<br />

Fig. 5.5 shows the threshold and synthesis block which is subdivided<br />

into 3 major sub-blocks: the sub-block on the left applies a threshold<br />

on the input stream, the sub-block on the right performs the synthesis<br />

<strong>of</strong> the signal, while the central block, called To Workspace, stores the<br />

decomposition coefficients after the application <strong>of</strong> the threshold, so that<br />

this value is used for calculating the percentage P <strong>of</strong> null coefficients.<br />

The synthesis block has been implemented, in analogy to the analysis<br />

block, as a 16-level cascade, see Fig. 5.6, containing Hi Rec Filter operators<br />

performing the convolution between the incoming signal and the<br />

Haar high-pass reconstruction filter, Low Rec Filter operators performing<br />

the convolution between the incoming signal and the Haar low-pass<br />

reconstruction filter, FixPt Sum operators performing the sum between<br />

filtered signals and Upsample operators performing the upsampling on<br />

the incoming signals.<br />

Finally the Delay block shown in Fig. 5.3 is the block with the task <strong>of</strong><br />

starting the synthesis process only when the analysis job has already<br />

been completed. It is to be noticed that the analysis, delay and synthesis<br />

blocks have been developed starting from simple blocks belonging<br />

to the Fixed Point Blockset, such as filtering, downsampling and upsampling<br />

blocks, and so on.<br />

After performing the analysis and synthesis <strong>of</strong> the 10 SDD events with<br />

a value <strong>of</strong> the threshold equal to 25 for the 3 architectures described<br />

above, we have obtained the values shown in Tab. 5.11; as a notation<br />

the floating point double precision standard architecture IEEE 754 is


5.4 — Multiresolution algorithm performances<br />

indicated as ieee754doub, the single precision floating point standard<br />

architecture IEEE 754 as ieee754sing and the fractional fixed point architecture<br />

as fixed(s), wheresis the number <strong>of</strong> bits representing the<br />

fractional part <strong>of</strong> the number.<br />

Simulink simulations show how the values P and E depend on the<br />

precision <strong>of</strong> the selected architecture: in particular taking as a reference<br />

the values P and E less influenced from the finite precision <strong>of</strong> the<br />

calculations, i.e. the values related to the architecture ieee754doub, it<br />

can be noticed in the cases ieee754sing, fixed(18), fixed(15), fixed(12)<br />

and fixed(9), a slight increase in the error E while P remains constant,<br />

while in cases fixed(7), fixed(5) and fixed(3) the discrepancy with the<br />

values obtained in the case ieee754doub increases strongly.<br />

So far the results we have obtained pointed us towards the choice <strong>of</strong><br />

one <strong>of</strong> the following architectures: ieee754doub, ieee754sing, fixed(18),<br />

fixed(15), fixed(12) and fixed(9). Our choice fell on the ieee754sing as<br />

explained in Par. 5.5.<br />

5.4 Multiresolution algorithm performances<br />

For a direct comparison <strong>of</strong> the performances obtained by the <strong>compression</strong><br />

algorithms implemented on the CARLOS prototypes and by the<br />

multiresolution based algorithm, we developed a FORTRAN subroutine<br />

running analysis and synthesis on a floating-point single precision<br />

SPARC5 processor. The FORTRAN subroutine can be logically divided<br />

in two parts: the first with the aim <strong>of</strong> giving an estimation <strong>of</strong> the<br />

algorithm in terms <strong>of</strong> <strong>compression</strong>, the second with the aim <strong>of</strong> giving<br />

an estimation <strong>of</strong> the reconstruction error on the cluster charge.<br />

The first part <strong>of</strong> the subroutine performs analysis, threshold th application<br />

and synthesis on SDD events containing several charge clusters.<br />

After applying analysis and threshold, for each SDD event the reciprocal<br />

<strong>of</strong> the <strong>compression</strong> ratio is calculated c−1 = no output bits<br />

no input bits ,with<br />

the assumption that each non-null decomposition coefficient is encoded<br />

149


150<br />

Wavelet based <strong>compression</strong> algorithm<br />

using two 32-bit words, one representing the value <strong>of</strong> the coefficient<br />

itself, the other representing the number <strong>of</strong> null coefficients between<br />

the current and the previous non-null coefficient. So far the number <strong>of</strong><br />

bits entering the algorithm is the number <strong>of</strong> samples multiplied by 8<br />

bits (64k × 8 = 512k), while the number <strong>of</strong> bits exiting the algorithm<br />

is the number <strong>of</strong> non-null decomposition coefficients multiplied by the<br />

32 + 32 = 64 bits used to encode each coefficient.<br />

The second part <strong>of</strong> the FORTRAN subroutine performs analysis, threshold<br />

application and synthesis to single-cluster SDD events.<br />

After analysis, threshold th application and synthesis, the difference<br />

between the coordinates <strong>of</strong> the cluster charge before <strong>compression</strong> and<br />

after synthesis is computed for each SDD event, as long as the percentage<br />

difference between the charge <strong>of</strong> the cluster before <strong>compression</strong> and<br />

after reconstruction.<br />

Fig. 5.7, Fig. 5.8, Fig. 5.9, Fig. 5.10, Fig. 5.11 and Fig. 5.12 show the<br />

value <strong>of</strong> the <strong>compression</strong> parameter c−1 for different threshold th values;<br />

in each figure the upper histogram represent the c values belonging<br />

to 500 SDD events analyzed, while the lower hystogram represents the<br />

c values related to SDD events whose c−1 value is less than 46 × 10−3 (c = 22).<br />

As shown in hystograms, the mean c values are lower than our target<br />

value c−1 =46× 10−3 for each threshold value selected. So far the<br />

multiresolution algorithms can reach an acceptable <strong>compression</strong> ratio<br />

by putting a threshold <strong>of</strong> 20 on analyzed coefficients.<br />

For what concerns the reconstruction error calculation up to now we<br />

could use only 20 single-cluster events. So far the hystograms reporting<br />

coordinate and charge difference before and after <strong>compression</strong> show a<br />

very poor statistics.<br />

For this reason the results we obtained on reconstruction error are<br />

pretty qualitative up to now: in particular performing the analysis on<br />

20 SDD events and using a threshold th level equal to 21 the differences<br />

on the centroid coordinates before and after <strong>compression</strong> are <strong>of</strong><br />

the order <strong>of</strong> magnitude <strong>of</strong> the µm, whereas the difference between clus-


5.5 — Hardware <strong>implementation</strong><br />

ter charge show a cluster underestimation <strong>of</strong> some percentual point.<br />

These qualitative results belong to the same order <strong>of</strong> magnitude <strong>of</strong> the<br />

<strong>compression</strong> algorithms implemented in CARLOS prototypes.<br />

Figure 5.7: c −1 values for th=20<br />

5.5 Hardware <strong>implementation</strong><br />

The <strong>hardware</strong> we have chosen for the <strong>implementation</strong> <strong>of</strong> the wavelet<br />

based <strong>compression</strong> algorithm is a DSP chip from Analog Devices (AD):<br />

the ADSP-21160. The DSP belongs to the Single Instruction Multiple<br />

Data SHARC family produced by AD. It performs calculations both<br />

in fixed-point and in single precision floating point at the same speed.<br />

Our choice fell on this DSP also for this interesting feature, since it<br />

allows us to try two different architectures with a single chip. The chip<br />

has the following features:<br />

– 600 MFLOPS (32-bit floating point) peak operation;<br />

151


152<br />

Wavelet based <strong>compression</strong> algorithm<br />

Figure 5.8: c −1 values for th=21<br />

Figure 5.9: c −1 values for th=22


5.5 — Hardware <strong>implementation</strong><br />

Figure 5.10: c −1 values for th=23<br />

Figure 5.11: c −1 values for th=24<br />

153


154<br />

Wavelet based <strong>compression</strong> algorithm<br />

Figure 5.12: c −1 values for th=25<br />

– 600 MOPS (32-bit fixed point) peak operation;<br />

– 100 MHz core operation;<br />

– 4 Mbits on-chip dual-ported SRAM;<br />

– division <strong>of</strong> SRAM between program and <strong>data</strong> memory is user selectable;<br />

– 14 channels <strong>of</strong> zero overhead DMA;<br />

– JTAG standard test access port.<br />

Particularly interesting in this chip is the amount <strong>of</strong> memory hosted onchip:<br />

4 Mbits are sufficient to store the algorithm program and at least<br />

2 SDD events (each one requires 512 Kbits). So far while processing<br />

one SDD event, an other one can be fetched into the internal SRAM<br />

using the DMA channels, so increasing the total throughput.<br />

The DSP has been bought together with an evaluation board and an<br />

integrated development environment s<strong>of</strong>tware VisualDSP, that allows<br />

to write C code and download it to the DSP chip. The wavelet based


5.5 — Hardware <strong>implementation</strong><br />

<strong>compression</strong> algorithm <strong>implementation</strong> on DSP is still in the design<br />

phase, so far no <strong>data</strong> concerning algorithm speed are available up to<br />

now for a quantitative comparison with the CARLOS chip prototypes.<br />

155


156<br />

Wavelet based <strong>compression</strong> algorithm<br />

Haar<br />

1D<br />

1 level 3 levels 5 levels<br />

Threshold value th P E P E P E<br />

0 7.78 3.02 e-15 9.05 7.11 e-15 9.12 1.26 e-14<br />

1 17.51 0.22 23.67 0.26 24.68 0.27<br />

2 31.23 0.65 38.11 0.62 40.01 0.63<br />

3 40.09 1.01 55.81 1.21 58.60 1.64<br />

4 44.28 1.25 63.48 1.56 67.08 1.71<br />

5 47.84 1.52 71.20 2.00 75.56 2.09<br />

6 48.78 1.61 74.80 2.26 79.87 2.38<br />

7 49.31 1.68 77.81 2.52 83.56 2.68<br />

8 49.71 1.74 80.38 2.79 86.71 2.99<br />

9 49.78 1.76 82.02 2.99 88.82 3.23<br />

10 49.87 1.78 83.41 3.19 90.70 3.48<br />

11 49.91 1.79 84.50 3.38 92.21 3.72<br />

12 49.94 1.80 85.17 3.50 93.20 3.89<br />

13 49.97 1.81 85.81 3.64 94.16 4.07<br />

14 49.98 1.82 86.25 3.75 94.81 4.21<br />

15 49.98 1.83 86.60 3.84 95.33 4.34<br />

16 49.99 1.83 86.85 3.92 95.72 4.44<br />

17 50.00 1.84 87.02 3.98 96.03 4.54<br />

18 50.00 1.84 87.12 4.02 96.20 4.60<br />

19 50.00 1.84 87.24 4.07 96.41 4.67<br />

20 50.00 1.84 87.32 4.10 96.54 4.72<br />

21 50.01 1.84 87.36 4.12 96.62 4.76<br />

22 50.01 1.84 87.40 4.14 96.69 4.79<br />

23 50.01 1.84 87.42 4.16 96.73 4.81<br />

24 50.01 1.85 87.43 4.17 96.76 4.83<br />

25 50.01 1.85 87.45 4.18 96.79 4.85<br />

Table 5.9: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />

the analysis has been performed on a number <strong>of</strong> levels equal to 1, 3, 5,<br />

using the Haar set <strong>of</strong> filters.


5.5 — Hardware <strong>implementation</strong><br />

Haar<br />

2D<br />

1 level 3 levels 5 levels<br />

Threshold value th P E P E P E<br />

0 3.54 5.32 e-15 3.67 1.5 e-14 3.68 2.50 e-14<br />

1 18.90 0.26 22.06 0.28 22.21 0.28<br />

2 36.05 0.69 42.33 0.74 42.63 0.75<br />

3 46.42 1.07 55.90 1.19 56.34 1.19<br />

4 55.25 1.47 67.15 1.66 67.76 1.67<br />

5 60.69 1.80 74.78 2.07 75.50 2.09<br />

6 64.01 2.06 79.95 2.42 80.77 2.44<br />

7 66.46 2.30 84.03 2.75 84.96 2.77<br />

8 68.30 2.51 87.18 3.05 88.21 3.08<br />

9 69.73 2.70 89.64 3.33 90.75 3.36<br />

10 70.95 2.90 91.72 3.59 92.88 3.63<br />

11 71.87 3.06 93.25 3.82 94.49 3.87<br />

12 72.63 3.22 94.51 4.03 95.80 4.08<br />

13 73.20 3.35 95.46 4.21 96.78 4.26<br />

14 73.65 3.47 96.21 4.36 97.56 4.42<br />

15 74.06 3.59 96.84 4.51 98.20 4.57<br />

16 74.38 3.69 97.34 4.64 98.73 4.71<br />

17 74.53 3.75 97.63 4.72 99.05 4.80<br />

18 74.65 3.80 97.82 4.79 99.25 4.86<br />

19 74.76 3.85 98.01 4.85 99.44 4.93<br />

20 74.82 3.87 98.11 4.89 99.55 4.97<br />

21 74.87 3.90 98.20 4.93 99.64 5.01<br />

22 74.91 3.92 98.25 4.95 99.69 5.03<br />

23 74.93 3.94 98.29 4.97 99.74 5.05<br />

24 74.94 3.95 98.32 4.99 99.77 5.07<br />

25 74.96 3.96 98.35 5.00 99.80 5.09<br />

Table 5.10: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01):<br />

the analysis has been performed on a number <strong>of</strong> levels equal to 1, 3, 5, using<br />

the Haar set <strong>of</strong> filters.<br />

157


158<br />

Wavelet based <strong>compression</strong> algorithm<br />

Architecture Precision P E<br />

ieee754doub 2 −52 99.88 5.07<br />

ieee754sing 2 −23 99.88 5.11<br />

fixed(18) 2 −18 99.88 5.11<br />

fixed(15) 2 −15 99.88 5.11<br />

fixed(12) 2 −12 99.88 5.11<br />

fixed(9) 2 −9 99.88 5.11<br />

fixed(7) 2 −7 99.87 6.04<br />

fixed(5) 2 −5 99.81 12.75<br />

fixed(3) 2 −3 99.52 89.09<br />

Table 5.11: Mean values <strong>of</strong> P ed E on 10 SDD events (∆P ≈ ∆E ≈ 0.01),<br />

obtained with Simulink simulations


Conclusions<br />

The main goal <strong>of</strong> this thesis work was the search for <strong>compression</strong> algorithms<br />

and its <strong>hardware</strong> <strong>implementation</strong> to be applied to <strong>data</strong> coming<br />

out from the Silicon Drift Detectors in the ALICE experiment.<br />

ALICE and, in general, LHC experiments put very stringent constraints<br />

on the <strong>compression</strong> algorithms for what concerns <strong>compression</strong> ratio,<br />

reconstruction error, speed, flexibility and so on. For example <strong>data</strong><br />

produced by the SDD have to be reduced <strong>of</strong> a factor <strong>of</strong> 22 in order<br />

to satisfy the constraints on disk space for permanent storage. So far<br />

many standard <strong>compression</strong> algorithms have been studied in order to<br />

find which one could obtain the best trade-<strong>of</strong>f between <strong>compression</strong><br />

ratio and reconstruction error, i.e. distortion introduced. It is rather<br />

obvious, in fact, that a high <strong>compression</strong> ratio such as 22 can only be<br />

achieved at the expense <strong>of</strong> some loss <strong>of</strong> information on the physical<br />

charge distribution over the SDD surface.<br />

Three <strong>hardware</strong> prototypes implementing <strong>data</strong> <strong>compression</strong> are presented<br />

in the thesis: the front-end chip CARLOS v1, v2 and v3. Their<br />

evolution from version 1 to version 3 reflects the architectural changes<br />

in the readout chain occurred during the 3 years <strong>of</strong> the work. Three<br />

major reasons can be used to justify these changes:<br />

– the necessity to work in a radiation environment, forcing us to<br />

choose a radiation-tolerant technology;<br />

– the lack <strong>of</strong> space for the SIU board, forcing us to change the<br />

readout architecture;<br />

159


160<br />

CONCLUSIONS<br />

– the change from a uni-dimensional (1D) <strong>compression</strong> algorithm to<br />

a bi-dimensional one (2D), in order to have the same <strong>compression</strong><br />

ratio as in 1D, while using lower thresholds, thus losing a smaller<br />

amount <strong>of</strong> physical <strong>data</strong>.<br />

We plan that CARLOS v4 will be the final version <strong>of</strong> the chip: it will<br />

contain the 2D algorithm and will be designed to be compliant with<br />

the new readout architecture. It should be sent to the foundry before<br />

the end <strong>of</strong> 2002.<br />

One <strong>of</strong> the main features <strong>of</strong> these chips is that lossy <strong>compression</strong> can be<br />

switched <strong>of</strong>f when needed and turned to lossless <strong>compression</strong>. Lossless<br />

<strong>data</strong> <strong>compression</strong> becomes necessary if <strong>compression</strong> algorithms implemented<br />

on the CARLOS chips are no longer applicable. For example<br />

the 2D <strong>compression</strong> algorithm does not work fine in presence <strong>of</strong> a<br />

slope on the anodic signal baseline. In this case on-line <strong>compression</strong> on<br />

the front-end has to be switched <strong>of</strong>f and a second level compressor in<br />

counting room has to do the job. For this kind <strong>of</strong> application different<br />

<strong>compression</strong> algorithms have to be studied.<br />

In alternative to the 1D and 2D algorithms, our group in <strong>Bologna</strong><br />

decided to study a wavelet based <strong>compression</strong> algorithm, in order to<br />

decide if it could be useful for a possible second level <strong>data</strong> <strong>compression</strong>.<br />

Our simulations proved that the algorithm show good performances<br />

for what concerns both the <strong>compression</strong> ratio and the reconstruction<br />

error. We are still working in order to obtain some more quantitative<br />

results and, at the same time, an <strong>implementation</strong> on DSP is planned for<br />

the near future in order to evaluate <strong>compression</strong> speed and how many<br />

DSPs would be necessary for the task. The use <strong>of</strong> DSP in counting<br />

room may be very convenient since, differently from ASICs, they are<br />

completely reprogrammable via s<strong>of</strong>tware if needed. So far as many as<br />

different <strong>compression</strong> algorithms as wanted can be tried on the input<br />

<strong>data</strong> in order to find the best one.


Bibliography<br />

[1] ALICE Collaboration, “Technical Proposal for A Large Ion<br />

Collider Experiment at the CERN LHC”, December 1995,<br />

CERN/LHCC/95-71.<br />

[2] The LHC study group, “The Large Hadron Collider Conceptual<br />

Design”, October 1995, CERN/AC/95-05(LHC).<br />

[3] P. Giubellino, E. Crescio, “The ALICE experiment at LHC:<br />

physics prospects and detector design”, January 2001, ALICE-<br />

PUB-2000-35.<br />

[4] CERN/LHCC 99-12 ALICE TDR 4, 18 June 1999.<br />

[5] E. Crescio, D. Nouais, P. Cerello, “A detailed study <strong>of</strong> charge diffusion<br />

and its effect on spatial resolution in Silicon Drift Detectors”,<br />

September 2001, ALICE-INT-2001-09.<br />

[6] F. Faccio, K. Kloukinas, G. Magazzu, A. Marchioro, “SEU<br />

effects in registers and in a Dual-Ported Static RAM designed in<br />

a0.25µmCMOStechnology for applications in the LHC”, Fifth<br />

Workshop on Electronics for LHC Experiments, September 20-24,<br />

1999, pages 571-575.<br />

[7] K. Sayood, “Introduction to Data Compression”, Morgan Kaufmann,<br />

S. Francisco, 1996.<br />

[8] E.S. Ventsel “Teoria delle probabilità”, Mir edition.<br />

[9] S. W. Smith, “The Scientist and Engineer’s Guide to Digital Signal<br />

Processing”, California Technical Publishing, S. Diego, 1999.<br />

161


162<br />

BIBLIOGRAPHY<br />

[10] J. Badier, Ph. Busson, A. Karar, D.W. Kim, G.B. Kim,, S.C.<br />

Lee, “Reduction <strong>of</strong> ECAL <strong>data</strong> volume using lossless <strong>data</strong> <strong>compression</strong><br />

techniques”, Nuclear Instruments and Methods in Physics<br />

Research A 463 (2001), pages 361-374.<br />

[11] R. Polikar, “The Engineer’s ultimate guide to wavelet analysis”,<br />

http://engineering.rowan.edu/˜polikar/WAVELETS/WTtutorial.html,<br />

2001.<br />

[12] P. G. Lemarié, Y.Meyer, “Ondelettes et bases hilbertiennes”, Rivista<br />

Matematica Iberoamericana, Vol. 2, pages 1-18, 1986.<br />

[13] E. J. Stollnitz, T. D. DeRose e D. H. Salesin, “Wavelets for<br />

computer graphics: a primer”, IEEE Computer Graphics and Applications,<br />

Vol. 3, NO. 15, pages 76-84, May 1995 (part 1) and<br />

Vol. 4, NO. 15, pages 75-85, July 1995 (part 2). Vol. 3, NO. 15,<br />

pages 76-84, May 1995.<br />

[14] P. Morton, “Image Compression Using<br />

the Haar Wavelet Transform”,<br />

http://online.redwoods.cc.ca.us/instruct/darnold/maw/haar.htm,<br />

1998.<br />

[15] B. Burke Hubbard, “The World According to Wavelets: the story<br />

<strong>of</strong> a mathematical technique in the making”, A K Peters, Ltd.,<br />

Wellesley, 1998.<br />

[16] S. G. Mallat, “A Theory for Multiresolution Signal Decomposition:<br />

The Wavelet Representation”, IEEE Transactions on pattern analysis<br />

and machine intelligence, Vol. II, NO. 7, pages 674-693, July<br />

1989.<br />

[17] D. Cavagnino, P. De Remigis, P. Giubellino, G. Mazza, e<br />

A. E. Werbrouck, “Data Compression for the ALICE Silicon Drift<br />

Detector”, 1998, ALICE-INT-1998-41.<br />

[18] Pankaj Gupta and Nick McKeown, “Designing and Implementing<br />

a Fast Crossbar Scheduler“, Jan/Feb 1999, IEEE Micro.


BIBLIOGRAPHY<br />

[19] D. Cavagnino, P. Giubellino, P. De Remigis, A. Werbrouck, G.<br />

Alberici, G. Mazza, A. Rivetti, F. Tosello, “Zero suppression and<br />

Data Compression for SDD Output in the ALICE Experiment”,<br />

Internal note/SDD, ALICE-INT-1999-28 V 1.0.<br />

[20] P. Moreira, J. Christiansen, A. Marchioro, E. van der Bij, K.<br />

Kloukinas, M. Campbell, G. Cervelli, “A 1.25 Gbit/s Serializer<br />

for LHC Data and Trigger Optical Links”, Fifth Workshop on<br />

Electronics for LHC Experiments, September 20-24, 1999, pages<br />

194-198.<br />

[21] F. Wang, “BIST using pseudorandom test vectors and signature<br />

analysis”, IEEE 1988 Custom Integrated Circuits Conference,<br />

CH2584-1/88/0000-0095.<br />

[22] T.W. Williams, W. Daehn, “Aliasing errors in multiple input signature<br />

analysis registers”, 1989 IEEE, CH2696-3/89/0000/0338.<br />

[23] M. Misiti, Y. Misiti, G. Oppenheim and J. M. Poggi, “Wavelet<br />

Toolbox User’s Guide”, The MathWorks, Inc., Natick, 2000.<br />

[24] “Simulink User’s Guide: Dynamic System Simulation for Matlab”,<br />

The MathWorks, Inc., Natick, 2000.<br />

[25] “Fixed-Point Blockset User’s Guide: for Use with Simulink”, The<br />

MathWorks, Inc., Natick, 2000.<br />

163

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!