COMMUNICATION THEORY - Signal Processing Systems

COMMUNICATION THEORY 

Frans M.J. Willems 1 

Spring 2005 

1 Group Signal Processing Systems, Electrical Engineering Department, Eindhoven University of Technology.

Contents 

I Some History 7 

1 Historical facts related to communication 8 

1.1 Telegraphy and telephony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

1.2 Wireless communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

1.3 Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.4 Classical modulation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

1.5 Fundamental bounds and concepts . . . . . . . . . . . . . . . . . . . . . . . . . 11 

II Optimum Receiver Principles 14 

2 Decision rules for discrete channels 15 

2.1 System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

2.2 Decision rules, an example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

2.3 The “maximum a-posteriori” decision rule . . . . . . . . . . . . . . . . . . . . . 18 

2.4 The “maximum likelihood” decision rule . . . . . . . . . . . . . . . . . . . . . . 19 

2.5 Scalar and vector channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 

2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

3 Decision rules for the real scalar channel 25 


3.2 MAP and ML decision rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

3.3 The scalar additive Gaussian noise channel . . . . . . . . . . . . . . . . . . . . 27 

3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

3.3.2 The MAP decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 

3.3.3 The Q-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

3.3.4 Probability of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 

4 Real vector channels 35 


4.2 Decision variables, MAP decoding . . . . . . . . . . . . . . . . . . . . . . . . . 36 

4.3 Decision regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 

1

CONTENTS 2 

4.4 The additive Gaussian noise vector channel . . . . . . . . . . . . . . . . . . . . 38 

4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 

4.4.2 Minimum Euclidean distance decision rule . . . . . . . . . . . . . . . . 39 

4.4.3 Error probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 

4.5 Theorem of reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

5 Waveform channels, correlation receiver 46 

5.1 System description, additive white Gaussian noise . . . . . . . . . . . . . . . . . 46 

5.2 Waveform expansions in series of orthonormal functions . . . . . . . . . . . . . 47 

5.3 An equivalent vector channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

5.4 A correlation receiver is optimum . . . . . . . . . . . . . . . . . . . . . . . . . 50 

5.5 Sufficient statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

6 Waveform channels, building-blocks 53 

6.1 Another sufficient statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 

6.2 The Gram-Schmidt orthogonalization procedure . . . . . . . . . . . . . . . . . . 55 

6.3 Transmitting vectors instead of waveforms . . . . . . . . . . . . . . . . . . . . . 55 

6.4 Signal space, signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 

6.5 Receiving vectors instead of waveforms . . . . . . . . . . . . . . . . . . . . . . 58 

6.6 The additive Gaussian vector channel . . . . . . . . . . . . . . . . . . . . . . . 58 

6.7 Processing the received vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

6.8 Relation between waveform and vector channel . . . . . . . . . . . . . . . . . . 60 

6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 

7 Matched filters 64 

7.1 Matched-filter receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 

7.2 Direct receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 

7.3 Signal-to-noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 

7.4 Parseval relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 

7.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 

7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 

8 Orthogonal signaling 75 

8.1 Orthogonal signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 

8.2 Optimum receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 

8.3 Error probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 

8.4 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 

8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

CONTENTS 3 

III Transmitter Optimization 82 

9 Signal energy considerations 83 

9.1 Translation and rotation of signal structures . . . . . . . . . . . . . . . . . . . . 83 

9.2 Signal energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 

9.3 Translating a signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 

9.4 Comparison of orthogonal and antipodal signaling . . . . . . . . . . . . . . . . . 85 

9.5 Energy of |M| orthogonal signals . . . . . . . . . . . . . . . . . . . . . . . . . 88 

9.6 Upper bound on the error probability . . . . . . . . . . . . . . . . . . . . . . . . 88 

9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 

10 Signaling for message sequences 92 

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 

10.2 Definitions, problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 

10.3 Bit-by-bit signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 

10.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 

10.3.2 Probability of error considerations . . . . . . . . . . . . . . . . . . . . . 96 

10.4 Block-orthogonal signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 

10.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 

10.4.2 Probability of error considerations . . . . . . . . . . . . . . . . . . . . . 97 

10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 

11 Time, bandwidth, and dimensionality 99 

11.1 Number of dimensions for bit-by-bit and block-orthogonal signaling . . . . . . . 99 

11.2 Dimensionality as a function of T . . . . . . . . . . . . . . . . . . . . . . . . . 99 

11.3 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

12 Bandlimited transmission 104 

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 

12.2 Sphere hardening (noise vector) . . . . . . . . . . . . . . . . . . . . . . . . . . 105 

12.3 Sphere hardening (received vector) . . . . . . . . . . . . . . . . . . . . . . . . . 106 

12.4 The interior volume of a hypersphere . . . . . . . . . . . . . . . . . . . . . . . . 107 

12.5 An upper bound for the number |M| of signal vectors . . . . . . . . . . . . . . . 107 

12.6 A lower bound to the number |M| of signal vectors . . . . . . . . . . . . . . . . 108 

12.6.1 Generating a random code . . . . . . . . . . . . . . . . . . . . . . . . . 108 

12.6.2 Error probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 

12.6.3 Achievability result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

12.7 Capacity of the bandlimited channel . . . . . . . . . . . . . . . . . . . . . . . . 111 

12.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

CONTENTS 4 

13 Wideband transmission 114 

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 

13.2 Capacity of the wideband channel . . . . . . . . . . . . . . . . . . . . . . . . . 114 

13.3 Relation between capacities, signal-to-noise ratio . . . . . . . . . . . . . . . . . 115 

13.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 

IV Channel Models 118 

14 Pulse amplitude modulation 119 

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 

14.2 Orthonormal pulses: the Nyquist criterion . . . . . . . . . . . . . . . . . . . . . 120 

14.2.1 The Nyquist result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 

14.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 

14.2.3 Proof of the Nyquist result . . . . . . . . . . . . . . . . . . . . . . . . . 122 

14.2.4 Receiver implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 124 

14.3 Multi-pulse transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 

14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 

15 Bandpass channels 127 

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 

15.2 Quadrature multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 

15.3 Optimum receiver for quadrature multiplexing . . . . . . . . . . . . . . . . . . . 132 

15.4 Transmission of a complex signal . . . . . . . . . . . . . . . . . . . . . . . . . . 134 

15.5 Dimensions per second . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 

15.6 Capacity of the bandpass channel . . . . . . . . . . . . . . . . . . . . . . . . . . 135 

15.7 Quadrature amplitude modulation . . . . . . . . . . . . . . . . . . . . . . . . . 135 

15.8 Serial quadrature amplitude modulation . . . . . . . . . . . . . . . . . . . . . . 135 

15.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 

16 Random carrier-phase 137 

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 

16.2 Optimum incoherent reception . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 

16.3 Signals with equal energy, receiver implementation . . . . . . . . . . . . . . . . 140 

16.4 Envelope detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 

16.5 Probability of error for two orthogonal signals . . . . . . . . . . . . . . . . . . . 144 

16.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 

17 Colored noise 150 

17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 

17.2 Another result on reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 

17.3 Additive nonwhite Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . 151 

17.4 Receiver implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

CONTENTS 5 

17.4.1 Correlation type receiver . . . . . . . . . . . . . . . . . . . . . . . . . . 152 

17.4.2 Matched-filter type receiver . . . . . . . . . . . . . . . . . . . . . . . . 153 

17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 

18 Capacity of the colored noise waveform channel 157 

18.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 

18.2 The sub-channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 

18.3 Capacity of parallel channels, water-filling . . . . . . . . . . . . . . . . . . . . . 159 

18.4 Back to the non-white waveform channel again . . . . . . . . . . . . . . . . . . 161 

18.5 Capacity of the linear filter channel . . . . . . . . . . . . . . . . . . . . . . . . . 162 

18.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 

V Coded Modulation 165 

19 Coding for bandlimited channels 166 

19.1 AWGN channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 

19.2 |A|-Pulse amplitude modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 167 

19.3 Normalized S/N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 

19.4 Baseline performance in terms of S/N norm . . . . . . . . . . . . . . . . . . . . 169 

19.5 The promise of coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 

19.6 Union bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 

19.6.1 Sequence error probability . . . . . . . . . . . . . . . . . . . . . . . . . 171 

19.6.2 Symbol error probability . . . . . . . . . . . . . . . . . . . . . . . . . . 172 

19.7 Extended Hamming codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 

19.8 Coding by set-partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 

19.8.1 Construction of the signal set . . . . . . . . . . . . . . . . . . . . . . . . 173 

19.8.2 Distance between the signals . . . . . . . . . . . . . . . . . . . . . . . . 174 

19.8.3 Asymptotic coding gain . . . . . . . . . . . . . . . . . . . . . . . . . . 175 

19.8.4 Effective coding gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 

19.8.5 Coding gains for more complex codes . . . . . . . . . . . . . . . . . . . 178 

19.9 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 

19.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 

VI Appendices 181 

A An upper bound for the Q-function 182 

B The Fourier transform 184 

B.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 

B.2 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 

B.2.1 Parseval’s relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

CONTENTS 6 

C Impulse signal, filters 186 

C.1 The impulse or delta signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 

C.2 Linear time-invariant systems, filters . . . . . . . . . . . . . . . . . . . . . . . . 186 

D Correlation functions, power spectra 188 

D.1 Expectation of an integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 

D.2 Power spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 

D.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 

D.4 Wide-sense stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 

D.5 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 

D.6 Properties of S x ( f ) and R x (τ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 

E Gram-Schmidt procedure, proof, example 192 

E.1 Proof of result 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 

E.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 

F Schwarz inequality 197 

G Bound error probability orthogonal signaling 199 

H Decibels 202 

I Whitening filters 203 

J Elementary energy E pam of an |A|-PAM constellation 205

Part I 

Some History 

7

Chapter 1 

Historical facts related to communication 

SUMMARY: In this chapter we mention some historical facts related to telegraphy and 

telephony, wireless communication, electronics, and modulation. Furthermore we shortly 

discuss some classic fundamental bounds and concepts relevant for digital communication. 

1.1 Telegraphy and telephony 

1800 Alessandro Volta (Italy 1745 - 1827) Announcement of the electric battery (Volta’s pile). 

This battery made constant-current electricity possible. Motivated by the “frog-leg” experiments 

of Luigi Galvani (Italy 1737 - 1798), anatomy professor at the university of 

Bologna, Volta discovered that it was possible to construct a battery from a plate of silver 

and a plate of zinc separated by spongy matter impregnated with a saline solution, repeated 

thirty or forty times. 

1819 Hans Christian Oersted (Denmark 1777 - 1851) Discovery of the fact that an electric 

current generates a magnetic field. This can be regarded as the first electro-magnetic result. 

It took twenty years after Volta’s invention to realize and prove that this effect exists 

although it was known at the time that a stroke of lightning magnetized objects. 

1827 Joseph Henry (U.S. 1797 - 1878) Constructed electromagnets using coils of wire. 

1838 Samuel Morse (U.S. 1791 - 1872) Demonstration of the electric telegraph in Morristown, 

New Jersey. The first telegraph line linked Washington with Baltimore. It became operational 

in May 1844. By 1848 every state east of the Mississippi was linked by telegraph 

lines. The alphabet that was designed by Morse (a portrait-painter) transforms letters into 

variable-length sequences (code words) of dots and dashes (see table 1.1). A dash should 

last roughly three times as long as a dot. Frequent letters (e.g. E) get short code words, 

less frequently occurring letters (e.g. J, Q, and Y) are represented by longer words. The 

first transcontinental telegraph line was completed in 1861. A transatlantic cable became 

operational in 1866. However already in 1858 a cable was completed but it failed after a 

few weeks. 

8

CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 9 

A . - G - - . M - - S . . . Y - . - - 

B - . . . H . . . . N - . T - Z - - . . 

C - . - . I . . O - - - U . . - period . - . - . - 

D - . . J . - - - P . - - . V . . . - ? . . - - . . 

E . K - . - Q - - . - W . - - 

F . . - . L . - . . R . - . X - . . - 

Table 1.1: Morse code. 

It should be mentioned that in 1833 C.F. Gauss and W.E. Weber demonstrated an electromagnetic 

telegraph in Göttingen, Germany. 

1875 Alexander Graham Bell (Scotland 1847 - 1922 U.S.) Invention of the telephone. Bell 

was a teacher of the deaf. His invention was patented in 1876 (electro-magnetic telephone). 

In 1877 Bell established the Bell Telephone Company. Early versions provided 

service over several hundred miles. Advances in quality resulted e.g. from the invention 

of the carbon microphone. 

1900 Michael Pupin (Yugoslavia 1858 - 1935 U.S.) Obtained a patent on loading coils for the 

improvement of telephone communications. Adding these Pupin-coils at specified intervals 

along a telephone line reduced attenuation significantly. The same invention was 

disclosed by Campbell two days later than Pupin. Pupin sold his patent for $455000 to 

AT&T Co. Pupin (and Campbell) were the first to set up a theory of transmission lines. 

The implications of this theory were not obvious at all for ”experimentalists”. 

1.2 Wireless communications 

1831 Michael Faraday (England 1791 - 1867) Discovery of the fact that a changing magnetic 

field induces an electric current in a conducting circuit. This is the famous law of induction. 

1873 James Clerk Maxwell (Scotland 1831 - 1879, England) The publication of “Treatise on 

Electricity and Magnetism”. Maxwell combined the results obtained by Oersted and Faraday 

into a single theory. From this theory he could predict the existence of electromagnetic 

waves. 

1886 Heinrich Hertz (Germany 1857 - 1894) Demonstration of the existence of electromagnetic 

waves. In his laboratory at the university of Karlsruhe, Hertz used a spark-transmitter 

to generate the waves and a resonator to detect them. 

1890 Edouard Branly (France 1844 - 1940) Origination of the coherer. This is a receiving device 

for electromagnetic waves based on the principle that, while most powdered metals are 

poor direct current conductors, metallic powder becomes conductive when high-frequency 

current is applied. The coherer was further refined by Augusto Righi (Italy 1850 - 1920) 

at the University of Bologna and by Oliver Lodge (England 1851 - 1940) .


1896 Aleksander Popov (Russia 1859 - 1906) Wireless telegraph transmission between two 

buildings (200 meters) was shown to be possible. Marconi did similar experiments at 

roughly the same time. 

1901 Guglielmo Marconi (Italy 1874 - 1937) attended the lectures of Righi in Bologna and 

built a first radio telegraph in 1895. He used Hertz’s spark transmitter, and Lodge’s coherer 

and added antennas to improve performance. In 1898 he developed a tuning device 

and was now able to use two radio circuits simultaneously. The first cross-Atlantic radio 

link was operated by Marconi in 1901. A radio signal was received at St. John’s, Newfoundland 

while being originated from Cornwall, England, 1700 miles away. Important 

was that Marconi applied directional antennas. 

1955 John R. Pierce (U.S. 1910 - 2002) Pierce (Bell Laboratories) proposed the use of satellites 

for communications and did pioneering work in this area. Originally this idea came 

from Arthur C. Clarke who suggested already in 1945 the idea to use earth-orbiting satellites 

as relay points between earth stations. The first satellite, Telstar I, built by Bell Laboratories, 

was launched in 1962. It served as a relay station for TV programs across the 

Atlantic. Pierce also headed the Bell-Labs team that developed the transistor and suggested 

the name for it. 

1.3 Electronics 

1874 Karl Ferdinand Braun (Germany 1850 - 1918) Observation of rectification at metal contacts 

to galena (lead sulfide). These semiconductor devices were the forerunners of the 

“cat’s whisker” diodes used for detection of radio channels (crystal detectors). 

1904 John Ambrose Fleming (England 1849 - 1945) Invention of the thermionic diode, “the 

valve”, that could be used as rectifier. A rectifier can convert trains of high-frequency 

oscillations into trains of intermittent but unidirectional current. A telephone can then be 

used to produce a sound with the frequency of the trains of the sparks. 

1906 Lee DeForest (U.S. 1873 - 1961) Invention of the vacuum triode, a diode with grid control. 

This invention made practical the cascade amplifier, the triode oscillator and the 

regenerative feedback circuit. As a result transcontinental telephone transmission became 

operational in 1915. A transatlantic cable was laid not earlier than 1953. 

1948 John Bardeen, Walter Brattain and William Shockley Development of the theory of the 

junction transistor. This transistor was first fabricated in 1950. The point contact transistor 

was invented in December 1947 at Bell Telephone Labs. 

1958 Jack Kilby and Robert Noyce (U.S.) Invention of the integrated circuit.


1.4 Classical modulation methods 

Beginning of the 20th century Carrier-frequency currents were generated by electric arcs or 

by high-frequency alternators. These currents were modulated by carbon transmitters and 

demodulated (or converted to audio) by a crystal detector or an other kind of rectifier. 

1909 George A. Campbell (U.S. 1870 - 1954) Internal AT&T memorandum on the “Electric 

Wave Filter”. Campbell there discussed band-pass filters that would reject frequencies 

other than those in a narrow band. A patent application filed in 1915 was awarded [2]. 

1915 Edwin Howard Armstrong (U.S. 1890 - 1954) Invention of regenerative amplifiers and 

oscillators. Regenerative amplification (using positive feedback to obtain a nearly oscillating 

circuit) considerably increased the sensitivity of the receiver. DeForest claimed the 

same invention. 

1915 Hendrik van der Bijl, Ralph V.L. Hartley, and Raymond A. Heising In a patent that 

was filed in 1915 van der Bijl showed how the non-linear portion of a vacuum tube could be 

utilized to modulate and demodulate. Heising participated in a project (in Arlinton, VA) in 

which a vacuum-tube transmitter based on the van der Bijl modulator was designed. Heising 

invented the constant-current modulator. Hartley also participated in the Arlington 

project and designed the receiver. He invented an oscillator that was named after him. In a 

paper published in 1923 Hartley explained how suppression of the carrier signal and using 

only one of the sidebands could economize transmitter power and reduce interference. 

1915 John R. Carson Publication of a mathematical analysis of the modulation and demodulation 

process. Description of single-sideband and suppressed carrier methods. 

1918 Edwin Howard Armstrong Invention of the superheterodyne radio receiver. The combination 

of the received signal with a local oscillation resulted in an audio beat-note. 

1922 John R. Carson “Notes on the Theory of Modulation [3]”. Carson compares amplitude 

modulation (AM) and frequency modulation (FM) . He came to the conclusion that FM 

was inferior to AM from the perspective of bandwidth requirement. He overlooked the 

possibility that wide-band FM might offer some advantages over AM as was later demonstrated 

by Armstrong. 

1933 Edwin Howard Armstrong Demonstration of an FM system to RCA. A patent was granted 

to Armstrong. Despite of the larger bandwidth that is needed, FM gives a considerable 

better performance than AM. 

1.5 Fundamental bounds and concepts 

1924 Harry Nyquist (Sweden 1889 - 1976 U.S.) Shows that the number of resolvable (noninterfering) 

pulses that can be transmitted per second over a bandlimited channel is proportional 

to the channel bandwidth [15]. If the bandwidth is W Hz the number of pulses


n(t) 

s(t) 

✗❄ 

✔ 

✲ + 

✖ ✕ 

✲ 

linear 

filter 

ŝ(t) 

✲ 

Figure 1.1: The communication problem considered by Wiener. 

per second cannot exceed 2W . This is strongly related to the sampling theorem which says 

that time is essentially discrete. If a time-continuous signal has bandwidth not larger than 

W Hz then it is completely specified by samples taken from it at discrete time instants 

1/2W seconds apart. 

1928 Ralph V.L. Hartley (U.S. 1888 - 1970) Determined the number of distinguishable pulse 

amplitudes [8]. Suppose that the amplitudes are confined to the interval [−A, +A], and 

that the receiver can estimate an amplitude reliably only to an accuracy of ±. Then 

the number of distinguishable pulses is roughly 2(A+) 

2 

= A 

+ 1. Clearly Hartley was 

concerned with digital communication. He realized that inaccuracy (due to noise) limited 

the amount of information that could be transmitted. 

1938 Alec Reeves Invention of Pulse Code Modulation (PCM) for digital encoding of speech 

signals. In World War II PCM-encoding allowed transmission of encrypted speech (Bell 

Laboratories). In [16] Oliver, Pierce and Shannon compared PCM to classical modulation 

systems. 

1939 Homer Dudley Description of a vocoder. This system did overthrow the idea that communication 

requires a bandwidth at least as wide as that of the signal to be communicated. 

1942 Norbert Wiener (U.S. 1894 - 1964) Investigated the problem shown in figure 1.1. The 

linear filter is to be chosen such that its output ŝ(t) is the best mean-square approximation 

to s(t) given the statistical properties of the processes S(t) and N(t). A drawback of 

the result of Wiener was that that modulation did not fit into his model and could not be 

analyzed. 

1943 Dwight O. North Discovery of the matched filter. This filter was shown to be the optimum 

detector of a known signal in additive white noise. 

1947 Vladimir A. Kotelnikov (Russia 1908 - ) Analysis of several modulation systems. His 

noise immunity work [11] dealt with how to design receivers (but not how to choose the 

transmitted signals) to minimize error probability in the case of digital signals or mean 

square error in the case of analog signals. Kotelnikov discovered independently of Nyquist, 

the sampling theorem in 1933.


Figure 1.2: Claude E. Shannon, founder of Information Theory. Photo IEEE-IT Soc. Newsl., 

Summer 1998. 

1948 Claude E. Shannon (U.S. 1916 - 2001) Publication of “A Mathematical Theory of Communication 

[20].” Shannon (see figure 1.2) showed that noise does not place an inescapable 

restriction on the accuracy of communication. He showed that noise properties, channel 

bandwidth, and restricted signal magnitude can be incorporated into a single parameter C 

which he called the channel capacity. If the cardinality |M| of the message M grows as a 

function of the signal duration T such that 

1 

T log 2 |M| ≈ C − ɛ, 

for some small ɛ > 0, then arbitrarily high communication accuracy is possible by increasing 

T , i.e. by taking longer and longer signals (code words). Conversely Shannon showed 

that reliable communication is not possible when 

1 

T log 2 |M| ≈ C + ɛ, 

for any ɛ > 0. Therefore a channel can be considered as a pipe through which which bits 

can reliably be transmitted up to rate C. 

Other results of Shannon include the application of Boole’s algebra to switching circuits 

(M.S. thesis, MIT, 1938). It was also Shannon who introduced the sampling theorem to 

the engineering community in 1949.

Part II 

Optimum Receiver Principles 

14

Chapter 2 

Decision rules for discrete channels 

SUMMARY: In this first chapter we investigate a discrete channel, i.e. a channel with a 

discrete input and output alphabet. For this discrete channel we determine the optimum 

receiver, i.e. the receiver that minimizes the error probability P E . The so-called maximum 

a-posteriori (MAP) receiver turns out to be optimum. When all messages are equally likely 

the optimum receiver reduces to a maximum-likelihood (ML) receiver. In the last section 

of this chapter we distinguish between discrete scalar and vector channels. 

2.1 System description 

m 

s m 

r 

ˆm 

✲ 

✲ discrete 

source transmitter ✲ receiver ✲ destin. 

channel 

Figure 2.1: Elements in a communication system based on a discrete channel. 

We start by considering a very simple communication system based on a discrete channel. It 

consists of the following elements (see figure 2.1): 

SOURCE An information source produces message m ∈ M = {1, 2, · · · , |M|}, one out of 

|M| alternatives. Message m occurs with probability Pr{M = m} for m ∈ M. This 

probability is called the a-priori message probability and M is the name of the random 

variable associated with this mechanism. 

TRANSMITTER The transmitter sends a signal s m if message m is to be transmitted. This 

signal is input to the channel. It assumes values from the discrete channel input alphabet 

S. The random variable corresponding to the signal is denoted by S. The collection of 

used signals is s 1 , s 2 , · · · , s |M| . 

15

CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 16 

DISCRETE CHANNEL The channel produces an output r that assumes a value from a discrete 

alphabet R. If the channel input is signal s ∈ S the output r ∈ R occurs with conditional 

probability Pr{R = r|S = s}. This channel output is directed to the receiver. The random 

variable associated with the channel output is denoted by R. 

When message m ∈ M occurs, the transmitter chooses s m as channel input. Therefore 

Pr{R = r|M = m} = Pr{R = r|S = s m } for all r ∈ R. (2.1) 

These conditional probabilities describe the behavior of the transmitter followed by the 

channel. 

RECEIVER The receiver forms an estimate ˆm of the transmitted message (or signal) by looking 

at the received channel output r ∈ R, hence ˆm = f (r). The random variable corresponding 

to this estimate is called ˆM. We assume here that ˆm ∈ M = {1, 2, · · · , |M|}, thus the 

receiver has to choose one of the possible messages (and can not declare an error, e.g.). 

DESTINATION The destination accepts the estimate ˆm. 

Note that we call our system discrete because the channel is discrete. It has a discrete input 

and output alphabet. 

The performance of our communication system is evaluated by considering the probability 

of error. This performance depends on the receiver, i.e. on the mapping f (·). 

Definition 2.1 The mapping f (·) is called the decision rule. 

Definition 2.2 The probability of error is defined as 

P E 

= Pr{ ˆM ̸= M}. (2.2) 

We are now interested in choosing a decision rule f (·) that minimizes P E . A decision rule that 

minimizes the error probability P E is called optimum. The corresponding receiver is called an 

optimum receiver. 

The probability of correct (right) decision is denoted as P C and obviously P E + P C = 1. 

2.2 Decision rules, an example 

To get more familiar with the properties of our system we study an example first. 

Example 2.1 We assume that |M| = 2, i.e. there are two possible messages. Their a-priori probabilities 

can be found in the following table: 

m Pr{M = m} 

1 0.4 

2 0.6


The two signals corresponding to the messages are s 1 and s 2 and the conditional probabilities Pr{R = 

r|S = s m } are given in the table below for the values of r and m that can occur. There are three values that 

r can assume, i.e. R = {a, b, c}. 

m Pr{R = a|S = s m } Pr{R = b|S = s m } Pr{R = c|S = s m } 

1 0.5 0.4 0.1 

2 0.1 0.3 0.6 

Note that the row sums are equal to one. 

Now suppose that we use the decision rule f (·) that is given by: 

r a b c 

f (r) 1 1 2 

This means that the receiver outputs the estimate ˆm = 1 if the channel output is a or b and ˆm = 2 if the 

channel output is c. To determine the probability of error P E that is achieved by this rule we first compute 

the joint probabilities 

Pr{M = m, R = r} = Pr{M = m} Pr{R = r|M = m} 

= Pr{M = m} Pr{R = r|S = s m }, (2.3) 

where we used (2.1). We list these joint probabilities in the following table: 

m Pr{M = m, R = a} Pr{M = m, R = b} Pr{M = m, R = c} 

1 0.20 0.16 0.04 

2 0.06 0.18 0.36 

Now we can determine the probability of correct decision P C = 1 − P E which is: 

P C = Pr{M = 1, R = a} + Pr{M = 1, R = b} + Pr{M = 2, R = c} 

= 0.20 + 0.16 + 0.36 = 0.72. (2.4) 

This decision rule is not optimum. To see why, note that for every r ∈ R a certain message ˆm ∈ 

M is chosen. In other words the decision rule selects in each column exactly one joint probability. 

These selected probabilities are added together to form P C . Our decision rule selects the joint probability 

Pr{M = 1, R = b} = 0.16 in the column that corresponds to R = b. A larger P C is obtained if for 

output R = b instead of message 1 the message 2 is chosen. In that case the larger joint probability 

Pr{M = 2, R = b} = 0.18 is selected. Now the probability of correct decision becomes 

P C = Pr{M = 1, R = a} + Pr{M = 2, R = b} + Pr{M = 2, R = c} 

= 0.20 + 0.18 + 0.36 = 0.74. (2.5) 

We may conclude that, in general, to maximize P C , we have to choose the m that achieves the largest 

Pr{M = m, R = r} for the r that was received. This will be made more precise in the following section.


2.3 The “maximum a-posteriori” decision rule 

We can upper bound the probability of correct decision as follows: 

P C = ∑ r 

Pr{M = f (r), R = r} 

≤ 

∑ r 

max Pr{M = m, R = r}. (2.6) 

m 

This upper bound is achieved if for all r the estimate f (r) corresponds to the message m that 

achieves the maximum in max m Pr{M = m, R = r}, hence an optimum decision rule for the 

discrete channel achieves 

Pr{M = f (r), R = r} ≥ Pr{M = m, R = r}, for all m ∈ M, (2.7) 

for all r ∈ R that can be received, i.e. that have Pr{R = r} > 0. Note that it is possible that for 

some channel outputs r ∈ R more than one decision f (r) is optimum. 

Definition 2.3 For a communication system based on a discrete channel the joint probabilities 


= Pr{M = m} Pr{R = r|S = s m }. (2.8) 

which are, given the received output value r, indexed by m ∈ M, are called the decision variables. 

An optimum decision rule f (·) is based on these variables. 

RESULT 2.1 (MAP decision rule) To minimize the probability of error P E , the decision rule 

f (·) should produce for each received r a message m having the largest decision variable. Hence 

for r ∈ R that actually can occur, i.e. that have Pr{R = r} > 0, 

f (r) = arg max 

m∈M Pr{M = m} Pr{R = r|S = s m}. (2.9) 

For 

∑ 

such r we can divide all decision variables by channel output probability Pr{R = r} = 

m∈M Pr{M = m} Pr{R = r|M = m}. This results in the a-posteriori probabilities of the 

message m ∈ M given the channel output r. By the Bayes rule 

Pr{M = m} Pr{R = r|S = s m } 

Pr{R = r} 

= Pr{M = m|R = r}, for all m ∈ M, (2.10) 

therefore the optimum receiver chooses as ˆm a message that has maximum a-posteriori probability 

(MAP). This decision rule is called the MAP-decision rule, the receiver is called a MAPreceiver.


Example 2.2 Below are the a-posteriori probabilities that correspond to the example in the previous section 

(obtained from Pr{R = a} = 0.26, Pr{R = b} = 0.34, and Pr{R = c} = 0.4). Note that the 

probabilities in a column add up to one now. 

m Pr{M = m|R = a} Pr{M = m|R = b} Pr{M = m|R = c} 

1 20/26 16/34 4/40 

2 6/26 18/34 36/40 

From this table it is clear that the optimum decision rule is 

This leads to an error probability 

r a b c 

f (r) 1 2 2 

P E = ∑ m 

Pr{M = m} Pr{ ˆM ̸= m|M = m} = 0.4(0.4 + 0.1) + 0.6 · 0.1 = 0.26, (2.11) 

where Pr{ ˆM ̸= m|M = m} = ∑ r: f (r)̸=m Pr{R = r|S = s m} is the probability of error conditional on the 

fact that message m was sent. 

2.4 The “maximum likelihood” decision rule 

When all messages are equally likely, i.e. when 

we get for the joint probabilities 

Pr{M = m} = 1 for all m ∈ M = {1, 2, · · · , |M|}, (2.12) 

|M| 

Pr{M = m, R = r} = Pr{M = m} Pr{R = r|M = m} = 1 

|M| Pr{R = r|S = s m}. (2.13) 

Considering (2.7) we obtain: 

RESULT 2.2 (ML decision rule) If the a-priori message probabilities are all equal, in order to 

minimize the error probability P E , a decision rule f (·) has to be applied that satisfies 


m∈M Pr{R = r|S = s m}, (2.14) 

for r ∈ R that can occur i.e. for r with Pr{R = r} > 0. Such a decision rule is called a maximum 

likelihood (ML) decision rule, the resulting receiver is a maximum likelihood receiver. 

Given the received channel output r ∈ R the receiver chooses a message ˆm ∈ M for which 

the received r has maximum likelihood. The transition probabilities Pr{R = r|S = s m } for 

r ∈ R and m ∈ M are called likelihoods. A receiver that operates like this (no matter whether 

or not the messages are indeed equally likely) is called a maximum likelihood receiver. It should 

be noted that such a receiver is less complex than a maximum a-posteriori receiver since it does 

not have to multiply the transition probabilities Pr{R = r|S = s m } by the a-priori probabilities 

Pr{M = m}.


Example 2.3 Consider again the transition probabilities of the example in section 2.2: 

m Pr{R = a|S = s m } Pr{R = b|S = s m } Pr{R = c|S = s m } 

1 0.5 0.4 0.1 

2 0.1 0.3 0.6 

From this table follows the maximum-likelihood decision rule which is 

r a b c 

f (r) 1 1 2 

When the messages 1 and 2 both have a-priori probability 1/2 the probability of correct decision is 

P C = ∑ m 

Pr{M = m} Pr{ ˆM = m|M = m} = 1 2 (0.5 + 0.4) + 1 0.6 = 0.75. (2.15) 

2 

Here Pr{ ˆM = m|M = m} = ∑ r: f (r)=m Pr{R = r|S = s m} is the correct-probability conditioned on the 

fact that message m was sent. Note that the maximum likelihood decision rule is optimum here since both 

messages have the same a-priori probability. 

2.5 Scalar and vector channels 

s m 

r 

s ✲ 

r 

m 

m1 

1 ✲ 

s m2 ✲ discrete r 

ˆm 

2 ✲ 

source ✲ transmitter vector 

receiver ✲ destin. 

. 

. 

✲ channel ✲ 

s m N 

r N 

Figure 2.2: Elements in a communication system based on a discrete vector channel. 

So far we have considered in this chapter a discrete channel that accepts a single input s ∈ S 

and produces a single output r ∈ R. This channel could be called a discrete scalar channel. 

There is also a vector variant of this channel, see figure 2.2. The components of a discrete vector 

system are described below. 

TRANSMITTER Let N be a positive integer. Then the vector transmitter sends a signal vector 

s m = (s m1 , s m2 , · · · , s m N ) of N components from the discrete alphabet S if message m 

is to be conveyed. This signal vector is input to the discrete vector channel. The random 

variable corresponding to the signal signal vector is denoted by S. The collection of used 

signal vectors is s 1 , s 2 , · · · , s |M| . 

DISCRETE VECTOR CHANNEL This channel produces an output vector r = (r 1 , r 2 , · · · , 

r N ) with N components all assuming a value from a discrete alphabet R. If the channel 

input was signal s ∈ S N the output r ∈ R N occurs with conditional probability Pr{R =


r|S = s}. This channel output vector is directed to the vector receiver. The random 

variable associated with the channel output is denoted by R. 

When message m ∈ M occurs, s m is chosen by the transmitter as channel input vector. 

Then the conditional probabilities 

Pr{R = r|M = m} = Pr{R = r|S = s m } for all r ∈ R, (2.16) 

describe the behavior of the vector transmitter followed by the discrete vector channel. 

RECEIVER The vector receiver forms an estimate ˆm of the transmitted message (or signal) by 

looking at the received channel output vector r ∈ R N , hence ˆm = f (r). The mapping 

f (·) is again called the decision rule. 

It will not be a big surprise that we define the decision variables for the discrete vector channel 

as follows: 

Definition 2.4 For a communication system based on a discrete vector channel the decision 

variables are again the joint probabilities 


which are, given the received output vector r, indexed by m ∈ M. 

= Pr{M = m} Pr{R = r|S = s m }. (2.17) 

The following result tells us what the optimum decision rule should be for the discrete vector 

channel: 

RESULT 2.3 (MAP decision rule for the discrete vector channel) To minimize the probability 

of error P E , the decision rule f (·) should produce for each received vector r a message m 

having the largest decision variable. Hence for vectors r ∈ R N that actually can occur, i.e. that 

have Pr{R = r} > 0, 

This decision rule is the MAP-decision rule. 

f (r) = arg max Pr{M = m} Pr{R = r|S = s m }. (2.18) 

m∈M 

There is obviously also a ”maximum likelihood version” of this result. 

2.6 Exercises 

1. Consider the noisy discrete communication channel illustrated in figure 2.3. 

(a) If Pr{M = 1} = 0.7 and Pr{M = 2} = 0.3, determine the optimum decision rule 

(assignment of a, b or c to 1, 2) and the resulting probability of error P E .


1 

2 

✄ 

✂ ✁ 

0.7 

✲ 

✄ 

◗ ✑✂ ✁ a 

◗ 

✑ 

◗ 0.2 ✑ 

0.1◗ 

✑ 

◗ ✑ 

◗✑ 

✏ ✄ 

✑◗ 

✂ ✁ b 

✑ ◗ 

0.3✑ 

◗ 

✑ 

✄ 

✄ 

✂✏✁ 

✏✏✏✏✏✏✏✏✏✏ 

◗ 

✑ 0.2 ◗ 

✑ ✲ 

◗✂ ✁ c 

0.5 

Figure 2.3: A noisy discrete communication system with input M and output R. 

(b) There are eight decision rules. Plot the probability of error for each decision rule 

versus Pr{M = 1} on one graph. 

(c) Each decision rule has a maximum probability of error which occurs for some least 

favorable a-priori probability Pr{M = 1}. The decision rule which has the smallest 

is called the minimax decision rule. Which of the eight rules is minimax? 

(Exercise 2.12 from Wozencraft and Jacobs [25].) 

2. Consider a communication system based on a discrete channel. The number of messages 

is five i.e. M = {1, 2, 3, 4, 5}. The a-priori probabilities are Pr{M = 1} = Pr{M = 5} = 

0.1, Pr{M = 2} = Pr{M = 4} = 0.2, and Pr{M = 3} = 0.4. The signal corresponding to 

message m is equal to m, thus s m = m, for m ∈ M. This signal is the input for the discrete 

channel. 

The channel adds the noise n to the signal, hence the channel output 

r = s m + n. (2.19) 

The noise variable N is independent of the channel input and can have the values −1, 0, 

and +1 only. It is given that Pr{N = 0} = 0.4 and Pr{N = −1} = Pr{N = +1} = 0.3. 

(a) Note that the channel output assumes values in {0, 1, 2, 3, 4, 5, 6}. Determine the 

probability Pr{R = 2}. For each message m ∈ M compute the a-posteriori probability 

Pr{M = m|R = 2}. 

(b) Note that an optimum receiver minimizes the probability of error P E . Give for each 

possible channel output the estimate ˆm that an optimum receiver will make. Determine 

the resulting error probability. 

(c) Consider a maximum-likelihood receiver. Give for each possible channel output the 

estimate ˆm that a maximum-likelihood receiver will make. Again determine the resulting 

probability of error. 

(Exam Communication Principles, July 5, 2002.)


3. Consider an optical communication system. A binary message is transmitted by switching 

the intensity λ of a laser ”on” or ”off” within the time-interval [0, T ]. For message m = 1 

(”off”) the intensity λ = λ off ≥ 0. For message m = 2 (”on”) the intensity λ = λ on > λ off . 

The laser produces photons that are counted by a detector. The number K of photons is a 

non-negative random variable, and the corresponding probabilities are 

Pr{K = k} = 

(λT )k 

k! 

exp(−λT ), for k = 0, 1, 2, · · · . 

This probability distribution is known as Poisson distribution. 

The a-priori message probabilities are Pr{M = 1} = p off and Pr{M = 2} = p on . It is 

assumed that 0 

The detector forms the message estimate ˆm based on the number k of photons that are 

counted. 

(a) Suppose that the detector chooses ˆm such that the probability of error P E = Pr{ ˆM ̸= 

M} is minimized. For what values of k does the detector choose ˆm = 1 and when 

does it choose ˆm = 2? 

(b) Consider a maximum-likelihood detector. For what values of k does this detector 

choose ˆm = 1 and when does it choose ˆm = 2 now? 

(c) Assume that λ off = 0. Consider a maximum-a-posteriori detector. For what values 

of p on does the detector choose ˆm = 2 no matter what k is? What is in that case the 

error probability P E ? 

(Exam Communication Principles, July 8, 2003.) 

0 

1 

1 − p 

✄✂ ❍ ✁ ✲ 

❍❍ 

✄✂ ✁ 

❍❍❍❥ 

✟ 

❍ 

✟ 

p ❍ 

✟ 

❍ ✟ 

✟ 

❍✟ 

✟ 

✟❍ 

❍ 

p 

✄ 

✂✟ ✁ 

✟✟✟✯ ✟ ❍ 

✟ 

❍ 

✟✟ 

❍ 

✲ ❍✄ 

✂ ✁ 

1 − p 

0 

1 

Figure 2.4: A binary symmetric channel with cross-over probability p. 

4. Consider transmission over a binary symmetric channel with cross-over probability 0 < 

p < 1/2 (see figure 2.4. All messages m ∈ M are equally likely. Each message m 

now corresponds to a signal vector (codeword) s m = (s m1 , s m2 , · · · , s m N ) consisting of N 

binary digits, hence s mi ∈ {0, 1} for i = 1, N. Such a codeword (sequence) is transmitted 

over the binary symmetric channel, the resulting channel output sequence is denoted by r. 

The Hamming-distance d H (x, y) between two sequences x and y, both consisting of N 

binary digits, is the number of positions at which they differ.


(a) Show that the optimum receiver should choose the codeword that has minimum Hamming 

distance to the received channel output sequence. 

(b) What is the optimum receiver for 1/2 

(c) Assume that the messages are not equally likely and that the cross-over probability 

p = 1/2? What is the minimum error probability now? What does the corresponding 

receiver do? 

(Exam Communication Principles, October 6, 2003.) 

5. The weather type in Paris is a random variable that is denoted by P. Similarly the weather 

type in Marseille is a random variable denoted by M. Both random variables can assume 

the values {s, c, r}. Here s stands for sunny, c for cloudy, and r for rainy. We assume that 

the weather type in Marseille M depends on the weather type in Paris P. The joint probability 

of a certain weather type in Paris together with a certain weather type in Marseille 

can be found in the table below. 

p Pr{P = p, M = s} Pr{P = p, M = c} Pr{P = p, M = r} 

s 0.25 0.05 0.00 

c 0.20 0.10 0.10 

r 0.05 0.15 0.10 

The table shows e.g. that the joint probability of cloudy weather in Paris and sunny weather 

in Marseille is 0.20. 

(a) Suppose that you have to predict a combination of weather types for Paris and Marseille 

such that your prediction is correct with a probability as large as possible. What 

combination do you choose and what is the probability that your prediction will be 

wrong? 

(b) Suppose that you have to guess the weather type in Marseille knowing the weather 

type in Paris. How do you decide for each of the three possible types of weather in 

Paris if you want to be correct with the highest possible probability? How large is the 

probability that your guess is wrong? 

(c) Suppose that you know the weather type in Marseille and that you must suggest two 

types of weather in Paris such that the probability that at least one of your suggestions 

is correct is maximal. What is your pair of suggestions for each of the three possible 

types of weather in Marseille? Compute the probability that both your suggestions 

are wrong also for this case. 

(Exam Communication Theory, January 12, 2005)

Chapter 3 

Decision rules for the real scalar channel 

SUMMARY: Here we will consider transmission of information over a channel with a single 

real-valued input and output. Again the optimum receiver is determined. As an example 

we investigate the additive Gaussian noise channel, i.e. the channel that adds Gaussian 

noise to the input signal. 


m 

s m 

r 

ˆm 

✲ 

✲ real 

source transmitter ✲ receiver ✲ destin. 

channel 

Figure 3.1: Elements in a communication system based on a real scalar channel. 

A communication system based on a channel with real-valued input and output alphabet, i.e. 

a real scalar channel does not differ very much from a system based on a discrete channel (see 

figure 3.1). 

SOURCE An information source produces the message m ∈ M = {1, 2, · · · , |M|} with a- 

priori probability Pr{M = m} for m ∈ M. Again M is the name of the random variable 

associated with this mechanism. 

TRANSMITTER The transmitter now sends a real-valued scalar signal s m if message m is 

to be transmitted. The scalar signal is input to the channel. It assumes a value in the 

range (−∞, ∞). The random variable corresponding to the signal is denoted by S. The 

collection of used signals is s 1 , s 2 , · · · , s |M| . 

REAL SCALAR CHANNEL The channel now produces an output r in the range (−∞, ∞). 

When the input signal is the real-valued scalar s the channel output is generated according 

25

CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 26 

to the conditional probability density function p R (r|S = s) and thus R is a real-valued 

scalar random variable. The probability of receiving a channel output r ≤ R < r + dr is 

equal to p R (r|S = s)dr for an infinitely small interval dr. 

When message m ∈ M occurs, signal s m is chosen by the transmitter as channel input. 

Then the conditional probability density function 

p R (r|M = m) = p R (r|S = s m ) for all r ∈ (−∞, ∞), (3.1) 

describes the behavior of the transmitter followed by the real scalar channel. 

RECEIVER The receiver forms an estimate ˆm of the transmitted message (or signal) based on 

the received real-valued scalar channel output r, hence ˆm = f (r). The mapping f (·) is 

called the decision rule. 

DESTINATION The destination accepts the estimate ˆm. 

3.2 MAP and ML decision rules 

For the real scalar channel we can write the probability of correct decision is 

P C = 

∫ ∞ 

−∞ 

Pr{M = f (r)}p R (r|M = f (r))dr. (3.2) 

An optimum decision rule is obtained if, after receiving the scalar R = r, the decision f (r) is 

taken in such a way that 

Pr{M = f (r)}p R (r|M = f (r)) ≥ Pr{M = m}p R (r|M = m) for all m ∈ M. (3.3) 

This leads to the definition of the decision variables given below. 

Alternatively we can assume that a value r ≤ R < r + dr was received. Then the decision 

variables would be 

Pr{M = m, r ≤ R < r + dr} = Pr{M = m} Pr{r ≤ R < r + dr|S = s m } 

= Pr{M = m}p R (r|S = s m )dr. (3.4) 

Also this reasoning leads to the definition of the decision variables below. 

Definition 3.1 For a system based on a real scalar channel the products 

Pr{M = m}p R (r|M = m) = Pr{M = m}p R (r|S = s m ) (3.5) 

which are, given the received output r, indexed by m ∈ M, are called the decision variables. 

The optimum decision rule f (·) is again based on these variables. It is possible that for certain 

channel outputs r more decisions f (r) are optimum.


RESULT 3.1 (MAP) To minimize the error probability P E , the decision rule f (·) should be 

such that for each received r a message m is chosen with the largest decision variable. Hence 

for r that can be received, an optimum decision rule f (·) should satisfy 


m∈M Pr{M = m}p R(r|S = s m ). (3.6) 

Both sides of the inequality (3.3) can be divided by p R (r) = ∑ m∈M Pr{M = m}p R(r|S = s m ), 

i.e. de density for the value of r that actually did occur. Then we obtain 

f (r) = arg max Pr{M = m|R = r}, (3.7) 

m∈M 

for r for which p R (r) > 0. This rule is again called maximum a-posteriori (MAP) decision rule. 

RESULT 3.2 (ML) When all messages have equal a-priori probabilities. i.e. when Pr{M = 

m} = 1/|M| for all m ∈ M, we observe from (3.6), that the optimum receiver has to choose 


m∈M p R(r|S = s m ), (3.8) 

for all r with p R (r) > 0. This rule is referred to as the maximum likelihood (ML) decision rule. 

3.3 The scalar additive Gaussian noise channel 

3.3.1 Introduction 

Gaussian noise is probably the most important kind of impairment. Therefore we will investigate 

a simple communication situation based on a channel that adds a Gaussian noise sample n to the 

real scalar channel input signal s (see figure 3.2). 

n 

p N (n) = √ 1 

n2 

exp(− ) 

2πσ 2 2σ 2 

s 

✛❄ 

✘r = s + n 

✲ + 

✲ 

✚✙ 

Figure 3.2: Scalar additive Gaussian noise channel. 

Definition 3.2 The scalar additive Gaussian noise (AGN) channel adds Gaussian noise N to 

the input signal S. This Gaussian noise N has variance σ 2 and mean 0. The probability density 

function of the noise is defined to be 

p N (n) = 

1 

√ 

2πσ 2 

The noise variable N is assumed to be independent of the signal S. 

n2 

exp(− ). (3.9) 

2σ 2


3.3.2 The MAP decision rule 

We assume e.g. that |M| = 2, i.e. there are two messages and M can be either 1 or 2. The 

corresponding two signals s 1 and s 2 are the inputs of our channel. Without loss of generality let 

s 1 > s 2 . 

The conditional probability density function of receiving R = r given the message m depends 

only on the value n that the noise variable N gets. In order to receive R = r when the signal is 

s m , the noise variable N should have value r − s m . The noise N is independent of the signal S 

(and the message M). Therefore (assuming that dr is infinitely small) 

p R (r|S = s m ) = Pr{r ≤ R < r + dr|S = s m }/dr 

= Pr{r − s m ≤ N < r − s m + dr}/dr 

= p N (r − s m ) 

= 

1 

√ exp(−(r − s m) 2 

2πσ 2 2σ 2 ), for m = 1, 2. (3.10) 

We obtain an optimum (maximum a-posteriori) receiver when ˆm = 1 is chosen if 

1 

Pr{M = 1} √ exp(−(r − s 1) 2 

1 

2πσ 2 2σ 2 ) ≥ Pr{M = 2} √ exp(−(r − s 2) 2 

2πσ 2 2σ 2 ), (3.11) 

and ˆm = 2 otherwise. The following inequalities are all equivalent to (3.11): 

ln Pr{M = 1} − (r − s 1) 2 

2σ 2 ≥ ln Pr{M = 2} − (r − s 2) 2 

2σ 2 , 

2σ 2 ln Pr{M = 1} − (r − s 1 ) 2 ≥ 2σ 2 ln Pr{M = 2} − (r − s 2 ) 2 , 

2σ 2 ln Pr{M = 1} + 2rs 1 − s1 2 ≥ 2σ 2 ln Pr{M = 2} + 2rs 2 − s2 2 , 

2rs 1 − 2rs 2 ≥ 2σ 2 Pr{M = 2} 

ln 

Pr{M = 1} + s2 1 − s2 2 . (3.12) 

RESULT 3.3 (Optimum receiver for the scalar additive Gaussian noise channel) A receiver 

that decides ˆm = 1 if 

r ≥ r ∗ σ 2 Pr{M = 2} 

= ln 

s 1 − s 2 Pr{M = 1} + s 1 + s 2 

, (3.13) 

2 

and ˆm = 2 otherwise, is optimum. When the a-priori probabilities Pr{M = 1} and Pr{M = 2} 

are equal the optimum threshold is 

r ∗ = s 1 + s 2 

. (3.14) 

2 

Example 3.1 Assume that σ 2 = 1 and s 1 = +1 and s 2 = −1. If the a-priori message probabilities are 

equal, i.e. when Pr{M = 1} = Pr{M = 2}, we obtain an optimum receiver if only for r ≥ r ∗ = s 1+s 2 

we 

2 

decide for ˆm = 1. The decision changes exactly halfway between s 1 and s 2 . The intervals (−∞, r ∗ ) and


0.2 

0.18 

0.16 

0.14 

0.12 

0.1 

0.08 

0.06 

0.04 

0.02 

0 

−5 −4 −3 −2 −1 0 1 2 3 4 5 

Figure 3.3: Decision variables for equal a-priori probabilities as a function of r. 

0.35 

0.3 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

−5 −4 −3 −2 −1 0 1 2 3 4 5 

Figure 3.4: Decision variables as a function of r for a-priori probabilities 1/4 and 3/4. 

(r ∗ , ∞) are called decision intervals. In our case, since s 2 = −s 1 , the threshold r ∗ = 0. This also can be 

seen from figure 3.3 where the decision variables 

1 

Pr{M = 1} √ exp(−(r − s 1) 2 

1 

), Pr{M = 2} √ 

2πσ 

2 2σ exp(−(r − s 2) 2 

) (3.15) 

2 2πσ 

2 2σ 2 

are plotted as a function of r assuming that Pr{M = 1} = Pr{M = 2} = 1/2. 

Next assume that the a-priori probabilities are not equal but let Pr{M = 1} = 3/4 and Pr{M = 2} = 

1/4. Now the decision variables change (see figure 3.4) and we must also change the decision rule. It 

turns out that for r ≥ r ∗ = − ln 3 ≈ −0.5493 the optimum receiver should choose ˆm = 1 (see again figure 

2 

3.4). The threshold r ∗ has moved away from the more probable signal s 1 . 

Note that, no matter how much the a-priori probabilities differ, there is always a value of r, which we 

have called r ∗ , the threshold, before, for which (3.11) is satisfied with equality. For r > r ∗ the left side in 

(3.11) is larger than the right side, for r < r ∗ the right side is the largest.


3.3.3 The Q-function 

To compute the probability of error we introduce the so-called Q-function (see figure 3.5). 

Definition 3.3 (Q-function) This function of x ∈ (−∞, ∞) is defined as 

Q(x) = 

∫ ∞ 

x 

1 

√ exp(− α2 

)dα. (3.16) 

2π 2 

It is the probability that a Gaussian random variable with mean 0 and variance 1 assumes a value 

larger than x. 

0.4 

0.35 

0.3 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

−4 −3 −2 −1 0 1 2 3 4 

Figure 3.5: Gaussian probability density function for mean 0 and variance 1. The shaded area 

corresponds to Q(1). 

A useful property of the Q-function is 

Q(x) + Q(−x) = 1. (3.17) 

In appendix A an upper bound for Q(x) is derived. In table 3.1 we tabulated Q(x) for several 

values of x. 

3.3.4 Probability of error 

We now want to find an expression for the error probability of our scalar system with two messages. 

We write 

P E = Pr{M = 1} Pr{R < r ∗ |M = 1} + Pr{M = 2} Pr{R ≥ r ∗ |M = 2}, (3.18)


x Q(x) x Q(x) x Q(x) 

0.00 0.50000 1.5 6.6681 ·10 −2 5.0 2.8665 ·10 −7 

0.25 0.40129 2.0 2.2750 ·10 −2 6.0 9.8659 ·10 −10 

0.50 0.30854 2.5 6.2097 ·10 −3 7.0 1.2798 ·10 −12 

0.75 0.22663 3.0 1.3499 ·10 −3 8.0 6.2210 ·10 −16 

1.00 0.15866 4.0 3.1671 ·10 −5 10.0 7.6200 ·10 −24 

Table 3.1: Table of Q(x) for some values of x. 

where 

Pr{R ≥ r ∗ |M = 2} = 

= 

= 

∫ ∞ 

1 

r=r 

∫ ∗ ∞ 

1 

r=r 

∫ ∗ ∞ 

√ exp(−(r − s 2) 2 

2πσ 2 2σ 2 )dr 

√ exp(− ((r/σ ) − s 2/σ ) 2 

)d(r/σ ) 

2π 2 

µ=r ∗ /σ −s 2 /σ 

1 

√ 

2π 

exp(− µ2 

2 )dµ = Q(r ∗ /σ − s 2 /σ ), (3.19) 

and similarly 

Pr{R < r ∗ |M = 1} = 

= 

= 

∫ r=r ∗ 

−∞ 

∫ r=r ∗ 

1 

√ exp(−(r − s 1) 2 

2πσ 2 2σ 2 )dr 

1 

√ exp(− ((r/σ ) − s 1/σ ) 2 

)d(r/σ ) 

2π 2 

−∞ 

∫ µ=r ∗ /σ −s 1 /σ 

−∞ 

Note that the last equality (*) follows from (3.17). 

1 

√ 

2π 

exp(− µ2 

2 )dµ 

= 1 − Q(r ∗ /σ − s 1 /σ ) (∗) 

= Q(s 1 /σ − r ∗ /σ ). (3.20) 

RESULT 3.4 (Minimum probability of error) If we combine (3.18), (3.19), and (3.20), we obtain 

P E = Pr{M = 1}Q( s 1 − r ∗ 

) + Pr{M = 2}Q( r ∗ − s 2 

). (3.21) 

σ 

σ 

If the a-priori message probabilities Pr{M = 1} and Pr{M = 2} are equal then, according to 

(3.14), we get r ∗ = s 1+s 2 

2 

and 

s 1 − r ∗ 

σ 

r ∗ − s 2 

σ 

= s 1 − s 1+s 2 

2 

σ 

s 1 +s 2 

2 

− s 2 

= 

σ 

= s 1 − s 2 

2σ 

= s 1 − s 2 

2σ 

(3.22)


hence for the minimum probability of error we obtain 

P E = Q( s 1 − s 2 

). (3.23) 

2σ 

Example 3.2 For s 1 = +1, s 2 = −1, and σ 2 = 1, we obtain if Pr{M = 1} = Pr{M = 2} that 

P E = Q(1) ≈ 0.1587. (3.24) 

For Pr{M = 1} = 3/4 and Pr{M = 2} = 1/4 we get that r ∗ = − ln 3 

2 

≈ −0.5493 and 

P E = 3 ln 3 

Q(1 + 

4 2 ) + 1 4 Q(−ln 3 

2 + 1) 

≈ 0.75Q(1.5493) + 0.25Q(0.4507) 

≈ 0.75 · 0.0607 + 0.25 · 0.3261 ≈ 0.1270. (3.25) 

Note that this is smaller than Q(1) ≈ 0.1587 which would be the error probability after ML-detection 

(which is suboptimum here). 

3.4 Exercises 

p R (r|M = 1) p R (r|M = 2) 

1 1 

0 

❅ 

❅❅ 

1/4 

❅ 

1 2 r -1 0 1 2 3 r 

Figure 3.6: Conditional probability density functions. 

1. A communication system is used to transmit one of two equally likely messages, 1 and 2. 

The channel output is a real-valued random variable R, the conditional density functions of 

which are shown in figure 3.6. Determine the optimum receiver decision rule and compute 

the resulting probability of error. 


2. The noise n in figure 3.7a is Gaussian with zero mean, i.e. E[N] = 0. If one of two 

equally likely messages is transmitted, using the signals of 3.7b, an optimum receiver 

yields P E = 0.01. 

(a) What is the minimum attainable probability of error P min 

E 

when the channel of 3.7a 

is used with three equally likely messages and the signals of 3.7c? And with four 

equally likely messages and the signals of 3.7d? 

(b) How do the answers to the previous questions change if it is known that E[N] = 1 

instead of 0?


s 

n 

✛❄ 

✘r = s + n 

✲ + ✲ 

✚✙ 

(a) 

s 1 

s 2 

✄ 

✂ ✄ 

✁ ✂ ✁ 

-2 +2 

(b) 

s 1 s 2 s 3 

s 1 s 2 s 3 s 4 

✄ ✄ ✄ 

✄ ✄ ✄ ✄ 

✂ ✁ ✂ ✁ ✂ ✁ 

✂ ✁ ✂ ✁ 

✂ ✁ ✂ ✁ 

-4 0 +4 -4 0 +4 +8 

(c) 

(d) 

Figure 3.7: 


3. Show that p(n) = √ 1 exp(− n2 

2π 2 

) for −∞ < n < ∞ is a probability density function 

(non-negative, integrating to 1). 

4. Consider a communication channel with an input s that can only assume positive values. 

The probability density function of the output R when the input is s is given by 

p R (r|S = s) = 

{ s−|r| 

s 2 

for 0 ≤ |r| ≤ s 

0 for |r| > s 

, 

see figure 3.8. 

1/s 

p R (r|S = s) 

❅ 

❅❅❅❅ 

❅ 

−s 

s 

r → 

Figure 3.8: The probability density function p R (r|S = s). 

There are two messages i.e. M = {1, 2} and the corresponding signals are s 1 = 1/2 and 

s 2 = 2. Let the a-priori probabilities Pr{M = 1} = Pr{M = 2} = 1/2. 

(a) Sketch the decision variables as a function of the output r in a single figure. For what 

values r does an optimum receiver decide ˆM = 1?


(b) Determine the corresponding error probability P E . 

(c) If the a-priori probability Pr{M = 1} is small enough, an optimum receiver will 

choose ˆM = 2 for all r. What is the largest value of Pr{M = 1} for which this 

happens? What is the error probability for this value of Pr{M = 1}? 

(Exam Communication Theory, November 18, 2003) 

5. Consider a communication channel with an input s that can only assume positive values. 

The probability density function of the output R when the input is s is given by 

( 

1 

p R (r|S = s) = √ exp − r 2 ) 

2πs 2 2s 2 . 

Note that conditionally on the input s, this R is Gaussian, with mean zero and variance s 2 . 

In figure 3.9 this density is plotted for s = 1. 

0.45 

0.4 

0.35 

0.3 

p R 

(r|S=1) 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

−3 −2 −1 0 1 2 3 

r 

Figure 3.9: The probability density function p R (r|S = 1). 

There are two messages i.e. M = {1, 2} and the corresponding signals are s 1 = 1 and 

s 2 = √ e. Let the a-priori probabilities Pr{M = 1} = Pr{M = 2} = 1/2. 

(a) Sketch both probability density functions p R (r|S = s 1 ) and p R (r|S = s 2 ) in a single 

figure. For what values r does an optimum receiver decide ˆM = 1? 

(b) Determine the corresponding error probability P E . Express it in terms of the Q(·) 

function. 

(c) If the a-priori probability Pr{M = 1} is small enough, an optimum receiver will 

choose ˆM = 2 for all r. What is the largest value of Pr{M = 1} for which this 

happens? What is the error probability for this value of Pr{M = 1}? 

(Exam Communication Theory, January 23, 2004)

Chapter 4 

Real vector channels 

SUMMARY: In this chapter we discuss the channel whose input and output are vectors of 

real-valued components. An important example of such a real vector channel is the additive 

Gaussian noise (AGN) vector channel. This channel adds zero-mean Gaussian noise to each 

input signal component. All noise samples are assumed to be independent of each other and 

have the same variance. For the AGN vector channel we determine the optimum receiver. 

If all messages are equally likely this receiver chooses the signal vector that is closest to the 

received vector in Euclidean sense. We also investigate the error behavior of this channel. 

The last part of this chapter deals with the theorem of reversibility. 


s m 

r 

source 

m 

✲ 

transmitter 

s m1 ✲ 

r 1 ✲ 

s m2 ✲ real r 2 ✲ 

vector 

. 

✲ channel . 

✲ 

s m N r N 

receiver 

ˆm 

✲ 

destin. 

Figure 4.1: A communication system based on a real vector channel. 

In a communication system based on a real vector channel (see figure 4.1) the channel input 

and output are vectors with real-valued components. 

SOURCE The information source generates the message m ∈ M = {1, 2, · · · , |M|} with a- 

priori probability Pr{M = m} for m ∈ M. M is the random variable associated with this 

mechanism. 

TRANSMITTER The transmitter sends a vector-signal s m = (s m1 , s m2 , · · · , s m N ) consisting 

of N components if message m is to be transmitted. Each component assumes a value in 

the range (−∞, ∞). The random variable corresponding to the signal vector is denoted 

35

CHAPTER 4. REAL VECTOR CHANNELS 36 

by S. The vector-signal is input to the channel. The collection of used signal vectors is 

s 1 , s 2 , · · · , s |M| . 

REAL VECTOR CHANNEL The channel produces an output vector r = (r 1 , r 2 , · · · , r N ) 

consisting of N components. We assume here that all these components assume values 

from (−∞, ∞). The conditional probability density function of the channel output vector 

r when the input vector s is transmitted is p R (r|S = s). Thus, for S = s, the probability 

of receiving an N-dimensional channel output vector R with components R n for n = 1, N 

satisfying r n ≤ R n < r n + dr n is equal to p R (r|S = s)dr for an infinitely small dr = 

dr 1 dr 2 · · · dr N and r = (r 1 , r 2 , · · · , r N ). 

When message m ∈ M is sent, signal s m is chosen by the transmitter as channel input 

vector. Then the conditional probability density function 

p R (r|M = m) = p R (r|S = s m ) for all r ∈ (−∞, ∞) N , (4.1) 

describes the behavior of the cascade of the vector transmitter and the vector channel. 

RECEIVER The receiver forms ˆm based on the received real-valued channel output vector r, 

hence ˆm = f (r). Mapping f (·) is called the decision rule. 

DESTINATION The destination accepts ˆm. 

4.2 Decision variables, MAP decoding 

The optimum receiver upon receiving r determines which one of the possible messages has 

maximum a-posteriori probability. It therefore chooses the decision rule f (r) such that for all r 

that actually can be received 

Pr{M = f (r)}p R (r|M = f (r)) ≥ Pr{M = m}p R (r|M = m), for all m ∈ M. (4.2) 

In other words, upon receiving r, the optimum receiver produces an estimate ˆm that corresponds 

to a largest decision variable. Given r, the decision variables for the real vector channel are 

Pr{M = m}p R (r|M = m) = Pr{M = m}p R (r|S = s m ) for all m ∈ M. (4.3) 

That this is MAP-decoding follows from the Bayes rule which says that 

Pr{M = m|R = r} = Pr{M = m}p R(r|M = m) 

p R (r) 

(4.4) 

for r with p R (r) = ∑ m∈M Pr{M = m}p R(r|M = m) > 0. Note that p R (r) > 0 for an output 

vector r that has actually been received.


4.3 Decision regions 

The optimum receiver, upon receiving r, determines the maximum of all decision variables which 

are given in (4.3). To compute these decision variables the a-priori probabilities Pr{M = m} 

must be known to the receiver and also the conditional density functions p R (r|S = s m ) for all 

m ∈ M. This calculation can be carried out for every r in the observation space, i.e. the set of 

all possible output vectors r. Each point r in the observation space is therefore assigned to one of 

the possible messages m ∈ M, the message that is actually chosen by the receiver. This results 

in a partitioning of the observation space in at most |M| regions. 

Definition 4.1 Given the decision rule f (·) we can write 

I m 

= {r| f (r) = m}. (4.5) 

where I m is called the decision region that corresponds to message m ∈ M. 

Note that in the example in subsection 3.3.2 of the previous chapter we considered decision 

intervals. A decision region is an N-dimensional generalization of a (one-dimensional) decision 

interval. 

I 1 ✚ ✚✘ ✘ 

s 1 

✄ ✂ ✁ 

✡ ✡✡✡✡✚✚✘ I 2 

✄ 

✂ ✁ s 2 

 

✑ 

✑ ❅ 

✑ ❅❅❆❆ 

✟ 

✑ 

✟ 

✟ 

✟ 

✏ ✏✏✟✟ 

❇ 

✏ 

❅ ◗ 

✄ 

❳❳❵❵❵❵ 

✂ ✁ 

s 3 

I 3 

Figure 4.2: Three two-dimensional signal vectors and their decision regions. 

Example 4.1 Suppose (see figure 4.2) that we have three messages i.e. M = {1, 2, 3}, and the corresponding 

signal-vectors are two-dimensional i.e. s 1 

= (1, 2), s 2 

= (2, 1), and s 3 

= (1, −3). A possible 

partitioning of the observation space into the three decision regions I 1 , I 2 , and I 3 is shown in the figure.


4.4 The additive Gaussian noise vector channel 

4.4.1 Introduction 

The actual shape of the decision regions is determined by the a-priori message probabilities 

Pr{M = m}, the signals s m , and the conditional density functions p R (r|S = s m ) for all m ∈ 

M. A relatively simple but again quite important situation is the case where the channel adds 

n p N (n) = 

1 

(2πσ 2 ) N/2 exp(− ‖n‖2 

2σ 2 ) 

s 

✛❄ 

✘ 

r = s + n 

✲ + 

✲ 

✚✙ 

Figure 4.3: Additive Gaussian noise vector channel. 

Gaussian noise to the signal components (see figure 4.3). 

Definition 4.2 For the output of the additive Gaussian noise (AGN) vector channel we have 

that 

r = s + n, (4.6) 

where n = (n 1 , n 2 , · · · , n N ) is an N-dimensional noise vector. We denote this random noise 

vector by N. The noise vector N is assumed to be independent of the signal vector S. Moreover 

the N components of the noise vector are assumed to be independent of each other. All noise 

components have mean 0 and the same variance σ 2 . Therefore the joint probability density 

function of the noise vector is given by 

p N (n) = ∏ 

i=1,N 

1 

√ exp(− n2 i 

2πσ 2 2σ 2 ) = 1 

1 

(2πσ 2 exp(− 

) N/2 2σ 2 

∑ 

i=1,N 

n 2 i 

). (4.7) 

The notation in the definition can be contracted by noting that 

∑ 

ni 2 = (n · n) = ‖n‖ 2 , (4.8) 

i=1,N 

where (a · b) = ∑ i=1,N a ib i is the dot (inner) product of the vectors a = (a 1 , a 2 , · · · , a N ) and 

b = (b 1 , b 2 , · · · , b N ). Thus 

p N (n) = 

1 

(2πσ 2 exp(−‖n‖2 ). (4.9) 

) N/2 2σ 2


4.4.2 Minimum Euclidean distance decision rule 

If the channel output is r and the channel input was s m then the noise vector must have been 

n = r − s m . This and the independence of the noise vector and the signal vector yields that 

p R (r|M = m) = p R (r|S = s m ) = p N (r − s m |S = s m ) = p N (r − s m ), (4.10) 

similar to (3.10) in the previous chapter. Now we can easily determine the decision variables, 

one for each m ∈ M: 

Pr{M = m}p R (r|M = m) = Pr{M = m}p N (r − s m ) 

1 

= Pr{M = m} 

(2πσ 2 ) N/2 exp(−‖r − s m ‖2 

2σ 2 ). (4.11) 

Note that the factor (2πσ 2 ) N/2 is independent of m. Hence maximizing over the decision variables 

in (4.11) is equivalent to minimizing 

‖r − s m ‖ 2 − 2σ 2 ln Pr{M = m} (4.12) 

over all m ∈ M. We obtain a very simple decision rule if all messages are equally likely. 

RESULT 4.1 (Minimum Euclidean distance decision rule) If the a-priori message probabilities 

are all equal, the optimum receiver has to minimize the squared Euclidean distance 

‖r − s m ‖ 2 , over all m ∈ M. (4.13) 

In other words the receiver has to choose the message ˆm with corresponding signal vector s ˆm 

that is closest in Euclidean sense to the received vector r. Note that this result holds for the 

additive Gaussian noise vector channel. 

Example 4.2 Consider again (see figure 4.4) three two dimensional signal vectors s 1 

= (1, 2), s 2 

= 

(2, 1), and s 3 

= (1, −3). 

The optimum decision regions can be found by drawing the perpendicular bisectors of the sides of the 

signal triangle. These are the boundaries of the decision regions I 1 , I 2 , and I 3 (see figure). Note that the 

three bisectors have a single point in common. 

4.4.3 Error probabilities 

The error probability of an additive Gaussian noise vector channel is determined by the location 

of the signals s 1 , s 2 , · · · , s |M| , and, more importantly, the hyperplanes that separate these signalvectors. 

An error occurs if the noise ”pushes” a signal-point to the wrong side of a hyperplane. In 

general it is quite difficult to determine the exact error probability, however it is easy to compute 

the probability that the received signal is on the wrong side of a hyperplane. 

To study this behavior we investigate a simple example in two dimensions. i.e. N = 2. 

Consider a signal vector s = (a, b), see figure 4.5. This signal vector is corrupted by the additive


 

I 1 

s 1 

❡ 

I 2 

❅ 

❅ 

❅❡ 

s 2 

❳❳❳❳❳❳❳ ❳ ❳ ❳ ❳❳❳❳ 

❡✄ ✄✄✄✄✄✄✄✄✄✄✄ 

s 3 

I 3 

Figure 4.4: Three two-dimensional signal vectors and the corresponding optimum decision regions 

for the additive Gaussian noise channel. 

◗ 

◗ 

◗ 

◗ 

◗ 

r 2 ◗ 

I 

◗ 

◗ 

✻ 

◗ 

◗ 

◗ 

◗ 

(r 1 , r 2 ) ◗ 

◗ 

✡◗ ◗ 

❆❑ ◗◗◗◗◗◗ 

✡ 

❆ 

✢✡ ✡✡✡✡✡✡✣ 

◗ 

◗❦ 

◗ 

◗ 

✡ 

❆ 

◗ 

◗✡ 

❆ 

✡ ✡✡✡✣ ◗ 

◗ 

◗ 

◗❦ 

❆ 

◗ 

u 2 

◗ 

◗ ❆ 

◗ ❆ 

✡ ✡✣ ✡✡ u 1 

◗ 

◗ line 

◗ 

◗ 

◗❆ 

◗✡ ❆ 

✡ ◗ θ 

◗ 

✄ 

✂ ✁ 

(a, b) 

r 1 

✲ 

Figure 4.5: A signal point and a line.


Gaussian noise vector n = (n 1 , n 2 ) with independent components that both have mean zero and 

variance σ 2 . What is now the probability P I that the received vector r = (r 1 , r 2 ) = (a, b) + 

(n 1 , n 2 ) is in region I, the region above the line, see the figure? 

To solve this problem we have to change the coordinate system. Let 

r 1 = a + u 1 cos θ − u 2 sin θ (4.14) 

r 2 = b + u 1 sin θ + u 2 cos θ, (4.15) 

where (a, b) is the center of a Cartesian system with coordinates u 1 and u 2 , and θ the rotationangle. 

Coordinate u 1 is perpendicular to and coordinate u 2 is parallel to the line in the figure. 

Note that 

∫ ( 

1 

P I = 

I 2πσ 2 exp − (r 1 − a) 2 + (r 2 − b) 2 ) 

2σ 2 dr 1 dr 2 . (4.16) 

Elementary calculus (see e.g. [26], p. 386, theorem 7.7.13) tells us that 

∫ 

∫ 

f (r 1 , r 2 )dr 1 dr 2 = f (r 1 (u 1 , u 2 ), r 2 (u 1 , u 2 ))|J|du 1 du 2 , (4.17) 

I 

where |J| is the determinant of the Jacobian J which is 

( ∂r1 ∂r 2 

( 

J = 

∂u 1 ∂u 1 cos θ − sin θ 

= 

sin θ cos θ 

u 1 = 

I 

∂r 1 

∂u 2 

∂r 2 

∂u 2 

) 

) 

. (4.18) 

Note that |J| = cos 2 θ + sin 2 θ = 1 and (r 1 − a) 2 + (r 2 − b) 2 = u 2 1 + u2 2 . Therefore 

∫ ( 

1 

P I = 

I 2πσ 2 exp − u2 1 + ) 

u2 2 

2σ 2 du 1 du 2 

∫ ∞ ∫ ( 

∞ 

1 

= 

u 1 = u 2 =−∞ 2πσ 2 exp − u2 1 + ) 

u2 2 

2σ 2 du 1 du 2 

∫ ( ) 

∞ 

∫ ( ) 

1 

= √ exp − u2 ∞ 

1 

1 

2πσ 2 2σ 2 du 1 √ exp − u2 2 

2πσ 2 2σ 2 du 2 

u 2 =−∞ 

= Q( σ ) · 1 = Q( ). (4.19) 

σ 

Conclusion is that the probability depends only on the distance from the signal point (a, b) to 

the line. This result carries over to more than two dimensions: 

RESULT 4.2 For the additive Gaussian noise vector channel, the probability that the noise 

pushes a signal to the wrong side of a hyperplane is 

P I = Q( ), (4.20) 

σ 

where is the distance from the signal-point to the hyperplane and σ 2 is the variance of each 

noise component. All noise components are assumed to be zero-mean.


4.5 Theorem of reversibility 

It is interesting to investigate what kind of processing of the output of a channel is allowed, in the 

sense that it does not degrade the performance. An important result corresponding to this issue 

is stated below. 

THEOREM 4.3 (Theorem of reversibility) The minimum attainable probability of error is not 

affected by the introduction of a reversible operation at the output of a channel, see figure 4.6. 

s 

✲ 

channel 

r 

✲ 

G 

r ′ 

✲ 

Figure 4.6: If G is reversible then an optimum decision can also be made from r ′ = G(r). 

Proof: Consider the receiver for r ′ that is depicted in figure 4.7. This receiver first recovers 

r = G −1 (r ′ ). This is possible since the mapping G from r to r ′ is reversible. Then an optimum 

receiver for r is used to determine ˆm. The receiver constructed in this way for r ′ yields the same 

error probability as an optimum receiver that observes r directly, thus a reversible operation does 

not (necessarily) lead to an increase of P E . 

✷ 

r ′ 

✲ 

G −1 

r 

✲ 

optimum 

receiver 

for r 

ˆm 

✲ 

Figure 4.7: An optimum receiver for r ′ 

4.6 Exercises 

1. One of four equally likely messages is to be communicated over a vector channel which 

adds a (different) statistically independent zero-mean Gaussian random variable with variance 

N 0 /2 to each transmitted vector component. Assume that the transmitter uses the 

signal vectors shown in figure 4.8 and express the P E produced by an optimum receiver in 

terms of the function Q(x). 


2. In the communication system diagrammed in figure 4.9 the transmitted signal s m and the 

noises n 1 and n 2 are all random voltages and all statistically independent. Assume that


s 2 ✄ ✂ 

✁ 

✄ s 

✂ ✁ 1 

s 3 

✄ 

✂ ✁ 

√ 

Es /2 

✄ 

✂ ✁ 

s 4 

|M| = 2 i.e. M = {1, 2} and that 

Figure 4.8: Signal structure. 

Pr{M = 1} = Pr{M = 2} = 1/2, 

s 1 = √ 2E s , 

s 2 = 0, 

p N1 (n) = p N2 (n) = 

1 

√ 

2πσ 2 

n2 

exp(− ). (4.21) 

2σ 2 

m 

n 1 

✲ transmitter 

n 2 

✛✘r 1 = s m + n 1 

✲ + 

✲ 

✚✙ 

✻ 

s m 

✛❄ 

✘ 

✲ + 

✲ 

✚✙r 2 = s m + n 2 

optimum 

receiver 

ˆm 

✲ 

Figure 4.9: Communication system. 

r 1 

✛❄ 

✘r 1 + r 2 

+ 

✚✙ 

✻ 

r 2 

✲ threshold device 

ˆm 

✲ 

Figure 4.10: Detector structure. 

(a) Show that the optimum receiver can be realized as diagrammed in figure 4.10. Determine 

the optimum threshold setting and the value ˆm for r 1 + r 2 larger than the 

threshold.


m 

s m 

✛✘r 1 = s m + n 1 

✲ transmitter ✲ + 

✲ 

✚✙ 

✻ 

n 1 

✛❄ 

✘ 

✲ + 

✲ 

n 2 

✚✙r 2 = n 1 + n 2 

optimum 

receiver 

ˆm 

✲ 

Figure 4.11: 

r 1 

✛❄ 

✘r 1 + ar 2 

+ 

✲ threshold device 

✛✘✚ 

✙ 

✲ 

✻ 

× 

r 2 

✚✙ 

✻ 

a 

ˆm 

✲ 

Figure 4.12: 

(b) Express the resulting P E in terms of Q(·). 

(c) By what factor would E s have to be increased to yield this same probability of error 

if the receiver were restricted to observing only r 1 . 


3. In the communication system diagrammed in figure 4.11 the transmitted signal s and the 

noises n 1 and n 2 are all random voltages and all statistically independent. Assume that 

|M| = 2 i.e. M = {1, 2} and that 

Pr{M = 1} = Pr{M = 2} = 1/2 

s 1 = −s 2 = √ E s 

p N1 (n) = p N2 (n) = 

1 

√ 

2πσ 2 

n2 

exp(− ). (4.22) 

2σ 2 

(a) Show that the optimum receiver can be realized as diagrammed in figure 4.12 where 

a is an appropriately chosen constant. 

(b) What is the optimum value for a? 

(c) What is the optimum threshold setting? 

(d) Express the resulting P E in terms of Q(x).


(e) By what factor would E s have to be increased to yield this same probability of error 

if the receiver were restricted to observing only r 1 . 


4. Consider a set of signal vectors {s 1 , s 2 , · · · , s |M| }. Result 4.1 says that if the a-priori 

probabilities of the corresponding messages are equal, the minimum Euclidean distance 

decision rule is optimum if the channel is an additive Gaussian noise vector channel (definition 

4.2). In this case the decision regions can be described by a set of hyperplanes (one 

for each message pair). Show that this is also the case if the a-priori probabilities are not 

all equal.

Chapter 5 

Waveform channels, correlation receiver 

SUMMARY: All topics discussed so far are necessary to analyze the additive white Gaussian 

noise waveform channel. In a waveform channel continuous-time waveforms are transmitted 

over a channel that adds continuous-time white Gaussian noise to them. We show in 

this chapter that an optimum receiver for such a waveform channel correlates the received 

waveform with all possible transmitted waveforms and makes a decision based on all these 

correlations. Together these correlations form a sufficient statistic. 

5.1 System description, additive white Gaussian noise 

In a communication system based on an additive white Gaussian noise waveform channel (see 

figure 5.1) the channel input and output signals are waveforms. 

TRANSMITTER The transmitter chooses the waveform s m (t) for 0 ≤ t < T , if message m is 

to be transmitted. The set of used waveforms is therefore {s 1 (t), s 2 (t), · · · , s |M| (t)}. The 

chosen waveform is input to the channel. We assume here that all these waveforms have 

finite-energy, i.e. 

E sm 

= 

∫ T 

0 

sm 2 (t)dt < ∞ for all m ∈ M, (5.1) 

where E sm is the energy of waveform s m (t). To convince yourself that this is a reasonable 

definition assume that s m (t) is the voltage across (or the current through) a resistor of 1. 

The total energy that is dissipated in this resistor is then equal to E sm . 

n w (t) 

m 

✲ 

transmitter 

s m (t) 

✛❄ 

✘ 

✲ + 

✲ 

✚r(t) ✙= s m (t) + n w (t) 

receiver 

ˆm 

✲ 

Figure 5.1: Communication over an additive white Gaussian noise waveform channel. 

46

CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 47 

WAVEFORM CHANNEL The channel accepts the input waveform s m (t) and adds white Gaussian 

noise n w (t) to it, i.e. for the received waveform r(t) we have that 

r(t) = s m (t) + n w (t). (5.2) 

The noise n w (t) is supposed to be generated by the random process N w (t) which is zeromean 

(i.e. E[N w (t) = 0 for all t), stationary, white, and Gaussian 1 . The autocorrelation 

function of the noise is 

R Nw (t, s) = E[N w (t)N w (s)] = N 0 

δ(t − s), (5.3) 

2 

and depends only on the time difference t − s (by the stationarity). The noise has power 

spectral density 

S Nw ( f ) = N 0 

for −∞ < f < ∞, (5.4) 

2 

i.e. it is white, see appendix D. Note that f is the frequency in Hz. 

RECEIVER The receiver forms the message estimate ˆm based on the observed waveform 

r(t), 0 ≤ t < T . The interval [0, T ) is called the observation interval. 

The transmission of messages over a waveform channel is what we actually want to study in 

the first part of these course notes. It is a model for many digital communication systems (e.g. 

a telephone line including the modems on both sides, wireless communication links, recording 

channels). In the next sections we will show that the optimum receiver should correlate the 

received signal r(t) with all possible transmitted waveforms s m (t), m ∈ M to form a good 

message estimate. In a second chapter on waveform communication we will study alternative 

implementations of optimum receivers. 

GAUSSIAN NOISE is always everywhere. It originates in the thermal fluctuations of all 

matter in the universe. It creates randomly varying electromagnetic fields that will a produce a 

fluctuating voltage across the terminals of an antenna. The random movements of electrons in 

e.g. resistors will create random voltages in the electronics of the receiver. Since these voltages 

are the sum of many minuscule effects, by the central-limit theorem, they are Gaussian. 

5.2 Waveform expansions in series of orthonormal functions 

To determine the optimum receiver some new techniques have to be introduced. It is our aim 

to transform the waveform channel into vector channel first and then use our knowledge of the 

vector channel to determine an optimum receiver. Therefore we first expand the noise process 

N w (t) and the signals s m (t) in terms of a countable set of orthonormal functions (waveforms). 

Consider a set of functions { f i (t), i = 1, 2, · · · } that are orthonormal over the interval [0, T ) 

in the sense that 

∫ T 

{ 

f i (t) f j (t)dt = 

1 if i = j 

(5.5) 

0 if i ̸= j. 

0 

1 One can argue that white noise is always Gaussian, see Helstrom [9] p.36.


f 1 

(t) 

f 2 

(t) 

f 3 

(t) 

f 4 

(t) 

1 

1 

1 

1 

0.8 

0.8 

0.8 

0.8 

0.6 

0.6 

0.6 

0.6 

0.4 

0.4 

0.4 

0.4 

0.2 

0.2 

0.2 

0.2 

0 

0 

0 

0 

−0.2 

0 2 4 

−0.2 

−0.2 

−0.2 

0 2 4 0 2 4 0 2 4 

Figure 5.2: Example of orthonormal functions: four time-translated pulses. 

Some well-known sets of orthonormal functions are given in the example that follows. 

Example 5.1 In figure 5.2 four 

∫ 

functions are shown that are time-translated orthonormal pulses. Note 

T 

that the energy in a pulse i.e. 

0 

fi 2 (t)dt = 1 for i = 1, 4. In figure 5.3 five functions are shown on 

a√ one-second time interval, a pulse with amplitude 1 and four sine and cosine waves whose amplitude is 

2. All these functions are again mutually orthogonal and have unit energy. Note that functions like these 

form the first building blocks that are used in Fourier series. 

Now assume 2 that all outcomes n w (t) of the random process N w (t) can be represented in 

terms of the orthonormal functions f 1 (t), f 2 (t), · · · as 

n w (t) = 

∞∑ 

n i f i (t). (5.6) 

i=1 

Then the coefficients n i , i = 1, 2, · · · , will satisfy 

n i = 

∫ T 

0 

n w (t) f i (t)dt. (5.7) 

To see why, substitute (5.6) in the right side of (5.7) and integrate making use of (5.5). Next 

assume that also all signals can be expressed in terms of the same set of orthonormal functions, 

i.e. 

∞∑ 

s m (t) = s mi f i (t), for m ∈ {1, 2, · · · , |M|}, (5.8) 

i=1 

2 In this discussion we will focus only on the main concepts and we do not bother very much about mathematical 

details as e.g. the existence of such a “complete set” of orthonormal functions, extra conditions on the random 

process N w (t) and on the waveforms s m (t), the convergence of series, the interchanging of orders of summation and 

integration, etc. . For a more precise treatment we refer e.g. to Gallager [7], Helstrom [9], or Lee and Messerschmitt 

[13].


1.5 

constant 1 

1.5 

sqrt(2)*cos(2*pi t) 

1.5 

sqrt(2)*cos(4*pi*t) 

1 

1 

1 

0.5 

0.5 

0.5 

0 

0 

0 

−0.5 

−0.5 

−0.5 

−1 

−1 

−1 

−1.5 

0 0.5 1 

−1.5 

0 0.5 1 

−1.5 

0 0.5 1 

1.5 

sqrt(2)*sin(2*pi*t) 

1.5 

sqrt(2)*sin(4*pi*t) 

1 

1 

0.5 

0.5 

0 

0 

−0.5 

−0.5 

−1 

−1 

−1.5 

0 0.5 1 

−1.5 

0 0.5 1 

Figure 5.3: Example of orthonormal functions: first five functions in a Fourier series. 

where the coefficients s mi , m ∈ M, i = 1, 2, · · · , are given by 

s mi = 

∫ T 

0 

s m (t) f i (t)dt. (5.9) 

Then also the received signal r(t) = s m (t) + n w (t) can be expanded as 

∞∑ 

r(t) = r i f i (t), (5.10) 

with coefficients r i , i = 1, 2, · · · , that satisfy 

i=1 

r i = 

= 

∫ T 

0 

∫ T 

0 

r(t) f i (t)dt 

s m (t) f i (t)dt + 

∫ T 

0 

n w (t) f i (t)dt = s mi + n i . (5.11) 

5.3 An equivalent vector channel 

An optimum receiver can now make a decision based on the vector r ∞ = (r 1 , r 2 , r 3 , · · · ) instead 

of the waveform r(t). By the theorem of reversibility (Theorem 4.3) this does not affect the 

smallest possible achievable error probability P E . 

Similarly we can assume that instead of the signal s m (t) the vector s ∞ m = (s m1, r m2 , r m3 , · · · ) 

is transmitted in order to convey the message m ∈ M, the reason for this is that s ∞ m and s m(t) 

are equivalent.


All this implies that we have transformed our waveform channel into a vector channel (with 

an infinite number of dimensions however). Since 

r ∞ = s ∞ m + n∞ , (5.12) 

with n ∞ = (n 1 , n 2 , n 3 , · · · ) the resulting vector channel adds noise to the input vector. What 

can we say now about the noise components N i , i = 1, 2, · · · ? 

Note first that each component results from linear operations on the Gaussian process N w (t). 

Therefore the noise components are jointly Gaussian. 

For the mean of component N i for i = 1, 2, · · · , we find that 

∫ T 

E[N i ] = E[ 

0 

N w (t) f i (t)dt] = 

∫ T 

0 

E[N w (t)] f i (t)dt = 0. (5.13) 

For the correlation of component N i , i = 1, 2, · · · , and component N j , j = 1, 2, · · · , we obtain 

∫ T 

E[N i N j ] = E[ 

= 

= 

= 

∫ T 

0 0 

∫ T ∫ T 

0 0 

∫ T ∫ T 

0 

∫ T 

0 

0 

N w (α)N w (β) f i (α) f j (β)dαdβ] 

E[N w (α)N w (β)] f i (α) f j (β)dαdβ 

N 0 

2 δ(α − β) f i(α) f j (β)dαdβ 

N 0 

2 f i(α) f j (α)dα = N 0 

2 δ i j, (5.14) 

and thus the noise variables all have variance N 0 /2 and are uncorrelated and thus independent of 

each other. In our derivation we used the Kronecker delta function which is defined as 

δ i j 

= 

{ 1 if i = j, 

0 if i ̸= j. 

(5.15) 

As a consequence of all this our vector channel is an additive Gaussian noise vector channel 

as described in definition 4.2. 

5.4 A correlation receiver is optimum 

In the previous subsection we have seen that our waveform channel is equivalent to an additive 

Gaussian noise vector channel (with an infinite number of dimensions). From (4.12) we know 

what the decision variables are for this situation. The decision variable for m ∈ M is expressed 

as 

‖r ∞ − s ∞ m ‖2 − 2σ 2 ln Pr{M = m} = ‖r ∞ − s ∞ m ‖2 − N 0 ln Pr{M = m}, (5.16)


where we have substituted N 0 /2 for the variance σ 2 . The receiver has to minimize (5.16) over 

m ∈ M. With (a ∞ · b ∞ ) = ∑ ∞ 

i=1 a i b i for a ∞ = (a 1 , a 2 , · · · ) and b ∞ = (b 1 , b 2 , · · · ), we 

rewrite 

‖r ∞ − s ∞ m ‖2 = ((r ∞ − s ∞ m ) · (r ∞ − s ∞ m )) 

= (r ∞ · r ∞ ) − 2(r ∞ · s ∞ m ) + (s∞ m · s∞ m ) 

= ‖r ∞ ‖ 2 − 2(r ∞ · s ∞ m ) + ‖s∞ m ‖2 . (5.17) 

Since ‖r ∞ ‖ 2 does not depend on m, the optimum receiver should maximize 

with 

(r ∞ · s ∞ m ) + c m over m ∈ M = {1, 2, · · · , |M|} (5.18) 

N 0 

c m = 

2 ln Pr{M = m} − ‖s∞ m ‖2 

. (5.19) 

2 

Now it would be nice if the receiver could use some simple method to determine the dot 

product (r ∞ · s ∞ m ) and the squared vector length ‖s∞ m ‖, which are both sums of an infinite 

number of components. We will see next that this is possible and in fact quite easy. For the dot 

product we find 

∫ T 

0 

r(t)s m (t)dt = 

= 

∫ T 

0 

r(t) 

∞∑ 

s mi f i (t)dt 

i=1 

∞∑ 

∫ ∞ 

s mi r(t) f i (t)dt = 

i=1 

−∞ 

Similarly we get for the squared vector length 

E sm = 

∫ T 

This leads to an important result. 

0 

∫ T ∞∑ 

sm 2 (t)dt = s m (t) s mi f i (t)dt 

= 

0 

i=1 

∞∑ 

s mi r i = (r ∞ · s ∞ m ). (5.20) 

i=1 

∞∑ 

∫ ∞ 

s mi s m (t) f i (t)dt = 

i=1 

−∞ 

∞∑ 

smi 2 = ‖s∞ m ‖2 . (5.21) 

RESULT 5.1 The optimum receiver for transmission of the signals s m (t), m ∈ M, over an 

additive white Gaussian noise waveform channel has to maximize 

Here 

∫ ∞ 

−∞ 

r(t)s m (t)dt + c m over m ∈ M = {1, 2, · · · , |M|}. (5.22) 

N 0 

c m = 

2 ln Pr{M = m} − E s m 

2 . (5.23) 

This receiver is called a correlation receiver (see figure 5.4). 

i=1


r(t) 

s 1 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

s 2 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

integrator 

integrator 

∫ T 

0 r(t)s 1(t)dt 

∫ T 

0 r(t)s 2(t)dt 

c 1 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

c 2 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

select 

largest 

ˆm 

✲ 

s |M| (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

. 

integrator 

∫ T 

0 r(t)s |M|(t)dt 

c |M| 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

Figure 5.4: Correlation receiver, an optimum receiver for waveforms in white Gaussian noise. 

5.5 Sufficient statistic 

In the previous section we have seen that the receiver can make an optimum decision by looking 

only at the correlations 

∫ T 

0 

r(t)s m (t)dt, for all m ∈ M. (5.24) 

These |M| correlations together form a so-called sufficient statistic for making an optimum decision. 

It is unnecessary to store the received waveform r(t) after having determined the correlations. 

Or, to put it in another way, given the correlations the received waveform r(t) is not 

relevant anymore. 

5.6 Exercises 

1. Consider a waveform communication system in which one of two equally likely messages 

is transmitted, hence M = {1, 2} and Pr{M = 1} = Pr{M = 2} = 1/2. The waveforms 

s 1 (t) and s 2 (t) corresponding to the messages both have energy E, i.e. 

∫ T 

0 

s 2 1 (t)dt = ∫ T 

0 

s2 2 (t)dt = E, 

where [0, T ) is the observation interval. The channel adds zero-mean additive Gaussian 

noise with power spectral density S Nw ( f ) = 1 to the transmitted waveform. 

(a) Find an expression for the error probability obtained by a correlation receiver when 

s 1 (t) = −s 2 (t). 

(b) What is the error probability of a correlation receiver when ∫ T 

0 s 1(t)s 2 (t)dt = 0?

Chapter 6 

Waveform channels, building-blocks 

SUMMARY: In this chapter we assume that when signal waveforms are linear combinations 

of a set of orthonormal waveforms (building-bock waveforms) the waveform channel 

can be transformed into an AGN vector channel without loss of optimality. To each input 

waveform there corresponds a vector in what is called the signal space. It is possible to 

make an optimum decision based on the correlations of the channel output waveform with 

the building-block waveforms. The difference between this vector of correlations and the 

transmitted vector is a noise vector which is Gaussian and spherically distributed. 

6.1 Another sufficient statistic 

Suppose that the number |M| of waveforms s m (t) is large. Then the receiver has to determine 

|M| correlations which is quite complex. The question that pops up is now is whether an optimum 

decision can be based on a smaller number of correlations? This turns out to be true when 

all waveforms s m (t), m ∈ M, can be represented as linear combinations of N orthonormal functions 

and N < |M|. We call these orthonormal functions building-block waveforms or building 

blocks. 

The decomposition of waveforms into building-blocks is also essential when we are interested 

in evaluating the error probability of a system. 

Definition 6.1 The building-block waveforms {ϕ i (t), i = 1, N} satisfy by definition 

∫ T 

{ 

ϕ i (t)ϕ j (t)dt = 

1 if i = j 

0 if i ̸= j. 

0 

Note that such a set of building blocks is just a set of functions orthonormal over the interval 

[0, T ). 

We say that a basis of N building-block waveforms exists for a collection s 1 (t), s 2 (t), · · · , 

s |M| (t) of transmitted waveforms if we can decompose these transmitted waveforms as 

s m (t) = ∑ 

s mi ϕ i (t), for m ∈ M = {1, 2, · · · , |M|}. (6.2) 

i=1,N 

53 

(6.1)

CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 54 

Note that the coefficient s mi corresponding to a building-block ϕ i (t) satisfies 

∫ T 

0 

s m (t)ϕ i (t)dt = 

∫ T 

0 

∑ 

s mj ϕ j (t)ϕ i (t)dt = ∑ ∫ T 

s mj ϕ j (t)ϕ i (t)dt = s mi . (6.3) 

j=1,N 

j=1,N 

0 

Now the correlation of r(t) and the signal s m (t) for m ∈ M can be expressed as 

with 

∫ T 

0 


∫ T 

0 

i=1,N 

r(t) ∑ 

i=1,N 

0 

s mi ϕ i (t)dt 

= ∑ ∫ T 

s mi r(t)ϕ i (t)dt = ∑ 

s mi r i , (6.4) 

i=1,N 

∫ T 

 

r i = r(t)ϕ i (t)dt for i = 1, N. (6.5) 

0 

Observe that all the correlations ∫ T 

0 r(t)s m(t)dt, m ∈ M, can be computed from the N 

correlations ∫ T 

0 r(t)ϕ i(t)dt, i = 1, N. Therefore also these N correlations ∫ T 

0 r(t)ϕ i(t)dt, i = 

1, N, are a sufficient statistic since from these N numbers an optimum decision can be made. 

This is summarized in the following theorem. 

RESULT 6.1 If the transmitted waveforms s m (t), m ∈ M, are linear combinations of the N 

building-block waveforms ϕ i (t), i = 1, N, i.e. if 

s m (t) = ∑ 

s mi ϕ i (t), for m ∈ M, (6.6) 

i=1,N 

then the receiver can make an optimum decision from 

r i = 

∫ T 

0 

r(t)ϕ i (t)dt, for i = 1, N. (6.7) 

The numbers r 1 , r 2 , · · · , r N therefore form a sufficient statistic. The optimum receiver produces 

as estimate the message m that maximizes ∑ N 

i=1 r i s mi + c m over m ∈ M. Note that c m is given 

by ( 5.23 ). 

All this results in the receiver that is shown in figure 6.1. There we first see a bank of N 

multipliers and integrators. This bank yields the correlations r i = ∫ T 

0 r(t)ϕ i(t)dt, i = 1, N. 

Then a matrix multiplication yields ∑ i=1,N r is mi = ∫ T 

0 r(t)s m(t)dt for all m ∈ M. Adding the 

constants c m and choosing as message estimate the m that achieves the largest sum yields again 

an optimum receiver.


r(t) 

ϕ 1 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

ϕ 2 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

integrator 

integrator 

r 1 

✲ 

r 2 

✲ 

weighting 

matrix 

c 1 

∑ Ni=1 

r i s 1i 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

c 2 

∑ Ni=1 

r i s 2i 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

select 

largest 

✲ˆm 

ϕ N (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

. 

integrator 

r N 

✲ 

. 

c |M| 

∑ Ni=1 

r i s |M|i 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

Figure 6.1: Correlation receiver based on building blocks. 

6.2 The Gram-Schmidt orthogonalization procedure 

It is not so difficult to construct an orthonormal basis, i.e a set of N building-block waveforms, 

for a given set of |M| finite-energy signaling waveforms. The Gram-Schmidt procedure can be 

used. 

RESULT 6.2 (Gram-Schmidt) For an arbitrary signal collection, i.e. a collection of waveforms 

{s 1 (t), s 2 (t), · · · , s |M| (t)} on the interval [0, T ) we can construct a set of N ≤ |M| 

building-block waveforms {ϕ 1 (t), ϕ 2 (t), · · · , ϕ N (t)} and find coefficients s mi such that for m = 

1, 2, · · · , |M| the signals can be synthesized as s m (t) = ∑ i=1,N s miϕ i (t). 

The proof of this result can found in appendix E. There is also an example there where for 

three signals we determine a set of building-blocks. 

A certain set of signals can be expanded in many different sets of building-block waveforms, 

also in sets with a larger dimensionality in general. What remains the same however is the geometrical 

configuration of the vector representations of the signals (see exercise 3 in this chapter). 

The Gram-Schmidt procedure always yields a base with the smallest possible dimension. 

6.3 Transmitting vectors instead of waveforms 

In the previous section we have seen that for each collection of signals s 1 (t), s 2 (t), · · · , s |M| (t) 

we can construct a finite orthonormal base such that each waveform s m (t) for m ∈ M is a linear 

combination of the building-block waveforms ϕ 1 (t), ϕ 2 (t), · · · , ϕ N (t) that form the base, i.e. 

s m (t) = ∑ 

s mi ϕ i (t). (6.8) 

i=1,N


ϕ 1 (t) 

✛❄ 

✘ 

s ✲ 

m1 × 

✚✙ 

ϕ 2 (t) 

✛❄ 

✘ 

s ✲ 

✲ 

m2 × 

✚✙ 

. + 

✲ 

ϕ N (t) 

✛❄ 

✘ 

s ✲ 

m N 

× 

✚✙ 

✲ 

s m (t) 

Figure 6.2: A modulator forms the waveform s m (t) from the vector s m . 

Hence to each waveform s m (t) there corresponds a vector consisting of N coefficients 

s m = (s m1 , s m2 , · · · , s m N ). (6.9) 

Now, instead of the waveform s m (t) we can say that the vector s m is transmitted. A 

modulator (see figure 6.2) performs operation (6.8) that transforms a vector into the transmitted 

waveform. 

6.4 Signal space, signal structure 

The next result applies to the set of all transmitted waveforms. 

RESULT 6.3 If the waveforms s 1 (t), s 2 (t), · · · , s |M| (t) are linear combinations of N buildingblocks, 

the collection of waveforms can be represented as a collection of vectors s 1 , s 2 , · · · , s |M| 

in an N-dimensional space. This space is called the signal space. The collection of vectors 

s 1 , s 2 , · · · , s |M| is called the signal structure. 

Example 6.1 Consider for m = 1, 2, 3, 4 the set of phase-modulated transmitter waveforms 

{ √ 

2E s 

s m (t) = T 

cos(2π f 0t + mπ ) for 0 ≤ t < T and 

2 

0 elsewhere, 

(6.10) 

where f 0 is an integral multiple of 1/T . 

Note that 

cos(2π f 0 t + mπ 

2 ) = cos(2π f 0t) cos mπ 

2 − sin(2π f 0t) sin mπ 

2 . (6.11)


ϕ 

s 2 

3 ✻ 

s 1 

✛ 

s 2 

s 

✲ 4 

ϕ 1 

❄ 

Figure 6.3: Four signals represented as vectors in a two-dimensional space. The length of all 

signal vectors is √ E s . 

If we now take as building-block waveforms 

ϕ 1 (t) = √ 2/T cos(2π f 0 t), and 

ϕ 2 (t) = √ 2/T sin(2π f 0 t), (6.12) 

for 0 ≤ t < T and zero elsewhere, we obtain the following vector-representations for the signals: 

s 1 

= (0, − √ E s ), 

s 2 

= (− √ E s , 0), 

s 3 

= (0, √ E s ), 

s 4 

= ( √ E s , 0). (6.13) 

These vectors are depicted in figure 6.3. Note that the four waveforms shown in figure 6.4 have the same 

signal structure (with respect to the base also shown there) as the signals defined in (6.10). We will learn 

later in this chapter that both sets of waveforms also have identical error behavior. 

√ 

1 

τ 

0 

ϕ 1 (t) 

τ 

ϕ 2 (t) 

2τ 

t 

√ 

E 

− s 

τ 

√ 

E s 

τ 

0 τ 2τ 0 

s 1 (t) 

s 3 (t) 

t 

t 

√ 

E 

− s 

τ 

√ 

E s 

τ 

s 2 (t) 

s 4 (t) 

τ 

2τ 

t 

t 

0 

τ 

2τ 

0 

τ 

2τ 

Figure 6.4: Another set of waveforms that leads to the vector diagram in figure 6.3.


6.5 Receiving vectors instead of waveforms 

In the previous sections we have shown that a waveform-transmitter actually sends a vector out of 

an N-dimensional signal space. In this section we will see that the optimum waveform-receiver 

as derived in section 6.1, which operates with sufficient statistics corresponding to the buildingblocks, 

actually receives a vector in this N-dimensional signal space. 

r(t) 

ϕ 1 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

ϕ 2 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

integrator 

integrator 

r 1 

✲ 

r 2 

✲ 

ϕ N (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

. 

integrator 

r N 

✲ 

Figure 6.5: Recovering the vector r = (r 1 , r 2 , · · · , r N ) from the received waveform r(t). 

Consider the first part (demodulator) of the optimum receiver for waveform communication 

working with sufficient statistics (see figure 6.5). This receiver observes 

r(t) = s m (t) + n w (t), (6.14) 

where s m (t) for some m ∈ M is the transmitted waveform and n w (t) is a realization of a white 

Gaussian noise process. The first part of the receiver determines the N-dimensional vector r = 

(r 1 , r 2 , · · · , r N ) whose components are defined as (see figure 6.5) 

∫ T 

 

r i = r(t)ϕ i (t)dt for i = 1, 2, · · · , N. (6.15) 

0 

The components of this vector are the sufficient statistics mentioned in section 6.1. So we can 

say that the optimum receiver makes a decision based on the vector r instead of on the 

waveform r(t). 

6.6 The additive Gaussian vector channel 

In the previous sections we have seen that without losing optimality we may assume that an 

N-dimensional vector s m for some m ∈ M is transmitted and an N-dimensional vector r is


received. We will now consider the vector-channel that has s m as input and r as output. Since 

r(t) = s m (t) + n w (t) we have that 

with for i = 1, N, 

r i = 

= 

∫ T 

0 

∫ T 

0 

r(t)ϕ i (t)dt 

Hence again, as in section 5.3, we get 

s m (t)ϕ i (t)dt + 

∫ T 

0 

n w (t)ϕ i (t)dt = s mi + n i , (6.16) 

∫ T 

 

n i = n w (t)ϕ i (t)dt. (6.17) 

0 

r = s m + n, (6.18) 

with r = (r 1 , r 2 , · · · , r N ), s = (s 1 , s 2 , · · · , s N ), and n = (n 1 , n 2 , · · · , n N ). Note that the 

dimension N of the vectors is the dimension of the signal space (the number of building-block 

waveforms), which is now finite. In section 5.3 the dimension of the vectors was infinite. 

What can we say about the noise vector n that disturbs the transmitted vector s m ? Just 

like before, in section 5.3, the noise components are jointly Gaussian. They result from linear 

operations on the Gaussian process N w (t). For the mean of component N i , i = 1, N, we again 

find that 

∫ T 

∫ T 

E[N i ] = E[ N w (t)ϕ i (t)dt] = E[N w (t)]ϕ i (t)dt = 0. (6.19) 

0 

0 

By the orthonormality of the building-block waveforms we get for the correlation of component 

N i , i = 1, N, and component N j , j = 1, N, that 

∫ T ∫ T 

E[N i N j ] = E[ N w (α)N w (β)ϕ i (α)ϕ j (β)dαdβ] 

= 

= 

= 

0 0 

∫ T ∫ T 

0 0 

∫ T ∫ T 

0 

∫ T 

0 

0 

E[N w (α)N w (β)]ϕ i (α)ϕ j (β)dαdβ 

N 0 

2 δ(α − β)ϕ i(α)ϕ j (β)dαdβ 

N 0 

2 ϕ i(α)ϕ j (α)dα = N 0 

2 δ i j, (6.20) 

and thus all N noise variables have variance N 0 /2 and are uncorrelated and since they are jointly 

Gaussian also independent of each other. 

RESULT 6.4 The joint density function of the N-dimensional noise vector n, that is added to 

the input of a vector channel, which is derived from a waveform communication system, is 

p N (n) = 

1 

exp(−‖n‖2 ), (6.21) 

(π N 0 ) N/2 N 0


hence the noise is spherically symmetric and depends on the magnitude but not on the direction 

of n. The noise projected on each dimension i = 1, N has variance N 0 /2. 

We may conclude that our channel with input s and output r is an additive Gaussian noise 

(AGN) vector channel as was described in definition 4.2. 

6.7 Processing the received vector 

Note (see 4.12) that after having determined the vector r, an optimum vector receiver should 

search for the message m that minimizes 

‖r − s m ‖ 2 − 2σ 2 ln Pr{M = m}. (6.22) 

Note that by result 6.4 we must substitute N 0 /2 for σ 2 here. Inspection now shows that an 

optimum vector receiver should maximize 

(r · s m ) + N 0 

2 ln Pr{M = m} − ‖s m ‖2 

2 

= ∑ 

i=1,N 

r i s mi + N 0 

2 ln Pr{M = m} − ‖s m ‖2 

2 

(6.23) 

over m ∈ M. If we note that 

E sm = 

∫ T 

0 

∫ T 

sm 2 (t)dt = s m (t) ∑ 


0 

i=1,N 

i=1,N 

= ∑ ∫ T 

s mi s m (t)ϕ i (t)dt = ∑ 

smi 2 = ‖s m ‖2 , (6.24) 

0 

i=1,N 

we can observe that the back end of the receiver shown in figure 6.1 performs exactly as we 

expect. 

6.8 Relation between waveform and vector channel 

We have seen that to each set of waveforms there corresponds a finite set of building-block waveforms 

such that each waveform is a linear combination of these building-block waveforms. We 

have observed that instead of the waveform s m (t) the vector s m is transmitted over the channel. 

We have also seen that an optimum receiver only needs to consider the projections (correlations) 

of the received signal r(t) onto the building-block waveforms. The receiver can base its 

decision only on the vector r, a sufficient statistic. It is important to note that these conclusions 

depend on the fact that the channel adds white Gaussian noise to the input waveform. 

From the above we may conclude that transmission over an additive white Gaussian waveform 

channel can actually be considered as transmission over an additive Gaussian noise vector 

channel (see figure 6.6). A vector transmitter produces the vector s m when message m is generated 

by the source. This vector is transformed into the waveform s m (t) by a modulator. The


transmitter 

optimum receiver 

 

❅ ❅ m s m s m (t) r(t) r ˆm 

source 

✲ vector ✲ 

✲ waveform 

✲ 

✲ ✲ 

✲ modulator 

demodulator 

vector 

✲ 

transmitter 

channel 

receiver 

❅ ❅ 

vector channel 

 

Figure 6.6: Reduction of waveform channel to vector channel. 

waveform s m (t) is input to the channel that adds white Gaussian noise. The demodulator operates 

on the received waveform r(t) and forms the vector r. A vector receiver determines the 

estimate ˆm of the message that was sent from r. The combination modulator - waveform channel 

- demodulator is an additive Gaussian noise vector channel. In the previous chapter we have 

seen how an optimum receiver for such a channel can be constructed. The decision function that 

applies to this situation can be found in (4.12). 

We may conclude that, by converting the problem of designing an optimum receiver for 

waveform communication to designing an optimum receiver for an additive white Gaussian noise 

vector channel, we have solved it. We want to end this chapter with a very important observation: 

RESULT 6.5 The error behavior of the waveform communication system is only determined by 

the collection of signal vectors (the signal structure) and N 0 /2. We should realize that depending 

on the chosen building-block waveforms different sets of waveforms having the same signal structure 

yield the same error performance. What building-block waveforms actually are chosen by 

the communication engineer depends possibly on other factors as e.g. bandwidth requirements, 

as we will see in later parts of these course notes. 

6.9 Exercises 

1. For each set of waveforms s 1 (t), s 2 (t), · · · , s |M| (t) we can apply the Gram-Schmidt procedure 

to construct an orthonormal base such that each waveform s m (t) for m ∈ M is a 

linear combination of the building-block waveforms ϕ 1 (t), ϕ 2 (t), · · · , ϕ N (t). Show that 

for a given set of waveforms the Gram-Schmidt procedure results in a base with the smallest 

possible dimension N. 

2. If we apply the Gram-Schmidt procedure we must assume that the waveforms s m (t), for 

m ∈ M, have finite energy. In the construction it is essential that the energy E θm corresponding 

the the auxiliary signal θ m, (t) is finite. Show that 

E θm ≤ 

∫ T 

0 

sm 2 (t)dt < ∞. (6.25)


3. Show that independent of the choice of the building-block waveforms, the length of any 

signal vector s m for m ∈ M is constant. Moreover show that the Euclidean distance 

between two signal vectors s m and s m ′ for m, m ′ ∈ M is constant. What does this imply 

for energy and expected error probability of the system? 

4. (a) Calculate PE 

min when the signal sets (a), (b), and (c) specified in figure 6.7 are used to 

communicate one of two equally likely messages over a channel disturbed by additive 

Gaussian noise with S n ( f ) = 0.15. Note that the third set is specified by the Fourier 

transforms (see appendix B) of the waveforms. Use Parseval’s equation. 

(b) Also calculate the PE 

min for the same three sets when the a-priori message probabilities 

are 1/4, 3/4. 

(Based on exercise 4.8 from Wozencraft and Jacobs [25].) 

s 1 (t) 

1 1 

s 2 (t) 

s 1 (t) 

2 2 

s 2 (t) 

t t t t 

0 1 0 1 

0 1 0 2 

(a) 

(b) 

1 

S 1 ( f ) 

-1 0 +1 

f 

(c) 

− 3 2 

1 

− 1 2 

S 2 ( f ) 

Figure 6.7: S i ( f ), the Fourier transform of s i (t), is real. 

1 

2 

3 

2 

f 

5. One of three equally likely messages is to be transmitted over an additive white Gaussian 

noise channel with S n ( f ) = N 0 /2 = 1 by means of pulse amplitude modulation. 

Specifically 

for 0 ≤ t ≤ 1 and zero elsewhere. 

s 1 (t) = +2 √ 2 sin(2πt), 

s 2 (t) = 0, 

s 3 (t) = −2 √ 2 sin(2πt), (6.26) 

(a) What mathematical operations are performed by the optimum receiver? 

(b) Express the resulting average error probability in terms of Q(·).


(c) Calculate the minimum attainable average error probability if instead of being equal 

the message probabilities satisfy 

Pr{M = 1} = Pr{M = 3} = 1 and Pr{M = 2} = 

e 

e + 2 

where e is the base of the natural logarithm. 


e + 2 , (6.27) 

6. One of two equally likely messages is to be transmitted over an additive white Gaussian 

noise channel with S n ( f ) = N 0 /2 = 2 using two waveforms s 1 (t) and s 2 (t). Specifically 

s 1 (t) = 2 √ 2φ 1 (t), and 

where φ 1 (t) and φ 2 (t) are two orthonormal waveforms, i.e. 

∫ ∞ 

−∞ 

s 2 (t) = 2 √ 2φ 2 (t), (6.28) 

∫ ∞ 

∫ ∞ 

φ1 2 (t)dt = φ2 2 (t)dt = 1 and φ 1 (t)φ 2 (t)dt = 0. (6.29) 

−∞ 

(a) What mathematical operations are performed by an optimum receiver? 

(b) Express the the resulting average error probability in terms of Q(·). 

(c) Calculate the minimum attainable average error probability if instead of being equal 

the message probabilities satisfy 

−∞ 

Pr{M = 1} = 1 

e 2 and Pr{M = 2} = 

e2 

+ 1 

where e is the base of the natural logarithm. 


e 2 + 1 , (6.30)

Chapter 7 

Matched filters 

SUMMARY: Here we discuss matched-filter receiver structures for optimum waveform 

communication. The optimum receiver can be implemented as a matched-filter receiver, 

so a matched filter not only maximizes the signal-to-noise ratio but it can also be used to 

minimize the error probability. In the last part of this chapter we consider the Parseval 

relation between correlation and dot product. 

7.1 Matched-filter receiver 

r(t) 

✲ 

h i (t) = ϕ i (T − t) 

u i (t) 

✲ 

Figure 7.1: A filter matched to the building-block waveform ϕ i (t). 

Note that for all i = 1, N the building-block waveforms are such that ϕ i (t) ≡ 0 for t < 0 

and t ≥ T . 

We can now replace the N multipliers and integrators by N matched filters and samplers. 

This can be advantageous in analog implementations since accurate multipliers are in that case 

more expensive than filters. Note that appendix C contains some elementary material on filters. 

Consider (see figures 7.1 and 7.2) filters with an impulse response h i (t) = ϕ i (T − t) for 

i = 1, N. Note that h i (t) ≡ 0 for t ≤ 0 (i.e. the filter is causal) and also for t > T . For the 

output of such a filter we can write 

u i (t) = r(t) ∗ h i (t) = 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(α)h i (t − α)dα 

r(α)ϕ i (T − t + α)dα. (7.1) 

64

CHAPTER 7. MATCHED FILTERS 65 

ϕ i (t) 

ϕ i (T − t) 

❆ 

✁ ✁✁❆ ❆ 

❆ ❆ 

✁✁ ✁ ❆ 

✁ ✁✁✁✁❆ ❆ 

❆❆ t 

❆❆ t 

✁ 

0 T 

0 

T 

Figure 7.2: A building-block waveform ϕ i (t) and the impulse response ϕ i (T − t) of the corresponding 

matched filter. 

At t = T the matched-filter output 

u i (T ) = 

∫ ∞ 

−∞ 

r(α)ϕ i (α)dα = 

∫ T 

0 

r(α)ϕ i (α)dα = r i , (7.2) 

hence we can determine the i-th component of the vector r in this way. Note that r i is the i-th 

component of the sufficient statistic corresponding to the building-blocks. 

r(t) 

✲ 

✲ 

ϕ 1 (T − t) 

ϕ 2 (T − t) 

❍ r 1 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ ✲ 

u 1 (t) 

❍ r 2 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ ✲ 

u 2 (t) 

weighting 

matrix 

c 1 

(r · s ❄ 1 ) ✛✘ 

✲ + ✲ 

✚✙ 

c 2 

(r · s ❄ 2 ) ✛✘ 

✲ + ✲ 

✚✙ 

select 

largest 

ˆm 

✲ 

∑ 

✲ ϕ N (T − t) 

❍ r N 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ ✲ 

u N (t) 

i r is mi 

c |M| 

(r · s ✛❄ 

|M| ) ✘ 

✲ + ✲ 

✚✙ 

❄ 

sample at t = T 

Figure 7.3: Matched-filter receiver based on building-blocks. 

Now we have an alternative method for computing the vector r. This results in the receiver 

shown in figure 7.3. Note that for all m ∈ M 

(r · s m ) = ∑ 

i=1,N 

r i s mi = 

∫ T 

0 

r(t)s m (t)dt. (7.3)


A filter whose impulse response is a delayed time-reversed version of a signal ϕ i (t) is called 

”matched to” ϕ i (t). A receiver that is equipped with such filters is called a matched-filter receiver. 

7.2 Direct receiver 

Note that, just like the building-blocks, for all m ∈ M the transmitted waveforms are such that 

s m (t) ≡ 0 for t < 0 and t ≥ T . 

Now for all m ∈ M consider a filter with impulse response s m (T − t), let the waveform 

channel output r(t) be the input of all these filters and sample the |M| filter outputs at t = T . 

Then we obtain 

∫ ∞ 

−∞ 

r(α)s m (T − t + α)dα t=T = 

∫ ∞ 

−∞ 

r(α)s m (α)dα = 

∫ T 

0 

r(α)s m (α)dα. (7.4) 

This gives another method to form an optimum receiver (see figure 7.4). This receiver is called a 

direct receiver since the filters are matched directly to the signals {s m (t), m ∈ M}. 

Note that a direct receiver is usually more expensive than a receiver with filters matched to 

the building-block waveforms, since always M ≥ N and in practice even often M ≫ N. The 

weighting-matrix operations are not needed here however. 

r(t) 

✲ 

✲ 

s 1 (T − t) 

s 2 (T − t) 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

c 1 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

c 2 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

select 

largest 

ˆm 

✲ 

✲ 

s |M| (T − t) 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

c |M| 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

❄ 


Figure 7.4: Direct matched-filter receiver. 

7.3 Signal-to-noise ratio 

We have seen in the previous section that a matched-filter receiver is optimum, i.e. minimizes 

the error probability P E . Not only in this sense is the matched filter optimum, we will show


r(t) = s(t) + n w (t) 

✲ 

h(t) 

❍ u ❍❍ s (T ) + u n (T ) 

✄ 

✂ ✄ 

✁ ✂ ✁ 

✲ 

❄ sample at t = T 

Figure 7.5: A matched filter maximizes S/N . 

next that it also maximizes the signal-to-noise ratio. To see what we mean by this, consider 

the communication situation shown in figure 7.5. A signal s(t) is assumed to be non-zero only 

for 0 ≤ t < T . This signal is observed in additive white noise, i.e. the observer receives 

r(t) = s(t)+n w (t). The process N w (t) is a zero-mean white Gaussian noise process with power 

spectral density S w ( f ) = N 0 /2 for all −∞ < f < ∞. 

The problem is now e.g. to decide whether the signal s(t) was present in the noise or not 

(or in a similar setting to decide whether s(t) or −s(t) was seen). Therefore the observer uses a 

linear time-invariant filter with impulse response h(t) and samples the filter output at time t = T . 

For the sampled filter output u(t) at time t = T we can write 

with 

u(T ) = r(t) ∗ h(t)| t=T 

= 

u s (T ) 

u n (T ) 

∫ ∞ 

−∞ 

 

= 

 

= 

r(T − α)h(α)dα = u s (T ) + u n (T ), (7.5) 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

s(T − α)h(α)dα, and (7.6) 

n w (T − α)h(α)dα, (7.7) 

where u s (T ) is the signal component and u n (T ) the noise component in the the sampled filter 

output. 

Definition 7.1 We can now define the signal-to-noise ratio as 

i.e. the ratio between signal energy and noise variance. 

The noise variance can be expressed as 

E[U 2 n (T )] = E[ ∫ ∞ 

= 

= N 0 

2 

−∞ 

∫ ∞ ∫ ∞ 

−∞ −∞ 

∫ ∞ ∫ ∞ 

−∞ 

S/N = u2 s (T ) 

E[Un 2 (7.8) 

(T )], 

N w (T − α)h(α)dα 

∫ ∞ 

−∞ 

N w (T − β)h(β)dβ] 

E[N w (T − α)N w (T − β)]h(α)h(β)dαdβ 

−∞ 

δ(β − α)h(α)h(β)dαdβ = N 0 

2 

∫ ∞ 

−∞ 

h 2 (α)dα. (7.9)


RESULT 7.1 If we substitute (7.9) and (7.6) in (7.8) we obtain for the maximum attainable 

signal-to-noise ratio 

S/N = 

(∗) 

≤ 

= 

[∫ ∞ 

−∞ s(T − α)h(α)dα] 2 

N 0 

∫ ∞ 

2 −∞ h2 (α)dα 

∫ ∞ 

−∞ s2 (T − α)dα ∫ ∞ 

−∞ h2 (α)dα 

N 0 

∫ ∞ 

2 −∞ h2 (α)dα 

∫ ∞ ∫ 

−∞ s2 (T − α)dα T 

= 

N 0 

2 

0 s2 (T − α)dα 

. (7.10) 

N 0 

2 

The inequality (∗) in this derivation comes from Schwarz inequality (see appendix F). Equality 

is obtained if and only if h(t) = Cs(T − t) for some constant C, i.e. if the filter h(t) is matched 

to the signal s(t). 

Note that the maximum signal-to-noise ratio depends only on the energy of the waveform 

s(t) and not on its specific shape. 

It should be noted that in this section we have demonstrated a weaker form of optimality for 

the matched filter than the one that we have obtained in the previous chapter. The matched filter 

is not only the filter that maximizes signal-to-noise ratio, but can be used for optimum detection 

as well. 

7.4 Parseval relation 

Equation (7.3) demonstrates that a correlation can be expressed as the dot product of the corresponding 

vectors. We will investigate this phenomenon more closely here. Therefore consider a 

set of orthonormal waveforms {ϕ i (t), i = 1, 2, · · · , N} over interval [0, T ) and two waveforms 

f (t) and g(t) that can be expressed in terms of these building-blocks, i.e. 

f (t) 

g(t) 

 

= ∑ 

i=1,N 

 

= ∑ 

i=1,N 

f i ϕ i (t), 

The vector-representations that correspond to f (t) and g(t) are 

Now: 

f = ( f 1 , f 2 , · · · , f N ) and 

g i ϕ i (t). (7.11) 

g = (g 1 , g 2 , · · · , g N ). (7.12)


RESULT 7.2 (Parseval relation) 

∫ T 

0 

f (t)g(t)dt = 

∫ T 

0 

= ∑ 

i=1,N 

= ∑ 

i=1,N 

∑ 

i=1,N 

∑ 

j=1,N 

∑ 

j=1,N 

f i ϕ i (t) ∑ 

j=1,N 

g j ϕ j (t)dt 

∫ T 

f i g j ϕ i (t)ϕ j (t)dt 

0 

f i g j δ i j = ∑ 

i=1,N 

f i g i = ( f · g). (7.13) 

This result says that the correlation of f (t) and g(t), which is defined as the integral of their 

product, is equal to the dot product of the corresponding vectors. Note that this result can be 

regarded as an analogue to the Parseval relation in Fourier analysis (see appendix B) which says 

that 

with 

∫ ∞ 

−∞ 

F( f ) = 

G( f ) = 

f (t)g(t)dt = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

Here G ∗ ( f ) is the complex conjugate of G( f ). 

The consequences of result 7.2 are: 

• Take g(t) ≡ f (t) then 

∫ T 

0 

∫ ∞ 

−∞ 

F( f )G ∗ ( f )d f, (7.14) 

f (t) exp(− j2π f t)dt, and 

g(t) exp(− j2π f t)dt. (7.15) 

f 2 (t)dt = ( f · f ) = ‖ f ‖ 2 . (7.16) 

This means that the energy of waveform f (t) is simply the square of the length of the 

corresponding vector f . We therefore also call the squared length of a vector its energy. 

We have seen before that ∫ T 

= sm 2 (t)dt = ‖s m ‖2 , (7.17) 

E sm 

0 

the energy corresponding to the waveform s m (t), for m ∈ M = {1, 2, · · · , |M|}. 

• Consider ∫ T 


0 

∫ T 

0 

i=1,N 

r(t) ∑ 

i=1,N 

0 


= ∑ ∫ T 

s mi r(t)ϕ i (t)dt = ∑ 

s mi r i = (s m · r). (7.18) 

i=1,N 

This result is similar to the Parseval result 7.2 but not identical since r(t) ̸= ∑ i=1,N r iϕ i (t), 

i.e. r(t) can not be expressed as a linear combination of building-block waveforms. Note 

that (7.18) is identical to equation (7.3).


7.5 Notes 

Figure 7.6: Dwight O. North, inventor of the matched filter. Photo IEEE-IT Soc. Newsl., Dec. 

1998. 

The matched filter as a filter for maximizing the signal-to-noise ratio was invented by North 

[14] in 1943. The result was published in a classified report at RCA Labs in Princeton. The 

name “matched filter” was coined by Van Vleck and Middleton who independently published 

the result a year later in a Harvard Radio Research Lab report [24]. 

7.6 Exercises 


noise channel with S n ( f ) = 0.05 by means of binary pulse position modulation. Specifically 

s 1 (t) = p(t) 

in which the pulse p(t) is shown in figure 7.7. 

s 2 (t) = p(t − 2), (7.19) 

(a) What mathematical operations are performed by the optimum receiver? 

(b) What is the resulting average error probability? 

(c) Indicate two methods of implementing the receiver, each of which uses a single linear 

filter followed by a sampler and comparison device. Method I requires that two samples 

from the filter output be fed into the comparison device. Method II requires that 

just one sample be used. For each method sketch the impulse response of the appropriate 

filter and its response to p(t). Which of the methods is most easily extended to 

M-ary pulse position modulation, where s m (t) = p(t − 2m + 2), m = 1, 2, · · · , M?


2 p(t) 

❆ 

1 ❆❆❆❆ 

0 ✁✁✁✁✁✁❆ 1 2 t 

Figure 7.7: A pulse. 

(d) Suggest another pair of waveforms that require the same energy as the binary pulseposition 

waveforms and yield the same average error probability and a pair that 

achieves smaller error probability. 

(e) Calculate the minimum attainable average error probability if 

Repeat this for 


s 1 (t) = p(t) and s 2 (t) = p(t − 1). (7.20) 

s 1 (t) = p(t) and s 2 (t) = −p(t − 1). (7.21) 

√ 

Es /2 

s 1 (t) 

√ 

Es /7 

s 2 (t) 

0 

1 2 3 4 5 6 7 t 

0 1 2 3 4 5 6 7 t 

− √ E s /7 

Figure 7.8: The waveforms s 1 (t) and s 2 (t). 

2. Specify a matched filter for each of the signals shown in figure 7.8 and sketch each filter 

output as a function of time when the signal matched to it is the input. Sketch the output 

of the filter matched to s 2 (t) when the input is s 1 (t). 


3. Consider a transmitter that sends the signal 

s a (t) = ∑ 

a k h(t − (k − 1)τ), 

k=1,K


impulses 

a 1 , · · · , a K 

+1 +1 

✻ 

0 

❄ 

−1 

✻ 

✲ 

✻ 

τ 2τ 3τ 

t 

h(t) 

n w (t) 

s ❄ 

a (t) ✗ ✔ 

✲ + ✲ h(T − t) 

✖ ✕ 

h(t) 

✁ ✁✁❅ ❅ ✁ ✁❅ ❅✁ ✁❅ ❅ ❅ ✁ ✁❅ 

❅ 

0 τ❆ 2τ 3τ 4τ 

❆❆ 

−h(t − τ) 

h(t − 3τ) 

t 

❍ ❍❍ ✄ 

✂ ✁ 

❄ 

✲ 

sample at 

t = T + (k − 1)τ 

✁ ✁✁❅ ❅ ✂❅ ❅ ❅ 

❇❇❇ ❅❅ 

0 τ 2τ 3τ 4τ 

❇ ✂✂✂ 

s a (t) 

t 

Figure 7.9: Pulse transmission and detection with a sampled-matched-filter receiver. Signals are 

also shown. 

when the message a = a 1 , a 2 , · · · , a K has to be conveyed to the decoder (i.e. pulsetransmission, 

see figure 7.9). Assume that a k ∈ {−1, +1} for k = 1, K , thus there are 2 K 

messages. The signal is sent over an additive white Gaussian noise waveform channel. 

Assume that h(t) = 0 for t < 0 and t ≥ T . The receiver now uses a matched filter 

h(T − t) and samples its output at t = T + (k − 1)τ, for k = 1, K . These K samples are 

then processed to form an estimate of the transmitted message (sequence). 

Show that this receiver is optimum if processing of the samples is done in the right way. 

What is optimum processing here? 


noise channel with S n ( f ) = N 0 /2 = 1 by means of binary pulse-position modulation. 

Specifically 

s 1 (t) = p(t), 

s 2 (t) = p(t − 2), 

in which the pulse p(t) is shown in figure 7.10. 

(a) Describe (and sketch) an optimum receiver for this case? Express the resulting error 

probability in terms of Q(·). 

(b) Give the implementation of an optimum receiver which uses a single linear filter 

followed by a sampler and comparison device. Assume that two samples from the 

filter output are fed into the comparison device. Sketch the impulse response of the


2 p(t) 

1 

0 1 2 t 

Figure 7.10: A rectangular pulse. 

appropriate filter. What is the output of the filter at both sample moments when the 

filter input is s 1 (t)? What are these outputs for filter input s 2 (t)? 

(c) Calculate the minimum attainable average error probability if 

s 1 (t) = p(t) and s 2 (t) = p(t − 1). 

(Exam Communication Principles, October 6, 2003.) 

5. In a communication system based on an additive white Gaussian noise waveform channel 

six signals (waveforms) are used. All signals are zero for t < 0 and t ≥ 8. For 0 ≤ t < 8 

the signals are 

s 1 (t) = 0 

s 2 (t) = +2 cos(πt/2) 

s 3 (t) = +2 cos(πt/2) + 2 sin(πt/2) 

s 4 (t) = +2 cos(πt/2) + 4 sin(πt/2) 

s 5 (t) = +4 sin(πt/2) 

s 6 (t) = +2 sin(πt/2) 

The messages corresponding to the signals all have probability 1/6. The power spectral 

density of the noise process N w (t) is N 0 /2 = 4/9 for all f . The receiver observes the 

received waveform r(t) = s m (t) + n w (t) in the time interval 0 ≤ t < 8. 

(a) Determine a set of building-block waveforms for these six signals. Sketch these 

building-block waveforms. Show that they are orthonormal over [0, 8). Give the 

vector representations of all six signals and sketch the resulting signal structure. 

(b) Describe for what received vectors r an optimum receiver chooses ˆM = 1, ˆM = 

2, · · · , and ˆM = 6. Sketch the corresponding decision regions I 1 , I 2 , · · · , I 6 . Give 

an expression for the error probability P E obtained by an optimum receiver. Use the 

Q(·)-function. 

(c) Sketch and specify a matched-filter implementation of an optimum receiver for the 

six signals (all occurring with equal a-priori probability). Use only two matched 

filters.


(d) Compute the average energy of the signals. We can translate the signal structure 

over a vector a such that the average signal energy is minimal. Determine this vector 

a. What is the minimal average signal energy? Specify the modified signals 

s 

1 ′ (t), s′ 2 (t), · · · , s′ 6 

(t) that correspond to this translated signal structure. 

(Exam Communication Theory, November 18, 2003.)

Chapter 8 

Orthogonal signaling 

SUMMARY: Here we investigate orthogonal signaling. In this case the waveforms that 

correspond to the messages are orthogonal. We determine the average error probability P E 

for these signals. It appears that P E can be made arbitrary small by increasing the number 

of waveforms if only the energy per transmitted bit is larger than N 0 ln 2. Note that this is 

a capacity-result! 

8.1 Orthogonal signal structure 

Consider |M| signals s m (t), or in vector representation s m , with a-priori probabilities Pr{M = 

m} = 1/|M| for m ∈ M = {1, 2, · · · , |M|}. We now define an orthogonal signal set in the 

following way: 

Definition 8.1 All signals in an orthogonal set are assumed to have equal energy and are orthogonal 

i.e. 

s m 

= 

√ 

Es ϕ m 

for m ∈ M, (8.1) 

where ϕ m 

is the unit-vector corresponding to dimension m. There are as many building-block 

waveforms ϕ m (t) and dimensions in the signal space as there are messages. 

The signals that we have defined are now orthogonal since 

∫ ∞ 

−∞ 

s m (t)s k (t)dt = (s m · s k ) = E s (ϕ m · ϕ k 

) = E s δ mk for m ∈ M and k ∈ M. (8.2) 

Note that all signals have energy equal to E s . 

So far we have not been very explicit about the actual signals. However we can e.g. think of 

(disjoint) shifts of a pulse (pulse-position modulation, PPM) or sines and cosines with an integer 

number of periods over [0, T ) (frequency-shift keying, FSK). 

75

CHAPTER 8. ORTHOGONAL SIGNALING 76 

8.2 Optimum receiver 

How does the optimum receiver decide when it receives the vector r = (r 1 , r 2 , · · · , r |M| )? It has 

to choose the message m ∈ M that minimizes the squared Euclidean distance between s m and 

r, i.e. 

‖r − s m ‖ 2 = ‖r‖ 2 + ‖s m ‖ 2 − 2(r · s m ) 

= ‖r‖ 2 + E s − 2 √ E s r m . (8.3) 

RESULT 8.1 Since only the term −2 √ E s r m depends on m an optimum receiver has to choose 

ˆm such that 

r ˆm ≥ r m for all m ∈ M. (8.4) 

Now we can find an expression for the error probability. 

8.3 Error probability 

To determine the error probability for orthogonal signaling we may assume, because of symmetry, 

that signal s 1 (t) was actually sent. Then 

r 1 = √ E s + n 1 , 

r m = n m , for m = 2, 3, · · · , |M|, 

with p N (n) = 

∏ 1 

√ exp(− n2 m 

). π N0 N 0 

(8.5) 

m=1,|M| 

Note that the noise vector n = (n 1 , n 2 , · · · , n |M| ) consists of |M| independent components all 

with mean 0 and variance N 0 /2. 

Suppose that the first component of the received vector is α. Then we can write for the correct 

probability, conditional on the fact that message 1 was sent and that the first component of r is 

α, 

Pr{ ˆM = 1|M = 1, R 1 = α} = Pr{N 2 < α, N 3 < α, · · · , N |M| < α} 

= (Pr{N 2 < α}) |M|−1 

(∫ α 

|M|−1 

(β)dβ) 

= p N (8.6) 

therefore the correct probability 

P C = 

∫ ∞ 

−∞ 

−∞ 

p N (α − √ (∫ α 

|M|−1 

E s ) p N (β)dβ) 

dα. (8.7) 

−∞


We rewrite this correct probability as follows 

P C = 

= 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

1 

√ exp(− (α − √ E s ) 2 (∫ α 

|M|−1 

) · · · dβ) 

dα 

π N0 N 0 −∞ 

1 

√ exp(− (α/√ N 0 /2 − √ E s / √ N 0 /2) 2 

2π 

(∫ α 

|M|−1 

) · · · dβ) 

dα/ √ N 0 /2 

−∞ 

2 

1 

√ exp(− (µ − √ ( ∫ 

2E s /N 0 ) 2 √ |M|−1 µ N0 /2 

) · · · dβ) 

dµ (8.8) 

2π 2 

−∞ 

with µ = α/ √ N 0 /2. Furthermore 

∫ µ 

√ N0 /2 

−∞ 

1 

√ π N0 

exp(− β2 

N 0 

)dβ = 

= 

∫ µ 

√ N0 /2 

−∞ 

∫ µ 

−∞ 

1 

√ exp(− (β/√ N 0 /2) 2 

)dβ/ √ N 0 /2 

2π 2 

1 

√ exp(− λ2 

2π 2 

)dλ (8.9) 

with λ = β/ √ N 0 /2. Therefore 

P C = 

∫ ∞ 

−∞ 

(∫ µ 

|M|−1 

p(µ − b) p(λ)dλ) 

dµ, (8.10) 

−∞ 

with p(γ ) = 1 √ 

2π 

exp(− γ 2 

2 ) and b = √ 2E s /N 0 . Note that p(γ ) is the probability density 

function of a Gaussian random variable with mean zero and variance 1. 

From (8.10) we conclude that for a given |M| the correct probability P C depends only on 

b i.e. on the ratio E s /N 0 . This can be regarded as a signal to noise ratio since E s is the signal 

energy and N 0 is twice the variance of the noise in each dimension. 

Example 8.1 In figure 8.1 the error probability P E = 1 − P C is depicted for values of log 2 |M| = 

1, 2, · · · , 16 and as a function of E s /N 0 . 

8.4 Capacity 

Consider the following experiment. We keep increasing |M| and want to know how we should 

increase E s /N 0 such that the error probability P E gets smaller and smaller. It turns out that it is 

the energy per bit that counts. We define E b i.e. the energy per transmitted bit of information 

E b 

= 

Reliable communication is possible if E b > N 0 ln 2. More precisely: 

E s 

log 2 |M| . (8.11)


10 0 Es/N0 

10 −1 

10 −2 

10 −3 

10 −4 

−4 −2 0 2 4 6 8 10 12 14 16 

Figure 8.1: Error probability P E for |M| orthogonal signals as a function of E s /N 0 in dB for 

log 2 |M| = 1, 2, · · · , 16. Note that P E increases with |M| for fixed E s /N 0 . 

RESULT 8.2 The error probability for orthogonal signalling satisfies 

P E ≤ 

{ 2 exp(− log2 |M|[E b /(2N 0 ) − ln 2]) E b /N 0 ≥ 4 ln 2 

2 exp(− log 2 |M|[ √ E b /N 0 − √ ln 2] 2 ) ln 2 ≤ E b /N 0 ≤ 4 ln 2. 

(8.12) 

The proof of result 8.2 can be found in appendix G. 

The consequence of (8.12) is that if E b , i.e. the energy per bit, is larger than N 0 ln 2 we can 

get an arbitrary small error probability by increasing |M|, the dimensionality of the signal space. 

This a capacity result! 

Note that the number of bits that we transmit each time is log 2 |M| and this number grows 

much slower than |M| itself. 

Example 8.2 In figure 8.2 we have plotted the error probability P E as a function of the ratio E b /N 0 . It 

appears that for ratios larger than ln 2 = −1.5917 dB (this number is called the Shannon limit) the error 

probability decreases by making |M| larger.


10 0 Eb/N0 

10 −1 

10 −2 

10 −3 

10 −4 

−4 −2 0 2 4 6 8 10 12 14 16 

Figure 8.2: Error probability for |M| = 2, 4, 8, · · · , 32768 now as function of the ratio E b /N 0 

in dB. 

8.5 Exercises 

1. A Hadamard matrix is a matrix whose elements are ±1. When n is a power of 2, an n × n 

Hadamard matrix is constructed by means of the recursion: 

) 

, 

H 2 

= 

( +1 +1 

+1 −1 

( ) 

+Hn +H 

H 2n = n 

. (8.13) 

+H n −H n 

Let n be a power of 2 and M = {1, 2, · · · , n}. Consider for m ∈ M the signal vectors 

s m 

= 

√ 

E s 

n h m where h m is the m-th row of the Hadamard matrix H n. 

(a) Show that the signal set {s m , m ∈ M} consists of orthogonal vectors all having 

energy E s . 

(b) What is the error probability P E if the signal vectors correspond to waveforms that 

are transmitted over a waveform channel with additive white Gaussian noise having 

spectral density N 0 /2. 

(c) What is the advantage of using the Hadamard signal set over the orthogonal set from 

definition 8.1 if we assume that the building-block waveforms are in both cases timeshifts 

of a pulse? And the disadvantage?


(d) The matrix n × 2n matrix 

H ∗ n 

 

= 

∣ +H ∣ 

n ∣∣∣ 

. (8.14) 

−H n 

defines a set of 2n signals which is called bi-orthogonal. Determine the error probability 

if this signal set if the energy of each signal is E s . 

A bi-orthogonal “code” with n = 32 was used for an early deep-space mission (Mariner, 

1969). A fast Hadamard transform was used as decoding method [6]. 

2. Consider a communication system based on frequency-shift keying (FSK). There are 8 

equiprobable messages, hence Pr{M = m} = 1/8 for m ∈ {1, 2, · · · , 8}. The signal 

waveform corresponding to message m is 

s m (t) = A √ 2 cos(2πmt), for 0 ≤ t < 1. 

For t < 0 and t ≥ 1 all signals are zero. The signals are transmitted over an additive white 

Gaussian noise waveform channel. The power spectral density of the noise is S n ( f ) = 

N 0 /2 for all frequencies f . 

(a) First show that the signals s m (t), m ∈ {1, 2, · · · , 8} are orthogonal 1 . Give the energies 

of the signals. What are the building-block waveforms ϕ 1 (t), ϕ 2 (t), · · · , ϕ 8 (t) 

that result in the signal vectors 

s 1 = (A, 0, 0, 0, 0, 0, 0, 0), 

s 2 = (0, A, 0, 0, 0, 0, 0, 0), 

· · · 

s 8 = (0, 0, 0, 0, 0, 0, 0, A)? 

(b) The optimum receiver first determines the correlations r i = ∫ 1 

0 r(t)ϕ i(t)dt for i = 

1, 2, · · · , 8. Here r(t) is the received waveform hence r(t) = s m (t) + n w (t). For 

what values of the vector r = (r 1 , r 2 , · · · , r 8 ) does the receiver decide that message 

m was transmitted? 

(c) Give an expression for the error probability P E obtained by the optimum receiver. 

(d) Next consider a system with 16 messages all having the same a-priori probability. 

The signal waveform corresponding to message m is now given by 

s m (t) = A √ 2 cos(2πmt), for 0 ≤ t < 1 for m = 1, 2, · · · , 8, 

s m (t) = −A √ 2 cos(2π(m − 8)t), for 0 ≤ t < 1 for m = 9, 10, · · · , 16. 

For t < 0 and t ≥ 1 these signals are again zero. What are the signal vectors 

s 1 , s 2 , · · · , s 16 if we use the building vectors mentioned in part (a) again? 

1 Hint: 2 cos(a) cos(b) = cos(a − b) + cos(a + b).


(e) The optimum receiver again determines the correlations r i = ∫ 1 

0 r(t)ϕ i(t)dt for i = 

1, 2, · · · , 8. For what values of the vector r = (r 1 , r 2 , · · · , r 8 ) does the receiver now 

decide that message m was transmitted? Give an expression for the error probability 

P E obtained by the optimum receiver now. 

(Exam Communication Principles, October 6, 2003.)

Part III 

Transmitter Optimization 

82

Chapter 9 

Signal energy considerations 

SUMMARY: The collection of vectors that corresponds to the signal-waveforms is called 

the signal structure. If the signal structure is translated or rotated the error probability 

need not change. In this chapter we determine the translation vector that minimizes the 

average signal energy. After that we demonstrate that orthogonal signaling achieves a certain 

error performance with twice as much energy as antipodal signal. The energy loss of 

orthogonal sets with many signals can be neglected however. 

9.1 Translation and rotation of signal structures 

In the additive white 1 Gaussian noise case, if a signal structure, i.e. the collection of vectors 

s 1 , s 2 , · · · , s |M| , is translated or rotated the error probability P E need not change. To see why, 

just assume that the corresponding decision regions I 1 , I 2 , · · · , I |M| (which need not be optimum) 

are translated or rotated in the same way. Then, because the additive white Gaussian 

noise-vector is spherically symmetric in all dimensions in the signal space, and since distances 

between decoding regions and signal points did not change, the error probability remains the 

same. However, in general, the average signal energy changes if a signal structure is translated. 

Rotation about the origin has no effect on the average signal energy. 

9.2 Signal energy 

Here we want to stress again that 

E sm = 

∫ T 

0 

s 2 m (t)dt = ‖s m ‖2 , (9.1) 

1 Although we are concerned with an AGN vector channel, we use the word ”white” to emphasize that this vector 

channel is equivalent to a waveform channel with additive white Gaussian noise. 

83

CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 84 

therefore we only need to know the collection of signal vectors, the signal structure, and the 

message probabilities to determine the average signal energy which is defined as 

E av 

= 

∑ 

m∈M 

Pr{M = m}E sm = ∑ 

m∈M 

9.3 Translating a signal structure 

Pr{M = m}‖s m ‖ 2 = E[‖S‖ 2 ]. (9.2) 

The smallest possible average error probability of a waveform communication system only depends 

on the signal structure, i.e. on the collection of vectors s 1 , s 2 , · · · , s |M| . It does not 

change if the entire signal structure is translated. The optimum decision regions simply translate 

too. Translation may affect the average signal energy however. We next determine the translation 

ϕ 2 

ϕ ′ 2 

✄ 

✂ ✁ 

a 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

ϕ ′ 1 

✄ 

✡ ✡✡✡✡✣ ✄ 

✂ ✂ ✁ 

✁ 

ϕ 1 

Figure 9.1: Translation of a signal structure, moving the origin of the coordinate system to a. 

vector that minimizes the average signal energy. 

Consider a certain signal structure. Let E av (a) denote the average signal energy when the 

structure is translated over −a, or equivalently when the origin of the coordinate system is moved 

to a (see figure 9.1), then 

E av (a) = ∑ 

Pr{M = m}‖s m − a‖ 2 = E[‖S − a‖ 2 ]. (9.3) 

m∈M 

We now have to find out how we should choose the translation vector a that minimizes 

E av (a). It turns out that the best choice is 

a = 

∑ 

Pr{M = m}s m = E[S]. (9.4) 

m=1,|M| 

This follows from considering an alternative translation vector b. The energy of the signal struc-


ture after translation by b is 

E av (b) = E[‖S − b‖ 2 ] = E[‖(S − a) + (a − b)‖ 2 ] 

= E[‖S − a‖ 2 + 2(S − a) · (a − b) + ‖a − b‖ 2 ] 

= E[‖S − a‖ 2 ] + 2(E[S] − a) · (a − b) + ‖a − b‖ 2 

= E[‖S − a‖ 2 ] + ‖a − b‖ 2 

≥ E[‖S − a‖ 2 ], (9.5) 

where equality in the third line follows from a = E[S]. Observe that E av (b) is minimized only 

for b = a = E[S]. 

If we do not translate, i.e. when b = 0, the average signal energy 

E av (0) = E[‖S − a‖ 2 ] + ‖a‖ 2 , (9.6) 

hence we can save ‖a‖ 2 if we translate the center of the coordinate system to a. 

RESULT 9.1 This implies that to minimize the average signal energy we should choose the 

center of gravity of the signal structure as the origin of the coordinate system. If the center of 

gravity of the signal structure a ̸= 0 we can decrease the average signal energy by ‖a‖ 2 by doing 

an optimal translation of the coordinate system. 

In what follows we will discuss some examples. 

9.4 Comparison of orthogonal and antipodal signaling 

Let M = {1, 2} and Pr{M = 1} = Pr{M = 2} = 1/2. Consider a first signal set of two 

orthogonal waveforms (see the two sub-figures in the top row in figure 9.2) 

s 1 (t) = √ 2E s sin(10πt) 

s 2 (t) = √ 2E s sin(12πt), for 0 ≤ t < 1, (9.7) 

and a second signal set of two antipodal signals (see the bottom row sub-figures in figure 9.2) 

s 1 (t) = √ 2E s sin(10πt) 

s 2 (t) = − √ 2E s sin(10πt), also for 0 ≤ t < 1. (9.8) 

Note that binary FSK (frequency-shift keying) is the same as orthogonal signaling, while binary 

PSK (phase-shift keying) is identical to antipodal signaling. 

The vector representations of the signal sets are 

s 1 = ( √ E s , 0), 

s 2 = (0, √ E s ), (9.9)


1.5 

s1(t)/sqrt(Es) 

1.5 


1 

1 

0.5 

0.5 

0 

0 

−0.5 

−0.5 

−1 

−1 

−1.5 

0 0.2 0.4 0.6 0.8 1 

−1.5 

0 0.2 0.4 0.6 0.8 1 

1.5 


1.5 


1 

1 

0.5 

0.5 

0 

0 

−0.5 

−0.5 

−1 

−1 

−1.5 

0 0.2 0.4 0.6 0.8 1 

−1.5 

0 0.2 0.4 0.6 0.8 1 

Figure 9.2: The two signal sets of (9.7) and (9.8) in waveform representation. 

s 2 

✄ 

✂ ✁ 

ϕ 2 

s 1 

0 

s 2 

✄ ✄ ✂ ✁ 

✂ ✁ 

✄ 

✂ ✁ 0 

ϕ 1 

s 1 

ϕ 1 

(a) 

(b) 

Figure 9.3: Vector representation of the orthogonal (a) and the antipodal signal set (b).


10 0 

10 −1 

10 −2 

10 −3 

10 −4 

10 −5 

10 −6 

10 −7 

−15 −10 −5 0 5 10 15 

Figure 9.4: Probability of error for binary antipodal and orthogonal signaling as function of 

E s /N 0 in dB. It is assumed that Pr{M = 1} = Pr{M = 2} = 1/2. Observe the difference of 3 

dB between both curves. 

for the first (orthogonal) set, and 

s 1 = ( √ E s , 0), 

s 2 = (− √ E s , 0), (9.10) 

for the second (antipodal) set. 

The average signal energy for both sets is equal to E s . For the error probabilities in the case 

of additive white Gaussian noise with power spectral density N 0 /2 we get 

⎛ ⎞ 

P orthog. 

E 

= Q 

P antipod. 

E 

= Q 

⎝√ 2Es 

√ ⎠ = Q( √ E s /N 0 ), 

N 

2 0 

2 

⎛ ⎞ 

⎝ 2√ E s 

√ ⎠ = Q( √ 2E s /N 0 ). (9.11) 

2 

N 0 

2 

Note that the distance d between the signal points differs by a factor √ 2 and P E = Q(d/2σ )). 

These error probabilities are depicted in figure 9.4. We observe a difference in the error probabilities 

of 3.0 dB. For a description of the meaning of decibels etc. we refer to appendix H. By 

this we mean that, in order to achieve a certain value of P E , we have to make the signal energy 

twice as large for orthogonal signaling than for antipodal signaling. The better performance of 

antipodal signaling relative to orthogonal signaling is best explained by the fact that for antipo-


dal signaling the center of gravity is the origin of the coordinate system while for orthogonal 

signaling this is certainly not the case. 

We may conclude that binary PSK modulation achieves a better error-performance than binary 

FSK. An advantage of FSK modulation is however that efficient FSK receivers can be 

designed that do not recover the phase. PSK on the other hand requires coherent demodulation. 

9.5 Energy of |M| orthogonal signals 

The average energy E av of the signals in an orthogonal set (all signals having energy E s ) is 

E av = E[‖S‖ 2 ] = ∑ 

Pr{M = m}‖s m ‖ 2 = E s . (9.12) 

m∈M 

The center of gravity of our orthogonal signal structure is 

We know from (9.6) that 

thus, since E av (0) = E av , 

or 

a = ( 1 

|M| , 1 

|M| , · · · , 1 

|M| )√ E s . (9.13) 

E av (0) = E av (a) + ‖a‖ 2 (9.14) 

E s = E av (a) + E s 

|M| 

(9.15) 

E av (a) = E s (1 − 1 ). (9.16) 

|M| 

Observe that E av (a) is the average signal energy after translating the coordinate system such that 

its origin is the center of gravity of the signal structure. For |M| → ∞ the difference between 

the E av (a) and E s can be neglected. Therefore we conclude that an orthogonal signal set is not 

optimal in the sense that for a given error performance it needs the smallest possible average 

signal energy, but that the difference diminishes if M → ∞. 

9.6 Upper bound on the error probability 

In general the probability of error P E of an optimum receiver cannot be computed easily. However 

we can use the ”union bound” to obtain an upper bound for it. To see how this works we 

assume that the a priori message probabilities are all equal. Then for the error probability PE 

1 

when message 1 is sent, we can write 

⋃ 

PE 1 = Pr{ (‖R − s m ‖ ≤ ‖R − s 1 ‖)|M = 1} 

≤ 

m∈M,m̸=1 

∑ 

m∈M,m̸=1 

Pr{‖R − s m ‖ ≤ ‖R − s 1 ‖|M = 1}. (9.17)


Note that in the last step we used the union bound Pr{∪ i E i } ≤ ∑ Pr{E i }, where E i is an event 

indexed by i. 

Now the total error probability can upper bounded by 

P E ≤ ∑ 

Pr{M = m}PE 

m 

≤ 

m∈M 

∑ 

m∈M 

1 

|M| 

∑ 

m ′ ∈M,m ′̸=m 

Pr{‖R − s m ′‖ ≤ ‖R − s m ‖|M = m}. (9.18) 

Also when the a priori probabilities are not all equal we can use the union bound to upper bound 

the error probability. 

9.7 Exercises 

1. Either of the two waveform sets illustrated in figures 9.5 and 9.6 may be used to communicate 

one of four equally likely messages over an additive white Gaussian noise channel. 

s 1 (t) 

s 2 (t) 

√ 

Es /2 

√ 

Es /2 

1 2 3 4 t 

1 

2 

3 

4 

t 

s 3 (t) 

s 4 (t) 

√ 

Es /2 

√ 

Es /2 

1 2 3 4 t 

1 2 3 4 t 

Figure 9.5: Signal set (a). 

(a) Show that both sets use the same energy. 

(b) Exploit the union bound to show that the set of figure 9.6 uses energy almost 3 dB 

more effectively than the set of figure 9.5 when a small P E is required. 


2. In the communication system diagrammed in figure 9.7 the transmitted signal-vectors 

(s m1 , s m2 ) and the noises n 1 and n 2 are all statistically independent. Assume that |M| = 2


s 1 (t) 

s 2 (t) 

√ 

Es 

1 2 3 4 t 

√ 

Es 

1 

2 

3 

4 

t 

s 3 (t) 

s 4 (t) 

√ 

Es 

√ 

Es 

1 2 3 4 t 

1 

2 

3 

4 

t 

Figure 9.6: Signal set (b). 

i.e. M = {1, 2} and that 

Pr{M = 1} = Pr{M = 2} = 1/2, 

(s 11 , s 12 ) = (3 √ E, 2 √ E), 

(s 21 , s 22 ) = (− √ E, −2 √ E), 

p N1 (n) = p N2 (n) = 

1 n2 

√ exp(− ). 

2πσ 2 2σ 2 

(9.19) 

m 

✲ 

transmitter 

s m1 

s m2 

n 1 

✛❄ 

✘r 1 = s m1 + n 1 

✲ + 

✲ 

✚✙ 

✛✘ 

✲ + 

✲ 

✚✙r 2 = s m2 + n 2 

✻ 

optimum 

receiver 

ˆm 

✲ 

n 2 

Figure 9.7: Communication system. 

(a) Show that the optimum receiver can be realized as diagrammed in figure 9.8. Determine 

α, the optimum threshold setting, and the value ˆm for r 1 + αr 2 larger than the 

threshold. 

(b) Express the resulting P E in terms of Q(·).


r 1 

2 

threshold device 

✛❄ 

✘r 1 + αr 

+ 

✲ 

r 2 

✛✘✚ 

✙ 

✲ 

✻ 

× 

✚✙ 

✻ 

α 

ˆm 

✲ 

Figure 9.8: Detector structure. 

(c) What is the average energy Eav of the signals. By translation of the signal structure 

we can reduce this average energy without changing the error probability. Determine 

the smallest possible average signal energy that can be obtained in this way. 

(Exam Communication Principles, July 5, 2002.)

Chapter 10 

Signaling for message sequences 

SUMMARY: In this chapter we will describe the problem of transmitting a continuous 

stream of messages. How should we use the available signal power P s in such a way that we 

achieve a large information rate R together with a small error probability P E ? 

10.1 Introduction 

In the previous part we have considered transmission of a single randomly-chosen message 

m ∈ M, over a waveform channel during the time-interval 0 ≤ t < T . The problem that was to 

be solved there was to determine the optimum receiver, i.e the receiver that minimizes the error 

probability P E for a channel with additive white Gaussian noise. We have found the following 

solution: 

1. First form an orthonormal base for the set of signals. 

2. The optimum receiver determines the finite-dimensional vector representation of the received 

waveform, i.e. it projects the received waveform onto the orthonormal signal base. 

3. Then using this vector representation of the received waveform, it determines the message 

that has the (or a) largest a-posteriori probability. This is done by minimizing expression 

(4.12) over all m ∈ M, hence by acting as an optimum vector receiver. 

In the present part we shall investigate the more practical situation where we have a continuous 

stream of messages that are to be transmitted over our additive white Gaussian noise 

waveform channel. 

10.2 Definitions, problem statement 

Definition 10.1 We assume that each T seconds one out of |M| messages is to be transmitted. 

The messages are equally likely 1 i.e. Pr{M = m} = 1/|M| for all m ∈ M. 

1 Only for the sake of simplicity we assume that all messages are equally likely. This is not essential however. 

92

CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 93 

m 1 ∈ M 

m 2 ∈ M 

m 3 ∈ M 

· · · 

0 

T 

2T 

3T 

t 

Figure 10.1: A continuous stream of messages. 

The waveform that corresponds to message m ∈ M is s m (t). It is non-zero only for 0 ≤ 

t < T . If the message in the k-th interval [(k − 1)T, kT ) is m, the signal in that interval is 

s k,m (t) = s m (t − (k − 1)T ). 

Definition 10.2 For each signal s m (t) for m ∈ M the available energy is E s hence 

Therefore the available power is 

E s ≥ 

∫ T 

0 

sm 2 (t)dt Joule. (10.1) 

E s 

P s = Joule/second. (10.2) 

T 

Definition 10.3 The transmission rate R is defined as 

R = log 2 |M| bit/second. (10.3) 

T 

Definition 10.4 Now the available energy per transmitted bit is defined as 

E b 

= 

E s 

log 2 |M| 

Joule/bit. (10.4) 

From the above definitions we can deduce for the available energy per transmitted bit that 

E b = E s 

T 

T 

log 2 |M| = P s 

R , (10.5) 

hence P s = RE b . 

Our problem is now to determine the maximum rate at which we can communicate reliably 

over a waveform channel when the available power is P s . What are the signals that are to 

be used to achieve this maximum rate? In the next sections we will first consider two extremal 

situations, namely bit-by-bit signaling and block-orthogonal signaling. 

10.3 Bit-by-bit signaling 

10.3.1 Description 

Suppose that in T seconds, we want to transmit K binary digits b 1 b 2 · · · b K . Then 

|M| = 2 K , 

R = log 2 |M| 

T 

= K T . (10.6)


x(t) 

c 

❅ ❅ 

τ 

t 

∫ ∞ 

−∞ x2 (t)dt = E b 

❅ ❅ 

x(t − 3τ) 

c 

s(t) 

❅ ❅ 

❅ ❅ 

❅ ❅ 

❅ ❅ 

❅ ❅ τ 2τ❅ ❅ ❅ ❅ 

 

❅ ❅ 

3τ 4τ 5τ t 

Figure 10.2: A bit-by-bit waveform for K = 5 and message 11010. 

We can realize this by transmitting a signal s(t) that is composed out of K pulses x(t) that 

are shifted in time (see figure 10.2). More precisely 

s(t) = ∑ 

{ −1 if bi = 0 

s i x(t − (i − 1)τ) with s i = 

(10.7) 

+1 if b i = 1. 

i=1,K 

Here x(t) is a pulse with energy E b and duration τ. 

To evaluate the performance of bit-by-bit signaling we first have to determine the buildingblock 

waveforms that are the base of our signals. It will be clear that if we take for i = 

1, 2, · · · , K 

ϕ i (t) = x(t − (i − 1)τ) 

√ (10.8) 

Eb 

then for i = 1, 2, · · · , K and j = 1, 2, · · · , K 

∫ ∞ 

−∞ 

ϕ i (t)ϕ j (t)dt = δ i j , (10.9) 

and we have an orthonormal base. Hence the building-block waveforms are the time-shifts over 

multiples of τ of the normalized pulse x(t)/ √ E b as is shown in figure 10.3. Now we can determine 

the signal structures that correspond to all the signals. For K = 1, 2, 3 we have shown the 

signal structures for bit-by-bit signaling in figure 10.4. The structure is always a K -dimensional 

hypercube. 

To see how the decision regions look like note first that 

s 1 = (− √ E b , − √ E b , · · · , − √ E b ). (10.10)


c √ Eb 

ϕ 1 (t) 

❅ ❅ 

ϕ 2 (t) 

❅ ❅ 

ϕ 3 (t) 

ϕ 4 (t) 

❅ ❅ 

ϕ 5 (t) 

❅ ❅ 

 

❅ 

❅ ❅ 

❅❅ ❅ 

❅ ❅ τ 2τ 3τ 4τ 5τ t 

Figure 10.3: Our K = 5 building-block waveforms. 

✛ 

2 √ E b 

✲ 

✄ 

✂ ϕ 2 

✄ 

✁ 

✂ ✁ 

 

2 √ s 4 

E b 

✛ 

✲ 

✄ 

✄ 

✂ ✁ 

✂ ✁ 

ϕ 1 ϕ 1 

s 0 1 s 0 

2 

s 1 

s 3 

s 2 

K = 1 

ϕ 

s 2 

s 

✄ 3 

4 

✂ ✄ 

✁ 

✂ ✁ 

✟✟ 

✄ 

s ✟ ✟✟✟✟✟✟ ✄✟ ✟✟✟✟✟✟ 

7 ✂ ✁ 

✂ ✁ 

s 8 

s 6 

✄ 

✂ ✁ 

 

✄ 

✂ ✁ 

K = 2 

ϕ 1 

✟✟ 

✟ 

✟ 

✟ 0 

ϕ 3 ✟ 

✟ 

✟ 

✟ ✄ 

✂ ✄ 

✁ 

✂ ✁ s 2 

s 1 

✄ 

✂✟ ✟✟✟✟✟✟ ✄ 

✁ 

✂✟✁ 

✟✟✟✟✟✟ 

s 5 K = 3 

✛ 

✲ 

2 √ E b 

Figure 10.4: Bit-by-bit signal structure for K = 1, 2, 3.


We claim 2 that the optimum receiver decides ˆm = 1 if 

r i < 0, for all i = 1, K (10.11) 

Now for messages other than for m = 1 similar arguments show that the decision regions are 

separated by hyperplanes r i = 0 for i = 1, K . 

To find an expression for P E note that the signal hypercube is symmetrical and assume that 

s 1 was transmitted. No error occurs if r i = − √ E b + n i < 0 for all i = 1, K or, in other words, 

if 

n i < √ E b for all i = 1, K , (10.12) 

hence, noting that σ 2 = N 0 /2, we get 

P C = 

( 

1 − Q( √ 2E b /N 0 )) K 

. (10.13) 

Therefore 

P E = 1 − 

( 

1 − Q( √ 2E b /N 0 )) K 

. (10.14) 

Observe that to estimate bit b i for i = 1, K , an optimum receiver only needs to consider the 

received signal r i in dimension i. 

10.3.2 Probability of error considerations 

Note that for bit-by-bit signaling K = RT (see equation (10.6)) and that E b = P s /R (see 

equation (10.5)) and substitute this in (10.14), our expression for P E . We then obtain 

P E = 1 − 

( 

1 − Q( √ 2P s /RN 0 )) RT 

. (10.15) 

Based on this expression for P E we can now investigate what we should do to improve the 

reliability of our system. 

First note that K = RT ≥ 1. For K = 1 we get P E = Q( √ 2P s /RN 0 ). There are two 

conclusions that can be drawn now: 

• For fixed average power P s and fixed rate R the average error probability increases and 

approaches 1 if we increase T . We cannot improve reliability by increasing T therefore. 

• For fixed T the average error probability P E can only be decreased by increasing the power 

P s or by decreasing the rate R. 

This seemed to be ”the end of the story” for communication engineers before Shannon presented 

his ideas in [20] and [21]. 

2 To see why, consider a signal s m for some m ̸= 1. Consider dimensions i for which s mi = √ E b . Note that 

s 1i = − √ E b hence for r i < 0 we have (r i − s mi ) 2 = (r i − √ E b ) 2 > (r i + √ E b ) 2 = (r i − s 1i ) 2 . For dimensions 

i with s mi = − √ E b = s 1i we get (r i − s mi ) 2 = (r i − s 1i ) 2 . Taking together all dimensions i = 1, K we obtain 

‖r − s m ‖ 2 > ‖r − s 1 ‖ 2 and therefore as claimed ˆm = 1 is chosen by an optimum receiver if r i < 0 for all i = 1, K .


10.4 Block-orthogonal signaling 

10.4.1 Description 

✄ ✄❈ ❈ 

ϕ(t) 

✄ ✄❈ ❈ 

s 27 (t) 

0 τ t 0 8τ 16τ 24τ 32τ t 

∫ ∞ 

−∞ ϕ2 (t)dt = 1 

∫ ∞ 

−∞ s2 27 (t)dt = E s = T P s 

Figure 10.5: A block-orthogonal waveform for K = 5 and message 11010 or m = 27. 

Next we will consider block-orthogonal signaling. We again want to transmit K bits in T 

seconds. We do this by sending one out of 2 K orthogonal pulses every T seconds. If we use 

pulse-position modulation (PPM) the |M| = 2 K signals are 

s m (t) = √ E s ϕ(t − (m − 1)τ), for m = 1, 2 K , (10.16) 

where ϕ(t) has energy 1 and duration not more than τ with 

τ = T 

2 K . (10.17) 

All signals within the block [0, T ) are orthogonal and have energy E s . That is why we call our 

signaling method block-orthogonal. 

10.4.2 Probability of error considerations 

To determine the error probability P E for block-orthogonal signaling we assume that 

E b /N 0 = (1 + ɛ) 2 ln 2, for 0 ≤ ɛ ≤ 1, (10.18) 

i.e. we are willing to spend slightly more than N 0 ln 2 Joule per transmitted bit of information. 

Now we obtain from (8.12) that 

P E ≤ 2 exp(− log 2 |M|[ √ (1 + ɛ) 2 ln 2 − √ ln 2] 2 ) 

= 2 exp(− log 2 |M|ɛ 2 ln 2). (10.19)


Finally substitution of log 2 |M| = RT (see (10.3) yields 

P E ≤ 2 exp(−ɛ 2 RT ln 2). (10.20) 

What does (10.18) imply for the rate R? Note that E b = P s /R (see (10.5)). This results in 

E b /N 0 = P s /RN 0 = (1 + ɛ) 2 ln 2, (10.21) 

in other words 

1 P s 

R = 

(1 + ɛ) 2 N 0 ln 2 . (10.22) 

Based on (10.20) and (10.22) we can now investigate what we should do to improve the reliability 

of our system. This leads to the following result. 

THEOREM 10.1 For an available average power P s we can achieve rates R smaller than but 

arbitrarily close to 

P s 

 

C ∞ = 

bit/second (10.23) 

N 0 ln 2 

while the error probability P E can be made arbitrarily small by increasing T . Observe that C ∞ , 

the capacity, depends only on the available power P s and power spectral density N 0 /2 of the 

noise. 

This is a Shannon-type of result. The reliability can be increased not only by increasing the 

power P s or decreasing the rate R but also by increasing the “codeword-lengths” T . It is also 

important to note that only rates up to the capacity C ∞ can be achieved reliably. We will see 

later that rates larger than C ∞ are indeed not achievable. Therefore it is correct to use the term 

capacity here. 

One might get the feeling that this could finish our investigations. We have found the capacity 

of the waveform channel with additive white Gaussian noise. In the next chapter we will see that 

so far we have ignored an important property of signal sets: their dimensionality. 

10.5 Exercises 

1. Rewrite the bound in (8.12) on P E in terms of R, T and C ∞ . Consider 3 

E(R) = lim 

T →∞ − 1 T ln P E(T, R), (10.24) 

for the best possible signaling methods with rate not smaller than R having signals with 

duration T . The function E(R) of R is called the reliability function. 

Now compute a lower bound on the reliability function E(R) and draw a plot of this lower 

bound. It can be shown that E(R) is equal to the lower bound that we have just derived 

(see Gallager [7]). 

3 Just assume that the limit in this definition exists.

Chapter 11 

Time, bandwidth, and dimensionality 

SUMMARY: Here we first discuss the fact that, to obtain a small error probability P E , 

block-orthogonal signaling requires a huge number of dimensions per second. Then we 

show that a bandwidth W can only accommodate roughly 2W dimensions per second. 

Hence block-orthogonal signaling is only practical when the available bandwidth is very 

large. 

11.1 Number of dimensions for bit-by-bit and block-orthogonal 

signaling 

We again assume that in an interval of duration T we have to transmit K bits. 

Then for bit-by-bit signaling we need K = RT dimensions per block. For block-orthogonal 

signaling we need 2 K = 2 RT dimensions per block. 

Therefore for bit-by-bit signaling we need K/T = R dimensions per second. For blockorthogonal 

signaling we need 2 K /T = 2 RT /T dimensions per second. 

Note that for block-orthogonal signaling the number of dimensions per second explodes by 

increasing T . In the next section we will see that a channel with a finite bandwidth cannot 

accommodate all these dimensions. Increasing T is however necessary to improve the reliability 

of a block-orthogonal system hence finite bandwidth creates a problem. 

11.2 Dimensionality as a function of T 

The following theorem will not be proved. For more information on this subject check Wozencraft 

and Jacobs [25]. The Fourier transform is described shortly in appendix B. 

RESULT 11.1 (Dimensionality theorem) Let {φ i (t), for i = 1, N} denote any set of orthogonal 

waveforms such that for all i = 1, N, 

1. φ i (t) = 0 for t outside [− T 2 , T 2 ), 99

CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 100 

2. and also ∫ W 

∫ ∞ 

| i ( f )| 2 d f ≥ (1 − ηW 2 ) | i ( f )| 2 d f (11.1) 

−W 

for the spectrum i ( f ) of φ i (t). The parameter W is called bandwidth (in Hz). 

Then the dimensionality N of the set of orthogonal waveforms is upper-bounded by 

−∞ 

N ≤ 

2W T 

1 − ηW 

2 . (11.2) 

The theorem says that if we require almost all spectral energy of the waveforms to be in the 

frequency range [−W, W ], the number of waveforms cannot be much more than roughly 2W T . 

In other words the number of dimensions is not much more than 2W per second. 

The “definition” of bandwidth in the dimensionality theorem may seem somewhat arbitrary, 

but what is important is that the number of dimensions grows not faster than linear in T . 

Note that instead of [0, T ) as in previous chapters, we have restricted ourselves here to the 

interval [− T 2 , T 2 

) here. The reason for this is that the analysis is more easy this way. 

Next we want to show the converse statement, i.e. that the number of dimensions N can grow 

linearly with T , with a coefficient roughly equal to 2W . Therefore consider 2K + 1 orthogonal 

waveforms that are always zero except for − T 2 ≤ t < T 2 

. In that case the waveforms are defined 

as 

φ 0 (t) = 1 

φ c 1 (t) = √ 2 cos(2π t T ) and φs 1 (t) = √ 2 sin(2π t T ) 

φ c 2 (t) = √ 2 cos(4π t T ) and φs 2 (t) = √ 2 sin(4π t T ) 

· · · 

φK c (t) = √ 2 cos(2K π t T ) and φs K (t) = √ 2 sin(2K π t ). (11.3) 

T 

We now determine the spectra of all these waveforms. 

1. The spectrum of the first waveform φ 0 (t) is 

0 ( f ) = 

∫ T/2 

exp(− j2π f t)dt 

−T/2 

∫ T/2 

= −1 

j2π f 

= −1 

j2π f 

d exp(− j2π f t) 

−T/2 

(exp(− j2π f T 2 ) − exp( j2π f T ) 

2 ) 

= sin π f T 

π f 

. (11.4) 

Note that this is the so called sinc-function. In figure 11.1 the signal φ 0 (t) and the corresponding 

spectrum 0 ( f ) are shown for T = 1.


x0(t) 

X0(f) 

1 

1 

0.8 

0.8 

0.6 

0.6 

0.4 

0.4 

0.2 

0.2 

0 

0 

−0.2 

−1 −0.5 0 0.5 1 

−0.2 

−4 −2 0 2 4 

Figure 11.1: The signal φ 0 (t) and the corresponding spectrum 0 (t) for T = 1. 

2. The spectrum of the cosine-waveform φk c (t) for a specified k can be determined by observing 

that 

φk c (t) = φ 0(t) √ 2 cos 2π kt 

T 

= φ 0 (t) √ 2 1 ( 

exp( j2π kt 

) 

kt 

) + exp(− j2π 

2 T T ) 

(11.5) 

hence (see appendix B) 

c k ( f ) = 1 √ 

2 

( 

0 ( f − k T ) + 0( f + k T ) ) 

. (11.6) 

In figure 11.2 the signal φ5 c(t) and the corresponding spectrum c 5 

( f ) are shown again for 

T = 1. 

3. Similarly we can determine the spectrum of the sine-waveforms φk s (t) for specified k. 

We now want to find out how much energy is in certain frequency bands. E.g. MATLAB 

tells us that 

∫ ∞ 

( ) sin π f T 2 

d f = T and 

−∞ 

∫ 1/T 

π f 

( ) sin π f T 2 

d f = 0.9028T, (11.7) 

−1/T 

π f


1.5 

x5c(t) 

0.7 

X5c(f) 

1 

0.6 

0.5 

0.5 

0.4 

0 

0.3 

0.2 

−0.5 

0.1 

0 

−1 

−0.1 

−1.5 

−1 −0.5 0 0.5 1 

−0.2 

−5 0 5 

Figure 11.2: The signal φ5 c(t) and the corresponding spectrum c 5 

( f ) for T = 1. 

hence less than 10 % of the energy of the sinc-function is outside the frequency band [−1/T, 1/T ]. 

Numerical analysis shows that never more than 12 % of the energy of c k 

( f ) is outside 

the frequency band [− k+1 

T , k+1 

T 

] for k = 1, 2, · · · . Similarly we can show that the spectrum 

of the sine-waveform φk s (t) has never more than 5 % of its energy outside the frequency band 

[− k+1 

T , k+1 

T 

] for such k. For large k both percentages approach roughly 5 %. 

The frequency band needed to accommodate most of the energy of all 2K + 1 waveforms is 

[− K +1 

T , K +1 

T 

]. Now suppose that the available bandwidth is W . Then, to have not more than 12 

% out-of-band energy, K should satisfy 

W ≥ K + 1 . (11.8) 

T 

If, for a certain bandwidth W we take the largest K , the number N of orthogonal waveforms is 

N = 2K + 1 = 2⌊W T − 1⌋ + 1 > 2(W T − 2) + 1 = 2W T − 3. (11.9) 

RESULT 11.2 There are at least N = 2W T − 3 orthogonal waveforms over [− T 2 , T 2 

] such that 

less than 12 % of their spectral energy is outside the frequency band [−W, W ]. This implies that 

a fixed bandwidth W can accommodate 2W dimensions per second for large T . 

The dimensionality theorem shows that block-orthogonal signaling has a very unpleasant 

property that concerns bandwidth. This is demonstrated by the next example. 

Example 11.1 For block-orthogonal signaling the number of required dimensions in an interval of T 

seconds is 2 RT . If the channel bandwidth is W , the number of available dimensions is roughly 2W T , 

hence the bandwidth W should be such that W ≥ 2 RT /(2T ).


Now consider subsection 10.4.2 and take ɛ = 0.1. Then (see (10.22)) R = 100 

121 C ∞. To get an 

acceptable P E we have to take RT ≫ 100 (see (10.20)), hence 

W ≥ 2RT 

2T = 2RT 

2RT R ≫ 2100 

R. (11.10) 

200 

Even for very small values of the rate R, e,g. 1 bit per second, this causes a big problem. We can also 

conclude that only very small spectral efficiencies R/W can be realized 1 . 

11.3 Remark 

In this chapter we have studied building-block waveforms that are time-limited. As a consequence 

these waveforms have a spectrum which is not frequency-limited. We can also investigate 

waveforms that are frequency-limited. However this implies that these waveforms are not 

time-limited anymore. Chapter 14 on pulse modulation deals with such building blocks. 


1. Consider the Gaussian pulse x(t) = ( √ 2πσ 2 ) −1 exp(−t 2 /(2σ 2 )) and signals such as 

s m (t) = ∑ 

s mi x(t − iτ), for m = 1, 2, · · · , |M|, (11.11) 

i=1,N 

constructed from successive τ-second translates of x(t). Constrain the inter-pulse interference 

by requiring that 

∫ ∞ 

−∞ 

x(t − kτ)x(t − lτ)dt ≤ 0.05 

∫ ∞ 

−∞ 

x 2 (t)dt, for all k ̸= l, (11.12) 

and constrain the signal bandwidth W by requiring that x(t) have no more than 10 % of 

its energy outside the frequency interval [−W, W ]. Determine the largest possible value 

of the coefficient c in the equation N = cT W when N ≫ 1. 


1 Note however that e.g. for ɛ = 1 the required number of dimensions can be acceptable. However then the rate 

R is only one fourth of the capacity P s /(N 0 ln 2).

Chapter 12 

Bandlimited transmission 

SUMMARY: Here we determine the capacity of the bandlimited additive white Gaussian 

noise waveform channel. We approach this problem in a geometrical way. The key idea is 

random code generation (Shannon, 1948). 


In the previous chapter we have seen that for a waveform channel with bandwidth W , the number 

of available dimensions per second is roughly 2W . We have also seen that reliable blockorthogonal 

signaling requires, already for very modest rates, many more dimensions per second. 

The reason for this was that, for rates close to capacity, although the error probability decreases 

exponentially with T , the number of necessary dimensions increases exponentially with T . On 

the other hand bit-by-bit signaling requires as many dimensions as there are bits to be transmitted, 

which is acceptable from the bandwidth perspective. But bit-by-bit signaling can only made 

reliable by increasing the transmitter power P s or by decreasing the rate R. 

This raises the question whether we can achieve reliable transmission at certain rates R by 

increasing T , when both the bandwidth W and available power P s are fixed. The following 

sections show that this is indeed possible. We will describe the results obtained by Shannon 

[21]. His analysis has a strongly geometrical flavor. These early results may not be as strong as 

possible but they certainly are very surprising and give the right intuition. 

We will determine the capacity per dimension, not per unit of bandwidth since the definition 

of bandwidth is somewhat arbitrary. We will work with vectors as we learned in the previous 

part of these course notes. Note that all messages are equiprobable. 

Definition 12.1 The energies of all signals are upper-bounded by E s which is defined to be the 

number of dimensions N times E N , i.e. 

‖s m ‖ 2 = ∑ 

i=1,N 

E N is therefore the energy available per dimension. 

s 2 mi ≤ E s = N E N , for all m ∈ M. (12.1) 

104

CHAPTER 12. BANDLIMITED TRANSMISSION 105 

0 

❳❳ 

✘✘ ❳❳ 

✑✟✟ 

✏✘ ✘ 

❍ ◗ 

r ′ ❅❅ 

n ′ 

❅ ❏ 

✂✡ ✁ 

❆ ❆❈❈ 

✂ ✂✂✂✂✂✍ 

✄ ✄ 

✄ 

✂✂✁ 

 

❈ ❈ 

s ′ m 

❇ ❆ 

✁ ✁✄✄ 

✄✏✑ ✏✏✏✏✏✏✏✏✏✏✏✏✏✏✶ ✑✑✑✑✑✑✑✑✑✑✑✑✑✑✑✑✸ 

❅ 

✂ ❏ ✡ 

✁ 

❅❅◗ ❅ 

❍❍ 

❳❳ ✘✘✟✟ 

❳❳ ✘✘ ✏✑ 

❇ 

❈ 

✄ 

✂ 

Figure 12.1: Hardening of the noise sphere around a signal point. 

It has advantages to consider here normalized versions of the signal, noise and received vectors. 

These normalized versions are defined in the following way: 

Definition 12.2 The normalized version r ′ of r is defined by 

r ′ = r/ 

√ 

N. 

As usual N is the number of components in r or the number of dimensions. Consequently also 

R ′ √ 

= R/ N for the random variables R ′ and R that correspond to the realizations r ′ and r. 

Similar definitions hold for normalized signal vectors s ′ m , for m ∈ M, and for the normalized 

noise vector n ′ . 

12.2 Sphere hardening (noise vector) 

The random noise vector N has N components, each with mean 0 and variance σ 2 = N 0 /2. The 

normalized noise vector N ′ = N/ √ N. Its expected squared length is 

E[‖N ′ ‖ 2 ] = E[ 1 N 

∑ 

i=1,N 

N 2 

i ] = 1 N 

∑ 

i=1,N 

E[N 2 

i ] = 1 N 

∑ 

i=1,N 

For the variance of the squared length of N ′ we find (using σ 2 = N 0 /2) that 

var[‖N ′ ‖ 2 ] = 1 N 2 var[ ∑ 

= 1 N 2 ∑ 

i=1,N 

= 1 N 2 ∑ 

i=1,N 

i=1,N 

N 2 

i ] = 1 N 2 ∑ 

i=1,N 

( 

E[N 4 

i ] − (E[N 2 

i ])2) 

(3σ 4 − (σ 2 ) 2) = 2σ 4 

var[N 2 

i ] 

N 

= 2 N 

N 0 

2 = N 0 

2 . (12.2) 

( 

N0 

2 

) 2 

. (12.3)


The second equality in this derivation holds since var[ ∑ i=1,N X i] = ∑ i=1,N var[X i] for independent 

random variables X 1 , X 2 , · · · , X N . Moreover we have used that E[Ni 4] 

= 3σ 4 , see 

( ) 2 

exercise 1 at the end of this chapter. Finally observe that 2 N0 

N 2 approaches zero for N → ∞. 

Chebyshev’s inequality can be used to show that the probability of observing a normalized 

noise vector with squared length smaller than N 0 /2 − ɛ or larger than N 0 /2 + ɛ for any fixed 

ɛ > 0 approaches zero for N → ∞. More precisely 

( ) 

{∣ ∣∣∣ 

Pr ‖N ′ ‖ 2 − N } 

0 

2 ∣ > ɛ ≤ var[‖N ′ ‖ 2 2 N0 

] N 2 

ɛ 2 = 

ɛ 2 . (12.4) 

RESULT 12.1 We have demonstrated that the normalized received vector r ′ is (roughly speaking) 

on the surface of a hypersphere with radius √ N 0 /2 around the signal vector s m . Small 

fluctuations are possible, however for N → ∞ these fluctuations diminish. 

12.3 Sphere hardening (received vector) 

The received vector r = s m + n has N components. The normalized received vector r ′ = 

s ′ m + n′ where s ′ m = s m /√ N. The squared length of the normalized received vector can now be 

expressed as 

‖r ′ ‖ 2 = ‖s ′ m + n′ ‖ 2 = ‖s ′ m ‖2 + 2(s ′ m · n′ ) + ‖n ′ ‖ 2 

= ‖s ′ m ‖2 + 2 ∑ 

s mi n i + ‖n ′ ‖ 2 . (12.5) 

N 

i=1,N 

Note that the term ‖s ′ m ‖2 is ”non-random” for a given message m while the other two terms in 

(12.5) depend on N and are therefore random variables. For the expected squared length of the 

random normalized received vector R ′ we therefore obtain that 

E[‖R ′ ‖ 2 ] = ‖s ′ m ‖2 + 2 N 

= ‖s ′ m ‖2 + N 0 

2 

∑ 

i=1,N 

s mi E[N i ] + E[‖N ′ ‖ 2 ] 

(a) 

≤ E N + N 0 

2 . (12.6) 

Here (a) follows from the fact that ‖s ′ m ‖2 = ‖s m ‖ 2 /N ≤ E s /N = E N by definition 12.1. It can 

be shown that also the variance of the squared length of R ′ approaches zero for N → ∞. See 

also exercise 2 at the end of this chapter. 

RESULT 12.2 . We have demonstrated that the normalized received vector r ′ is roughly speaking 

within a sphere with radius √ E N + N 0 /2 and center at the origin of the coordinate system. 

For a normalized signal vector with energy equal to E N the received vectors are however on the 

surface of this hypersphere.


❳❳ 

✘✘ ❳❳ 

✑✟✟ 

✏✘ ✘ 

❍ ◗ 

 

❅❅ 

 

❅ ❏ 

✂✡ r 

✁ 

❆ ❆❈❈ ❇ 

ɛ ✄ ✄ 

✲ ✛ 

✄✚ ✚✚✚✚✚❃ ❈ 

✂ ✁ ✲ 

r − ɛ 

❈ ❈ 

❇ ❆ 

✁ ✁✄✄ 

✄ 

✂ 

❏ ❅ 

✡ 

❅❅◗ ❅ 

❍❍ 

❳❳ ✘✘✟✟ 

❳❳ ✘✘ ✏✑ 

Figure 12.2: A hypersphere with radius r − ɛ concentric within a hypersphere of radius r, the 

difference is a shell with thickness ɛ. 

12.4 The interior volume of a hypersphere 

Consider in N dimensions a hypersphere with radius r. The volume of this sphere is V r = B N r N 

where B N is some constant depending in N (see e.g. appendix 5D in Wozencraft and Jacobs 

[25]). E.g. B 2 = π and B 3 = 4π/3. 

Now also consider a hypersphere with the same center but with a slightly smaller radius 

r − ɛ for some ɛ > 0, see figure 12.2. The volume of this smaller hypersphere is now V r−ɛ = 

B N (r − ɛ) N . This leads to the following result. 

RESULT 12.3 The ratio of the volume of the ”interior” of a hypersphere in N dimensions and 

the hypersphere itself is 

V r−ɛ 

V r 

= ( r − ɛ ) N (12.7) 

r 

which approaches zero for N → ∞. Consequently for N → ∞ the shell volume V r − V r−ɛ 

constitutes the entire volume V r of the hypersphere, no matter how small the thickness ɛ > 0 of 

the shell is. 

12.5 An upper bound for the number |M| of signal vectors 

For reliable transmission, the (disjoint) decision regions I m , m ∈ M should not be essentially 

smaller than a hypersphere with radius (slightly larger than) √ N 0 /2. This follows from result 

12.1. Note that we are referring to normalized vectors. On the other hand we know that all 

received vectors are inside a hypersphere of radius (slightly larger than) √ E N + N 0 /2. This was 

a consequence of result 12.2. To obtain an upper bound on the number |M| of signal vectors that 

allow reliable communication, we can now apply a so-called sphere-packing argument. 

RESULT 12.4 For reliable transmission, the number of signals |M| can not be larger than the 

volume of the hypersphere containing the received vectors, divided by the volume of the noise


hyperspheres, hence 

|M| ≤ 

B N 

(E N + N 0 

B N 

( 

N0 

2 

2 

) N/2 

) N/2 

= 

( 

EN + N 0 

2 

N 0 

2 

) N/2 

. (12.8) 

Here B N is again the constant in the formula V r = B N r N for the volume V r of an N-dimensional 

hypersphere with radius r. 

We now have our upper bound on the number |M| of signal vectors. It should be noted that 

in the book of Wozencraft and Jacobs [25] a more detailed proof can be found. 

12.6 A lower bound to the number |M| of signal vectors 

Shannon [21] constructed an existence proof to show that there are signal sets that allow reliable 

communication and that contain surprisingly many signals. His argument involved random 

coding. 

12.6.1 Generating a random code 

Fix the number of signal vectors |M| and their length N. To obtain a signal set, choose |M| 

normalized signal vectors s ′ 1 , s′ 2 , · · · , s′ |M| 

at random, independently of each other, uniformly 

within a hypersphere centered at the origin having radius √ E N . Consider the ensemble of all 

signal sets that can be chosen in this way. 

12.6.2 Error probability 

We are now interested in PE av, 

i.e. the error probability P E averaged over the ensemble of signal 

sets, i.e. 

∫ 

= 

P av 

E 

s ′ 1 ,s′ 2 ,··· ,s′ |M| 

p(s ′ 1 , s′ 2 , · · · , s′ |M| )P E(s ′ 1 , s′ 2 , · · · , s′ |M| )d(s′ 1 , s′ 2 , · · · , s′ |M| ). (12.9) 

The averaging corresponds to the choice of the signal sets {s ′ 1 , s′ 2 , · · · , s′ |M| 

}. Once we know 

PE 

av we claim that there exists at least one signal set {s′ 1 , s′ 2 , · · · , s′ |M| 

} with error probability 

P E (s ′ 1 , s′ 2 , · · · , s′ |M| ) ≤ Pav E 

. Therefore we will first show that Pav 

E 

is small enough. 

Consider figure 12.3. Suppose that the signal s ′ m for some fixed m ∈ M was actually sent. 

The noise vector n ′ is added to the signal vector and hence the received vector turns out to be 

r ′ = s ′ m + n′ . Now an optimum receiver will decode the message m only if there are no other 

signals s 

k ′ , k ̸= m, inside a hypersphere with radius n′ around r ′ . This is a consequence of result 

4.1 that says that minimum Euclidean distance decoding is optimum if all messages are equally 

likely.


✘ 

✑ ✟ ✦✦✘✘ 

 

✡ 

✁ ✄ ✄✄☞☞ 

❈ 

❈❈▲▲ 

❆ ❆ 

❏ 

❅ 

❅❅◗❍ 

❍ ❛❛ 

❳❳❳ 

0 

❳❳ ❳ 

❛ ❛ ✑ ✏✘ ✘ 

❍❍ 

◗ 

❅❆ ❩ 

✂✡ ❅ ❩❩❩❩7 n ′ ❆❆ 

❅ 

✄ h ❆ ❏ 

❆ ✟✯ 

s ′ ❆❆ 

❆ 

m ❈ ✟ ▲ 

❇ C ▲ 

❏ 

❈ 

❈ 

✓✟ ✓✓✓✓✓✓✓✓✓✓✓✼❩ r ′ ❅❅◗❳ 

❈ 

✄ ✟✟✟✟✟✟✟✟✟✟✟✟ 

✂ ✁ 

❳ 

✄ 

✄ 

✄ 

☞ 

☞ 

✁ 

✡ 

 

 

 

✑ 

✟✟ 

✘✘✘✦ ✦ 

❳❳ 

◗ 

❅ 

❏ 

❇ 

❈ 

✄ 

✂ 

✡ 

 

✘✘ ✏✑ 

Figure 12.3: Random coding situation


• Note that result 12.3 implies that, with probability approaching one for N → ∞, the signal 

s ′ m was chosen on the surface of a sphere with radius √ E N . This is a consequence of the 

selection procedure of the signal vectors which is uniform over the hypersphere with radius 

√ 

EN . Therefore we may assume that ‖s ′ m ‖2 = E N . 

• By result 12.2 the normalized received vector r ′ is now, with probability approaching one, 

on the surface of a sphere with radius √ E N + N 0 /2 centered at the origin, and thus ‖r ′ ‖ 2 = 

E N + N 0 /2. 

• Moreover the normalized noise vector n ′ is on the surface of a sphere with radius √ N 0 /2 

hence ‖n ′ ‖ 2 = N 0 /2 by the sphere-hardening argument (see result 12.1). 

First we have to determine the probability that the signal corresponding to the fixed message 

k ̸= m is inside the sphere with center r ′ and radius ‖n ′ ‖. Therefore we need to know the volume 

of the “lens” in the figure 12.3. Determining this volume is not so easy but we know that it is not 

larger than the volume of a sphere with radius h, see figure 12.3, and center coinciding with the 

center C of the lens. This hypersphere contains the lens. Our next problem is therefore to find 

out how large h is. 

Observing the lengths of s ′ m , n′ , and r ′ , we may conclude that the angle between the normalized 

signal vector s ′ m and the normalized noise vector n′ is π/2 (Pythagoras). Now we can use 

simple geometry to determine the length h: 

√ 

√ 

N EN 0 

2 

h = √ , (12.10) 

E N + N 0 

2 

and consequently 

( ) 

N 

V lens ≤ B N h N EN 0 N/2 

2 

= B N 

E N + N . (12.11) 

0 

2 

The probability that a signal s k for a specific k ̸= m was selected inside the lens is (by the 

uniform distribution over the hypersphere with radius √ E N ) now given by the volume of the 

lens divided by the volume of the sphere. This ratio can be upper-bounded as follows 

( ) 

N 

V 0 N/2 

lens 

2 

B N E N/2 ≤ 

E 

N 

N + N . (12.12) 

0 

2 

By the union bound 1 the probability that any signal s k for k ̸= m is chosen inside the lens is at 

most |M| − 1 times as large as the ratio (probability) considered in (12.12). Therefore we obtain 

for the error probability averaged over the ensemble of signal sets 

P av 

E ≤ |M| ( 

N 0 

2 

E N + N 0 

2 

) N/2 

. (12.13) 

1 For the K events E 1 , E 2 , · · · , E K the probability Pr{E 1 ∪ E 2 ∪ · · · ∪ E K } ≤ Pr{E 1 } + Pr{E 2 } + · · · + Pr{E K }.


12.6.3 Achievability result 

RESULT 12.5 Fix any δ > 0. Suppose that the number of signals |M| as a function of the 

number of dimensions N is given by 

then 

( 

|M| = 2 −δN EN + N 0 

2 

N 0 

2 

) N/2 

, (12.14) 

lim 

N→∞ Pav E ≤ lim 

N→∞ 2−δN = 0. (12.15) 

This implies the existence of signal sets, one for each value of N, with |M| as in (12.14), for 

which lim N→∞ P E = 0. 

Note that relation (12.14) makes reliable transmission possible. Since better methods could 

exist, it serves as a lower bound on |M| when reliable transmission is required. 

12.7 Capacity of the bandlimited channel 

We can now combine results 12.4 and 12.5. Define 

( ) 

1 

C N = 

2 log EN + N 0 /2 

2 

, (12.16) 

N 0 /2 

then result 12.4 (upper bound) shows that reliable communication is only possible if the rate R N 

per dimension satisfies 

R N = log 2 |M| ≤ C N . (12.17) 

N 

On the other hand result 12.5 (lower bound) demonstrates the existence of signal sets that allow 

reliable communication with rates 

R N ≥ C N − δ, (12.18) 

for any value of δ > 0. Since we can take δ as small as we like we get the following theorem: 

THEOREM 12.6 (Shannon, 1949) Reliable communication is possible over a waveform channel, 

when the available energy per dimension is E N and the power spectral density of the noise 

is N 0 

2 , for all rates R N satisfying 

R N < C N bit/dimension. (12.19) 

Rates R N larger than C N are not realizable with reliable methods. The quantity C N is called the 

capacity of the waveform channel in bit per dimension.


Example 12.1 In figure 12.4 we have depicted the behavior of randomly generated codes. In a MATLAB 

program we have generated codes (signal sets) with R N = 1 bit/dimension and vector-lengths N = 

4, 8, 12, and 16. Note that reliable transmission of R N = 1-codes is only possible for signal-to-noise 

ratios E N / N 0 

≥ 3, or 4.7 dB. These codes are therefore tested with E 

2 N / N 0 

between 3 and 11 dB. The 

2 

resulting error probabilities for vector-lengths 4,8,12, and 16 are listed in the figure. For each experiment 

we have generated 32 codes, and each code was tested with 256 randomly-generated messages. The 

resulting number of errors was divided by 8192 to give an estimate of the error probability. The figure 

10 0 

10 −1 

10 −2 

10 −3 

10 −4 

3 4 5 6 7 8 9 10 11 

Figure 12.4: Error probability (vertically) as a function of E N / N 0 

2 

(∗), 8 (◦), 12 (△), and 16 (✷) . 

for codes with length N = 4 

clearly shows that increasing N results in decreasing error-probabilities for E N / N 0 

≥ 5 dB. Simulations 

2 

for N ≥ 20 turned out to be unfeasible in MATLAB. 

In the previous chapter we have seen that a waveform channel with bandwidth W can accommodate 

(roughly) at most 2W dimensions per second. With 2W dimensions per second, the 

available energy per dimension is E N = P s /(2W ) if P s is the available transmitterpower. In that 

case the channel capacity in bit per dimension is 

C N = 1 2 log 2 

( 

Ps 

Therefore we obtain the following result: 

2W + N 0/2 

N 0 /2 

) 

= 1 2 log 2 

( 

1 + P s 

W N 0 

) 

. (12.20) 

THEOREM 12.7 For a waveform channel with spectral noise density N 0 /2, frequency bandwidth 

W , and available transmitter power P s , the capacity in bit per second is 

( 

C = 2WC N = W log 2 1 + P ) 

s 

bit/second. (12.21) 

W N 0


Thus reliable communication is possible for rates R in bit per second smaller than C, while rates 

larger than C are not realizable with arbitrary small P E . 


1. Let N be a random variable with density p N (n) = √ 1 

n2 

exp(− ). Show that E[N 4 ] = 

2πσ 2 2σ 2 

3σ 4 . 

2. Show that the variance of the squared length of the normalized received vector R ′ approaches 

zero for N → ∞. 

m 

✲ 

transmitter 

n w,a (t) 

✛❄ 

✘ 

✲ + 

✚✙ 

n w,b (t) 

✛❄ 

✘ 

✲ + 

✚✙ 

✲ 

✲ 

receiver 

ˆm 

✲ 

Figure 12.5: Water-filling situation 

3. A transmitter having power P s = 10 Watt is connected to a receiver by two waveform 

channels (see figure 12.5). Each of these channels adds white Gaussian noise to its input 

waveform. The power densities of the two noise processes are N0 a/2 = 1 and N 0 b/2 = 2 

for −∞ < f < ∞, while the frequency bandwidth W = 1Hz. What is the total capacity 

of this parallel channel in bit per second? What happens if P s = 1 Watt? The power 

allocation procedure that should be used here is called “water-filling”.

Chapter 13 

Wideband transmission 

SUMMARY: We determine the capacity of the additive white Gaussian noise waveform 

channel in the case of unlimited bandwidth resources. The result that is obtained in this 

way shows that block-orthogonal signaling is optimal. 


We have seen that with block-orthogonal signaling we can achieve reliable communication at 

rates 

R < 

P s 

bit/second, (13.1) 

N 0 ln 2 

if we only had access to as much bandwidth as we would want. We will prove in this chapter 

that rates larger than P s /(N 0 ln 2) are not realizable with reliable signaling schemes. Therefore 

we may indeed call P s /(N 0 ln 2) the capacity of the wideband waveform channel. 

13.2 Capacity of the wideband channel 

In the previous chapter the capacity of a waveform channel with frequency bandwidth limited to 

W was shown to be 

( 

W log 2 1 + P ) 

s 

bit/second, (13.2) 

W N 0 

see theorem 12.7. We can easily determine the wideband capacity by letting W → ∞. 

RESULT 13.1 The capacity C ∞ of the wideband waveform channel with additive white Gaussian 

noise with power density N 0 /2 for −∞ < f < ∞, when the transmitter power is P s , follows 

114

CHAPTER 13. WIDEBAND TRANSMISSION 115 

2.5 

ln( 1 + S/N ), S/N, and ln( 1 + S/N )/ S/N 

2 

1.5 

1 

0.5 

0 

0 0.5 1 1.5 2 2.5 

S/N 

Figure 13.1: The ratio ln(1 + S/N )/S/N and ln(1 + S/N ) as function of the S/N . 

from: 

( 

C ∞ = lim W log 2 

W →∞ 

= lim 

W →∞ 

P ln s 

N 0 ln 2 

1 + P ) 

s 

W N 

( 0 

) 

1 + P s 

W N 0 

= P s 

P s N 

W N 0 ln 2 . (13.3) 

0 

Expressed in nats per second this capacity is equal to P s /N 0 . 

13.3 Relation between capacities, signal-to-noise ratio 

Note that we can express the capacity of the bandlimited channel as 

( 

C = W log 2 1 + P ) 

s 

W N 

( 0 

) 

1 + P s 

W N 0 

= 

P ln s 

N 0 ln 2 

ln(1 + S/N ) 

= C 

P ∞ s 

S/N 

W N 0 

, with S/N = P s 

W N 0 

. (13.4) 

Here S/N is the signal-to-noise ratio of the waveform channel. Note that the total noise power 

is equal to the frequency band 2W times the power spectral density N 0 /2 of the noise. We have 

plotted the ratio ln(1 + S/N )/S/N in figure 13.1.


RESULT 13.2 If we note that C = W log 2 (1 + S/N ) then we can can distinguish two cases. 

C ≈ 

{ 

C∞ if S/N ≪ 1, 

W log 2 (S/N ) if S/N ≫ 1. 

(13.5) 

The case S/N ≪ 1 is called the power-limited case. There is enough bandwidth. When on the 

other hand S/N ≫ 1 we speak about bandwidth-limited channels. Increasing the power by a 

factor of two does not double the capacity but increases it only by one bit per second per Hz. 

Finally we give some other ways to express the signal-to-noise ratio (under the assumption 

that the number of dimensions per second is exactly 2W ): 


S/N = 

P s 

W N 0 

= 

E N 

N 0 /2 = 2R E b 

N = R N 0 W 

E b 

N 0 

. (13.6) 

1. To obtain result 12.7 we have assumed that the number of dimensions that a channel with 

bandwidth W can accommodate per second is 2W . What would happen with the wideband 

capacity if instead the channel would give us αW dimensions per second? 

2. Find out whether a telephone line channel (W = 3400Hz, C = 34000 bit/sec) is power- or 

bandwidth-limited by determining its S/N . Also determine E b /N 0 and R N . 

3. For a fixed ratio E b /N 0 , for some values of R/W reliable transmission is possible, for 

other values of R/W this is not possible. Therefore the E b /N 0 × R/W - plane can be 

divided in a region in which reliable transmission is possible and a region where this is 

not possible. Find out what function of R/W describes the boundary between these two 

regions. 

(The ratio R/W is sometimes called the bandwidth efficiency. Note that R/W = 2R N 

where R N is the rate per dimension. Hence a similar partitioning is possible for the 

E b /N 0 × R N plane.) 

4. A receiver is connected to a transmitter by two channels, a bandlimited channel and a 

wide-band channel. The transmitter’s signalling power P s = 90 Watt. The transmitter can 

use all its power to communicate only via the bandlimited channel, only via the wide-band 

channel, or it may distribute its power over the bandlimited channel and the wide-band 

channel. 

(a) The bandlimited channel has bandwidth W = 200 Hz. The power spectral density of 

the noise in this channel N 

0 ′ /2 = 0.015 (Watt/Hz). Assume that the transmitter uses 

all of its power to communicate over the bandlimited channel. First determine the 

capacity C 

N ′ of this channel in bits per dimension. Then determine the capacity C′ of 

the bandlimited channel in bits per second.


transm. 

P s = 90 

✲ 

W = 200 

−W 

1 

W 

☛❄ 

✟ 

✲ ✲ 

✡+ ✠ receiv. 

☛ ✟ 

✲ 

✡+ 

✠ 

✻ 

N 

0 ′ /2 = 0.015 

✲ 

N 

0 ′′ /2 = 0.06 

Figure 13.2: Transmitter, two parallel channels (bandlimited and wide-band channel), and receiver. 

(b) The spectral density of the noise in the wide-band channel N 

0 ′′ /2 = 0.06 (Watt/Hz). 

Assume now that the transmitter only communicates via the wide-band channel to 

the receiver. What is the capacity C ′′ of this wide-band channel in bits per second. 

(c) How should the transmitter split up its power over the bandlimited channel and the 

wide-band channel to achieve the maximum total capacity in bits per second? How 

large is this maximum total capacity C tot ? 

(Exam Communication Theory, January 12, 2005.)

Part IV 

Channel Models 

118

Chapter 14 

Pulse amplitude modulation 

SUMMARY: In this chapter we discuss serial pulse amplitude modulation. This method 

provides us with a new channel dimension every T seconds if we choose the pulse according 

to the so-called Nyquist criterion. The Nyquist criterion implies that the required bandwidth 

W is larger than 1/(2T ). Multi-pulse transmission is also considered in this chapter. 

Just as in the single-pulse case a bandwidth of 1/(2T ) Hz per used pulse is required. This 

corresponds again to 2W = 1/T dimensions per second. 


impulses 

a 0 , a 1 , · · · , a K −1 

✲ 

p(t) 

✁ ✁✁ 

0 

❅ p(t) 

❅ ❅ t 

T 2T 

s a (t) 

n w (t) 

✗❄ 

✔ 

✲ + 

✖ ✕ 

r(t) 

✲ 

+1 

✻ 

0 

T 

❄ 

−1 

+1 

✻ 

2T 

+1 

✻ 

3T 

t 

✁✁ ✁ ❇ ❇ ✂ ❅ ❅ s ❅❅ a (t) 

❅ ❇ 0 T ❇ 2T 

✂ ✂ 3T 4T 5T 

❅❅✂ 

t 

Figure 14.1: A serial pulse-amplitude modulation system. 

In chapter 11 we could observe that a channel with a bandwidth of W Hz can accommodate 

roughly 2W T dimensions each T seconds (for large enough T ). We considered there buildingblock 

waveforms of finite duration, therefore the bandwidth constraint could only be met approx- 

119

CHAPTER 14. PULSE AMPLITUDE MODULATION 120 

imately. Here we will investigate whether it is possible to get a new dimension every 1/(2W ) 

seconds. We allow the building blocks to have a non-finite duration. Moreover all these buildingblock 

waveforms are time shifts of a ”pulse”. The subject of this chapter is therefore serial pulse 

transmission. 

Consider figure 14.1. We there assume that the transmitter sends a signal s(t) that consists 

of amplitude-modulated time shifts of a pulse p(t) by an integer multiple k of the so-called 

modulation interval T , hence 

s a (t) = 

∑ 

a k p(t − kT ). (14.1) 

k=0,K −1 

The vector of amplitudes a = (a 0 , a 1 , · · · , a K −1 ) consists of symbols a k , k = 0, K − 1 taking 

values in the alphabet A. We call this modulation method serial pulse-amplitude modulation 

(PAM). 

14.2 Orthonormal pulses: the Nyquist criterion 

14.2.1 The Nyquist result 

If we have the possibility to choose the pulse p(t) ourselves we can choose it in such a way that 

all time shifts of the pulse form an orthonormal base. In that case the pulse p(t) has to satisfy 

∫ ∞ 

{ 

p(t − kT )p(t − k ′ 1 if k = k 

T )dt = 

′ , 

0 if k ̸= k ′ (14.2) 

, 

−∞ 

for integer k and k ′ . This is equivalent to 

∫ ∞ 

−∞ 

p(τ)p(τ − kT )dτ = p(t) ∗ p(−t)| t=kT = h(kT ) = 

{ 1 if k = 0, 

0 if k ̸= 0, 

(14.3) 

where h(t) = p(t) ∗ p(−t). This time-domain restriction on the pulse p(t) is called the zeroforcing 

(ZF) criterion. Later we will see why. 

THEOREM 14.1 The frequency-domain equivalent to (14.3) which is known as the Nyquist 

criterion for zero intersymbol interference is 

Z( f ) = 1 ∑ 

H( f + m/T ) 

T 

= 1 T 

m=··· ,−1,0,1,2,··· 

∑ 

m=··· ,−1,0,1,2,··· 

‖P( f + m/T )‖ 2 = 1 for all f , 

where H( f ) = P( f )P ∗ ( f ) = ‖P( f )‖ 2 is the Fourier transform of h(t) = p(t) ∗ p(−t). Note 

that ‖P( f )‖ is the modulus of P( f ). Moreover Z( f ) is called the 1/T -aliased spectrum of 

H( f ). 

Later in this section we will give the proof of this theorem. First we will discuss it however 

and consider an important consequence of it.


H( f ) 

Z( f ) 

✁ ✁ ❆ ❆ ✁ ✁ ❆ ❆ ✁ ✁ ❆ ❆ ✁ ✁ ❆ ❆ ✁ ✁ ❆ ❆ 

❆❆ ✁ ✁✁ ❆ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ❆ 

1 1 3 

− 1 

1 f 

− 3 − 1 T − 1 

2T 2T 

2T 2T 2T T 2T 

✁ f 

Figure 14.2: The spectrum H( f ) = ‖P( f )‖ 2 corresponding to a pulse p(t) that does not satisfy 

the Nyquist criterion. 

H( f ) 

T 1 

Z( f ) 

− 1 

2T 

1 

2T 

f 

− 3 

2T 

− 1 T − 1 

2T 

1 

2T 

1 

T 

3 

2T 

f 

Figure 14.3: The ideally bandlimited spectrum H( f ) = ‖P( f )‖ 2 . Note that P( f ) is a spectrum 

with the smallest possible bandwidth satisfying the Nyquist criterion. 

14.2.2 Discussion 

• Since p(t) is a real signal, the real part of its spectrum P( f ) is even in f , and the imaginary 

part of this spectrum is odd. Therefore the modulus ‖P( f )‖ of P( f ) is an even function 

of the frequency f . 

• If the bandwidth W of the pulse p(t) is strictly smaller than 1/(2T ) (see figure 14.2), then 

the Nyquist criterion, which is based on H( f ) = ‖P( f )‖ 2 , can not be satisfied. Thus 

no pulse p(t) that satisfies a bandwidth-W constraint, can lead to orthogonal signaling if 

W < 1/(2T ). 

• The smallest possible bandwidth W of a pulse that satisfies the Nyquist criterion is 1/(2T ). 

The ”basic” pulse with bandwidth 1/(2T ) for which the Nyquist criterion holds has a socalled 

ideally bandlimited spectrum, which is given by 

P( f ) = 

{ √ 

T if | f | < 1/(2T ), 

0 if | f | > 1/(2T ), 

(14.4) 

and √ T /2 for | f | = 1/(2T ). The corresponding H( f ) = ‖P( f )‖ 2 and 1/T -aliased 

spectrum Z( f ) can be found in figure 14.3. 

The basic pulse p(t) that corresponds to the ideally bandlimited spectrum is shown in 

figure 14.4. It is the well-known sinc-pulse 

p(t) = √ 1 sin(πt/T ) 

. (14.5) 

T πt/T


1 

0.8 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

−5 −4 −3 −2 −1 0 1 2 3 4 5 

Figure 14.4: The sinc-pulse that corresponds to T = 1, i.e. p(t) = sin(πt)/(πt). 

H( f ) Z( f ) 

T 

1 

❅ ❅ 

❅ ❅ 

❅ ❅ 

❅ 

❅ ❅ ❅ ❅ ❅ 

❅ ❅ ❅ 

❅ ❅ ❅ 

1 f 

− 1 

2T 2T 

− 3 

2T 

− 1 − 1 

T 2T 

1 

2T 

1 

T 

3 

2T 

❅ ❅ 

❅ 

f 

Figure 14.5: A spectrum H( f ) with a positive excess bandwidth satisfying the Nyquist criterion. 

• A pulse with a bandwidth larger than 1/(2T ) can also satisfy the Nyquist criterion as can 

be seen in figure 14.5. Note that the so-called excess bandwidth, i.e. the bandwidth minus 

1/(2T ), can be larger than 1/(2T ). Square-root raised-cosine pulses p(t) have a spectrum 

H( f ) = |P( f )| 2 that satisfies the Nyquist criterion and an excess bandwidth that can be 

controlled. 

At he end of this subsection we state the implication of the previous discussion. 

RESULT 14.2 The smallest possible bandwidth W of a pulse that satisfies the Nyquist criterion 

is W = 1 

1 

2T 

. The sinc-pulse p(t) = √ 

sin(πt/T ) 

T πt/T 

has this property. Note that this way of serial 

pulse transmission leads to exactly 1 T 

= 2W dimensions per second. 

Moreover observe that unlike before in chapter 11, our pulses and signals are not timelimited! 

14.2.3 Proof of the Nyquist result 

We will next give the proof of result 14.1, the Nyquist result. 

Proof: Since H( f ) is the Fourier transform of h(t), the condition (14.3) can be rewritten as 

h(kT ) = 

∫ ∞ 

−∞ 

H( f ) exp( j2π f kT )d f = 

{ 1 if k = 0, 

0 if k ̸= 0. 

(14.6)


We now break up the integral in parts, a part for each integer m, and obtain 

h(kT ) = 

= 

= 

= 

∞∑ 

∫ (2m+1)/2T 

m=−∞ (2m−1)/2T 

∫ 1/2T 

∞∑ 

m=−∞ 

∫ 1/2T 

−1/2T 

∫ 1/2T 

−1/2T 

−1/2T 

[ ∞ 

∑ 

m=−∞ 

H( f ) exp( j2π f kT )d f 

H( f + m ) exp( j2π f kT )d f 

T 

H( f + m T ) ] 

exp( j2π f kT )d f 

Z( f ) exp( j2π f kT )d f (14.7) 

where we have defined Z( f ) by 

Z( f ) = 

∞∑ 

m=−∞ 

H( f + m ). (14.8) 

T 

Observe that Z( f ) is a periodic function in f with period 1/T . Therefore it can be expanded in 

terms of its Fourier series coefficients · · · , z −1 , z 0 , z 1 , z 2 , · · · as 

where 

z k = T 

Z( f ) = 

∫ 1/2T 

−1/2T 

∞∑ 

k=−∞ 

If we now combine (14.7) and (14.10) we obtain that 

z k exp( j2πk f T ) (14.9) 

Z( f ) exp(− j2π f kT )d f. (14.10) 

T h(−kT ) = z k , (14.11) 

for all integers k. Condition (14.3) now tells us that only z 0 = T and all other z k are zero. This 

implies that 

Z( f ) = T, (14.12) 

or equivalently 

∞∑ 

m=−∞ 

H( f + m ) = T. (14.13) 

T 

✷


r(t) 

✲ 

p(T p − t) 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

r k 

✲ 

❄ 

sample at t = T p + kT 

Figure 14.6: The optimum receiver front-end for detection of serially transmitted orthonormal 

pulses. 

14.2.4 Receiver implementation 

If we use serial PAM with orthonormal pulses p(t − kT ), k = 0, K − 1, then we can use a 

single matched-filter m(t) = p(T p − t) for the computations of the correlations of the received 

waveform r(t) with all building-block waveforms, i.e. with all pulses. Assume that the delay T p 

is chosen in such a way that m(t) is (effectively 1 ) causal. The output of this filter m(t) when r(t) 

is the input signal is 

u(t) = 

∫ ∞ 

−∞ 

r(α)m(t − α)dα = 

At time t = T p + kT we see at the filter output 

r k = u(T p + kT ) = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(α)p(T p − t + α)dα. (14.14) 

r(α)p(α − kT )dα, (14.15) 

which is what the optimum receiver should determine, i.e. the correlation of r(t) with the pulse 

p(t − kT ). This leads to the very simple receiver structure shown in figure 14.6. Processing the 

samples r k , k = 1, K, should be done in the usual way. 

When there is no noise r(t) = ∑ k=0,K −1 a k p(t − kT ). Then at time t = T p + kT we see at 

the filter output 

r k = u(T p + kT ) = 

= 

= 

∫ ∞ 

−∞ 

∫ ∞ 

r(α)p(α − kT )dα 

∑ 

−∞ 

k ′ =0,K −1 

∑ 

∫ ∞ 

a k ′ 

k ′ =0,K −1 

−∞ 

a k ′ p(α − k ′ T )p(α − kT )dα 

p(α − k ′ T )p(α − kT )dα = a k , (14.16) 

by the orthonormality of the pulses. Conclusion is that there is no intersymbol interference 

present in the samples. In other words the Nyquist criterion forces the intersymbol interference 

to be zero (zero-forcing (ZF)). 

1 Note that p(t) has a non-finite duration in general.


14.3 Multi-pulse transmission 

Instead of using shifted versions of a single pulse we can apply the shifts of a set of orthonormal 

pulses. Consider J pulses p 1 (t), p 2 (t), · · · , p J (t) which are orthonormal. The shifted versions 

of these pulses can be used to convey a stream of messages. Again the shifts are over multiples 

of T and T is called the modulation interval again. Thus a signal s(t) can be written as 

s(t) = 

∑ ∑ 

a kj p j (t − kT ), (14.17) 

k=0,K −1 j=1,J 

for a kj ∈ A. If we want all pulses and its time-shifts to form an orthonormal basis, then the 

time-shifts of all pulses should be orthogonal to the pulses p j (t), j = 1, J. In other words, we 

require the J pulses to satisfy 

∫ ∞ 

{ 

p j (t − kT )p j ′(t − k ′ 1 if j = j 

T )dt = 

′ and k = k ′ , 

(14.18) 

0 elsewhere, 

−∞ 

for all j = 1, J, j ′ = 1, J, and all integer k and k ′ . A set of pulses p j (t) for j = 1, 2, · · · , J 

that satisfies this restriction is 

p j (t) = √ 1 sin(πt/(2T )) 

cos ((2 j − 1)πt/(2T )) . (14.19) 

T πt/(2T ) 

For J = 4, and assuming that T = 1, these pulses and their spectra are shown in figure 14.7. 

Observe that in this example the total bandwidth of the J pulses is J/(2T ). 

RESULT 14.3 The smallest possible bandwidth that is occupied by orthogonal multi-pulse signaling 

with J pulses each T seconds is J/(2T ). Note therefore that also bandwidth-efficient 

serial multi-pulse transmission leads to exactly 2W dimensions per second. 

Proof: (Lee and Messerschmitt [13], p. 266) Just like in the proof of theorem 14.1 we can show 

that the pulse-spectra P j ( f ) for j = 1, J must satisfy 

1 

∞∑ 

P j ( f + m T 

T )P∗ j ′( f + m T ) = δ j j ′. (14.20) 

m=−∞ 

Now fix a frequency f ∈ [−1/(2T ), +1/(2T )) and define for each j = 1, J the vector 

P j 

= (· · · , Pj ( f − 1/T ), P j ( f ), P j ( f + 1/T ), P j ( f + 2/T ), · · · ). (14.21) 

Here P j ( f +m/T ) is the component at position m. With this definition condition (14.20) implies 

that the vectors P j , j = 1, J are orthogonal. Assume for a moment that there are less than J 

positions m for which at least one component P j ( f + m/T ) for some j = 1, J is non-zero. This 

would imply that the J vectors P j , j = 1, J would be dependent. Contradiction! 

Thus for each frequency interval d f ∈ [−1/(2T ), 1/(2T )) we may conclude that the J pulses 

”fill” at least J disjoint intervals of size d f of the frequency spectrum. In total the J pulses fill a 

part J/T of the spectrum. The minimally occupied bandwidth is therefore J/(2T ). ✷ 

Note that the optimum receiver for multi-pulse transmission can be implemented with J 

matched filters that are sampled each T seconds.


1 

0 

−1 

−3 −2 −1 0 1 2 3 

1 

1 

0.5 

0 

−3 −2 −1 0 1 2 3 

1 

0 

−1 

−3 −2 −1 0 1 2 3 

1 

0 

−1 

−3 −2 −1 0 1 2 3 

1 

0.5 

0 

−3 −2 −1 0 1 2 3 

1 

0.5 

0 

−3 −2 −1 0 1 2 3 

1 

0 

−1 

−3 −2 −1 0 1 2 3 

0.5 

0 

−3 −2 −1 0 1 2 3 

Figure 14.7: Time domain and frequency domain plots of the pulses in (14.19) for j = 1, 2, 3, 4 

and T = 1. 


1. Determine the spectra of the pulses p j (t) for j = 1, 2, · · · , J defined in (14.19). Show 

that these pulses satisfy (14.18). Determine the total bandwidth of these J pulses.

Chapter 15 

Bandpass channels 

SUMMARY: Here we consider transmission over an ideal bandpass channel with bandwidth 

2W . We show that building-block waveforms for transmission over a bandpass channel 

can be constructed from baseband building-block waveforms having a bandwidth not 

larger than W . A first set of orthonormal baseband building-block waveforms can be modulated 

on a cosine carrier, a second orthonormal set on a sine carrier, and the resulting 

bandpass waveforms are all orthogonal to each other. This technique is called quadrature 

multiplexing. We determine the optimum receiver for this signaling method and the capacity 

of the bandpass channel. We finally discuss quadrature amplitude modulation (QAM). 


So far we have only considered baseband communication. In baseband communication the channel 

allows signaling only in the frequency band [−W, W ]. But what should we do if the channel 

only accepts signals with a spectrum in the frequency band ±[ f 0 − W, f 0 + W ]? In the present 

chapter we will describe a method that can be used for signaling over such a channel. We start 

by discussing this so-called bandpass channel. First we give a definition of the ideal bandpass 

channel. 

Definition 15.1 Consider figure 15.1. The input signal s(t) to the bandpass channel is first sent 

through a filter W 0 ( f ). This filter W 0 ( f ) is an ideal bandpass filter with bandwidth 2W and 

s(t) 

✲ W 0 ( f ) 

n w (t) 

✛❄ 

✘r(t) 

✲ + ✲ 

✚✙ 

W 0 ( f ) 

f 

1 0 − W f 0 + W 

− f 0 f 0 f 

Figure 15.1: Ideal bandpass channel with additive white Gaussian noise. 

127

CHAPTER 15. BANDPASS CHANNELS 128 

center frequency f 0 > W . It is defined by the transfer function 

{ 

W 0 ( f ) = 

1 if f0 − W < | f | < f 0 + W , 

0 elsewhere. 

(15.1) 

The noise process N w (t) is assumed to be stationary, Gaussian, zero-mean, and white. The noise 

has power spectral density function 

S Nw ( f ) = N 0 

2 

for −∞ < f < ∞, (15.2) 

and autocorrelation function R Nw (t, s) = E[N w (t)N w (s)] = N 0 

2 

δ(t − s). The noise-waveform 

n w (t) is added to the output of the bandpass filter W 0 ( f ) which results in r(t), the output of the 

bandpass channel. 

In the next section we will describe a signaling method for a bandpass channel. This method 

is called quadrature multiplexing. 

15.2 Quadrature multiplexing 

In quadrature multiplexing we combine two waveforms with a baseband spectrum into a waveform 

with a spectrum that matches with the bandpass channel. We assume that the two baseband 

waveforms together contain a single message. 

Consider a first set of baseband waveforms {s1 c(t), sc 2 (t), · · · , sc |M| (t)} with waveform sc m (t) 

for all m ∈ {1, 2, · · · , |M|}, having bandwidth smaller than W , i.e. with a spectrum Sm c ( f ) ≡ 0 

for | f | ≥ W . To this set of baseband-waveforms there corresponds a set of building-block waveforms 

{φ i (t), i = 1, 2, · · · , N c }. For all i = {1, 2, · · · , N c } the spectrum of the corresponding 

building-block waveform i ( f ) ≡ 0 for | f | ≥ W since the building-block waveforms are linear 

combinations of the signal waveforms (Gram-Schmidt procedure, see appendix E). 

Also consider a second set of baseband waveforms {s1 s(t), ss 2 (t), · · · , ss |M| 

(t)} with waveform 

sm s (t) for all m ∈ {1, 2, · · · , |M|} having bandwidth smaller than W , i.e. Ss m ( f ) ≡ 0 

for | f | ≥ W . To this second set of waveforms there corresponds a set of building-block waveforms 

{ψ j (t), j = 1, 2, · · · , N s }. For all j = {1, 2, · · · , N s } the spectrum of the corresponding 

building-block waveform j ( f ) ≡ 0 for | f | ≥ W . 

Now, to obtain a passband signal, the two waveforms sm c (t) and ss m (t) are combined into a 

signal s m (t) by modulating them on a cosine resp. sine wave with frequency f 0 and then adding 

the resulting signals together (see figure 15.2), thus 

s m (t) = s c m (t)√ 2 cos(2π f 0 t) + s s m (t)√ 2 sin(2π f 0 t). (15.3) 

We now make (15.3) more precise. Note that to each message m ∈ M there corresponds a 

waveform s c m (t) = ∑ i=1,N c 

s c mi φ i(t) and a waveform s s m (t) = ∑ j=1,N s 

s s mj ψ j(t). Therefore we


√ 

2 cos 2π f0 t 

s c (t) 

s s (t) 

✛❄ 

✘ 

✲ × 

✚✙ 

✛❄ 

✘ 

+ 

✚✙ 

✛✘ 

✻ 

✲ × 

✚✙ 

✻ 

√ 

2 sin 2π f0 t 

s(t) 

✲ 

Figure 15.2: Quadrature multiplexing. 

write: 

s m (t) = sm c (t)√ 2 cos(2π f 0 t) + sm s (t)√ 2 sin(2π f 0 t) 

= ∑ 

smi c φ i(t) √ 2 cos(2π f 0 t) + 

∑ 

smj s ψ j(t) √ 2 sin(2π f 0 t). (15.4) 

i=1,N c j=1,N s 

This means that the signal s m (t) can be regarded as a linear combination of “building-block” 

waveforms φ i (t) √ 2 cos(2π f 0 t) and ψ j (t) √ 2 sin(2π f 0 t). To see whether these waveforms are 

actually building blocks we have to check whether they form an orthonormal set. 

To this end consider the Fourier transforms of the building block waveforms 1 φ i (t) and ψ j (t): 

i ( f ) = 

j ( f ) = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

φ i (t) exp(−2π f t)dt, 

ψ j (t) exp(−2π f t)dt. (15.5) 

Since both sets {φ i (t), i = 1, N c } and {ψ i (t), j = 1, N s } form an orthonormal base, using the 

Parseval relation, we have for all i and j and all i ′ and j ′ that 

δ i,i ′ = 

δ j, j ′ = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

φ i (t)φ i ′(t)dt = 

ψ j (t)ψ j ′(t)dt = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

i ( f )i ∗ ′( f )d f, 

j ( f ) ∗ j ′( f )d f. (15.6) 

1 Note that we do not assume here that the baseband building blocks are limited in time as in the chapters on 

waveform communication.


i ( f ) 

c,i ( f ) 

❅ f 0 − W f 0 + W 

❅ 

❅ 

✚ ✚✚❩ ❩ ✚ ✚❩ ❩ ❩ ✚ ❩ 

−W 0 W f 

0 

f 

− f 0 

f 0 

j ( f ) 

j s, j ( f ) 

❅ 

❅❅ ❩ ❩❩✚ ✚ ✚ 

− f 0 

−W 

0 

W 

f 

0 

f 

✚ ✚✚❩ ❩ ❩ 

f 0 

Figure 15.3: Spectra after modulation with √ 2 cos 2π f 0 t and √ 2 sin 2π f 0 t. For simplicity it is 

assumed that the baseband spectra are real. 

Now we are ready to determine the spectrum c,i ( f ) of a cosine or in-phase building-block 

waveform 

φ c,i (t) = φ i (t) √ 2 cos 2π f 0 t (15.7) 

and the spectrum s, j ( f ) of a sine or quadrature building-block waveform 

ψ s, j (t) = ψ j (t) √ 2 sin 2π f 0 t. (15.8) 

We express these spectra in terms of the baseband spectra i ( f ) and j ( f ) (see also figure 

15.3). 

c,i ( f ) = 

∫ ∞ 

−∞ 

φ i (t) √ 2 cos(2π f 0 t) exp(−2π f t)dt 

= √ 1 ( i ( f − f 0 ) + i ( f + f 0 )) , 

2 

s, j ( f ) = 

= 

∫ ∞ 

−∞ 

1 

j √ 2 

ψ j (t) √ 2 sin(2π f 0 t) exp(−2π f t)dt 

( 

j ( f − f 0 ) − j ( f + f 0 ) ) . (15.9)


Note (see again figure 15.3) that the spectra c,i ( f ) and s, j ( f ) are zero outside the passband 

±[ f 0 − W, f 0 + W ]. 

We now first consider the correlation between φ c,i (t) and φ c,i ′(t) for all i and i ′ . This leads 

to 

∫ ∞ 

−∞ 

= 1 2 

= 1 2 

φ c,i (t)φ c,i ′(t)dt = 

+ 1 2 

(a) 

= 1 2 

∫ ∞ 

−∞ 

∫ ∞ 

∫ ∞ 

−∞ 

c,i ( f ) ∗ c,i ′( f )d f 

[ i ( f − f 0 ) + i ( f + f 0 )][ ∗ i ′( f − f 0) + ∗ i ′( f + f 0)]d f 

−∞ 

i ( f − f 0 ) ∗ i ′( f − f 0)d f + 1 2 

∫ ∞ 

∫ ∞ 

−∞ 

i ( f + f 0 ) ∗ i ′( f − f 0)d f + 1 2 

−∞ 

i ( f − f 0 ) ∗ i ′( f − f 0)d f + 1 2 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

i ( f − f 0 ) ∗ i ′( f + f 0)d f 

i ( f + f 0 ) ∗ i ′( f + f 0)d f 

i ( f + f 0 ) ∗ i ′( f + f 0)d f 

(b) 

= 1 2 δ i,i ′ + 1 2 δ i,i ′ = δ i,i ′. (15.10) 

Here equality (a) follows from the fact that f 0 > W and the observation that for all i = 1, N c 

the spectra i ( f ) ≡ 0 for | f | ≥ W . Therefore the cross-terms are zero, see also figure 15.3. 

Equality (b) follows from (15.6) which holds since the baseband building-blocks φ i (t) for i = 

1, N c , form an orthonormal base. 

In a similar way we can show that for all j and j ′ 

∫ ∞ 

−∞ 

ψ s, j (t)ψ s, j ′(t)dt = δ j, j ′. (15.11) 

What remains to be investigated are the correlations between all in-phase (cosine) buildingblock 

waveforms φ c,i (t) for i = 1, N c and all quadrature (sine) building-block waveforms 

ψ s, j (t) for j = 1, N s : 

∫ ∞ 

−∞ 

= j 2 

= j 2 

φ c,i (t)ψ s, j (t)dt = 

+ j 2 

∫ ∞ 

−∞ 

∫ ∞ 

∫ ∞ 

−∞ 

c,i ( f ) ∗ s, j ( f )d f 

[ i ( f − f 0 ) + i ( f + f 0 )][ ∗ j ( f − f 0) − ∗ j ( f + f 0)]d f 

−∞ 

i ( f − f 0 ) ∗ j ( f − f 0)d f − j 2 

∫ ∞ 

∫ ∞ 

−∞ 

i ( f + f 0 ) ∗ j ( f − f 0)d f − j 2 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

i ( f − f 0 ) ∗ j ( f + f 0)d f 

i ( f + f 0 ) ∗ j ( f + f 0)d f 

(c) 

= j i ( f − f 0 ) ∗ j 

2 

( f − f 0)d f − j i ( f + f 0 ) ∗ j 

−∞ 2 

( f + f 0)d f 

−∞ 

= 0. (15.12)


Here equality (c) follows from the fact that f 0 > W and the observation that for all i = 1, N c 

the spectra i ( f ) ≡ 0 for | f | ≥ W and for all j = 1, N s the spectra j ( f ) ≡ 0 for | f | ≥ W . 

Therefore the cross-terms are zero. See figure 15.3. 

RESULT 15.1 We have shown that all in-phase building-block waveforms φ c,i (t) for i = 1, N c 

and all quadrature building-block waveforms ψ s, j (t) for j = 1, N s together form an orthonormal 

base. 

Moreover the spectra c,i ( f ) and s, j ( f ) of all these building-block waveforms are zero outside 

the passband ±[ f 0 − W, f 0 + W ]. Therefore none of these building-block waveforms is hindered 

by the bandpass filter W 0 ( f ) when they are sent over our bandpass channel. 

It is important to note that the baseband building-block waveforms φ i (t) and ψ j (t), for any 

i and j, need not be orthogonal. Multiplication of these baseband waveforms by √ 2 cos 2π f 0 t 

resp. √ 2 sin 2π f 0 t results in the orthogonality of the bandpass building-block waveforms φ c,i (t) 

and ψ s, j (t). 

Also note that all the bandpass building block waveforms have unit energy. This follows from 

(15.10) and (15.11). Therefore the energy of s m (t) is equal to the squared length of the vector 

s m = (s c m , ss m ) = (sc m1 , sc m2 , · · · , sc m N c 

, s s m1 , ss m2 , · · · , ss m N s 

). (15.13) 

15.3 Optimum receiver for quadrature multiplexing 

The result of the previous section actually states that by applying quadrature multiplexing for a 

bandpass channel we obtain a “normal” waveform channel communication problem. Quadrature 

multiplexing apparently yields the right building-block waveforms! The building-block waveforms 

have spectra that are matched to the bandpass channel. Each message m ∈ M results in 

a signal s m (t) that is a linear combination of N c cosine and N s sine building-block waveforms. 

Therefore the ideal bandpass filter at the input of the bandpass channel has no effect on the 

transmission of the signals. 

We have seen before that a waveform channel is actually equivalent to a vector channel. Also 

we know how the optimum receiver for a waveform channel should be constructed. If we restrict 

ourselves for a moment to the receiver that correlates the received signal r(t) with the building 

blocks, we know that it should start with N c + N s multipliers followed by integrators. 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(t)φ c,i (t)dt = 

r(t)ψ s, j (t)dt = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(t) √ 2 cos(2π f 0 t)φ i (t)dt = r c i , and 

r(t) √ 2 sin(2π f 0 t)ψ j (t)dt = r s j . (15.14) 

After having determined the received vector r = (r c , r s ) = (r1 c, r 2 c, · · · , r N c c 

, r1 s, r 2 s, · · · , r N s s 

) we 

can form the dot-products (r ·s m ) where the signal vector s m = (s c m , ss m ) = (sc m1 , sc m2 , · · · , sc m N c 

, 

sm1 s , ss m2 , · · · , ss m N s 

): 

(r · s m ) = ∑ 

ri c sc mi + 

∑ 

r s j ss mj . (15.15) 

i=1,N c j=1,N s


φ 1 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

φ 2 (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

√ 

2 cos 2π f0 t . 

φ Nc (t) 

✛❄ 

✘✛❄ 

✘ 

✲ × ✲ × ✲ 

✚✙✚✙ 

r(t) 

ψ 1 (t) 

✛✘✛❄ 

✘ 

✲ × ✲ × ✲ 

✚✙✚✙ 

✻ ψ 2 (t) 

√ ✛❄ 

✘ 

2 sin 2π f0 t ✲ × ✲ 

✚✙ 

∫ 

∫ 

∫ 

∫ 

∫ 

r1 

c ✲ 

r2 

c ✲ 

rN c c 

✲ 

r1 

s ✲ 

r2 

s ✲ 

weighting 

matrix 

∑ 

i r c i sc mi 

+ ∑ j r s j ss mj 

c 1 

(r · s ❄ 1 ) ✛✘ 

✲ + ✲ 

✚✙ 

c 2 

(r · s ❄ 2 ) ✛✘ 

✲ + ✲ 

✚✙ 

. 

c |M| 

(r · s |M| ✛) 

❄ ✘ 

✲ + ✲ 

✚✙ 

select 

largest 

ˆm 

✲ 

. 

ψ Ns (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

∫ 

rN s s 

✲ 

Figure 15.4: Optimum receiver for quadrature-multiplexed signals that are transmitted over a 

bandpass channel.


Adding the constants c m is now the last step before selecting ˆm as the message m that maximizes 

(r · s m ) + c m where (see result (6.1)) 

with 

c m = N 0 

2 ln Pr{M = m} − ‖s m ‖2 

, (15.16) 

2 

‖s m ‖ 2 = ‖s c m ‖2 + ‖s s m ‖2 = ∑ 

= 

i=1,N c 

(s c mi )2 + 

∫ ∞ 

−∞ 

(s c m (t))2 dt + 

∑ 

(smj s )2 

j=1,N s 

∫ ∞ 

Note that ‖s m ‖ 2 = ∫ ∞ 

−∞ s2 m (t)dt. 

All this leads to the receiver implementation shown in figure 15.4. 

15.4 Transmission of a complex signal 

−∞ 

(s s m (t))2 dt. (15.17) 

After the previous section we may think of quadrature multiplexing as a method for transmitting 

two real-valued signals sm c (t) and ss m (t). However we can also see quadrature multiplexing as a 

technique to transmit a single complex waveform 

Then we can write 

sm cs (t) = sc m (t) − jss m (t). (15.18) 

s m (t) = R[s cs 

m (t)√ 2 exp( j2π f 0 t)] 

= s c m (t)√ 2 cos(2π f 0 t) + s s m (t)√ 2 sin(2π f 0 t), (15.19) 

where R[x] denotes the real part of x. 

The notion of energy of the complex signal sm cs (t) is in accordance with the energy that we 

have used in expressing the constants c m , hence 

∫ ∞ 

−∞ 

‖s cs 

m (t)‖2 dt = 

∫ ∞ 

−∞ 

15.5 Dimensions per second 

(s c m (t))2 dt + 

∫ ∞ 

−∞ 

(s s m (t))2 dt. (15.20) 

By applying quadrature multiplexing we can not obtain more than 4W dimensions per second 

from our bandpass channel. By choosing both the baseband building-block waveform sets 

{φ i (t), i = 1, N c } and {ψ j (t), j = 1, N s } as large (rich) as possible given the bandwidth constraint 

of W Hz, i.e. by taking 2W dimensions per second for each of the building-block waveform 

sets, we obtain the upper bound of 4W dimensions per second. Note however that, as usual, 

this corresponds to 2 dimensions per Hz per second.


15.6 Capacity of the bandpass channel 

In the previous section we saw that the number of dimensions in the case of bandpass transmission 

is at most 4W per second. Moreover there exist baseband building-block sets that achieve 

this maximum. 

RESULT 15.2 Therefore the capacity in bit per dimension of the bandpass channel with bandwidth 

±[ f 0 − W, f 0 + W ] is 

C N = 1 2 log 2 (1 + ), (15.21) 

2N 0 W 

if the transmitter power is P s and the noise spectral density is N 0 /2 for all f . Consequently the 

capacity per second is 

C = 4WC N = 2W log 2 (1 + 

15.7 Quadrature amplitude modulation 

P s 

P s 

) bit/second. (15.22) 

2N 0 W 

In general we take N c = N s = N and the baseband in-phase and quadrature building-block 

waveforms equal, i.e. φ i (t) = ψ i (t) for all i = 1, N. Then the two dimensions corresponding 

to the passband building-blocks φ i (t) √ 2 cos(2π f 0 t) and ψ i (t) √ 2 sin(2π f 0 t) for i = 1, N are 

in some way linked together. This will become clear in chapter 16 on random carrier-phase 

communication. We refer to this method as quadrature amplitude modulation (QAM). 

15.8 Serial quadrature amplitude modulation 

It is important to realize that we could use serial pulse amplitude modulation with a pulse p(t) 

satisfying the Nyquist criterion (see result 14.1) on both carriers (cosine and sine). A Nyquist 

pulse corresponds to an orthonormal basis consisting of time shifts of this pulse. Two such sets 

(based on the same pulse p(t)) can be used in a QAM scheme. Note that this signaling method 

leads to an extremely simple receiver structure. 

Example 15.1 QAM modulation results in two-dimensional signal structures. In serial QAM we can take 

as baseband building blocks 

φ 1 (t) = ψ 1 (t) = √ sin(2π W t) 

2W , (15.23) 

2πWt 

and all other baseband building blocks are shifts over multiples of 1/(2W ) seconds of this sinc-pulse. 

Therefore the frequency bandwidth of all building blocks is W . After modulation a bandpass spectrum 

is obtained, thus the spectra of the signals fit into ±[ f 0 − W, f 0 + W ]. In each pair of dimensions 

corresponding to a shift over (k − 1)/(2W ) seconds we can observe a two-dimensional part (sk c, ss k 

) of the 

signal vector. This part assumes values in a two-dimensional signal structure. 

In figure 15.5 two signal structures are shown. The signal points are placed on a rectangular grid. 

This implies that their minimum Euclidean distance is not smaller than a certain value d. It is important to 

place the required number of signal points in such a way on the grid that their average energy is minimal.


✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ d 

d 

✛ ✲✄ 

✂ ✄ 

✄ 

✁ 

✛ ✲✄ 

✂ ✄ 

✁ ✂ ✂ ✄ 

✂ ✁ 

✁ ✁ ✂ ✁ ✂ ✁ 

 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

(a) 

(b) 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

Figure 15.5: Two QAM signal structures, (a) 16-QAM, (b) 32-QAM. Note that 32-QAM is not 

square! 


1. A signal structure must contain M signal points. These point are to be chosen on an integer 

grid. 

Now consider a hypercube and a hypersphere in N dimensions. Both have their center in 

the origin of the coordinate system. 

We can either choose our signal structure as all the grid points inside the sphere or all the 

grid points inside the cube. Assume that the dimensions of the sphere and cube are such 

that they contain equally many signal points. 

Let M ≫ 1 so that the signal points can be assumed to be uniformly distributed over the 

sphere and the cube. 

(a) Let N = 2. Find the ratio between the average signal energies of the sphere-structure 

and that of the cube-structure. What is the difference in dB? 

(b) Let N be even. What is then the ratio for N → ∞? In dB? 

Hint: The volume of an N-dimensional sphere is π N/2 R N /(N/2)! for even N. 

The computed ratios are called shaping gains.

Chapter 16 

Random carrier-phase 

SUMMARY: Here we consider carrier modulation under the assumption that the carrierphase 

is not known to the receiver. Only one baseband signal s b (t) will be transmitted. First 

we determine the optimum receiver for this case. Then we consider the implementation of 

the optimum receiver when all signals have equal energy. We investigate envelope detection 

and work out an example in which one of out of two orthogonal signals is transmitted. 


Because of oscillator drift or differences in propagation time it is not always reasonable to assume 

that the receiver knows the phase of the wave that is used as carrier of the message, when 

quadrature multiplexing would be used. In that case coherent demodulation is not possible. 

To investigate this situation we assume that the transmitter modulates with the 1 baseband 

signal s b (t) a wave with a random phase θ, i.e. 

s(t) = s b (t) √ 2 cos(2π f 0 t − θ), (16.1) 

where the random phase is assumed to be uniform over [0, 2π) hence 

p (θ) = 1 , for 0 ≤ θ < 2π. (16.2) 

2π 

More precisely, for each message m ∈ M, let the signal sm b (t) correspond to the vector 

s m = (s m1 , s m2 , · · · , s m N ) hence 

sm b (t) = ∑ 

s mi φ i (t). (16.3) 

i=1,N 

We assume as before that the spectrum of the signals sm b (t) for all m ∈ M is zero outside 

[−W, W ]. Note that the same holds for the building-block waveforms φ i (t) for i = 1, N. This 

1 Note that there is only one baseband signal involved here. The type of modulation is called double sideband 

suppressed carrier (DSB-SC) modulation. 

137

CHAPTER 16. RANDOM CARRIER-PHASE 138 

is since the building-block waveforms are linear combinations of the signals. If we now use the 

goniometric identity cos(a − b) = cos a cos b + sin a sin b we obtain 

s m (t) = s b m (t)√ 2 cos(2π f 0 t − θ) 

= sm b (t) cos θ√ 2 cos(2π f 0 t) + sm b (t) sin θ√ 2 sin(2π f 0 t) 

= ∑ 

s mi cos θφ i (t) √ 2 cos(2π f 0 t) + ∑ 

s mi sin θφ i (t) √ 2 sin(2π f 0 t).(16.4) 

i=1,N 

i=1,N 

If we now turn to the vector approach we can say that, although θ is unknown, this is equivalent 

to quadrature multiplexing where vector-signals s c m and ss m are transmitted satisfying 

s c m = s m cos θ, 

s s m = s m sin θ. (16.5) 

After receiving the signal r(t) = s m (t) + n w (t) an optimum-receiver forms the vectors r c and r s 

for which 

r c = s c m + nc = s m cos θ + n c , 

r s = s s m + ns = s m sin θ + n s , (16.6) 

where both n c and n s are independent Gaussian vectors with independent components all having 

mean zero and variance N 0 /2. In the next section we will further determine the optimum receiver 

for this situation. 

16.2 Optimum incoherent reception 

To determine the optimum receiver we first write out the decision variable for message m, i.e. 

Pr{M = m}p R c ,R s (r c , r s |M = m) 

= Pr{M = m} 

∫ 2π 

0 

p (θ)p R c ,R s (r c , r s |M = m, = θ)dθ. (16.7) 

Note that we average over the unknown phase θ. Assume first that all messages are equally 

likely. Therefore only the integral is relevant. The informative term inside the integral is 

Therefore we investigate 

p R c ,R s (r c , r s |M = m, = θ) 

= p N c ,N s (r c − s m cos θ, r s − s m sin θ) 

1 

= ( ) N exp 

(− ‖r c − s m cos θ‖ 2 + ‖r s − s m sin θ‖ 2 ) 

. (16.8) 

π N 0 N 0 

‖r c − s m cos θ‖ 2 + ‖r s − s m sin θ‖ 2 

= ‖r c ‖ 2 − 2(r c · s m ) cos θ + ‖s m ‖ 2 cos 2 θ + ‖r s ‖ 2 − 2(r s · s m ) sin θ + ‖s m ‖ 2 sin 2 θ 

= ‖r c ‖ 2 + ‖r s ‖ 2 − 2(r c · s m ) cos θ − 2(r s · s m ) sin θ + ‖s m ‖ 2 , (16.9)


and note that both ‖r c ‖ 2 and ‖r s ‖ 2 do not depend on the message m and can be ignored. Therefore 

the relevant part of the decision variable is 

∫ ( ) ( 

2 

p (θ) exp [(r c · s 

N m ) cos θ + (r s · s m ) sin θ] dθ exp − E ) 

m 

, (16.10) 

0 N 0 

θ 

where E m = ‖s m ‖ 2 . Note that we have to maximize the decision variable over all m ∈ M. 

We next consider (r c · s m ) cos θ + (r s · s m ) sin θ. Therefore, for each m ∈ M, we first make 

✧ 

✧ 

✧ 

✧ 

X m ✧ 

✧ 

✧ 

✧ 

✧ 

✧ 

✧ 

✧ 

✧ 

✧ ❇ γ m 

(r c , s m ) 

(r s , s m ) 

Figure 16.1: Polar transformation of matched-filter outputs. 

a transformation of the matched-filter outputs (r c · s m ) and (r s · s m ) from rectangular into the 

polar coordinates (amplitude X m and phase γ m ). Define (see figure 16.1): 

and let the angle γ m be such that 

Therefore 

and 

X m 

= 

√(r c · s m ) 2 + (r s · s m ) 2 (16.11) 

(r c · s m ) = X m cos γ m , 

(r s · s m ) = X m sin γ m . (16.12) 

(r c · s m ) cos θ + (r s · s m ) sin θ = X m cos γ m cos θ + X m sin γ m sin θ 

∫ 2π 

0 

= 

(a) 

= 

= X m cos(θ − γ m ), (16.13) 

( ) 

2 

p (θ) exp [(r c · s 

N m ) cos θ + (r s · s m ) sin θ] dθ 

0 

∫ 2π 

0 

∫ 2π 

0 

1 

2π exp ( 2Xm 

) 

cos(θ − γ m ) 

N 0 

( 

1 

2π exp 2Xm 

cos θ 

N 0 

dθ 

) ( ) 2Xm 

dθ = I 0 . (16.14) 

N 0 

Here (a) follows from the periodicity of the cosine. Note that therefore the angle γ m is not 

relevant! The integral denoted by I 0 (·) is called the “zero-order modified Bessel function of the 

first kind” (see figure 16.2).


12 

10 

8 

6 

4 

2 

0 

0 0.5 1 1.5 2 2.5 3 3.5 4 

Figure 16.2: A plot of I 0 (x) = 1 ∫ 2π 

2π 0 

exp(x cos θ)dθ as a function of x. 

RESULT 16.1 Combining everything we obtain that the optimum receiver for “random-phase 

transmission” (incoherent detection) has to choose the message m ∈ M that maximizes 

( ) ( 

2Xm 

I 0 exp − E ) 

m 

. (16.15) 

N 0 N 0 

16.3 Signals with equal energy, receiver implementation 

If all the signals have the same energy the optimum receiver only has to maximize I 0 (2X m /N 0 ). 

However since I 0 (x) is a monotonically increasing function of x this is equivalent to maximizing 

X m or its square 

X 2 m = (r c · s m ) 2 + (r s · s m ) 2 . (16.16) 

How can we now determine e.g. (r c · s m )? We can use the standard approach as shown in 

figure 15.4 and correlate r(t) with the bandpass building-block waveforms φ i (t) √ 2 cos(2π f 0 t) 

to obtain the components of r c and then form the dot product, but there is also a more direct way.


Note that 

(r c · s m ) = ∑ 

i=1,N 

ri c s mi = ∑ (∫ ∞ 

) 

r(t)φi c (t)dt s mi 

i=1,N 

−∞ 

= ∑ (∫ ∞ 

= 

= 

i=1,N 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(t)φ i (t) √ 2 cos(2π f 0 t)dt 

−∞ 

r(t) √ 2 cos(2π f 0 t) ∑ 

i=1,N 

s mi φ i (t)dt 

) 

s mi 

r(t) √ 2 cos(2π f 0 t)sm b (t)dt. (16.17) 

A similar result holds for the product (r s · s m ). All this suggests the implementation shown in 

figure 16.3. As usual there is also a matched-filter version of this incoherent receiver. 

√ 

2 cos(2π f0 t) 

r(t) ✛❄ 

✘ 

✲ × 

✚✙ 

✛✘ 

✲ × 

✚✙ 

✻ 

√ 

2 sin(2π f0 t) 

✛✘ 

(r c · s 1 ) 

∫ 

✲ × ✲ 

✲ (·) 2 

✚✙ 

✛❄ 

✘X ✻ 

1 

2 

s ✲ 

1 b(t) 

+ 

✛❄ 

✘ 

✚✙ 

∫ 

✲ ✲ 

✲ 

✻ 

× 

(·) 2 

✚✙ 

(r s · s 1 ) 

✛✘ 

(r c · s 2 ) 

∫ 

✲ × ✲ 

✲ (·) 2 

✚✙ 

✛❄ 

✘X ✻ 

s b + ✲ 

2 (t) 

2 

2 

✛❄ 

✘ 

✚✙ 

∫ 

✲ ✲ 

✲ 

✻ 

× 

(·) 2 

✚✙ 

(r s · s 2 ) 

. 

✛✘ 

(r c · s |M| ) 

∫ 

✲ × ✲ 

✲ (·) 2 

✚✙ 

✛❄ 

X✘ 

✻ 

s b + ✲ 

|M| (t) 

|M| 

2 

✛❄ 

✘ 

✚✙ 

∫ 

✲ ✲ 

✲ 

✻ 

× 

(·) 2 

✚✙ 

(r s · s |M| ) 

select 

largest 

ˆm 

✲ 

Figure 16.3: Correlation receiver for equal-energy signals with random phase.


16.4 Envelope detection 

Fix an m ∈ M. The matched filter for the (bandpass) signal s m (t) = s b m (t)√ 2 cos(2π f 0 t − θ) 

has impulse response s m (T − t) = s b m (T − t)√ 2 cos(2π f 0 (T − t) − θ). Here it is assumed that 

the signal s m (t) is zero outside [0, T ]. Since the phase θ of the transmitted signal is unknown to 

the receiver, the impulse response 

is probably a good matched filter. Let’s investigate this! 

The response of this filter on r(t) is 

u m (t) = 

with 

= 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

= cos(2π f 0 t) 

r(α)h m (t − α)dα 

h m (t) = s b m (T − t)√ 2 cos(2π f 0 t) (16.18) 

r(α)s b m (T − t + α)√ 2 cos(2π f 0 (t − α))dα 

r(α)s b m (T − t + α)√ 2[cos(2π f 0 t) cos(2π f 0 α) + sin(2π f 0 t) sin(2π f 0 α)]dα 

+ sin(2π f 0 t) 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(α)s b m (T − t + α)√ 2 cos(2π f 0 α)dα 

r(α)s b m (T − t + α)√ 2 sin(2π f 0 α)dα 

= u c m (t) cos(2π f 0t) + u s m (t) sin(2π f 0t), (16.19) 

u c m (t) = 

∫ ∞ 

−∞ 

u s m (t) = 

∫ ∞ 

−∞ 

r(α) √ 2 cos(2π f 0 α)sm b (T − t + α)dα, 

r(α) √ 2 sin(2π f 0 α)sm b (T − t + α)dα. (16.20) 

The signals u c m (t) and us m (t) are baseband signals. Why? The reason is that we can regard 

e.g. u c m (t) as the output of the filter with impulse response sb m (T − t) when the excitation is 

r(t) √ 2 cos(2π f 0 t). It is clear that signal-components outside the frequency band [−W, W ] can 

not pass the filter sm b (T − t) since sb m (t) is a baseband signal. 

We now use the trick of section 16.2 again. Write the matched-filter output as 

where 

and angle γ m (t) is such that 

u m (t) = X m (t) cos[2π f 0 t − γ m (t)] (16.21) 

X m (t) = 

√ 

(u c m (t))2 + (u s m (t))2 (16.22) 

u c m (t) = X m(t) cos γ m (t), 

u s m (t) = X m(t) sin γ m (t). (16.23)


Now X m (t) is a slowly-varying “envelope” (amplitude) and γ m (t) a slowly-varying ”phase” of 

the output u m (t) of the matched filter. By slowly-varying we mean within bandwidth [−W, W ]. 

Next let us consider what happens at t = T . Observe from equations (16.17) that 

hence 

X m (T ) = 

u c m (T ) = (r c · s m ), 

u s m (T ) = (r s · s m ), (16.24) 

√ 

(r c · s m ) 2 + (r s · s m ) 2 . (16.25) 

Therefore, for equal-energy signals, we can construct an optimum receiver by sampling the 

envelope of the outcomes of the matched filters h m (t) = s b m (T −t)√ 2 cos(2π f 0 t) for all m ∈ M 

and comparing the samples. This leads to the implementation shown in figure 16.4. 

r(t) 

✲ 

✲ 

h 1 (t) 

h 2 (t) 

u 1 (t) 

u 2 (t) 

✲ 

✲ 

envelope 

detector 

envelope 

detector 

X 1 (t) 

X 2 (t) 

❍ X ❍❍ 1 

✄ 

✂ ✄ 

✁ ✂ ✁ ✲ 

❍ X ❍❍ 2 

✄ ✄ 

✂ ✁ ✂ ✁ ✲ 

select 

largest 

ˆm 

✲ 

. 

✲ 

h |M| (t) 

u |M| (t) 

✲ 

envelope 

detector 

X |M| (t) ❍ X ❍❍ |M| 

✄ 

✂ ✄ 

✁ ✂ ✁ ✲ 


Figure 16.4: Envelope-detector receiver for equal energy signals with random phase. 

Note that instead of using a matched filter with (passband) impulse response sm b (T − t)√ 2 

cos(2π f 0 t) we can first multiply the received signal r(t) by cos(2π f 0 t) and then use a matched 

filter with (baseband) impulse response sm b (T − t). 

Example 16.1 For some m ∈ M assume that sm b (t) = 1 for 0 ≤ t ≤ 1 and zero elsewhere. If the random 

phase turns out to be θ = π/2 then 

s m (t) = s b m (t)√ 2 cos(2π f 0 t − π/2) = s b m (t)√ 2 sin(2π f 0 t). (16.26) 

For the matched-filter impulse response, noting that T = 1, we can write 

h m (t) = s b m (1 − t)√ 2 cos(2π f 0 t). (16.27)


If we assume that is no noise, hence r(t) = s m (t), the output u m (t) of the matched filter will be 

u m (t) = 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

r(α)h m (t − α)dα 

s b m (α)√ 2 sin(2π f 0 α)s b m (1 − t + α)√ 2 cos(2π f 0 (t − α)dα. (16.28) 

All these signals are shown in figure 16.5 for f 0 = 7Hz. 

Note that the envelope of u m (t) has the triangular shape which is the output of a matched filter for a 

rectangular pulse if the input of this matched filter is this rectangular pulse. 

s b m (t) −0.5 0 0.5 1 1.5 2 2.5 

1.5 

1 

0.5 

0 

−0.5 

−1 

−1.5 

−0.5 0 0.5 1 1.5 2 2.5 

1.5 

1 

0.5 

0 

−0.5 

−1 

−1.5 

s m 

(t)=s b m (t)sqrt(2)sin(2*pi*f 0 *t) 

h m 

(t)=s b m (1−t)sqrt(2)cos(2*pi*f 0 *t) −0.5 0 0.5 1 1.5 2 2.5 

1.5 

1 

0.5 

0 

−0.5 

−1 

−1.5 

−0.5 0 0.5 1 1.5 2 2.5 

1.5 

1 

0.5 

0 

−0.5 

−1 

−1.5 

u m 

(t) 

Figure 16.5: Signals s b m (t), s m(t), and u m (t), and impulse-response h m (t). 

16.5 Probability of error for two orthogonal signals 

An incoherent receiver does not consider phase information. Therefore it can not yield so small 

an error probability as a coherent receiver. To illustrate this we will now calculate the probability 

of error for white Gaussian noise when one of two equally likely messages is communicated 

over a system utilizing an incoherent receiver and DSB-SC modulated equal-energy orthogonal 

baseband signals 

s b 1 (t) = √ E s φ 1 (t), hence s 1 = ( √ E s , 0), and 

s b 2 (t) = √ E s φ 2 (t), hence s 2 = (0, √ E s ). (16.29)


An optimum receiver (see result 16.1) now chooses ˆm = 1 if and only if 

(r c · s 1 ) 2 + (r s · s 1 ) 2 > (r c · s 2 ) 2 + (r s · s 2 ) 2 . (16.30) 

Here all vectors are two-dimensional, more precisely 

r c = s m cos θ + n c 

r s = s m sin θ + n s . (16.31) 

When M = 1 then the vector components of r c = (r c 1 , r c 2 ) and r s = (r s 1 , r s 2 ) are 

r c 1 = √ E s cos θ + n c 1 , 

r s 1 = √ E s sin θ + n s 1 , 

r c 2 = n c 2 , 

r s 2 = n s 2 , (16.32) 

where n c = (n c 1 , nc 2 ) and ns = (n s 1 , ns 2 ). Now by s 1 = (√ E s , 0) and s 2 = (0, √ E s ) the optimum 

receiver decodes ˆm = 1 if and only if 

(r c 1 )2 + (r s 1 )2 > (r c 2 )2 + (r s 2 )2 . (16.33) 

The noise components are all statistically independent Gaussian variables with density function 

p N (n) = 1 √ π N0 

exp(− n2 

N 0 

). (16.34) 

Now fix some θ. Assume that (r c 1 )2 + (r s 1 )2 = ρ 2 . What is now the probability of error given 

that M = 1 and = θ? Therefore consider 

Pr{ ˆM = 2| = θ, M = 1, (R1 c )2 + (R1 s )2 = ρ 2 } = Pr{(R2 c )2 + (R2 s )2 ≥ ρ 2 } 

∫ ∫ 

= 

p N (α)p N (β)dαdβ 

α 2 +β 2 ≥ρ 

∫ ∫ 

2 

= 

exp(− α2 + β 2 

)dαdβ 

π N 0 

= 

1 

α 2 +β 2 ≥ρ 

∫ 2 ∞ ∫ 2π 

ρ 

0 

N 0 

1 

exp(− r 2 

)rdrdθ 

π N 0 N 0 

= exp(− ρ2 

N 0 

). (16.35)


We obtain the error probability for the case where M = 1 by averaging over all R c 1 and Rs 1 , hence 

Pr{ ˆM = 2| = θ, M = 1} 

= 

= 

= 

∫ ∞ ∫ ∞ 

−∞ −∞ 

∫ ∞ ∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

−∞ 

p R c 

1 ,R s 1 (r c 1 , r s 1 | = θ, M = 1) exp(−(r c 1 )2 + (r s 1 )2 

N 0 

)dr c 1 dr s 1 

p R c 

1 

(r c 1 | = θ, M = 1)p R s 1 (r s 1 | = θ, M = 1) exp(−(r c 1 )2 + (r s 1 )2 

N 0 

)dr c 1 dr s 1 

p R c 

1 

(r1 c c | = θ, M = 1) exp(−(r 1 )2 

)dr1 c N · 

0 

∫ ∞ 

· p R s 

1 

(r1 s | = θ, M = 1) exp(−(r 1 s)2 

)dr1 s 

−∞ 

N . (16.36) 

0 

Consider the first factor. Note that r c 1 = √ E s cos θ + n c 1 . Therefore 

∫ ∞ 

−∞ 

= 

p R c 

1 

(r1 c c | = θ, M = 1) exp(−(r 1 )2 

)dr1 

c N 0 

∫ ∞ 

−∞ 

With m = √ E s cos θ this integral becomes 

∫ ∞ 

−∞ 

= 

= 

= 

1 

√ exp(− (α − √ E s cos θ) 2 

) exp(− α2 

)dα. (16.37) 

π N0 N 0 

N 0 

1 (α − m)2 

√ exp(− ) exp(− α2 

)dα 

π N0 N 0 N 0 

∫ ∞ 

1 

√ exp(− 2α2 − 2αm + m 2 

)dα 

π N0 

−∞ 

∫ ∞ 

−∞ 

N 0 

1 

√ exp(− 2(α2 − αm + m 2 /4) + m 2 /2 

)dα 

π N0 

exp(− 

m2 

2N 0 

) 

√ 

2 

∫ ∞ 

−∞ 

1 

√ π N0 /2 

N 0 

− m/2)2 

exp(−(α )dα = 

N 0 /2 

exp(− 

m2 

2N 0 

) 

√ 

2 

= exp(− E s cos 2 θ 

2N 0 

) 

√ 

2 

. (16.38) 

Combining this with a similar result for the second factor we obtain 

Pr{ ˆM = 2| = θ, M = 1} = exp(− E s cos 2 θ 

2N 0 

) 

√ 

2 

· exp(− E s sin 2 θ 

2N 0 

) 

√ 

2 

= 1 2 exp(− E s 

2N 0 

). (16.39) 

Although we have fixed θ this probability is independent of θ. Averaging over θ yields therefore 

Pr{ ˆM = 2|M = 1} = 1 2 exp(− E s 

2N 0 

). (16.40) 

Based on symmetry we can obtain a similar result for Pr{ ˆM = 1|M = 2}.


RESULT 16.2 Therefore we obtain for the error probability for an incoherent receiver for two 

equally likely orthogonal signals, both having energy E s , that 

P E = 1 2 exp(− E s 

2N 0 

). (16.41) 

Note that for coherent reception of two equally likely orthogonal signals we have obtained 

before that 

√ 

E s 

P E = Q( ) ≤ 1 N 0 2 exp(− E s 

). (16.42) 

2N 0 

Here we have used the bound on the Q-function that was derived in appendix A. Figure 16.6 

shows that the error probabilities are comparable however. 

10 0 Es/N0 

10 −1 

10 −2 

10 −3 

10 −4 

10 −5 

10 −6 

10 −7 

−15 −10 −5 0 5 10 15 

Figure 16.6: Probability of error for coherent and incoherent reception of two equally likely 

orthogonal signals of energy E s as a function of E s /N 0 in dB. Note that probability of error for 

incoherent reception is largest. 

Another disadvantage of incoherent transmission of two orthogonal signals is that we loose 

3 dB relative to antipodal signaling. Also the bandwidth efficiency of random-phase transmission 

of two-orthogonal signals is a factor of two smaller than that of coherent transmission. A 

transmitted dimension spreads out over two received dimensions.



1. Let X and Y be two independent Gaussian random variables with common variance σ 2 . 

The mean of X is m and Y is a zero-mean random variable. We define the random variable 

V as V = √ X 2 + Y 2 . Show that 

p V (v) = v 

σ 2 I 0( mv 

σ 2 ) exp(−v2 + m 2 

2σ 2 ), for v > 0, (16.43) 

and 0 for v ≤ 0. Here I 0 (·) is the modified Bessel function of the first kind and zero order. 

The distribution of V is know as the Rician distribution. In the special case where m = 0, 

the Rician distribution simplifies to the Rayleigh distribution. 

(Exercise 3.31 from Proakis and Salehi [17].) 

2. In on-off keying of a carrier-modulated signal, the two possible signals are 

s b 1 (t) = √ Eφ(t), hence s 1 = √ E, 

s b 2 (t) = 0, hence s 2 = 0. (16.44) 

Note that s 1 and s 2 are signals in one-dimensional “vector”-notation. 

equiprobable i.e. Pr{M = 1} = Pr{M = 2} = 1/2. 

The signals are 

These signals are transmitted over a bandpass channel with a carrier with random phase θ 

with p (θ) = 1/2π for 0 ≤ θ < 2π. Power density of the noise is N 0 

2 

for all frequencies. 

An optimum receiver first determines r c and r s for which we can write 

r c = s m cos θ + n c , 

r s = s m sin θ + n s , (16.45) 

where both n c and n s are independent zero-mean Gaussian vectors with variance N 0 

2 . 

Now let E/N 0 = 10dB. 

(a) Determine the optimum decision regions I m for m = 1, 2. 

Hint: Note that I 0 (12.1571) = 22025. 

(b) Determine the expected error probability that is achieved by the optimum receiver. 

Hint: Note that ∫ 2.7184 

0 

x exp(− x2 

2 )I 0(x √ 20)dx = 635.72. 

3. In on-off keying of a carrier modulated signal the two baseband signals are 

s b 1 (t) = Aφ(t), hence s 1 = A 

s b 2 (t) = 0, hence s 2 = 0. 

Here φ(t) is a baseband building-block waveform and s 1 and s 2 are the baseband signals 

in vector-representation (one dimensional). Both signals are equiprobable, hence Pr{M = 

1} = Pr{M = 2} = 1/2.


The baseband signals are modulated with a carrier having frequency f 0 and random phase 

, hence 

s m (t) = s b m (t)√ 2 cos(2π f 0 t − θ). 

The resulting signal s 1 (t) or s 2 (t) is transmitted over a bandpass channel, power spectral 

density of the additive white Gaussian noise is N 0 /2 for all frequencies f . Assume that 

A 2 /N 0 = ln(13/5). 

An optimum receiver correlates the received signal r(t) = s m (t) + n w (t) with the bandpass 

building-block waveforms φ(t) √ 2 cos(2π f 0 t) and φ(t) √ 2 sin(2π f 0 t) to obtain the 

received vector (r c , r s ). 

(a) Fix some θ. Express the received vector (r c , r s ) as a signal vector (s c , s s ) plus a 

noise vector (n c , n s ). What is the signal vector (s c , s s ) as a function of θ? What are 

the statistical properties of the noise vector (n c , n s )? 

(b) The random variable can have the values +π/2 and −π/2 only and Pr{ = 

+π/2} = Pr{ = −π/2} = 1/2. What are now the possible signal vectors (s c , s s )? 

Give the decision variables as function of (r c , r s ) and specify 2 for what (r c , r s ) an 

optimum receiver decides ˆM = 2. 

(c) Give an expression for the error probability P E that is achieved by an optimum receiver. 

2 Hint: note that x = ln 5 satisfies 26 

5 

= exp(x) + exp(−x).

Chapter 17 

Colored noise 

SUMMARY: So far we have only considered additive white Gaussian noise. In this chapter 

we will see that we should use a whitening filter if the channel noise is nonwhite. We will 

discuss the corresponding receiver structures. 


n(t) 

m 

✲ 

transmitter 

s m (t) ✛❄ 

✘ 

✲ + ✲ receiver 

✚✙r(t) = s m (t) + n(t) 

ˆm 

✲ 

Figure 17.1: Communication over an additive waveform channel. The noise n(t) is nonwhite 

and Gaussian. 

Consider figure 17.1. There the transmitter sends, depending on the message index m, one of 

the signals (waveforms) s 1 (t), s 2 (t), · · · , s |M| (t) over a waveform channel. The channel adds 

wide-sense-stationary zero-mean non-white Gaussian noise n(t) with power spectral density 

S n ( f ) to this waveform 1 . Therefore the output of the channel r(t) = s m (t) + n(t). 

How should an optimum receiver process this output signal r(t) now? We will show next 

that we can use a so-called whitening filter to make the colored noise white. Then we proceed as 

usual. 

17.2 Another result on reversibility 

We could try to construct an optimum receiver in two steps (see figure 17.2). First an operation is 

performed on the channel output r(t). This yields a new channel output r o (t). Then we construct 

1 See appendix D. 

150

CHAPTER 17. COLORED NOISE 151 

s m (t) 

r(t) 

r o (t) 

✲ channel 

✲ operation 

✲ 

receiver 

ˆm 

✲ 

Figure 17.2: Insertion of an operation between channel output and receiver. 

an optimum receiver for the channel with input s m (t) and output r o (t). 

It will be clear that such a two-step procedure cannot result in a smaller average error probability 

P E than an optimum one-step procedure would achieve. 

If an inverse operation exist which permits r(t) to be reconstructed from r o (t) however, the 

theorem 4.3 of reversibility states that the average error probability need not be increased by a 

two-step procedure. The second step could in that case consist of the calculation of r(t) from 

r o (t), followed by the optimum receiver for r(t). The theorem of reversibility in its most general 

form states: 

RESULT 17.1 A reversible operation, transforming r(t) into one or more waveforms, may be 

inserted between the channel output and the receiver without affecting the minimum attainable 

average error probability. 

17.3 Additive nonwhite Gaussian noise 

Suppose that we use a filter with impulse response g(t) and transfer function G( f ) to change 

the power spectral density S n ( f ) of the Gaussian noise process N(t). Consider figure 17.3. The 

filter produces the Gaussian noise process N o (t) at its output. 

From equation (D.6) in appendix D we know that 

S n o( f ) = S n ( f )G( f )G ∗ ( f ) = S n ( f )|G( f )| 2 . (17.1) 

If we use the filter g(t) to whiten the noise, i.e. to produce noise at the output with power spectral 

density S n o( f ) = N 0 /2, the filter should be such that S n ( f )|G( f )| 2 = N 0 /2. In appendix I we 

have shown how to determine a realizable linear filter with transfer function G( f ), which has a 

realizable inverse and for which 

|G( f )| 2 = N 0 

2 

1 

S n ( f ) . (17.2) 

The method in the appendix is applicable if whenever S n ( f ) can be expressed (or approximated) 

as a ratio of two polynomials in f . We will work out an example next. 

n(t) 

✲ 

h(t) 

n o (t) 

✲ 

Figure 17.3: Filtering the noise process N(t).


Example 17.1 Consider (see figure 17.4) noise with power spectral density 

Observe that we can take as transfer function of our whitening filter 

S n ( f ) = f 2 + 4 

f 2 + 1 . (17.3) 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

0 

−10 −8 −6 −4 −2 0 2 4 6 8 10 

Figure 17.4: The power spectrum S n ( f ) = ( f 2 + 4)/( f 2 + 1) of the noise and the squared 

modulus ( f 2 + 1)/( f 2 + 4) of the transfer function G( f ) of the corresponding whitening filter. 

then both G( f ) and its inverse are realizable (see appendix I). 

G( f ) = f − j 

f − 2 j , (17.4) 

17.4 Receiver implementation 

17.4.1 Correlation type receiver 

When the whitening filter g(t) has been determined we can construct the optimum receiver. Note 

that we actually place the whitening filter at the channel output r(t) as in the left part of figure 

17.5 but this is equivalent to the circuit containing two filters g(t) as in the right part of figure 

17.5. Therefore the effect of applying the whitening filter g(t) is that we have transformed the 

channel into an additive white Gaussian noise waveform channel and that the signals s m (t) are 

changed by the whitening filter g(t) into the signals 

s o m (t) = s m(t) ∗ g(t) = 

∫ ∞ 

−∞ 

s m (α)g(t − α)dα for m ∈ M. (17.5)


n(t) 

❄ 

g(t) 

n(t) 

n w (t) 

s m (t) 

✛❄ 

✘ 

s m (t) 

✛❄ 

✘ 

✲ + ✲ g(t) ✲ 

✲ g(t) ✲ + ✲ 

✚ 

✙r(t) 

r o (t) 

s o ✚✙ 

m (t) r o (t) 

Figure 17.5: Two equivalent channel/whitening-filter circuits. 

This observation leads to the optimum receiver shown in figure 17.6. This receiver correlates 

the output r o (t) of the whitening filter g(t) with the signals sm o (t) for all m ∈ M. The signals 

sm o (t) are obtained at the output of filters with impulse response g(t) when s m(t) is input. Note 

that the constants cm o for m ∈ M are now 

c o m (t) = N 0 

2 ln Pr{M = m} − 1 2 

since the signal s m (t) has changed into s o m (t). 

17.4.2 Matched-filter type receiver 

∫ ∞ 

−∞ 

[s o m (t)]2 dt, (17.6) 

To determine the matched-filter type receiver here, we will do some frequency-domain investigations. 

Suppose that for some m ∈ M the Fourier spectrum of the waveform s m (t) is denoted by 

S m ( f ). Assume that s m (t) = 0 except for 0 ≤ t ≤ T . The transfer function of the corresponding 

matched filter h m (t) = s m (T − t) is then given by 

H m ( f ) = 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

s m (T − t) exp(− j2π f t)dt 

s m (α) exp(− j2π f (T − α))dα 

= exp(− j2π f T ) 

∫ ∞ 

−∞ 

s m (α) exp( j2π f α)dα = exp(− j2π f T )Sm ∗ ( f ), (17.7) 

where Sm ∗ ( f ) is the complex conjugate of S m( f ). 

Now what is the matched filter corresponding to the signal sm o (t). First note that the spectrum 

of sm o (t) is Sm o ( f ) = G( f )S m( f ). (17.8) 

Suppose that g(t) is only non-zero for 0 ≤ t ≤ T 1 . Moreover let all s m (t) be zero outside the 

time interval 0 ≤ t ≤ T 2 . Then s o m (t) = ∫ ∞ 

−∞ s m(α)g(t − α)dα can only be non-zero if an α


r(t) 

✲ 

g(t) 

r o (t) 

s 1 (t) 

✲ 

s 2 (t) 

✲ 

g(t) 

g(t) 

. 

s1 o(t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

s2 o(t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

∫ 

∫ 

c1 

o 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

c2 

o 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

select 

largest 

ˆm 

✲ 

s |M| (t) 

✲ 

g(t) 

s|M| o (t) 

✛❄ 

✘ 

✲ × ✲ 

✚✙ 

. 

∫ 

c|M| 

o 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

Figure 17.6: Correlation receiver for the additive nonwhite Gaussian noise channel (filteredreference 

receiver). 

exists such that 0 ≤ α ≤ T 2 and 0 ≤ t − α ≤ T 1 hence if 0 ≤ t ≤ T 1 + T 2 . The transfer function 

H o ( f ) of the filter matched to sm o (t) should therefore be 

H o m ( f ) = [So m ( f )]∗ exp(− j2π f (T 1 + T 2 )) 

= G ∗ ( f )S ∗ m ( f ) exp(− j2π f (T 1 + T 2 )) 

= G ∗ ( f ) exp(− j2π f T 1 )S ∗ m ( f ) exp(− j2π f T 2). (17.9) 

This corresponds to a cascade of the filter g(T 1 − t) followed by the filter s m (T 2 − t), one for 

each m ∈ M. Thus the filter matched to sm o (t) consists of a part matched to the channel (thus 

to the whitening filter) and a part that is matched to the signal s m (t). This leads to the receiver 

structure in figure 17.7. 

It is interesting to see what the front-end of this receiver does. This consists of the cascade of 

the whitening filter g(t) and the part of the matched filter matched to the channel (the whitening 

filter). The transfer function of this cascade is 

G( f )G ∗ ( f ) exp(− j2π f T 1 ) = |G( f )| 2 exp(− j2π f T 1 ) 

= N 0/2 

S n ( f ) exp(− j2π f T 1). (17.10) 

This cascade may be implemented as a single filter. This effect of this filter is to pass energy 

over frequency bands where the the noise power is small and to suppress energy over frequency 

bands where it is strong.


r(t) 

✲ 

g(t) 

r o (t) 

✲ 

g(T 1 − t) 

✲ 

✲ 

s 1 (T 2 − t) 

s 2 (T 2 − t) 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

c1 

o 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

c2 

o 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

select 

largest 

ˆm 

✲ 

. 

✲ 

s |M| (T 2 − t) 

❍ 

✄ ✄ 

✂ ❍❍ 

✁ ✂ ✁ 

c|M| 

o 

✛❄ 

✘ 

✲ + ✲ 

✚✙ 

❄ 

sample at t = T 1 + T 2 

Figure 17.7: Matched filter receiver for the additive nonwhite Gaussian noise channel (filteredsignal 

receiver). 


1. A communication system uses the {ϕ j (t)} shown in figure 17.8 to transmit one of two 

equally likely messages by means of the signals s 1 (t) = √ E s ϕ 1 (t), s 2 (t) = √ E s ϕ 2 (t). 

The channel is also illustrated in figure 17.8. 

(a) What is the probability of error for an optimum receiver when E s /N 0 = 5? 

(b) Give a detailed block diagram, including waveshapes in the absence of noise, of the 

optimum correlation receiver (filtered-signal receiver). 

(Exercise 7.1 from Wozencraft and Jacobs [25].)


✻ϕ 1 (t) 

1 

✻ϕ 2 (t) 

1 

✲ 

t 

1 

2 

✲ 

t 

n w (t) 

s(t) 

✲ 

h(t) 

s 0 (t) ✛❄ 

✘ 

✲ + 

✚✙ 

r(t) 

✲ 

1 

✻h(t) 

1/2 

4 

9/2 

✲ 

t 

Figure 17.8: Building blocks waveforms, communication system, and the channel impulse response 

h(t).

Chapter 18 

Capacity of the colored noise waveform 

channel 

SUMMARY: In this chapter we determine the capacity of an additive non-white Gaussian 

noise waveform channel. We have to find out how to distribute signal power over a number 

of parallel channels, each with their own noise power. These channels partition the 

spectrum in different frequency bands. It turns out that a ”water-filling” method achieves 

capacity. 

18.1 Problem statement 

transmitter 

n(t) 

s(t) ✛❄ ✘ 

✲ + 

✲ 

✚✙r(t) = s(t) + n(t) 

receiver 

Figure 18.1: Communication system based on an additive non-white Gaussian noise channel. 

Consider the communication system shown on figure 18.1. The channel adds stationary zeromean 

Gaussian noise to the input waveform s(t). The power spectral density S n ( f ) of the noise 

process N(t) is assumed to be non-white, see figure 18.2. Our problem is now to determine the 

capacity of this channel when the available transmitter power is P s ? We will solve this problem 

by decomposing the colored-noise channel into sub-channels. Each of these sub-channels is a 

bandpass channel. We then have to find out how the signal power P s is distributed over these 

sub-channels in an optimal way, i.e. to maximize the total capacity of all the sub-channels. 

157

CHAPTER 18. CAPACITY OF THE COLORED NOISE WAVEFORM CHANNEL 158 

❊❊ 

❊ 

❊ 

❊ 

❆ 

❆ 

❆ 

❅ 

❍❍ 

 

✟✟ 

✟✟ 

✻ S n ( f ) 

❍❍ 

❅ ❅❍ 

❍ 

✟✟ ✁ ✁ ✁✆✆✆✆✆ 

0 

✲ 

f 

Figure 18.2: Power spectral density S n ( f ) of colored noise. 

❊❊ 

❊ 

❊ 

❊ 

❆ 

............. 

❆ 

❆ 

............. 

❅ 

❍❍ 

............. 

✟✟ 

............. 

 

✟✟ 

............. 

✻S n ( f ) 

3 2 1 1 2 3 

............. ............. 

4 

............. 

✆ ✆✆✆ 

❍❍ 

............. 

............. 

❅ ❅❍ 

❍ ✟✟ ✁✁✁✆ ............. ............. 

✲ 

k 

−f 

0 

f 3f 

2f 

f 

Figure 18.3: Decomposition of our channel into sub-channels, each having bandwidth f . 

18.2 The sub-channels 

Suppose that we divide our channel into K sub-channels, each sub-channel being a bandpass 

channel (see figure 18.3). We assume that channel k has a passband which is ±[(k −1)f, kf ] 

for k = 1, K . For the moment we assume that f and K are suitably chosen. 

Assume that transmitter power used in sub-channel k is P k . The spectral density of the noise 

is not constant within the passband of a sub-channel. Therefore we make a staircase approximation 

of the spectral density (see figure 18.3). We assume that the spectral density of the noise over 

the entire passband of sub-channel k is S n ( f k ) where f k = (k − 1/2)f is the center-frequency 

of the passband. Now the capacity of the sub-channel k is 

C k = f log 2 

(1 + 

P k 

S n ( f k )2f 

) 

bit/second. (18.1) 

This a consequence of result 15.2. In the next section we investigate how we should distribute 

the signal power over all the sub-channels.


∼ N 1 

✛❄ 

✘ 

✲ + 

✚✙ 

✲ 

∼ P 1 

∼ N 2 

∼ P 2 

✛❄ 

✘ 

✲ + 

✚✙ 

✲ 

∼ P K 

. 

∼ N K 

✛❄ 

✘ 

✲ + 

✚✙ 

✲ 

Figure 18.4: K parallel channels with different noise powers. 

18.3 Capacity of parallel channels, water-filling 

Consider K parallel channels (see figure 18.4). Each channel has the same bandwidth f . 

We assume that the signal power used in channel k is P k . The noise power in that channel is 

N k = S n ( f k )2f . The total available signal power is P s hence we obtain the condition 

∑ 

P k ≤ P s . (18.2) 

k=1,K 

Our problem is to distribute the signal power P s over all the sub-channels such that the total 

capacity is maximal. In other words 

∑ 

( 

C = max f log 2 1 + P ) 

k 

, (18.3) 

P 1 ,P 2 ,··· ,P K 

N k 

k=1,K 

where condition (18.2) should be satisfied. We can solve this maximization problem using the 

following result. 

RESULT 18.1 (Gallager [7], p. 87). Let f (α) be a convex-∩ function of α = (α 1 , α 2 , · · · , α K ) 

over the region R when α is a probability vector. Assume that the partial derivatives ∂ F(α)/∂α k 

are defined and continuous over the region R with the possible exception that lim αk →0 ∂ F(α)/∂α k 

may be +∞. Then the conditions 

∂ f (α) 

= λ for all k such that α k > 0, 

∂α k 

(18.4) 

∂ f (α) 

≤ λ for all k such that α j = 0, 

∂α k 

(18.5)


are necessary and sufficient conditions on a probability vector α to maximize f (α) over the 

region R. 

To find the capacity of a collection of K parallel channels we take the function 

f (α) = ∑ 

k=1,K 

( 

f log 2 1 + α ) 

k P s 

, (18.6) 

N k 

i.e. the total capacity C. This is a convex-∩ function of the probability vectors α. R is the region 

of all these vectors. The partial derivatives of f (α) are 

∂ f (α) 

∂α j 

= f 

ln 2 · 

P s 

N k + α k P s 

. (18.7) 

Result 18.1 now implies that the probability vector α reaches a maximum if and only if 

N k + α k P s = L for all k such that α k > 0, 

N k ≥ L for all k such that α k = 0 (18.8) 

for some power level L. This suggest the following procedure: 

RESULT 18.2 The procedure is called ”water-filling” (see figure 18.5.) Before the water-filling 

N 4 

L 

α 1 P s α 2 P s α 3 P s 

N 3 

N 2 

N 1 

Figure 18.5: Water-filling with four parallel channels. 

procedure begins each channel is filled with noise-power only. Channel k contains noise power 

N k . Then we start pouring signal power into the channels. The signal power starts filling the 

channel with the smallest noise power first, i.e. channel 1 here. When water-level reaches the


second-smallest noise power level N 2 , and we keep pouring signal power into the channels, both 

channel channel 1 and 2 are being filled, etc. Observe that depending on the total amount of 

signal power some channels will not get signal power at all (channel 4 here). 

This procedure leads to the optimal distribution of signal power P k = α k P s over all the 

channels. From these powers the total capacity C can be computed. 

18.4 Back to the non-white waveform channel again 

Remember that we have decomposed the non-white waveform channel into K sub-channels. 

These sub-channels all have bandwidth f . We have seen how we should divide up the signal 

power over these sub-channels (water-filling procedure, result 18.2). 

Now instead of specifying the signal power in each sub-channel it has advantages to consider 

the spectral density of the signal power P s ( f k ) in each sub-channel k hence 

P s ( f k ) = 

P k 

2f 

= α k P s 

2f . (18.9) 

Moreover note that we can express the noise power in sub-channel as N k = S n ( f k )2f . Therefore 

a signal power distribution {P s ( f k ), k = 1, K } reaches the maximum if and only if 

S n ( f k ) + P s ( f k ) = L ′ for all k such that P s ( f k ) > 0, 

S n ( f k ) ≥ L ′ for all k such that P s ( f k ) = 0 (18.10) 

for some power level L ′ such that 

∑ 

k=1,K 

P s ( f k )2f = P s . (18.11) 

Note that L ′ relates to the power level L in the previous section as given by L ′ 2f = L. 

Now we are ready to let the bandwidth of the sub-channels f ↓ 0 and the number of 

sub-channels K → ∞. This leads to the main result in this chapter. 

RESULT 18.3 A spectral density function P s ( f ) of the signal power reaches the maximum if 

and only if 

for some power level L” such that 

S n ( f ) + P s ( f ) = L” for all f such that P s ( f ) > 0, 

S n ( f ) ≥ L” for all f such that P s ( f ) = 0 (18.12) 

∫ ∞ 

−∞ 

P s ( f )d f = P s . (18.13) 

This suggests a water-filling procedure in the spectral domain (see figure 18.6).


❊❊ S n ( f ) 

❊ 

❊ 

❊ 

❆ 

❆ 

❆ 

❅ 

❍❍ 

 

✟✟ 

✟✟ 

✻ 

❍❍ 

P s ( f ) 

❅ ❅❍ 

❍ ✟✟ ✁ ✁ ✁✆✆✆✆✆ ✲ 

L” 

0 

f 

where 

Figure 18.6: Water-filling in the spectral domain. 

For the capacity of the non-white waveform channel we can now write 

∫ 

1 

C = 

f ∈B 2 log 2 (1 + P s( f ) 

S n ( f ) )d f 

∫ 

1 

= 

2 log 2 ( L” 

)d f bit/second. (18.14) 

S n ( f ) 

f ∈B 

B = { f |P s ( f ) > 0}. (18.15) 

Note that L” is the power spectral density level that determines how the spectral power is distributed 

over the frequency spectrum. 

18.5 Capacity of the linear filter channel 

❄ 

g(t) 

n(t) 

channel 

n(t) 

s(t) 

✲ 

h(t) 

✛❄ 

✘ 

✲ + ✲ 

✚ 

✙r(t) 

g(t) 

r o (t) 

✲ 

s(t) 

✛ ❄ ✘r o (t) 

✲ + ✲ 

✚✙ 

Figure 18.7: Creating a flat channel changes the spectral density of the noise. 

Suppose that the channel filters the input signal s(t) with a filter with impulse response h(t) 

and then adds (possibly non-white) Gaussian noise n(t) to the filter output s(t) ∗ h(t). Then,


at the receiver side, the overal channel can be made flat by filtering the channel output r(t) = 

s(t) ∗ h(t) + n(t) with an equalizing filter with response g(t) that satisfies 

G( f )H( f ) = 1, (18.16) 

thus g(t) is the inverse 1 filter to h(t). The effect of this filter is however that also the noise n(t) 

is filtered by this equalizing filter, hence 

r o (t) = r(t) ∗ g(t) 

= (s(t) ∗ h(t) + n(t)) ∗ g(t) 

= s(t) ∗ h(t) ∗ g(t) + n(t) ∗ g(t) = s(t) + n(t) ∗ g(t). (18.17) 

The power spectral density of the noise n o (t) = n(t) ∗ g(t) in r o (t) is now given by 

S n o( f ) = S n ( f )‖G( f )‖ 2 = S n( f ) 

‖H( f )‖ 2 . (18.18) 

Note that from the previous sections we know how to determine the capacity of this channel now. 


1. Determine the capacity C of the channel with noise spectrum S n ( f ) = | f | + 1 depicted in 

figure 18.8 as a function of the available signal power P s . 

✻S n ( f ) 

❅ 

 

❅ 

❅ 

❅ 

| f | + 1 

❅ 

❅ 

❅ 1 

✲ 

0 

f 

Figure 18.8: A noise spectrum S n ( f ). 

2. Consider a waveform channel with additive non-white Gaussian noise. The output waveform 

r(t) = s(t) + n(t) where s(t) is the input waveform and n(t) the noise. 

The power spectral density of the noise is given by S n ( f ) = exp(| f |) for −∞ < f < ∞ 

(see figure 18.9). Determine the signal power P s as a function of the capacity C of this 

channel. As usual C is expressed in bit per second. 

(Exam July 4, 2001.) 

1 We assume here that this inverse filter is realizable. Therefore the theorem of reversibility 4.3 applies.


8 

7 

6 

5 

S n 

(f) 

4 

3 

2 

1 

0 

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 

f 

Figure 18.9: Power spectral density of the noise S n ( f ) = exp(| f |). 

3. Consider a waveform channel with additive non-white Gaussian noise. The output waveform 

r(t) = s(t) + n(t) where s(t) is the input waveform and n(t) the noise. 

The power spectral density of the noise (see figure 18.10) is given by 

⎧ 

1 for | f | ≤ 1, 

⎪⎨ 

2 for 1 < | f | ≤ 2, 

S n ( f ) = 

4 for 2 < | f | ≤ 3, 

⎪⎩ 

∞ otherwise. 

(18.19) 

9 

8 

7 

6 

5 

S n 

(f) 

4 

3 

2 

1 

0 

−4 −3 −2 −1 0 1 2 3 4 

f 

Figure 18.10: Power spectral density of the noise S n ( f ) in Watt/Hz as a function of the frequency 

f in Hz. 

(Exam July 5, 2002.)

Part V 

Coded Modulation 

165

Chapter 19 

Coding for bandlimited channels 

SUMMARY: Modulation makes it possible to transmit sequences of symbols over a waveform 

channel. Here we will investigate coding, i.e. we will study which sequences of symbols 

should be used to achieve efficient transmission. We study only coding methods, i.e. techniques 

that create distance between sequences. Shaping techniques, i.e. methods that aim 

at achieving spherical signal structures will not be discussed. 

19.1 AWGN channel 

Consider transmission over an additive white Gaussian noise (AWGN) channel. For this channel 

n i 

p N (n i ) = 1 √ π N0 

exp(− n2 i 

N 0 

) 

s i 

✛❄ 

✘r i = s i + n i 

✲ + ✲ 

✚✙ 

Figure 19.1: Additive white Gaussian noise channel. 

the i-th channel output r i is equal to the i-th input s i to which noise n i is added, hence 

r i = s i + n i , for i = 1, 2, · · · . (19.1) 

We assume that the noise variables N i are independent and normally distributed, with mean 0 

and variance σ 2 = N 0 /2. 

This AWGN-channel models transmission over a waveform channel with additive white 

Gaussian noise with power density S Nw ( f ) = N 0 /2 for −∞ < f < ∞ as we have seen 

before (see section 6.8). 

Note that i is the index of the dimensions. If e.g. the waveform channel has bandwidth W , 

we can get a new dimension every 1 

2W 

seconds (see chapter 14). This determines the relation 

between the actual time t and the dimension index i but we will ignore this in what follows. 

166

CHAPTER 19. CODING FOR BANDLIMITED CHANNELS 167 

In this chapter we call a dimension a transmission. We assume that the energy “invested” in 

each transmission (dimension) is E N . It is our objective to investigate coding techniques for this 

channel. First we consider uncoded transmission. 

19.2 |A|-Pulse amplitude modulation 

In this section we will consider pulse amplitude modulation (PAM) again. In chapter 14 the modulation 

properties of this technique were discussed. Here we will investigate the error behavior 

of this method which can be regarded as uncoded transmission. We will use (uncoded) PAM as 

a reference for evaluating alternative signaling methods, methods that do involve coding. 

In the case of PAM, the channel input symbols s i for i = 1, 2, · · · assume values in the 

symbol alphabet (or signal structure or signal constellation) 

scaled by a factor d 0 /2. We write 

A = {−|A| + 1, −|A| + 3, · · · , |A| − 1}, (19.2) 

s i = d 0 

2 a i where a i ∈ A. (19.3) 

We will here call a i an elementary PAM-symbol. 

The constellation A is called an |A|-PAM constellation. It contains |A| equidistant elementary 

symbols centered on the origin and having minimum Euclidean distance 2. Therefore d 0 is 

the minimum Euclidean distance between the symbols s i . We assume in this chapter that |A| is 

even. In figure 19.2 an |A|-PAM constellation is shown. 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

 

−|A| + 1 · · · -5 -3 -1 1 3 5 · · · |A| − 1 

Figure 19.2: An |A|-PAM constellation (|A| even) with the |A| elementary symbols. 

Each symbol value has a probability of occurrence of 1/|A|. Therefore the information rate 

per transmission is 

R N = log 2 |A| bits per transmission. (19.4) 

For the average symbol energy E N per transmission we now have that 

E N = d2 0 

4 E pam (19.5) 

where E pam is the elementary energy of an |A|-PAM constellation which is defined as 

E pam 

= 

1 

|A| 

∑ 

a=−|A|+1,−|A|+3,··· ,|A|−1 

a 2 . (19.6)


RESULT 19.1 In appendix J it is shown that 

The result holds both for even and odd |A|. 

E pam = |A|2 − 1 

. (19.7) 

3 

It now follows from the above result that, since s i = (d 0 /2)a i , the average symbol energy 

E N = d2 0 

4 

Example 19.1 Consider the 4-PAM constellation 

|A| 2 − 1 

. (19.8) 

3 

A = {−3, −1, +1, +3}. (19.9) 

The rate corresponding to this constellation is R N = log 2 4 = 2 bits per transmission. For the elementary 

energy per transmission we obtain 

E pam = 1 4 ((−3)2 + (−1) 2 + (+1) 2 + (+3) 2 ) = 42 − 1 

3 

= 5. (19.10) 

Apart from the rate R N and the average symbol energy E N , a third important parameter is 

the symbol error probability PE s. This error probability is Q( d 0/2 

σ 

) for the outer two signal points 

and 2Q( d 0/2 

σ 

) for the inner signal points. Therefore 

P s E 

= 

2(|A| − 2) + 2 

|A| 

Q( d 0/2 

σ 

) = 

2(|A| − 1) 

|A| 

Q( d 0/2 

). (19.11) 

σ 

By (19.8) and the fact that σ 2 = N 0 /2, the square of the argument of the Q-function can be 

written as 

d0 

2 

4σ 2 = 3 E N 

|A| 2 − 1 N 0 /2 . (19.12) 

Therefore we can express the symbol error probability as 

(√ 

) 

PE s 2(|A| − 1) 3 E N 

= Q 

|A| |A| 2 = 

− 1 N 0 /2 

(√ 

2(|A| − 1) 

Q 

|A| 

) 

3 

|A| 2 − 1 S/N , (19.13) 

if we note that the signal-to-noise ratio S/N = E N /(N 0 /2), i.e. the average symbol energy 

divided by the variance of the noise in a transmission. 

19.3 Normalized S/N 

We now want to look more closely at signal-to-noise ratios. Therefore note that reliable transmission 

is only possible if the rate R N (in bits per transmission, dimension) is smaller than the


capacity C N per dimension. If we write the capacity per transmission (see 12.19) as a function 

of S/N = E N /(N 0 /2) we obtain 

R N < C N = 1 2 log 2 (1 + S/N ). (19.14) 

Rewriting this inequality in the reverse direction way we obtain the following lower bound for 

S/N as a function of the rate R N 

S/N > 2 2R N 

− 1 = S/N min . (19.15) 

Definition 19.1 This immediately suggests defining the normalized signal-to-noise ratio as 

S/N norm 

= 

S/N 

S/N min 

= 

S/N 

2 2R N − 1 . (19.16) 

This definition allows us to compare coding methods with different rates with each other. 

Now it follows from (19.15) that for reliable transmission S/N norm > 1 must hold. Good 

signaling methods achieve small error probabilities for S/N norm close to 1 while lousy methods 

need a larger S/N norm to realize reliable transmission. It is the objective of the communication 

engineer to design systems that achieve acceptable error probabilities for a S/N norm as close as 

possible to 1. 

19.4 Baseline performance in terms of S/N norm 

In a previous section we have determined the symbol error probability of |A|-PAM as a function 

of S/N . This symbol error probability can be expressed in terms of S/N norm in a very simple 

way as we see soon. 

Since we are considering bandlimited transmission in this chapter, we may assume that 

S/N ≫ 1. We are aiming at achieving 

log |A| = R N ≈ C N = 1 2 log 2 (1 + S/N ) (19.17) 

or |A| ≈ √ 1 + S/N and hence we make the assumption that also |A| ≫ 1. Therefore we may 

conclude for the symbol error probability for |A|-PAM that 

(√ 

) 

PE s 3 

≈ 2Q 

|A| 2 − 1 S/N 

(√ 

) 

3 

(√ ) 

= 2Q 

2 2R N − 1 S/N = 2Q 3S/N norm (19.18) 

by definition 19.1 of the normalized signal-to-noise ratio. This symbol error probability for PAM 

is depicted in figure 19.3. It will serve as reference performance for the alternative methods that 

will be investigated soon. We call it therefore the baseline performance. Note once more that the 

baseline performance corresponds to uncoded PAM transmission.


10 0 SNRnorm 

10 −1 

10 −2 

10 −3 

10 −4 

10 −5 

10 −6 

−1 0 1 2 3 4 5 6 7 8 9 10 

Figure 19.3: Average symbol error probability P s E versus S/N norm for uncoded |A|-PAM, under 

the assumption that |A| is large. Horizontally S/N norm in dB, vertically log 10 P s E . 

19.5 The promise of coding 

It can be observed from figure 19.3 that a S/N norm of approximately 9 dB is needed to achieve 

a reliable performance when PAM is used. It is more or less common practice to assume that 

P s E ≈ 10−6 corresponds to reliable transmission. 

Shannon’s results imply that reliable transmission is possible for S/N norm ≈ 1. Therefore 

for |A|-PAM the S/N norm is roughly 9 dB, a factor of eight, larger than necessary. There exist 

methods that perform up to 9 dB better than our baseline method, i.e. up to 9 dB better than 

uncoded PAM. 

Why do we prefer small signal-to-noise ratios? In a radio transmission system a smaller S/N 

gives the opportunity to use less transmitter power or to apply simpler and/or smaller antennas. 

Using less transmitter power makes smaller batteries possible, results in a smaller bill from the 

electricity distribution company, or makes dissipation in circuits lower, etc. In line transmission 

using smaller transmitter power is also advantageous for the same reasons. Moreover we have 

the effect of cross-talk between different users. We can not combat the noise that results from 

this cross-talk by increasing the transmitter power. In this case it is better to achieve a better error 

performance given the S/N that occurs there.


19.6 Union bound 

19.6.1 Sequence error probability 

In what follows we will investigate several coding methods and evaluate their performance. Since 

we want to compare the performance of the proposed codes with the baseline performance it 

would be nice to know the symbol error probabilities of the proposed codes. But since it is not 

clear yet what this means, we consider the (sequence) error probability P E of a coding method 

first. The union bound can be used to give a good estimate of this error probability. This bound 

is based on the evaluating the pair-wise error probabilities between pairs of coded sequences. 

On the AWGN channel the received sequence r = s + n is the sum of the transmitted coded 

sequence (signal) s and the noise vector n consisting of N independent components each having 

mean zero and variance σ 2 = N 0 /2. Now consider two coded sequences s ∈ S and s ′ ∈ S 

where S is the set of sequences (signals). They differ by the Euclidean distance ‖s − s ′ ‖. If the 

sequence s is transmitted, the probability that the received vector r = s + n will be closer to s ′ 

than to s is given by 

( ‖s − s ′ ) 

‖/2 

Q 

. (19.19) 

σ 

The probability that r will be closer to s ′ than to s for any s ′ ̸= s in S which is precisely the 

probability of error P E (s) for minimum distance decoding when s is sent, is therefore upperbounded 

by 

P E (s) ≤ ∑ ( ‖s − s ′ ) 

‖/2 

Q 

. (19.20) 

s ′̸=s σ 

The error probability over all coded sequences s is now upper-bounded by 

P E = ∑ 1 

|S| P E(s) ≤ ∑ 1 ∑ 

( ‖s − s ′ ) 

‖/2 

Q 

= ∑ ( ) d/2 

K d Q , (19.21) 

|S| 

s∈S 

s∈S s ′̸=s σ 

σ 

d 

where 

K d = ∑ s∈S 

# of s ′ such that ‖s − s ′ ‖ = d 

, (19.22) 

|S| 

i.e. the average number of signals at distance d from a coded sequence. Inequality (19.21) is 

referred to as the union bound for the sequence error probability. 

The function Q(x) decays exponentially not slower than exp(−x 2 /2). This follows from 

appendix A. Therefore, when K d does not increase too rapidly with d, for small σ , the union 

bound estimate is dominated by its first term, which is called the union bound estimate 

( ) 

dmin /2 

P E ≈ K min Q , (19.23) 

σ 

where d min is the minimum Euclidean distance between any two sequences in S and K min the 

average number of signals at distance d min from a coded sequence. K min is called the error 

coefficient.


19.6.2 Symbol error probability 

To compare a coded system with the uncoded baseline system, we would like to know, instead 

of the sequence error probability P E for this coded system, the symbol error probability P s E . 

A first problem now appears: What are the symbols? Usually in a coded system a bunch of 

bits is transmitted in a single sequence during N transmissions. In the baseline system however, 

a symbol is the amount of information that is sent in a single transmission. Therefore in the 

coded case we assume also that in each transmission a symbol is sent. Such an abstract symbol 

corresponds to a fraction 1/N of the information bits. Thus the coded sequence contains N of 

these symbols. 

Can we now express the sequence error probability P E in terms of the error probability ˜P s E 

for these symbols 1 and vice versa? Now we get a second problem: How do these symbols-errors 

depend on each other? To make life easy we assume that the N symbols-errors are independent 

of each other. Now we easily obtain 

or 

P E = 1 − (1 − ˜P s E )N ≈ N ˜P s E (19.24) 

˜P s E ≈ P E/N. (19.25) 

If we now use the union bound estimate (19.23) obtained in the previous subsection we get 

˜P E s ≈ P E/N ≈ K ( ) 

min 

N Q dmin /2 

. (19.26) 

σ 

The ratio K min /N is called the normalized error coefficient (see [6], p. 2399). The symbol error 

probability ˜P 

E s which is also referred to as sequence error probability per transmission can now 

be used for comparison with the uncoded PAM case. 

19.7 Extended Hamming codes 

To construct sets of coded sequences we can make use of a binary error-correcting code. Such a 

code C is a set of |C| codewords {c 1 , c 2 , · · · , c |C| }. Each codeword c is a sequence consisting of 

N symbols c 1 , c 2 , · · · , c N taking values in the alphabet {0, 1}. 

A code can be linear. In that case all codewords are linear combinations of K independent 

codewords 2 . Such a linear code contains 2 K codewords. 

Definition 19.2 The minimum Hamming distance of a code is defined as 

d H,min 

= min 

c̸=c ′ d H(c, c ′ ), (19.27) 

where d H (c, c ′ ) is the Hamming distance between two codewords c and c ′ , i.e. the number of 

positions at which they differ. 

1 We write ˜P E s with a tilde since we are not dealing with concrete symbols. 

2 The sum c of two codewords c ′ and c ′′ is the component-wise mod 2 sum of both codewords. A codeword c 

times a scalar is the codeword c itself if the scalar is 1 or the all-zero codeword 0, 0, · · · , 0 if the scalar is 0.


So-called extended Hamming codes exist and have parameters 

(N, K, d H,min ) = (2 q , 2 q − q − 1, 4), for q = 2, 3, · · · . (19.28) 

We will use these codes in the next sections of this chapter, just because it is convenient to do so. 

Example 19.2 For q = 3 we obtain a code with parameters (N, K, d H,min ) = (8, 4, 4). Indeed this code 

can be constructed from the following K = 4 independent codewords of length N = 8: 

c 1 

= (1, 0, 0, 0, 0, 1, 1, 1), 

c 2 

= (0, 1, 0, 0, 1, 0, 1, 1), 

c 3 

= (0, 0, 1, 0, 1, 1, 0, 1), 

c 4 

= (0, 0, 0, 1, 1, 1, 1, 0). (19.29) 

The code consists of the 16 linear combinations of these 4 codewords. Why is the minimum Hamming 

distance d H,min of this code 4? To see this, note that the difference of two codewords c ̸= c ′ is a non-zero 

codeword itself. The Hamming distance between two codewords is the number of ones i.e. the Hamming 

weight of this difference. The minimal Hamming weight of a nonzero codeword is at least 4. This can be 

checked easily. 

19.8 Coding by set-partitioning 

19.8.1 Construction of the signal set 

The idea of “mapping by set-partitioning” comes from Ungerboeck [22]. He proposed to partition 

the constellation A into subsets A 0 and A 1 in such a way that the minimum distance of 

the elementary symbols in the subsets becomes larger than that of the elementary symbols in 

the original constellation. E.g. an 8-PAM constellation A can be partitioned into a subset A 0 

✄ 

✂ ✁ 

-7 

A 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

 

-5 -3 -1 1 3 5 7 

A 0 A 1 

✄ ✂ 

✁ 

✏✮ 

✏ ✏✏✏✏✏ 

 

 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

-7 -3 1 5 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

 

-5 -1 3 7 

Figure 19.4: Partitioning of an 8-PAM set into two subsets with four elementary symbols having 

both a twice as large minimum within-subset distance than the original PAM-set. 

and a subset A 1 both containing four elementary symbols, and such that the minimum distance 

between two elementary symbols in A 0 or two symbols in A 1 , i.e. the minimum within-subset


distance, is 4, which is twice as large as the minimum distance between two elementary symbols 

in A, which is 2 (see figure 19.4). Note that the minimum distance between any symbol from A 0 

and any symbol from A 1 is still 2. 

How can set-partitioning lead to more reliable transmission? Therefore we consider the 

example-encoder shown in figure 19.5. This encoder transforms 20 binary information digits 

b 1 , b 2 , · · · , b 20 into 8 elementary 8-PAM symbols a 1 , a 2 , · · · , a 8 . The digits b 1 , b 2 , b 3 , b 4 are 

b 1 

b 2 

b 3 

b 4 

(8,4,4) 

c 1 

b 5 

b 8 

b 19 tab 

b 20 

a 8 

c 8 

tab 

b 6 

c 2 

a 1 

b 7 

tab 

a 2 

. 

. . 

Figure 19.5: An encoder that maps by set-partitioning. 

. 

transformed into a codeword c 1 , c 2 , · · · , c 8 from an (8,4,4) extended Hamming code. Digit c i 

together with two information digits b 2i+3 and b 2i+4 are used to form the PAM symbol a i . Therefore 

the following table can be used. The table lists a i for all combinations of c i and b 2i+3 , b 2i+4 . 

c i b 2i+3 , b 2i+4 00 01 11 10 

0 −7 −3 +1 +5 

1 −5 −1 +3 +7 

Note that according to this table essentially one out of the four elementary symbols from the set 

A 0 is chosen if c i = 0 or one out the four elementary symbols from A 1 if c i = 1. The binary 

digits b 2i+3 and b 2i+4 determine which symbol is picked. 

19.8.2 Distance between the signals 

How about the minimum distance d min between the signals s = s 1 , s 2 , · · · , s 8 that is achieved in 

this way? Note that there are 2 20 such signals. Just like before each signal (vector) 

s = (d 0 /2)a, (19.30) 

thus each signal s is a scaled version of an elementary signal (vector) a = a 1 , a 2 , · · · , a 8 . To determine 

the minimum Euclidean distance between two signals s ̸= s ′ we determine the minimum 

distance between two elementary signals a ̸= a ′ first. We consider two cases.


1. First consider all 2 16 elementary sequences a that result from a fixed codeword c from 

the extended Hamming code. For such a sequence each component a i ∈ A ci for i = 

1, 2, · · · , 8. The within-subset distance in subsets A 0 and A 1 is 4. Therefore the minimum 

distance between two elementary sequences a and a ′ resulting from the same codeword c 

is 4. 

2. Next consider two elementary sequences a and a ′ that correspond to two different codewords 

c and c ′ from the extended Hamming code. Then there are at least d H,min = 4 

positions i ∈ {1, 2, · · · , 8} at which c i ̸= c 

i ′ . Note that at these positions the elementary 

symbol a i ∈ A ci while the symbol a 

i ′ ∈ A c i ′ . The minimum distance between any symbol 

from A 0 and any symbol from A 1 is 2 and thus at these positions |a i − a 

i ′ | ≥ 2 and hence 

‖a − a ′ ‖ 2 = ∑ 

(a i − a i ′ )2 ≥ 4 · 4 = 16. (19.31) 

i=1,8 

Therefore the distance between two elementary sequences a and a ′ that correspond to two 

different codewords c and c ′ is at least 4. 

We conclude that the Euclidean distance between two elementary sequences a and a ′ is at least 

4. Therefore the Euclidean distance between two sequences s and s ′ is at least 4(d 0 /2) = 2d 0 . 

Hence d min = 2d 0 , which is a factor of two larger than in the uncoded case. 

19.8.3 Asymptotic coding gain 

We have seen in the previous section that the minimum distance between any two coded sequences 

for the Ungerboeck coding method d min = 2d 0 . In uncoded PAM, our baseline method, 

the distance between any two symbols was at least d 0 . We may conclude that by using the idea 

of Ungerboeck the minimum Euclidean distance has increased by a factor of two. 

Let us now compute the sequence error probability for our Ungerboeck coding technique. Is it 

indeed smaller than the baseline behavior for the symbol error probability for the uncoded PAM 

case that was described by (19.18)? Therefore we first note that by the union bound estimate 

(19.23) we can write for our coding method that 

( ) 

dmin /2 

P E ≈ K min Q . (19.32) 

σ 

Now we express P E in terms of E N by noting that again 

E N = d2 0 

4 

|A| 2 − 1 

, (19.33) 

3 

where it is important to note that although we use coding, the PAM symbols remain equiprobable. 

Moreover 

dmin 

2 

4σ 2 = d2 min 

d0 

2 

d0 

2 4σ 2 = d2 min 3 E N 

d0 

2 |A| 2 − 1 N 0 /2 = d2 min 3 

d0 

2 |A| 2 S/N . (19.34) 

− 1


Therefore 

P E = K min Q 

(√ 

) (√ 

) 

dmin 

2 3 

d0 

2 |A| 2 − 1 S/N dmin 

2 2 

= K min Q 

2R N − 1 

d0 

2 |A| 2 − 1 3S/N norm , (19.35) 

where R N is the rate of the signal set S, i.e. 

R N = log 2 |S| 

N 

= K N + log 2 |A|/2 

= 2q − q − 1 

2 q + log 2 |A| − 1 = log 2 |A| − q + 1 

2 q . (19.36) 

Note that the rate R N approaches log 2 |A| for q → ∞, i.e. if we increase the complexity of the 

extended Hamming code. 

The argument of the Q( √·)-function in terms of S/N norm was 3S/N norm in the uncoded 

PAM case. If we use Ungerboeck coding as we described here, it is d2 min 

d0 

2 

we have achieved an increase of the argument by a factor G asymp 

cod 

of 

G asymp 

cod 

= d2 min 

d 2 0 

2 2R N −1 

|A| 2 −1 3S/N norm, hence 

2 2R N 

− 1 

|A| 2 − 1 = 4 · 22R N 

− 1 

|A| 2 − 1 . (19.37) 

This factor is called the asymptotic (or nominal) coding gain of our coding system with respect 

to the baseline performance. 

The factor of dmin 2 /d2 0 

= 4 corresponds to the increase of the squared Euclidean distance 

between the sequences. This factor is referred to as distance gain. This increase of the distance 

however was achieved since we have introduced redundancy by using an error-correcting code. 

The factor (2 2R N 

− 1)/(|A| 2 − 1) that corresponds to this loss is therefore called redundancy 

loss. 

Example 19.3 The encoder shown in figure 19.5 is based on the (8, 4, 4) extended Hamming code. The 

rate of this extended Hamming, corresponding to q = 3, is 

R ext.Hamm. = K N = 2q − q − 1 

= 23 − 3 − 1 

= 1/2. (19.38) 

2 q 2 3 

Therefore the total rate R N = R ext.Hamm. + 2 = 5/2 bits/transmission. The additional 2 bits come from the 

fact that the subsets A 0 and A 1 each contain four symbols. 

The distance gain of this method 

dmin 

2 

d0 

2 = 4, (19.39) 

which is 10 log 10 4 = 6.02dB. The redundancy loss however is 

2 2R N 

− 1 22·5/2 

|A| 2 − 1 = − 1 

= 31/63 (19.40) 

8 2 − 1 

which is 10 log 10 31/63 = −3.08dB. Therefore the asymptotic coding gain for this code is 

G asymp 

cod 

= 10 log 10 4 · 31 = 6.02dB − 3.08dB = 2.94dB. (19.41) 

63


19.8.4 Effective coding gain 

In the previous subsection we have considered only the asymptotic coding gain, i.e. we have 

only concentrated on the argument of the Q-function (in the expressions of both P E and ˜P s E ) 

and ignored the coefficient of it. How large are these coefficients? It can be shown that for the 

extended Hamming code with parameter q 

K min (q) = 

( 2 q 

4 

) 

+ (2 q − 1) ( 2 q−1 ) 

2 

2 q (19.42) 

is the number of codewords at Hamming distance 4 from a codeword. We have already seen that 

the minimum Euclidean distance of our Ungerboeck code is 2d 0 . How many sequences have 

distance 2d 0 to a given sequence, in other words how large is our K min ? Therefore suppose that 

a sequence s is a result of extended Hamming codeword c. Note first that there are at most 2N 

signals s ′ that result from the same codeword c having distance 2d 0 from s. Moreover for each 

extended Hamming codeword c ′ at Hamming distance 4 from c there are at most 2 4 sequence s ′ 

having distance 2d 0 from s. Therefore in total 

K min ≤ 2N + 2 4 K min (q). (19.43) 

Note that the normalized error coefficient is K min /N. This normalized error coefficient 

K min /N is larger than the coefficient 2 in the uncoded case. What is the effect of this larger 

coefficient in dB? In other words, how much do we have to increase the S/N norm , to compensate 

for the larger K min /N? Note that we are interested in achieving a symbol error probability 

of 10 −6 . To achieve the baseline performance the argument α of Q( √·) has to satisfy 

2Q( √ α) = 10 −6 (19.44) 

in other words α = 23.928. For an alternative coding methods with normalized error coefficient 

K min /N we need an argument β that satisfies 

K min 

N Q(√ β) = 10 −6 , (19.45) 

or a S/N norm that is a factor β/α larger, to get the same performance. Therefore the coding loss 

L that we have to take into account is 

L = 10 log 10 

β 

α . (19.46) 

In practice it turns out that if the normalized error coefficient K min /N is increased by a factor of 

two, we have to accept an additional loss of roughly 0.2dB (at error rate 10 −6 ). 

The resulting coding gain, called the effective coding gain, is the difference of the asymptotic 

coding gain and the loss, hence 

G eff 

cod = Gasymp cod 

− L = G asymp 

cod 

− 10 log 10 

β 

α . (19.47)


4 

R N 

(bits/transm.) 

6 

Asympt. cod. gain (dB) 

3 

5 

4 

2 

3 

1 

2 

1 

0 

2 3 4 5 6 

0 

2 3 4 5 6 

3000 

K min 

/N 

6 

Effect. cod. gain (dB) 

2500 

5 

2000 

4 

1500 

3 

1000 

2 

500 

1 

0 

2 3 4 5 6 

0 

2 3 4 5 6 

Figure 19.6: Rate, asymptotic coding gain (in dB), normalized error coefficient, and effective 

coding gain (in dB) versus q. 

Example 19.4 Again consider the encoder shown in figure 19.5 which is based on the (8, 4, 4) extended 

Hamming code. We can show that K min (q) i.e. the number of “nearest neighbors” in the extended Hamming 

code, for q = 3, is 

( 2 3) + (2 3 − 1) ( ) 

2 3−1 

4 

K min (3) = 

= 14. (19.48) 

2 3 

This results in the following bound for the error coefficient in the set S of coded sequences 

2 

K min ≤ 2N + 2 4 K min (3) = 2 · 8 + 16 · 14 = 240. (19.49) 

The normalized error coefficient K min /N is therefore at most 240/8 = 30. Now we have to solve 

K min 

N Q(√ β) = 30 · Q( √ β) = 10 −6 , (19.50) 

It turns out that β = 29.159 and thus the loss is 10 log 10 (29.159/23.928) = 0.86dB. Therefore the 

effective coding gain G eff 

cod = Gasymp cod 

− 0.86dB = 2.94dB − 0.86dB = 2.08dB. 

19.8.5 Coding gains for more complex codes 

For the values 2, 3, · · · , 6 of the parameter q that determines the extended Hamming code we 

have computed rates, normalized error coefficients, and asymptotic and effective coding gains of 

our Ungerboeck codes. These data can be found in the table below.


q N R N G asymp 

cod 

K min /N L G eff 

cod 

2 4 2.250 1.38 dB 6 0.37 1.01 dB 

3 8 2.500 2.94 dB 30 0.86 2.08 dB 

4 16 2.687 4.10 dB 142 1.29 2.81 dB 

5 32 2.813 4.87 dB 622 1.66 3.21 dB 

6 64 2.891 5.35 dB 2606 1.99 3.36 dB 

The results from the table are also shown in figure 19.6. It can be observed that the rate R N of 

our code tends to 3 bits/transmission if q increases. Moreover we see that the asymptotic coding 

gain approaches 6dB. The effective coding gain is considerably smaller however. The reason for 

this is that the normalized error coefficient increases very quickly with q. 

19.9 Remark 

What we did not do is the following. To keep things simple we have only discussed onedimensional 

signal structures. Ungerboeck’s method of set-partitioning also applies to two or 

more-dimensional structures. 


1. Consider instead of the extended Hamming code a so-called single-parity-check code of 

length 4. This code contains 8 codewords and has distance d H,min = 2. A codeword 

now has four binary components c 1 , c 2 , c 3 , c 4 and is determined by three information bits 

b 1 , b 2 , b 3 . 

✘✘✘✘✘ ✘ ✘✘❳ ❳ ❳❳❳❳❳ 

✘✘✘ ✘ ❳ 

A ✲ A 

❳❳❳❳3 

0 

A 1 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

3 ✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✄ 

3 ✂ ✂ ✁ 

✄ 

✁ 

3 ✂ ✁ 

 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

1 ✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

1 ✂ ✁ 

✄ 

✂ ✁ 

1 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 1 

✂ ✁ 

✄ 3 

✂ ✁ 

✄ 

✂ ✁ 

1 

✄ 3 

✂ ✁ 

 

✄ 

✂ ✁ 

✄ 1 3 

✂ ✁ 

 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

✄ 

✂ ✁ 

Figure 19.7: Partitioning a 16-QAM constellation into two subsets. 

As before we use set-partitioning, but now we partition a 16-QAM constellation into two 

subsets A 0 and A 1 (see figure). Note that the signals in these sets are two-dimensional.


Each component c i , i = 1, 2, 3, 4 of the chosen codeword c determines the used subset 

A ci , i.e. if c i = 0 then A 0 is used, if c i = 1 we use A 1 . Three binary digits 

b 3i+1 , b 3i+2 , b 3i+3 then determine which one of the (two-dimensional) signal points 

a 2i−1 , a 2i from the subset A ci is transmitted. Note that N = 8, since four two-dimensional 

signals are transmitted. 

As before the elementary signal a = a 1 , a 2 , · · · , a 8 is multiplied by d 0 /2 to get the actual 

signal s. 

(a) What is the minimum Euclidean distance d min now? What is the rate R N of this code 

in bits per transmission? Give the asymptotic coding gain of this code (relative to the 

baseline performance).

Part VI 

Appendices 

181

Appendix A 

An upper bound for the Q-function 

Note that the Q-function is defined as 

Q(x) = 

∫ ∞ 

x 

1 

√ exp(− α2 

)dα. (A.1) 

2π 2 

The lemma below gives a useful upper bound for Q(x). In figure A.1 the Q-function and the 

derived upper bound are plotted. 

LEMMA A.1 For x ≥ 0 the Q-function is bounded as 

Q(x) ≤ 1 2 

Proof: Note that for α ≥ x, and since x ≥ 0, 

Therefore 

Q(x) = 

≤ 

x2 

exp(− ). (A.2) 

2 

α 2 = (α − x) 2 + 2αx − x 2 

∫ ∞ 

x 

∫ ∞ 

x 

≥ (α − x) 2 + 2x 2 − x 2 

= (α − x) 2 + x 2 . (A.3) 

1 

√ 

2π 

exp(− α2 

2 )dα 

1 

√ exp(− (α − x)2 + x 2 

)dα 

2π 2 

= exp(− x2 

2 ) ∫ ∞ 

= 1 2 

x 

1 

√ 

2π 

exp(− 

(α − x)2 

)dα 

2 

x2 

exp(− ). (A.4) 

2 

✷ 

182

APPENDIX A. AN UPPER BOUND FOR THE Q-FUNCTION 183 

10 0 Figure A.1: The Q-function and the derived upper bound. 

10 −1 

10 −2 

10 −3 

10 −4 

10 −5 

10 −6 

10 −7 

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Appendix B 

The Fourier transform 

B.1 Definition 

THEOREM B.1 If the signal x(t) satisfies the Dirichlet conditions, that is, 

1. The signal x(t) is absolutely integrable on the real line, i.e. ∫ ∞ 

−∞ 

|x(t)|dt < ∞. 

2. The number of maxima and minima of x(t) in any finite interval on the real line is finite. 

3. The number of discontinuities of x(t) in any finite interval on the real line is finite. 

4. When x(t) is discontinuous at t then x(t) = x(t+ )+x(t − ) 

2 

, 

then the Fourier transform (or Fourier integral) of x(t), defined by 

X ( f ) = 

∫ ∞ 

−∞ 

x(t) exp(− j2π f t)dt 

(B.1) 

exists and the original signal can be obtained from its Fourier transform by 

x(t) = 

∫ ∞ 

−∞ 

X ( f ) exp( j2π f t)d f. 

(B.2) 

B.2 Some properties 

B.2.1 Parseval’s relation 

RESULT B.2 If the Fourier transforms of the signals f (t) and g(t) are denoted by F( f ) and 

G( f ) respectively, then 

∫ ∞ 

−∞ 

f (t)g(t)dt = 

∫ ∞ 

−∞ 

F( f )G ∗ ( f )d f. (B.3) 

184

APPENDIX B. THE FOURIER TRANSFORM 185 

Proof: Note that for a real signal g(t) the Fourier transform G( f ) satisfies G(− f ) = G ∗ ( f ). 

Then 

∫ ∞ 

∫ ∞ 

[∫ ∞ 

] [∫ ∞ 

] 

f (t)g(t)dt = 

F( f ) exp( j2π f t)d f G( f ) exp( j2π f t)d f dt 

−∞ 

−∞ −∞ 

−∞ 

∫ ∞ 

[∫ ∞ 

] [∫ ∞ 

] 

= 

F( f ) exp( j2π f t)d f G ∗ ( f ′ ) exp(− j2π f ′ t)d f ′ dt 

We know that 

therefore 

∫ ∞ 

−∞ 

= 

−∞ 

∫ ∞ 

−∞ 

−∞ 

[∫ ∞ 

[∫ ∞ 

F( f ) G ∗ ( f ′ ) 

∫ ∞ 

−∞ 

f (t)g(t)dt = 

−∞ 

−∞ 

exp( j2πt( f − f ′ ))dt 

−∞ 

exp( j2πt( f − f ′ ))dt = δ( f − f ′ ), 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

[∫ ∞ 

] 

F( f ) G ∗ ( f ′ )δ( f − f ′ )d f ′ d f 

−∞ 

] 

d f ′ ] 

d f. 

(B.4) 

(B.5) 

F( f )G ∗ ( f )d f. (B.6) 

✷

Appendix C 

Impulse signal, filters 

C.1 The impulse or delta signal 

In mathematical sense (see e.g. [17]) the delta signal δ(t) is not a funtion but a distribution or a 

generalized function. A distribution is defined in terms of its effect on another function (usually 

called “test function”). The impulse distribution can be defined by its effect on the test function 

f (t), which is supposed to be continuous at the origin, by the relation 

∫ ∞ 

−∞ 

f (t)δ(t)dt = f (0). 

(C.1) 

We call this property the sifting property of the impulse signal. Note that we defined the impulse 

signal δ(t) by desribing its action on the test function f (t) and not by specifying its value for 

different values of t. 

C.2 Linear time-invariant systems, filters 

Definition C.1 A system L is linear if and only if for any two legitimate input signals x 1 (t) 

and x 2 (t) and any two scalars α 1 and α 2 , the linear combination α 1 x 1 (t) + α 2 x 2 (t) is again a 

legitimate input, and 

L[α 1 x 1 (t) + α 2 x 2 (t)] = α 1 L[x 1 (t)] + α 2 L[x 2 (t)]. 

(C.2) 

A system that does not satisfy this relation is nonlinear. 

Definition C.2 The impulse response h(t) of a system L is the response of the system to an 

impulse input δ(t), thus 

h(t) = L[δ(t)]. 

(C.3) 

The response of the time-invariant system time-invariant system to a unit response applied at 

time τ, i.e. δ(t − τ) is obviously h(t − τ). 

186

APPENDIX C. IMPULSE SIGNAL, FILTERS 187 

Now we can determine the response of a linear time-invariant system L to an input signal 

x(t) as follows: 

y(t) = L[x(t)] 

= L[ 

= 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

x(τ)δ(t − τ)dτ] 

x(τ)L[δ(t − τ)]dτ 

x(τ)h(t − τ)dτ. 

(C.4) 

The final integral is called the convolution of the signal x(t) and the impulse response h(t). 

A linear time-invariant system is often called a filter.

Appendix D 

Correlation functions, power spectra 

We will investigate here what the influence is of a filter with impulse response h(t) on the spectrum 

S x ( f ) of the input random process X (t). Then we study the consequences of our findings. 

Consider figure D.1. The filter produces the random process Y (t) at its output while the input 

process is X (t). 

x(t) 

✲ 

h(t) 

y(t) 

✲ 

Figure D.1: Filtering the noise process X (t). 

D.1 Expectation of an integral 

For the expected value of the random process Y (t) we can write 1 

∫ ∞ 

m y (t) = E[Y (t)] = E[ 

= 

−∞ 

∫ ∞ 

−∞ 

The autocorrelation function of the random process is 

R y (t, s) = E[Y (t)Y (s)] = E[ 

= 

= 

∫ ∞ 

−∞ 

∫ ∞ ∫ ∞ 

−∞ −∞ 

∫ ∞ ∫ ∞ 

−∞ 

X (α)h(t − α)dα] 

E[X (α)]h(t − α)dα. 

X (α)h(t − α)dα 

−∞ 

1 Note that we do not assume (yet) that the process X (t) is Gaussian. 

∫ ∞ 

−∞ 

X (β)h(s − β)dβ] 

E[X (α)X (β)]h(t − α)h(s − β)dαdβ 

R x (α, β)h(t − α)h(s − β)dαdβ. 

(D.1) 

(D.2) 

188

APPENDIX D. CORRELATION FUNCTIONS, POWER SPECTRA 189 

D.2 Power spectrum 

To study the effect of filtering a random process we assume that R x (t, s) = R x (τ) with τ = 

t − s, i.e. the autocorrelation function R x (t, s) of the input random process depends only on the 

difference in time t − s between the sample times t and s. If this condition is satisfied we can 

investigate the distribution of the mean power of a random process as a function of frequency. 

Now 

R y (t, s) = 

= 

∫ ∞ ∫ ∞ 

−∞ −∞ 

∫ ∞ ∫ ∞ 

−∞ 

−∞ 

R x (α − β)h(t − α)h(s − β)dαdβ 

R z (t − s + µ − ν)h(ν)h(µ)dνdµ, 

(D.3) 

with ν = t − α and µ = s − β. Observe that now also R y (t, s) only depends on the time 

difference τ = t − s hence R y (t, s) = R y (τ). 

Next consider the Fourier transforms S x ( f ) and S y ( f ) of R x (τ) and R y (τ) respectively (see 

appendix B): 

S x ( f ) = 

S y ( f ) = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

R x (τ) exp(− j2π f τ)dτ 

R y (τ) exp(− j2π f τ)dτ 

and R x (τ) = ∫ ∞ 

−∞ S x( f ) exp( j2π f τ)d f 

and R y (τ) = ∫ ∞ 

−∞ S y( f ) exp( j2π f τ)d f. 

(D.4) 

Note that these Fourier transforms can only be defined if the correlation functions depend only 

on the time-difference τ = t − s. Next we obtain 

R y (τ) = 

= 

= 

= 

hence 

∫ ∞ ∫ ∞ 

−∞ −∞ 

∫ ∞ ∫ ∞ ∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

−∞ 

[ 

S x ( f )[ 

R x (τ + µ − ν)h(ν)h(µ)dνdµ 

−∞ 

∫ ∞ 

−∞ 

S x ( f ) exp( j2π f (τ + µ − ν)d f ]h(ν)h(µ)dνdµ 

h(ν) exp(− j2π f ν)dν][ 

S x ( f )H( f )H ∗ ( f ) exp( j2π f τ)d f, 

D.3 Interpretation 

∫ ∞ 

−∞ 

S y ( f ) = S x ( f )H( f )H ∗ ( f ) = S x ( f )|H( f )| 2 . 

h(µ) exp(− j2π f µ)dµ] exp( j2π f τ)d f 

(D.5) 

(D.6) 

First note that the mean square value of the filter output process Y (t) is time-independent if 

R x (t, s) = R x (t − s). This follows from 

E[Y 2 (t)] = R y (t, t) = R y (t − t) = R y (0). 

(D.7)


✻ H( f ) 

1 

− f 2 

− f 1 0 f 1 f 2 

f 

✲ 

Figure D.2: An ideal bandpass filter. 

Therefore R y (0) is the expected value of the power that is dissipated in a resistor of 1 connected 

to the output of the filter, at any time instant. Next note that 

R y (0) = 

∫ ∞ 

−∞ 

S y ( f )d f = 

∫ ∞ 

−∞ 

S x ( f )|H( f )| 2 d f. 

(D.8) 

Consider a filter with transfer function H( f ) shown in figure D.2. This is a bandpass filter and 

H( f ) = 

{ 1 for f1 ≤ | f | ≤ f 2 , 

0 elsewhere. 

(D.9) 

Then we obtain 

R y (0) = 

∫ − f1 

− f 2 

S y ( f )d f + 

∫ f2 

f 1 

S y ( f )d f. (D.10) 

Now let f 1 = f and f 2 = f + f . The filter H( f ) now only tranfers components of X (t) 

in the frequency band ( f, f + f ) and stops all other components. Since a power spectrum 

is always even, see section D.6, the expected power at the output of the filter is approximately 

2S x ( f )f . This implies that S x ( f ) is the distribution of average power in the process X (t) over 

the frequencies. Therefore we call S x ( f ) the power spectral density function of X (t). 

D.4 Wide-sense stationarity 

Definition D.1 A random process Z(t) is called wide-sense stationary (WSS) if and only if 

m z (t) = constant 

and R z (t, s) = R z (t − s). (D.11) 

For a WSS process Z(t) it is meaningful to consider its power spectral density S z ( f ). For 

a WSS process Y (t) the expected power can be expressed as in equation (D.7). When a WSS 

process X (t) is the input of a filter then the filter output process Y (t) is also WSS and its power 

spectral density S y ( f ) is related to the input power spectral density S x ( f ) as given by (D.8). 

These consequences already justify the concept wide-sense stationarity.


D.5 Gaussian processes 

Recall that a Gaussian process Z(t) is completely specified by its mean m z (t) and autocorrelation 

function R z (t, s) for all t and s. Therefore a WSS (wide-sense stationary) Gaussian process Z(t) 

is also (strict-sense) stationary. 

Furthermore note that when a Gaussian process X (t) is the input of a filter then the filter 

output process Y (t) is also Gaussian. Remember that the output process is WSS when input 

process is WSS. 

D.6 Properties of S x ( f ) and R x (τ) 

Let X (t) be a WSS random process again. 

a) First observe that R x (τ) is a (real) even function of τ. This follows from 

R x (−τ) = E[X (t − τ)X (t)] = E[X (t)X (t − τ)] = R x (τ). 

(D.12) 

b) Next we show that S x ( f ) is a real even function of f . Since R x (τ) is an even funtion and 

sin(2π f τ) an odd function of τ 

Therefore 

S x ( f ) = 

= 

= 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

R x (τ) sin(2π f τ)dτ = 0. 

R x (τ) exp(−2π f τ)dτ 

R x (τ)[cos(2π f τ) − j sin(2π f τ)]dτ 

R x (τ) cos(2π f τ)dτ, 

(D.13) 

(D.14) 

which is even and real. 

c) The power spectral density S x ( f ) is a non-negative function of the frequency f . To see 

why this statement is true suppose that S x ( f ) is negative for f 1 < | f | < f 2 . Then consider an 

ideal bandpass filter H( f ) as given by (D.9). The expected power of the output Y (t) of this filter 

when the input is the random process X (t) is given by 

E[Y 2 (t)] = R y (0) = 2 

∫ f2 

f 1 

S x ( f )d f < 0, (D.15) 

which yields a contradiction. 

d) Without proof we give another property of R x (τ). It can be shown that 

|R x (τ)| ≤ R x (0). (D.16)

Appendix E 

Gram-Schmidt procedure, proof, example 

We will only prove result 6.2 for signal sets with signals s m (t) ̸≡ 0 for all m ∈ M. The statement 

then also should hold for sets that do contain all-zero signals. 

E.1 Proof of result 6.2 

The proof is based in induction. 

1. First we will show that the induction hypothesis holds for one signal. Therefore consider 

s 1 (t) and take 

ϕ 1 (t) = s ∫ T 

1(t) 

√ with E s1 = s1 2 (t)dt. 

(E.1) 

Es1 0 

Note that by doing so 

∫ T 

0 

ϕ 2 1 (t)dt = ∫ T 

0 

1 

s1 2 (t)dt = 1, 

E (E.2) 

s1 

i.e. ϕ 1 (t) has unit energy (is normal). Since s 1 (t) = √ E s1 ϕ 1 (t) the coefficient s 11 = √ E s1 . 

2. Now suppose that induction hypothesis holds for m − 1 ≥ 1 signals. Thus there exists an 

orthonormal basis ϕ 1 (t), ϕ 2 (t), · · · , ϕ n−1 (t) for the signals s 1 (t), s 2 (t), · · · , s m−1 (t) with 

n − 1 ≤ m − 1. Now consider the next signal s m (t) and an auxiliary signal θ m (t) which is 

defined as follows: 

θ m (t) = s m (t) − 

∑ 

s mi ϕ i (t) 

(E.3) 

with 

s mi = 

∫ T 

We can distinguish between two cases now: 

0 

i=1,n−1 

s m (t)ϕ i (t)dt for i = 1, n − 1. 

(E.4) 

(a) If θ m (t) ≡ 0 then s m (t) = ∑ i=1,n−1 s miϕ i (t) and the induction hypothesis also holds 

for m signals in this case. 

192

APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 193 

(b) If on the other hand θ m (t) ̸≡ 0 then take a new building-block waveform 

By doing so 

ϕ n (t) = θ ∫ T 

m(t) 

√ with E θm = θm 2 (t)dt. 

Eθm 0 

∫ T 

0 

ϕ 2 n (t)dt = ∫ T 

0 

1 

θm 2 (t)dt = 1, 

E (E.6) 

θm 

i.e. also ϕ n (t) has unit energy. Moreover for all i = 1, n − 1 

(E.5) 

∫ T 

0 

ϕ n (t)ϕ i (t)dt = 

= 

= 

= 

∫ T 

0 

1 

√ 

Eθm 

θ m (t)ϕ i (t)dt 

⎛ 

1 

√ ⎝ 

Eθm 

∫ T 

0 

⎛ 

1 

√ ⎝s mi − 

Eθm 

1 

√ 

Eθm 

(s mi − 

s m (t)ϕ i (t)dt − 

∑ 

j=1,n−1 

∑ 

j=1,n−1 

∫ T 

0 

∑ 

j=1,n−1 

⎞ 

∫ T 

s mj ϕ j (t)ϕ i (t)dt⎠ 

0 

s mj δ ji ) = 0, 

⎞ 

s mj ϕ j (t)ϕ i (t)dt⎠ 

(E.7) 

and therefore ϕ n (t) is orthogonal to all ϕ i (t). 

Note that now 

s m (t) = θ m (t) + ∑ 

s mi ϕ i (t) 

i=1,n−1 

= √ E θm ϕ n (t) + ∑ 

i=1,n−1 

s mi ϕ i (t) = ∑ 

i=1,n 

s mi ϕ i (t) 

(E.8) 

with s mn = √ E θm . Thus also in this case for m signals there exists an orthonormal 

basis with n ≤ m building-block waveforms. Therefore the induction hypothesis also 

holds for m signals. 

It should be noted that, when using the Gram-Schmidt procedure, any ordering of the 

signals other than s 1 (t), s 2 (t), · · · , s |M| (t) will yield a basis, i.e. a set of building-block 

waveforms, of the smallest possible dimensionality (see exercise 1 in chapter 6), however 

in general with different building-block waveforms. 

E.2 An example 

We will next discuss an example in which we actually carry out the Gram-Schmidt procedure for 

a given set of waveforms.


3 

s 1 (t) 

3 

ϕ 1 (t) 

1 

t 

1/ √ 3 

t 

-1 

1 2 3 

−1/ √ 3 

1 2 3 

-3 

-3 

3 

s 2 (t) 

3 

θ 2 (t) 

3 

ϕ 2 (t) 

1 

t 

1 

t 

1/ √ 2 

t 

-1 

1 2 3 -1 1 2 3 

1 2 3 

-3 

-3 

-3 

3 

s 3 (t) 

1 

-1 

1 2 

3 

t 

θ 3 (t) ≡ 0 

-3 

Figure E.1: An example of the Gram-Schmidt procedure. 

Example E.1 Consider the three waveforms s m (t), m = 1, 2, 3, shown in figure E.1. Note that T = 3. 

We first determine from the energy E 1 of the first signal s 1 (t) 

E 1 = 4 + 4 + 4 = 12. 

(E.9) 

Therefore the first building-block waveform ϕ 1 (t) is 

ϕ 1 (t) = s 1(t) 

√ 

E1 

= s 1(t) 

2 √ 3 , and thus s 11 = 2 √ 3. (E.10) 

Now we continue with the second signal s 2 (t). First we determine the projection s 21 of s 2 (t) on the first 

dimension i.e. the dimension that corresponds to ϕ 1 (t): 

s 21 = 

∫ T 

0 

s 2 (t)ϕ 1 (t)dt = 1 √ 

3 

(−1 − 3 + 1) = − √ 3. 

(E.11)


s 2 

s 1 

− √ 2 

ϕ 2 

❏❪ 

❏ 

2 √ 2 

❏ 

❏ 

❏ 

√ 

2 

√ 

− 3 

❏ 

❏ 

❏ 

❏ 

✡ 

✡ 

✡ 

✡ 

✡ 

✡ 

✡ 

s 3 −2 √ 2 

✡ 

√ 

3 

2 √ 3 

✲ 

ϕ 1 

Figure E.2: A vector representation of the signals s m (t), m = 1, 2, 3, of figure E.1. 

hence the auxiliary signal θ 2 (t) is now 

θ 2 (t) = s 2 (t) + √ 3ϕ 1 (t). 

(E.12) 

Next we determine the energy E θ2 of the auxiliary signal θ 2 (t): 

E θ2 = 4 + 4 = 8, 

(E.13) 

and the second building-block waveform ϕ 2 (t) is 

Now we obtain for the signal s 2 (t): 

ϕ 2 (t) = θ 2(t) 

√ 

Eθ2 

s 2 (t) = θ 2 (t) − √ 3ϕ 1 (t) 

= θ 2(t) 

2 √ 2 . (E.14) 

= 2 √ 2ϕ 2 (t) − √ 3ϕ 1 (t), thus s 22 = 2 √ 2. (E.15) 

For the third signal s 3 (t) we can now determine the projections s 31 and s 32 and the auxiliary signal θ 3 (t): 

s 31 = 

s 32 = 

∫ T 

0 

∫ T 

0 

s 3 (t)ϕ 1 (t)dt = 1 √ 

3 

(−1 + 1 − 3) = − √ 3 

s 3 (t)ϕ 2 (t)dt = 1 √ 

2 

(−1 − 3) = −2 √ 2 hence 

θ 3 (t) = s 3 (t) + √ 3ϕ 1 (t) + 2 √ 2ϕ 2 (t) ≡ 0. (E.16) 

Since this auxiliary signal is always 0 we can express the third signal s 3 t) in terms of ϕ 1 (t) and ϕ 2 (t) as 

follows: 

s 3 (t) = − √ 3ϕ 1 (t) − 2 √ 2ϕ 2 (t), 

(E.17)


hence the coefficients corresponding to s 3 (t) are 

s 31 = − √ 3 and 

s 32 = −2 √ 2. (E.18) 

We have now obtained three vectors of coefficients, one for each signal: 

s 1 

= (s 11 , s 12 ) = (2 √ 3, 0), 

s 2 

= (s 21 , s 22 ) = (− √ 3, 2 √ 2), and 

s 3 

= (s 31 , s 32 ) = (− √ 3, −2 √ 2). (E.19) 

These vectors are depicted in figure E.2.

Appendix F 

Schwarz inequality 

LEMMA F.1 (Schwarz inequality) For two finite-energy waveforms a(t) and b(t) the inequality 

(∫ ∞ 

2 ∫ ∞ ∫ ∞ 

a(t)b(t)dt) 

≤ a 2 (t)dt b 2 (t)dt 

(F.1) 

−∞ 

−∞ 

−∞ 

holds. Equality is obtained only if b(t) ≡ Ca(t) for some constant C. 

Proof: Form an orthonormal expansion for a(t) and b(t), i.e. 

a(t) = a 1 φ 1 (t) + a 2 φ 2 (t), 

b(t) = b 1 φ 1 (t) + b 2 φ 2 (t), (F.2) 

with ∫ ∞ 

−∞ φ i(t)φ j (t)dt = δ i j for i = 1, 2 and j = 1, 2. 

Then with a = (a 1 , a 2 ) and b = (b 1 , b 2 ) and the Parseval results (a · b) = ∫ ∞ 

−∞ a(t)b(t)dt, 

(a · a) = ∫ ∞ 

−∞ a2 (t)dt, and (b · b) = ∫ ∞ 

−∞ b2 (t)dt, we find for the vectors a and b that 

(a · b) 

‖a‖‖b‖ = 

∫ ∞ 

−∞ 

√ a(t)b(t)dt 

∫ √ 

∞ ∫ . (F.3) 

∞ 

−∞ a2 (t)dt 

−∞ b2 (t)dt 

If we now can prove that (a · b) 2 ≤ ‖a‖ 2 ‖b‖ 2 we get Schwarz inequality. 

We start by stating the triangle inequality for a and b: 

‖a + b‖ ≤ ‖a‖ + ‖b‖. 

(F.4) 

Moreover 

hence 

and therefore 

‖a + b‖ 2 = (a + b) 2 = ‖a‖ 2 + ‖b‖ 2 + 2(a · b) 

(F.5) 

‖a‖ 2 + ‖b‖ 2 + 2(a · b) ≤ ‖a‖ 2 + ‖b‖ 2 + 2‖a‖‖b‖, 

(F.6) 

(a · b) ≤ ‖a‖‖b‖. (F.7) 

197

APPENDIX F. SCHWARZ INEQUALITY 198 

From this it also follows that 

−(a · b) = (a · (−b)) ≤ ‖a‖‖ − b‖ = ‖a‖‖b‖ 

(F.8) 

or 

(a · b) ≥ −‖a‖‖b‖. (F.9) 

We therefore may conclude that −‖a‖‖b‖ ≤ (a · b) ≤ ‖a‖‖b‖ or (a · b) 2 ≤ ‖a‖ 2 ‖b‖ 2 and this 

proves Schwarz inequality. 

Equality in Schwarz inequality is achieved only when equality in the triangle inequality is 

obtained. This the case when (first part of proof) b = Ca or (second part) −b = (−b) = Ca for 

some non-negative C. 

✷

Appendix G 

Bound error probability orthogonal 

signaling 

Note that the correct probability satisfies 

P C = 

∫ ∞ 

−∞ 

p(µ − b)(1 − Q(µ)) |M|−1 dµ 

(G.1) 

where b = √ 2E s /N 0 , hence 

P E = 1 − P C = 

∫ ∞ 

−∞ 

p(µ − b)[1 − (1 − Q(µ)) |M|−1 ]dµ. 

(G.2) 

The term in square brackets is the probability that at least one of |M| − 1 noise components 

exceeds µ. By the union bound this probability is not larger than the sum of the probabilities that 

individual components exceed µ, thus 

1 − (1 − Q(µ)) |M|−1 ≤ (|M| − 1)Q(µ) ≤ |M|Q(µ). (G.3) 

Moreover, since this term is a probability we can also write 

1 − (1 − Q(µ)) |M|−1 ≤ 1. (G.4) 

When µ is small Q(µ) is large and then the unity bound is tight. On the other hand for large µ 

the bound |M|Q(µ) is tighter. Therefore we split the integration range into two parts, (−∞, a) 

and (a, ∞): 

∫ a 

∫ ∞ 

P E ≤ p(µ − b)dµ + |M| p(µ − b)Q(µ)dµ. 

(G.5) 

−∞ 

a 

If we take a ≥ 0 we can use the upper bound Q(µ) ≤ exp(−µ 2 /2), see the proof in appendix 

A, and we obtain 

∫ a 

∫ ∞ 

P E ≤ p(µ − b)dµ + |M| p(µ − b) exp(− µ2 

)dµ. (G.6) 

−∞ 

a 

2 

199

APPENDIX G. BOUND ERROR PROBABILITY ORTHOGONAL SIGNALING 200 

If we denote the first integral by P 1 and the second by P 2 , we get 

P E ≤ P 1 + |M|P 2 . 

(G.7) 

To minimize the bound we choose a such that the derivative of (G.6) with respect to a is equal 

to 0, i.e. 

This results in 

0 = d 

da [P 1 + |M|P 2 ] 

= p(a − b) − |M|p(a − b) exp(− a2 

). (G.8) 

2 

exp( a2 

) = M. (G.9) 

2 

Therefore the value a = √ 2 ln |M| achieves a (at least a local) minimum in P E . Note that a ≥ 0 

as was required. Now we can consider the separate terms. For the first term we can write 

P 1 = 

= 

∫ a 

−∞ 

∫ a−b 

−∞ 

= Q(b − a) 

1 

√ 

2π 

exp(− 

(µ − b)2 

)dµ 

2 

1 

√ exp(− γ 2 

2π 2 )dγ 

(∗) (b − a)2 

≤ exp(− ) if 0 ≤ a ≤ b. (G.10) 

2 

Here the inequality (∗) follows from appendix A. For the second term we get 

∫ ∞ 

1 (µ − b)2 

P 2 = √ exp(− ) exp(− µ2 

a 2π 2 

2 )dµ 

∫ 

(∗∗) 

= exp(− b2 ∞ 

4 ) 1 

√ exp(−(µ − b/2) 2 )dµ 

2π 

= exp(− b2 

4 ) 1 √ 

2 

∫ ∞ 

= exp(− b2 

4 ) 1 √ 

2 

∫ ∞ 

a 

a 

a √ 2 

1 

√ 

2π 

exp(− 

1 

√ 

2π 

exp(− 

( 

µ √ 2 − (b/2) √ ) 2 

2 

2 

( 

γ − (b/2) √ 2 

2 

) 2 

)dγ 

)dµ √ 2 

= exp(− b2 

4 ) 1 √ 

2 

Q(a √ 2 − (b/2) √ 2). (G.11)

APPENDIX G. BOUND ERROR PROBABILITY ORTHOGONAL SIGNALING 201 

Here the equality (∗∗) follows from 

From (G.11) we get 

1 

2 (µ − b)2 + µ2 

2 

= µ2 

2 

b2 

− µb + 

2 + µ2 

2 

= µ 2 − µb + b2 

4 + b2 

4 

= (µ − b 2 )2 + b2 

4 . (G.12) 

{ exp(−b 

P 2 ≤ 

2 /4) 0 ≤ a ≤ b/2 

exp(−b 2 /4 − (a − b/2) 2 ) a ≥ b/2. 

(G.13) 

If we now collect the bounds (G.10) and (G.13) and substitute the optimal value of a found in 

(G.9) we get 

P E ≤ 

{ exp(−(b − a) 2 /2) + exp(a 2 /2) exp(−b 2 /4) 0 ≤ a < b/2 

exp(−(b − a) 2 /2) + exp(a 2 /2) exp(−b 2 /4 − (a − b/2) 2 ) b/2 ≤ a ≤ b. 

(G.14) 

Now we note that 

(a − b) 2 ( b 

2 

) 

− 

2 4 − a2 

= 

2 

( 

a − b 2) 2 

≥ 0. (G.15) 

Therefore for 0 ≤ a < b/2 the first term is not larger than the second one while for b/2 ≤ a ≤ b 

both terms are equal. This leads to 

P E ≤ 

{ 2 exp(−b 2 /4 + a 2 /2) 0 ≤ a < b/2 

2 exp(−(b − a) 2 /2) b/2 ≤ a ≤ b. 

(G.16) 

Now we rewrite both a an b as 

a = √ 2 ln |M| = √ 2 ln 2 · √log 

2 |M| 

b = √ 2E s /N 0 = √ 2E b /N 0 · √log 

2 |M|, (G.17) 

where we used the following definition for E b i.e. the energy per transmitted bit of information 

E b 

= 

E s 

log 2 |M| . 

(G.18) 

This finally leads to 

P E ≤ 

{ 2 exp(− log2 |M|[E b /(2N 0 ) − ln 2]) E b /N 0 ≥ 4 ln 2 

2 exp(− log 2 |M|[ √ E b /N 0 − √ ln 2] 2 ) ln 2 ≤ E b /N 0 ≤ 4 ln 2. 

(G.19)

Appendix H 

Decibels 

Definition H.1 An energy ratio X > 0 can be expressed in decibels as follows: 

X| in dB 

= 10 log10 X. (H.1) 

It is easy to work with decibels since by doing so we transform multiplications into additions. 

Table H.1 shows for some energy ratios X the equivalent ratio in dB. 

Note that 1 dB = 0.1 Bel. 

X X| in dB 

0.01 -20.0 dB 

0.1 -10.0 dB 

1.0 0.0 dB 

2.0 3.0 dB 

3.0 4.8 dB 

5.0 7.0 dB 

10.0 10.0 dB 

100.0 20.0 dB 

Table H.1: Decibels. 

202

Appendix I 

Whitening filters 

Suppose that the power spectral density of the noise is rational, and can be written as 

S n ( f ) = k N( f ) 

D( f ) . (I.1) 

Here the numerator polynomial N( f ) is assumed to have degree n, its complex roots are ν 1 , ν 2 , 

· · · , ν n , the coefficient of f n is one. The roots of N( f ) are the zeros of S n ( f ). We assume that 

the denominator polynomial D( f ) has degree d, its complex roots are π 1 , π 2 , · · · , π d , and the 

coefficient of f d is one. The roots of D( f ) are the poles of S n ( f ). Finally k is some constant. 

Note that if S n ( f ) is not rational it can be approximated as a rational function of f . 

A power density function is real and even (see appendix D). Therefore the zeros and poles 

of S n ( f ) come in pairs, i.e. each zero ν = a + jb has a counterpart ν ′ = a − jb and each pole 

π = a + jb has a counterpart π ′ = a − jb (see figure I.1). Note that both n and d are even. 

To construct the whitening filter we number the roots (poles and zeros) of S n ( f ) in such a 

way that odd-numbered roots have a positive imaginary part and the even-numbered roots have 

a negative imaginary part. Now we define 

N pos ( f ) = ( f − ν 1 )( f − ν 3 ) · · · ( f − ν n−1 ), 

N neg ( f ) = ( f − ν 2 )( f − ν 4 ) · · · ( f − ν n ), 

D pos ( f ) = ( f − π 1 )( f − π 3 ) · · · ( f − π d−1 ), 

D neg ( f ) = ( f − π 2 )( f − π 4 ) · · · ( f − π d ). (I.2) 

The whitening filter G( f ) can now be specified by 

G( f ) = 

√ 

N0 /2 

k 

D pos ( f ) 

N pos ( f ) , 

(I.3) 

thus the numerator of G( f ) corresponds to the poles of S n ( f ) with a positive imaginary part and 

the denominator of G( f ) corresponds to the zeros of S n ( f ) with a positive imaginary part. 

203

APPENDIX I. WHITENING FILTERS 204 

✻jβ 

❅ 

✄ 

✂ ✁ 

ν 1 

❅ 

π 1 

✄ 

✂ ✁ν 3 

❅ 

π 3 

α 

✲ 

❅ 

✄ 

✂ ✁ 

ν 2 

π 6 

❅ 

π 5 

π 4 

π 2 

✄ 

✂ ✁ν 4 

❅ 

Figure I.1: Poles and zeros of S n ( f ). 

How does this filter change the spectral density of the noise? The power spectral density at 

the filter output is S n ( f )|G( f )| 2 and 

|G( f )| 2 = G( f )G ∗ ( f ) = N 0/2 

k 

= N 0/2 

k 

= N 0/2 

k 

D pos ( f ) Dpos ∗ ( f ) 

N pos ( f ) Npos ∗ ( f ) 

D pos ( f ) D neg ( f ) 

N pos ( f ) N neg ( f ) 

D( f ) 

N( f ) = N 0/2 

S n ( f ) , 

(I.4) 

thus the output power spectral density S n o( f ) = N 0 /2, i.e. the output process is white Gaussian 

noise. 

We next have to verify that both filters G( f ) and G −1 ( f ) = 1/G( f ) are realizable. The 

transfer functions of both these filters contain only factors [ f − (a + jb)] with b ≥ 0. Let 

s = j2π f denote the complex frequency then we can express such a factor as 

[s/j2π − (a + jb)] = 1 [s + 2πb − 2 jπa]. 

j2π (I.5) 

Note that the roots of these factors are s = −2πb + 2 jπa and thus have non-positive real part. 

Therefore none of the poles of both filters occur in the right-half s-plane. Consequently both 

G( f ) and G −1 ( f ) are realizable and our hence whitening filter G( f ) is reversible.

Appendix J 

Elementary energy E pam of an |A|-PAM 

constellation 

The elementary energy E pam of an |A|-PAM constellation was defined as 

E pam 

= 

1 

|A| 

∑ 

s=−|A|+1,−|A|+3,··· ,|A|−1 

s 2 . 

(J.1) 

We want to express E pam in terms of the number of symbols |A|. We will only consider the case 

where |A| is even. A similar treatment can be given for odd |A|. For even |A| 

By induction we will show that now 

E pam = 2 

|A| 

∑ 

k=1,H 

∑ 

k=1,|A|/2 

(2k − 1) 2 = 4H 3 − H 

3 

(2k − 1) 2 . (J.2) 

. (J.3) 

Note that the induction hypothesis holds for H = 1. Next suppose that it holds for H = h ≥ 1. 

Now it turns out that 

∑ 

(2k − 1) 2 = ∑ 

(2k − 1) 2 + (2h + 1) 2 

k=1,h+1 

k=1,h 

= 4h3 − h 

3 

+ (2h + 1) 2 

= 4h3 − h + 12h 2 + 12h + 3 

3 

= 4h3 + 12h 2 + 12h + 4 − h − 1 

3 

= 4(h + 1)3 − (h + 1) 

, (J.4) 

3 

205

APPENDIX J. ELEMENTARY ENERGY E PAM OF AN |A|-PAM CONSTELLATION 206 

and thus the induction hypothesis also holds for h + 1 and thus for all H. Therefore we obtain 

E pam = 2 

|A| 

∑ 

k=1,|A|/2 

(2k − 1) 2 = 2 

|A| 

4(|A|/2) 3 − |A|/2 

3 

= |A|2 − 1 

. (J.5) 

3

Bibliography 

[1] A.R. Calderbank, T.A. Lee, and J.E. Mazo, “Baseband Trellis Codes with a Spectral Null at 

Zero,” IEEE Trans. Inform. Theory, vol. IT-34, pp. 425-434, May 1988. 

[2] G.A. Campbell, U.S. Patent 1,227,113, May 22, 1917, “Basic Types of Electric Wave Filters.” 

[3] J.R. Carson, “Notes on the Theory of Modulation,” Proc. IRE, vol. 10, pp. 57-64, February 

1922. Reprinted in Proc. IEEE, Vol. 51, pp. 893-896, June 1963. 

[4] T.M. Cover, “Enumerative Source Encoding,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 

73-77, Jan. 1973. 

[5] G.D. Forney, Jr., R.G. Gallager, G.R. Lang, F.M. Longstaff, and S.U. Quereshi, “Efficient 

Modulation for Band-Limited Channels,” IEEE Journ. Select. Areas Comm., vol. SAC-2, pp. 

632-647, Sept. 1984. 

[6] G.D. Forney, Jr., and G. Ungerboeck, “Modulation and Coding for Linear Gaussian Channels,” 

IEEE Trans. Inform. Theory, vol. 44, pp. 2384 - 2415, October 1998. 

[7] R.G. Gallager, Information Theory and Reliable Communication, Wiley and Sons, 1968. 

[8] R.V.L. Hartley, “Transmission of Information,” Bell Syst. Tech. J., vol. 7, pp. 535 - 563, July 

1928. 

[9] C.W. Helstrom, Elements of Signal Detection and Estimation, Prentice Hall, 1995. 

[10] S. Haykin, Communication Systems. John Wiley & Sons, 4th edition, 2000. 

[11] V.A. Kotelnikov, The Theory of Optimum Noise Immunety. Dover Publications, 1960. 

[12] R. Laroia, N. Farvardin, S. Tretter, “On Optimal Shaping of Multidimensional Constellations”, 

submitted to IEEE Trans. on Inform. Theory, vol. IT-40, pp. 1044-1056, July 1994. 

[13] E.A. Lee and D.G. Messerschmitt, Digital Communication, 2nd Edition, Kluwer Academic, 

1994. 

207

BIBLIOGRAPHY 208 

[14] D.O. North, Analysis of the Factors which determine Signal/Noise Discrimination in Radar. 

RCA Laboratories, Princeton, Technical Report, PTR-6C, June 1943. Reprinted in Proc. 

IEEE, vol. 51, July 1963. 

[15] H. Nyquist, “Certain factors affecting telegraph speed,” Bell Syst. Tech. J., vol. 3, pp. 324 - 

346, April 1924. 

[16] B.M. Oliver, J.R. Pierce, and C.E. Shannon, “The Philosophy of PCM,” Proc. IRE, vol. 36, 

pp.1324-1331, Nov. 1948. 

[17] J.G. Proakis and M. Salehi, Communication Systems Engineering. Prentice Hall, 1994. 

[18] J.G. Proakis, Digital Communications. McGraw-Hill, 4th Edition, 2001. 

[19] J.P.M. Schalkwijk, “An Algorithm for Source Coding,” IEEE Trans. Inform. Theory, vol. 

IT-18, pp. 395-399, May 1972. 

[20] C.E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J., vol. 27, pp. 

379 - 423 and 623 - 656, July and October 1948. Reprinted in the Key Papers on Information 

Theory. 

[21] C.E. Shannon, “Communication in the Presence of Noise,” Proc. IRE, vol. 37, pp. 10 - 21, 

January 1949. Reprinted in the Key Papers on Information Theory. 

[22] G. Ungerboeck, “Channel Coding with Multilevel/Phase Signals,” IEEE Trans. Inform. 

Theory, vol. IT-28, pp. 55-67, Jan. 1982. 

[23] S. Verdu, Multi-user detection, Cambridge, 1998. 

[24] J.H. Van Vleck and D. Middleton, “A Theoretical Comparison of Visual, Aural, and Meter 

Reception of Pulsed Signals in the Presence of Noise,” Journal of Applied Physics, vol. 17, 

pp. 940-971, November 1946. 

[25] J.M. Wozencraft and I.M. Jacobs, Principles of Communication Engineering, Wiley, New 

York, 1965. 

[26] S.T.M. Ackermans and J.H. van Lint, Algebra en Analyse, Academic Service, Den Haag, 

1976.

Index 

a-posteriori message probability, 18 

a-priori message probability, 15 

additive Gaussian noise channel 

scalar, 27 

additive Gaussian noise vector channel, 38 

aliased spectrum, 120 

Armstrong, E.H., 11 

asymptotic coding gain, 175 

autocorrelation function 

of white noise, 47 

available energy 

per dimension, 104 

bandlimited transmission, 104 

bandwidth-limited case, 116 

Bardeen J., Brattain W., and Shockley W., 10 

Bayes rule, 18 

Bell, A.G., 9 

bi-orthogonal signaling, 80 

binary modulation 

frequency shift keying (FSK), 85 

phase shift keying (PSK), 85 

Branly, E., 9 

Braun, K.F., 10 

building-block waveforms, 53 

Campbell, G.A., 9, 11 

capacity 

of bandlimited waveform channel, 112 

of wideband waveform channel, 114 

per dimension, 104, 111 

carrier-phase, 137 

Carson, J.R., 11 

center of gravity of the signal structure, 85 

channel 

additive Gaussian noise 

scalar, 27 

bandpass, 127 

binary symmetric, 23 

capacity, 13 

discrete, 15 

discrete scalar, 20 

discrete vector, 20 

real scalar, 25 

real vector, 35, 36 

waveform, 46 

Clarke, A.C., 10 

code 

error-correcting, 172 

coding gain 

asymptotic, 175 

coherent demodulation, 137 

correlation, 50 

correlation receiver, 50 

cross-talk, 170 

decibel definition, 202 

decision intervals, 29 

decision region, 37 

definition for real vector channel, 37 

decision rule, 16 

maximum a-posteriori 

for discrete channel, 17 

for real scalar channel, 27 

maximum likelihood 



minimax, 22 

minimum Euclidean distance, 39 

decision variables 



209

INDEX 210 

for real vector channel, 36 

for the discrete vector channel, 21 

DeForest, L., 10 

demodulator, 61 

destination, 16 

diode, 10 

direct receiver, 66 

discrete, 16 

distance gain, 176 

dot product, 38 

Dudley, H., 12 

energy 

of vector, 69 

envelope detection, 142 

equally likely messages, 19 

error probability, 16 

minimum, for the scalar AGN channel, 30 

symbol, 168 

union bound, 88 

error-correcting code, 172 

estimate, 16, 21 

excess bandwidth, 122 

extended Hamming codes, 172 

Faraday, M., 9 

Fleming, J.A., 10 

frequency shift keying, 75 

FSK, 75 

Galvani, L., 8 

Gauss, C.F. and Weber, W.E., 9 

Gram-Schmidt procedure, 55, 192 

Hadamard matrix, 79 

Hamming codes 

extended, 172 

Hamming distance, 172 

minimum, 172 

Hamming-distance, 23 

Hartley, R., 12 

Henry, J., 8 

Hertz, H., 9 

hypershere 

interior volume of, 107 

information source, 15 

inner product, 38 

integrated circuit, 10 

interval 

observation, 47 

irrelevant data (noise), 58 

Kilby J. and Noyce R., 10 

Kotelnikov, V., 12 

Kronecker delta function, 50 

likelihood, 19 

Lodge, O., 9 

MAP decision rule 

for the discrete vector channel, 21 



for the scalar AGN channel, 28 

MAP decoding 

for real vector channel, 36 

Marconi, G., 10 

matched filter, 12 

matched filters, 64 

matched-filter receiver, 64 

Maxwell, J.C., 9 

message, 15 

message sequences, 92 

minimax decision rule, 22 

minimum Hamming distance, 172 

ML decision rule 



modulation 

amplitude (AM), 11 

double sideband suppressed carrier (DSB- 

SC), 137 

frequency (FM), 11 

pulse amplitude (PAM), 119 

pulse code (PCM), 12 

pulse position, 75 

quadrature amplitude (QAM), 127, 135

INDEX 211 

modulation interval, 120, 125 

modulator, 56, 60 

Morse 

code, 8 

telegraph, 8 

Morse, S., 8 

multi-pulse transmission, 125 

normalized signal-to-noise-ratio, 168 

North, D.O., 12, 70 

Nyquist criterion, 120 

Nyquist, H., 11 

observation interval, 47 

observation space, 37 

Oersted, H.K., 8 

optimum decision rule, 16 

for real channel, 26 

for the discrete channel, 18 

optimum receiver, 16 

for the AGN vector channel, 39 

for the scalar AGN channel, 28 

orthogonal signaling, 75 

capacity, 77 

energy, 88 

error probability, 76 

optimum receiver for, 76 

signal structure, 75 

orthonormal, 47 

orthonormal functions, 47 

Parseval relation, 68 

PCM, 12 

Pierce, J.R., 10 

Poisson distribution, 23 

Popov, A., 10 

power spectral density 

of white noise, 47 

power-limited case, 116 

PPM, 75 

probability of correct decision, 16 

probability of error 

definition, 16 

pulse amplitude modulation, 167 

pulse transmission, 120 

Pupin, M., 9 

Q-function, 30 

definition of, 30 

table of, 31 

upper bound for, 182 

quadrature amplitude modulation, 127, 135 

quadrature multiplexing, 127 

random code, 104, 108 

real vector channel, 36 

receiver, 16 

direct, 66 

redundancy loss, 176 

Reeves, A., 12 

reversibility 

theorem of, 42 

Righi, A., 9 

satellites, 10 

Schwarz inequality, 197 

serial pulse transmission, 120 

set-partitioning, 173 

Shannon, C.E., 13 

shaping gain, 136 

signal, 15 

signal constellation, 167 

signal energy, 83 

signal space, 56 

signal structure, 56, 61 

average energy of, 84 

rotation of, 83 

translation of, 83 

signal vector, 20 

signal-to-noise ratio, 66, 115 

at output of matched filter, 67 

normalized, 168 

signaling 

antipodal, 85 

bit-by-bit, 104 

block-orthogonal, 104 

orthogonal binary, 85 

source, 15

INDEX 212 

sphere hardening 

noise vector, 105 

received vector, 106 

sphere-packing argument, 107 

sufficient statistic, 52 

sufficient statistics, 53 

symbol error probability, 168 

telegraph, 9 

telephone, 9 

Telstar I, 10 

transistor, 10 

transmitter, 15 

triode, 10 

uncoded transmission, 167 

union bound, 89, 110 

van der Bijl, H., Hartley, R., and Heising, R., 

11 

vector channel, 20 

vector receiver, 21 

vector transmitter, 20 

Volta, A., 8 

water-filling procedure, 113 

waveform channel 

relation with vector channel, 60 

Wiener, N., 12 

zero-forcing (ZF), 124 

zero-forcing criterion, 120

COMMUNICATION THEORY - Signal Processing Systems

Create successful ePaper yourself

Delete template?

Save as template?