30.11.2012 Views

Code and ciphers: Julius Caesar, the Enigma and the internet

Code and ciphers: Julius Caesar, the Enigma and the internet

Code and ciphers: Julius Caesar, the Enigma and the internet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

24<br />

chapter 2<br />

help or a hindrance to <strong>the</strong> cryptanalyst. In one extreme case a novel was<br />

written which in over 50 000 words never used <strong>the</strong> letter E, but this was<br />

done deliberately; <strong>the</strong> author having tied down <strong>the</strong> E on his typewriter so<br />

that it couldn’t be used. This is a remarkable feat; here, as a sample, is one<br />

sentence from <strong>the</strong> book:<br />

Upon this basis I am going to show you how a bunch of bright young<br />

folks did find a champion; a man with boys <strong>and</strong> girls of his own; a man<br />

of so dominating <strong>and</strong> happy individuality that Youth is drawn to him<br />

as is a fly to a sugar bowl. [2.3]<br />

Even when shown a much longer extract from this book few people notice<br />

anything unusual about it until <strong>the</strong>y are asked to study it very carefully<br />

<strong>and</strong>, even <strong>the</strong>n, <strong>the</strong> majority fail to notice its unique feature.<br />

Letter frequencies in languages o<strong>the</strong>r than English<br />

A simple substitution cipher in any alphabetic language is solvable by <strong>the</strong><br />

method above: a frequency count followed by use of <strong>the</strong> language itself.<br />

Obviously, <strong>the</strong> cryptanalyst needs to have at least a moderate knowledge<br />

of <strong>the</strong> language, though with a simple substitution cipher he doesn’t need<br />

to be fluent. Equally obviously <strong>the</strong> frequency count of letters in a typical<br />

sample will vary from one language to ano<strong>the</strong>r although <strong>the</strong> variation<br />

between languages with a common base, such as Latin, will be less than<br />

will be found between languages with entirely different roots. Not all languages<br />

use 26 letters; some use fewer; Italian normally uses only 22, <strong>and</strong><br />

some, such as Russian, use more whilst o<strong>the</strong>rs (Chinese) don’t have an<br />

alphabet at all. Since <strong>the</strong> Italians normally don’t use K, W or Y <strong>the</strong>se letters<br />

are given a zero frequency, but an Italian text which includes a mention of<br />

New York shows that even such letters may appear. In French <strong>and</strong><br />

German we should really distinguish between vowels with various<br />

accents or umlauts but in order to simplify <strong>the</strong> tables below all forms of<br />

<strong>the</strong> same letter were counted toge<strong>the</strong>r. Thus, in French, E, É, Ê <strong>and</strong> È were<br />

all included in <strong>the</strong> count for E. Also, numbers were excluded from <strong>the</strong><br />

count, unless <strong>the</strong>y were spelled out, <strong>and</strong> all non-alphabetic symbols such<br />

as space, comma, full stop, quotes, semi-colon etc. were considered as<br />

‘o<strong>the</strong>r’. Upper <strong>and</strong> lower case letters were treated as <strong>the</strong> same. With <strong>the</strong>se<br />

conventions Table 2.6 shows <strong>the</strong> frequency of letters in samples of 1,000<br />

in four European languages. The table of frequencies of letters in English<br />

given above is repeated for convenience.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!