16.11.2014 Views

Cryptography on the Playstation 3: Brute Force AES Attack Introduction

Cryptography on the Playstation 3: Brute Force AES Attack Introduction

Cryptography on the Playstation 3: Brute Force AES Attack Introduction

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong><br />

<strong>Force</strong> <strong>AES</strong> <strong>Attack</strong><br />

Introducti<strong>on</strong><br />

Adults typically categorize game c<strong>on</strong>soles as toys. This is not surprising, since that is <strong>the</strong>ir<br />

intended purpose. Recently however <strong>the</strong> technology used in game c<strong>on</strong>soles has become<br />

extremely advanced, even bleeding edge. While <strong>the</strong> intended purpose of <strong>the</strong>se devices is<br />

entertainment, modern game c<strong>on</strong>sole hardware such as <strong>the</strong> S<strong>on</strong>y Playstati<strong>on</strong> 3 have more in<br />

comm<strong>on</strong> with high end supercomputers than <strong>the</strong>y do with <strong>the</strong> toys <strong>the</strong>y share a shelf with.<br />

Unfortunately most of this power goes to waste – <strong>the</strong>se machines sit idle most of <strong>the</strong> day.<br />

Stanford University’s Pande group recognized this wasted potential and found a way to utilize it.<br />

This group runs <strong>the</strong> Folding @ Home network, a distributed computing network where people<br />

can d<strong>on</strong>ate idle processing power to scientists so <strong>the</strong>y can run complex calculati<strong>on</strong>s [6]. This<br />

group worked with S<strong>on</strong>y to develop a Folding @ Home client for <strong>the</strong> Playstati<strong>on</strong> 3. Since this<br />

client went live, <strong>the</strong> processing power of <strong>the</strong>ir distributed computing network has more than<br />

doubled to become <strong>on</strong>e of <strong>the</strong> fastest distributed computing networks in <strong>the</strong> world. The<br />

overwhelming success of <strong>the</strong> Playstati<strong>on</strong> 3 in <strong>the</strong> protein folding project was <strong>the</strong> inspirati<strong>on</strong> for<br />

this project. Since it does so well at <strong>the</strong>se complex scientific calculati<strong>on</strong>s, how would it do at<br />

cryptography? Perhaps more importantly can a typical programmer unlock <strong>the</strong> potential of <strong>the</strong><br />

Cell, or is this a beast that can <strong>on</strong>ly be tamed by a handful of specialists?<br />

Strategy<br />

This project could have examined <strong>the</strong> technical specificati<strong>on</strong>s of <strong>the</strong> Playstati<strong>on</strong> 3 and its Cell<br />

processor and marvel at how great it would be at cryptography, but that would leave <strong>the</strong> reader<br />

w<strong>on</strong>dering if <strong>the</strong>ory translates to reality, and more importantly it would not be much fun. To see<br />

what <strong>the</strong> Playstati<strong>on</strong> 3 can really do this project implemented an interesting, computati<strong>on</strong>ally<br />

intensive applicati<strong>on</strong> <strong>on</strong> both <strong>the</strong> Playstati<strong>on</strong> 3 and <strong>on</strong> commodity PC hardware to compare <strong>the</strong>ir<br />

performance. The chosen example was a known plaintext brute force attack <strong>on</strong> <strong>the</strong> <strong>AES</strong><br />

encrypti<strong>on</strong> algorithm. This paper did not aim to shoot holes in <strong>AES</strong>; this would be naive and<br />

unachievable. The primary goal of this paper is to teach <strong>the</strong> reader about a unique new breed of<br />

processor and show an interesting applicati<strong>on</strong> of <strong>the</strong> Cell processor that highlights its<br />

performance potential.<br />

<strong>AES</strong><br />

<strong>AES</strong> is a symmetric block cipher that is <strong>the</strong> current FIPS standard for protecting electr<strong>on</strong>ic data in<br />

business and government (FIPS 197) [1]. It is based off of <strong>the</strong> Rijndael cipher but uses a fixed 128 bit<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 1


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

block size and <strong>on</strong>ly supports 128, 192, or 256 bit keys (Rijndael allows variable key length and block size)<br />

[7].<br />

Cryptanalysis<br />

Every cryptographic algorithm is vulnerable to cryptanalysis. There are many different<br />

cryptanalysis techniques, but <strong>on</strong>e is possible with every cryptographic algorithm: brute force.<br />

The <strong>the</strong>oretical vulnerability of an algorithm is determined by <strong>the</strong> effort required to search <strong>the</strong> full<br />

key space; if every possible key is tried, <strong>on</strong>e of <strong>the</strong>m has to be <strong>the</strong> correct key. Due to <strong>the</strong>ir<br />

simplistic nature many would c<strong>on</strong>sider brute force attacks to be primitive. Primitive or not, it is<br />

often <strong>the</strong> most effective cryptanalytic technique. Algorithms are c<strong>on</strong>sidered “broken” when an<br />

attack is found that requires less than <strong>the</strong> number of operati<strong>on</strong>s a brute force attack would require<br />

to recover <strong>the</strong> plaintext. These “breaks” show weakness in <strong>the</strong> algorithm, but are often nothing<br />

more than <strong>the</strong>oretical weaknesses due to unrealistic c<strong>on</strong>straints <strong>on</strong> <strong>the</strong> attack, such as requiring a<br />

huge number of known plaintext, ciphertext pairs. It’s hard for an attacker to get a single known<br />

plaintext, ciphertext pair, let al<strong>on</strong>e a large number of <strong>the</strong>m.<br />

The Cell<br />

The heart of <strong>the</strong> Playstati<strong>on</strong> 3 is <strong>the</strong> Cell Broadband Engine microprocessor. This processor was<br />

jointly developed by S<strong>on</strong>y, Toshiba, and IBM (STI) [8]. While <strong>the</strong> purpose of this paper is not to<br />

marvel at <strong>the</strong> microarchitecture of <strong>the</strong> Cell, understanding what makes <strong>the</strong> Cell different from<br />

typical CPUs is essential to unlocking its power. Traditi<strong>on</strong>al microprocessors have a single<br />

general purpose processing core. Recently multi-core processors have reached <strong>the</strong> mainstream<br />

market. These are essentially multiple (identical) general purpose processing cores packaged<br />

toge<strong>the</strong>r so <strong>the</strong>y can be installed in a single socket. They are a c<strong>on</strong>solidated versi<strong>on</strong> of <strong>the</strong> shared<br />

memory multiprocessor systems that preceded <strong>the</strong>m.<br />

The Cell is different, very different. Instead of trying to build a faster processor by cramming<br />

more transistors into general purpose processors STI decided to improve performance through<br />

specializati<strong>on</strong>. The Cell c<strong>on</strong>sists of a single general purpose core called <strong>the</strong> Power Processing<br />

Element (PPE) and eight highly specialized 128 bit vector processing units called Synergistic<br />

Processing Elements (SPEs). The PPE is capable of general purpose computing; it is <strong>the</strong> heart of<br />

<strong>the</strong> Cell. The PPE is a 64 bit RISC processing unit based off IBM’s POWER architecture that is<br />

capable of running two threads in parallel [9]. The SPEs are <strong>the</strong> workhorses of <strong>the</strong> Cell. These<br />

are specialized processing units that are built to perform a limited set of operati<strong>on</strong>s very quickly.<br />

The general programming strategy recommended by IBM is to c<strong>on</strong>trol <strong>the</strong> SPEs with <strong>the</strong> code<br />

running <strong>on</strong> <strong>the</strong> PPE, and offload all compute intensive work to <strong>the</strong> SPEs [3].<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 2


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

The PPE and SPEs are not set up in a traditi<strong>on</strong>al shared memory multiprocessor c<strong>on</strong>figurati<strong>on</strong>.<br />

The PPE is linked directly to main memory (256MB). The SPEs each have <strong>the</strong>ir own private<br />

memory, referred to as <strong>the</strong> Local Store (256KB). The PPE and SPEs are c<strong>on</strong>nected via a high<br />

capacity interc<strong>on</strong>nect called <strong>the</strong> Element Interc<strong>on</strong>nect Bus (EIB). While <strong>the</strong> PPE and SPEs are<br />

c<strong>on</strong>nected, <strong>the</strong> SPEs cannot access main memory directly; <strong>the</strong>y must do so via DMA [9]. This is<br />

a unique memory c<strong>on</strong>figurati<strong>on</strong>. The SPEs can access <strong>the</strong>ir local store very quickly, and since a<br />

local store is dedicated to a single SPE it does not need to worry about c<strong>on</strong>tenti<strong>on</strong>. When SPEs<br />

need to work with data that is in main memory, <strong>the</strong> data must be transferred across <strong>the</strong> EIB. The<br />

PPE and SPEs can both initiate DMA requests, but for efficiency reas<strong>on</strong>s it is preferable to<br />

initiate DMA from <strong>the</strong> SPEs [3]. If an SPE modifies data and wants <strong>the</strong> PPU to be able to see <strong>the</strong><br />

change, it needs to write <strong>the</strong> updated data back to main memory via DMA.<br />

The Cell processor is not exclusive to <strong>the</strong> Playstati<strong>on</strong> 3, it is also available in high performance<br />

servers such as <strong>the</strong> IBM QS20 [10], dedicated processing boards such as those produced by<br />

Mercury Computer [11], and in <strong>the</strong> Department of Energy’s next supercomputer, <strong>the</strong> IBM<br />

Roadrunner [12]. The Cell processor in <strong>the</strong> Playstati<strong>on</strong> 3 is a full featured Cell, <strong>the</strong> <strong>on</strong>ly<br />

restricti<strong>on</strong> being that <strong>on</strong>e of <strong>the</strong> eight SPEs is disabled. This was d<strong>on</strong>e to improve chip yields;<br />

many more processors can pass QC if <strong>on</strong>e of <strong>the</strong> SPEs is permitted to be defective [13]. The<br />

Playstati<strong>on</strong> 3 provides a unique opportunity to gain access to supercomputer technology without<br />

paying supercomputer prices (even <strong>the</strong> relatively cheap QS20 blade server is around $20,000<br />

[10]).<br />

SPE C<strong>on</strong>cepts<br />

The SPEs are what makes <strong>the</strong> Cell a m<strong>on</strong>ster. Their wide 128 bit registers and SIMD instructi<strong>on</strong><br />

set allow huge volumes of data to be processed quickly. The fact that <strong>the</strong> Cell processor c<strong>on</strong>tains<br />

eight SPEs does not hurt ei<strong>the</strong>r (six available through Linux [13]). To harness <strong>the</strong> power of <strong>the</strong><br />

SPEs programmers must understand a few key c<strong>on</strong>cepts:<br />

Vector Data Types<br />

Vectors data types allow multiple sub-quadword values to be stored in a single 128 bit quadword.<br />

The number of values that can fit in a vector vary based <strong>on</strong> <strong>the</strong> type of scalar being aggregated<br />

into a vector. A vector unsigned char can hold 16 X 8 bit unsigned chars, a vector unsigned int<br />

can hold 4 X 32 bit unsigned ints, and so <strong>on</strong>. The utility of vector data types is not immediately<br />

obvious, but will become so in <strong>the</strong> next secti<strong>on</strong>.<br />

SIMD Operati<strong>on</strong>s<br />

Vector data types do not do much <strong>on</strong> <strong>the</strong>ir own; <strong>the</strong>y just provide a way to structure data in<br />

quadwords so it can be processed using SIMD operati<strong>on</strong>s. SIMD stands for Single Instructi<strong>on</strong><br />

Multiple Data. SIMD operati<strong>on</strong>s allow <strong>on</strong>e CPU operati<strong>on</strong> to be applied to multiple values in<br />

parallel. The figure below shows two lists of four 32 bit unsigned ints being strored in two 128<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 3


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

bit vector unsigned ints. These are added using an SIMD add operati<strong>on</strong> to produce a 128 bit<br />

vector unsigned int result. With a single CPU instructi<strong>on</strong> four pairs of values have been added.<br />

Intrinsics<br />

C<strong>on</strong>cept derived from [2]<br />

Typically a high level language like C would be used to implement an algorithm, and <strong>the</strong><br />

compiler would be resp<strong>on</strong>sible for mapping <strong>the</strong> C operati<strong>on</strong>s to CPU instructi<strong>on</strong>s. This works<br />

well for some target platforms. A good C compiler can normally come very close to <strong>the</strong><br />

performance of an assembly implementati<strong>on</strong>. This is not <strong>the</strong> case with <strong>the</strong> Cell however. For <strong>the</strong><br />

Cell to perform, data needs to be vectorized and SIMD operati<strong>on</strong>s need to be used. I expected <strong>the</strong><br />

compiler to be able to automatically vectorize data (such as scalar arrays) and use SIMD<br />

operati<strong>on</strong>s, but C code compiled for <strong>the</strong> Cell performed dismally. I am not sure if this is due to<br />

<strong>the</strong> immaturity of <strong>the</strong> compiler or if effective auto-vectorizati<strong>on</strong> is too much to ask.<br />

Optimizing for <strong>the</strong> Cell requires <strong>the</strong> programmer to c<strong>on</strong>vert data to vectors and c<strong>on</strong>vert<br />

operati<strong>on</strong>s to intrinsics calls. A vector is a 128 bit chunk of data. The SPEs have 128 bit<br />

registers, so any of <strong>the</strong> vector types can fit in a single register. SPE CPU instructi<strong>on</strong>s operate <strong>on</strong><br />

vectors. IBM has provided a library of SPE intrinsics that allow <strong>the</strong> programmer to make near<br />

direct calls to <strong>the</strong> SPEs SIMD CPU instructi<strong>on</strong>s [14]. This allows <strong>the</strong> programmer to take back<br />

some c<strong>on</strong>trol from <strong>the</strong> compiler without resorting to raw assembly.<br />

Vector/Intrinsics Example: XOR two 128 bit chunks of data.<br />

Standard C implementati<strong>on</strong>:<br />

unsigned char chunk1[16] = {0xA0,0x03,0x00,0x04,0x13,0xB4,0x00,0x05,0x80,0x66,0xDF,0x01,0x34,0x06,0x80,0x10};<br />

unsigned char chunk2[16] = {0xF4,0x50,0x01,0xA4,0x57,0x23,0x60,0x40,0xF0,0xAA,0x12,0x40,0x01,0xEF,0xC4,0x08};<br />

unsigned char result[16];<br />

int i;<br />

for (i=0;i


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

The standard C implementati<strong>on</strong> represents <strong>the</strong> 128 bits of data as an array of unsigned chars. It<br />

<strong>the</strong>n loops through each positi<strong>on</strong> and XORs <strong>the</strong> chunks to produce <strong>the</strong> result. The compiler will<br />

need to interpret <strong>the</strong> code and decide how to map this to CPU instructi<strong>on</strong>s. On a machine with 32<br />

or 64 bit registers multiple CPU instructi<strong>on</strong>s would typically be required to XOR two 128 bit<br />

chunks of data.<br />

The vectorized versi<strong>on</strong> using SPE intrinsics takes c<strong>on</strong>trol away from <strong>the</strong> compiler. The 128 bits<br />

of data is represented with a vector unsigned char. This is a data type that c<strong>on</strong>tains 16 8 bit<br />

unsigned chars aligned to be stored in a single register. The XOR operati<strong>on</strong> is performed using<br />

<strong>the</strong> spu_xor intrinsic. The documentati<strong>on</strong> for this intrinsic indicates that it will be mapped to a<br />

single CPU instructi<strong>on</strong> – XOR [14]. Intrinsics allow <strong>the</strong> programmer to make low level calls<br />

easily from a high level language.<br />

Performance Comparis<strong>on</strong> Strategy<br />

To evaluate <strong>the</strong> performance of <strong>the</strong> Playstati<strong>on</strong> 3 and its Cell processor I implemented a brute<br />

force attack <strong>on</strong> <strong>AES</strong> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3 as well as <strong>on</strong> commodity PC hardware. A range of keys<br />

to search was chosen, and <strong>the</strong> search was performed <strong>on</strong> both sets of hardware. Timing was taken<br />

for both versi<strong>on</strong>s and <strong>the</strong> results were compared.<br />

Software Design<br />

The x86 and Cell versi<strong>on</strong>s of <strong>the</strong> attack will use <strong>the</strong> same basic design. There will be a single<br />

c<strong>on</strong>troller thread and many worker threads.<br />

C<strong>on</strong>troller<br />

The c<strong>on</strong>troller is resp<strong>on</strong>sible for managing <strong>the</strong> attack. It breaks up <strong>the</strong> work to be d<strong>on</strong>e into<br />

chunks. Worker threads are spawned to process <strong>the</strong> chunks. The optimal number of worker<br />

threads will be <strong>the</strong> number of cores available <strong>on</strong> <strong>the</strong> target CPU.<br />

Pseudocode:<br />

Build a known plaintext, ciphertext pair.<br />

for (each core)<br />

{<br />

Allocate a porti<strong>on</strong> (1/# cores) of <strong>the</strong> keyspace to be searched.<br />

}<br />

Spawn a worker thread to search <strong>the</strong> keyspace passing it <strong>the</strong> range to search<br />

and <strong>the</strong> known plaintext, ciphertext pair.<br />

Exit when a worker thread finds <strong>the</strong> key OR all worker threads finish searching<br />

<strong>the</strong>ir chunk of <strong>the</strong> keyspace.<br />

Worker:<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 5


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

The worker loops through <strong>the</strong> range of keys and tests each candidate key. If <strong>the</strong> key is found it<br />

returns <strong>the</strong> key. When <strong>the</strong> entire range has been searched <strong>the</strong> worker returns indicating that <strong>the</strong><br />

key was not found.<br />

Pseudocode:<br />

for (start of key range TO end of key range)<br />

{<br />

Encrypt known plaintext to produce candidate ciphertext.<br />

if (candidate ciphertext = actual ciphertext)<br />

{<br />

return <strong>the</strong> key;<br />

}<br />

}<br />

Move to next key.<br />

<strong>AES</strong> Implementati<strong>on</strong> - Cell<br />

<strong>AES</strong> can be optimized for a wide range of architectures. To maximize performance software<br />

implementati<strong>on</strong>s need to be designed with <strong>the</strong> target platform in mind. Optimized software<br />

implementati<strong>on</strong>s are widely available for many types of hardware, from smart cards to<br />

supercomputers. There is not currently a freely available Cell optimized <strong>AES</strong> implementati<strong>on</strong><br />

however. Since <strong>the</strong>re was no Cell optimized <strong>AES</strong> implementati<strong>on</strong> available, I needed to create<br />

<strong>on</strong>e.<br />

The <strong>AES</strong> implementati<strong>on</strong> created for this project is not a full fledged implementati<strong>on</strong>. Only<br />

encrypt and key scheduling operati<strong>on</strong>s needed to be implemented for <strong>the</strong> attack. In additi<strong>on</strong> <strong>the</strong><br />

attack works <strong>on</strong> 128 bit keys, so <strong>the</strong> <strong>AES</strong> implementati<strong>on</strong> <strong>on</strong>ly supports 128 bit keys. It would be<br />

trivial to turn this into a full fledged Cell optimized <strong>AES</strong> implementati<strong>on</strong> (if some<strong>on</strong>e else is<br />

looking for a project).<br />

The process began with evaluating <strong>the</strong> available <strong>AES</strong> implementati<strong>on</strong>s compiled for <strong>the</strong> Cell to<br />

determine how well <strong>the</strong> attack would run without optimizati<strong>on</strong>. Performance was much slower<br />

than <strong>on</strong> x86 hardware with all <strong>the</strong> implementati<strong>on</strong>s tested. It became clear that realizing <strong>the</strong><br />

Cell's potential would require a custom <strong>AES</strong> implementati<strong>on</strong>. The <strong>on</strong>ly questi<strong>on</strong> remaining was<br />

which of <strong>the</strong> existing <strong>AES</strong> implementati<strong>on</strong>s to use as a starting point. The “optimized” reference<br />

implementati<strong>on</strong> [5] performed significantly better than <strong>the</strong> base reference implementati<strong>on</strong> [4], but<br />

<strong>the</strong>re was not a clear divisi<strong>on</strong> of <strong>the</strong> different steps of an <strong>AES</strong> round, so I found it easier to start<br />

my optimizati<strong>on</strong> attempts with <strong>the</strong> base reference implementati<strong>on</strong>. It is necessary to understand<br />

how <strong>AES</strong> works to understand what was d<strong>on</strong>e to optimize it for <strong>the</strong> Cell.<br />

<strong>AES</strong> Encrypt Operati<strong>on</strong><br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 6


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

The <strong>AES</strong> encrypt operati<strong>on</strong> takes a plaintext and an expanded key and returns a ciphertext. The<br />

key steps in encrypti<strong>on</strong>, SubBytes, ShiftRows, MixColumns and AddRoundKey are outlined in<br />

more detail below.<br />

Pseudocode:<br />

[1]<br />

AddRoundKey<br />

Summary: The AddRoundKey step applies a round key that is derived from <strong>the</strong> 128 bit<br />

encrypti<strong>on</strong> key to <strong>the</strong> input. The 128 bit key is applied to <strong>the</strong> 128 bit input by XORing<br />

<strong>the</strong>m toge<strong>the</strong>r.<br />

[1]<br />

Reference Implementati<strong>on</strong>: The reference implementati<strong>on</strong> loops through each byte of<br />

<strong>the</strong> input XORing <strong>on</strong>e at a time with <strong>the</strong> corresp<strong>on</strong>ding byte of <strong>the</strong> round key.<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 7


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

Cell Optimized Implementati<strong>on</strong>: This operati<strong>on</strong> is very easy to optimize for <strong>the</strong> Cell. The<br />

input is 128 bits, <strong>the</strong> round key is 128 bits, and <strong>the</strong> key is applied to <strong>the</strong> input via <strong>the</strong> XOR<br />

operati<strong>on</strong> to produce a 128 bit output. C<strong>on</strong>veniently <strong>the</strong> Cell has a vector intrinsic<br />

(spu_xor) that takes two 128 bit vectors, XORs <strong>the</strong>m, and returns an output vector. The<br />

intrinsic maps to a single CPU instructi<strong>on</strong> (XOR) so <strong>the</strong> entire AddRoundKey step can be<br />

performed in <strong>on</strong>e operati<strong>on</strong>.<br />

SubBytes<br />

Summary: The SubBytes step takes each byte of <strong>the</strong> input, uses it as <strong>the</strong> key to a table<br />

lookup, and replaces <strong>the</strong> byte with <strong>the</strong> value from <strong>the</strong> table.<br />

[1]<br />

Reference Implementati<strong>on</strong>: The s-box is implemented as a 256 entry byte array. The<br />

input array is looped through <strong>on</strong>e byte at a time with <strong>the</strong> current byte used as <strong>the</strong> key to<br />

<strong>the</strong> 256 entry byte array. The value returned from <strong>the</strong> table lookup is substituted in <strong>on</strong>e<br />

byte at a time.<br />

Cell Optimized Implementati<strong>on</strong>: This step is tricky to optimize. It is necessary to look at<br />

what is going <strong>on</strong> from a higher level without being biased by <strong>the</strong> reference<br />

implementati<strong>on</strong>'s approach. The s-box is implemented as a 16 entry quadword array.<br />

Lookups are performed using <strong>the</strong> 5 least significant bits of each byte of <strong>the</strong> input data as a<br />

key to index two of <strong>the</strong> 16 byte entries of <strong>the</strong> s-box array at a time. This is repeated 8<br />

times to search <strong>the</strong> whole 16 entry s-box. The result of <strong>the</strong>se 32 byte s-box lookups is 8<br />

intermediate vectors. These vectors c<strong>on</strong>tain <strong>the</strong> valid substituti<strong>on</strong> values, but <strong>the</strong>y also<br />

c<strong>on</strong>tain many invalid values since <strong>the</strong> most significant 3 bits of each input byte were<br />

ignored while doing <strong>the</strong> lookup. The invalid values need to be eliminated and <strong>the</strong> 8<br />

intermediate vectors need to be c<strong>on</strong>solidated into a single result vector c<strong>on</strong>taining <strong>on</strong>ly <strong>the</strong><br />

valid values. This is d<strong>on</strong>e using binary tree pruning. The bytes of <strong>the</strong> result vectors are<br />

pruned down to eliminate invalid values based <strong>on</strong> <strong>the</strong> value of <strong>the</strong> three most significant<br />

bits that were ignored previously. This is d<strong>on</strong>e in stages, eventually leaving a single<br />

vector c<strong>on</strong>taining <strong>the</strong> substituted values. This seems overly complex and wasteful<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 8


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

compared to <strong>the</strong> single byte substituti<strong>on</strong>s <strong>the</strong> reference implementati<strong>on</strong> uses. This is not<br />

<strong>the</strong> case however due to <strong>the</strong> SIMD operati<strong>on</strong>s this technique uses to perform lookups <strong>on</strong><br />

16 bytes at a time. This technique for SIMD table lookups was illustrated in [2].<br />

SIMD table lookup using binary tree pruning:<br />

ShiftRows<br />

[2]<br />

Summary: The ShiftRows step arranges <strong>the</strong> input as a 4x4 byte array and circular left<br />

shifts each row of <strong>the</strong> array by a varying number of bytes. The first row is not shifted at<br />

all, <strong>the</strong> sec<strong>on</strong>d by <strong>on</strong>e byte, <strong>the</strong> third by two bytes, and <strong>the</strong> fourth by three bytes.<br />

[1]<br />

Reference Implementati<strong>on</strong>: In <strong>the</strong> reference implementati<strong>on</strong> <strong>the</strong> ShiftRows step is<br />

performed by treating <strong>the</strong> input as a 4x4 array as in <strong>the</strong> c<strong>on</strong>ceptual ShiftRows step. Each<br />

of <strong>the</strong> rows except <strong>the</strong> first gets looped through. For each row each byte is copied to <strong>the</strong><br />

destinati<strong>on</strong> positi<strong>on</strong> in a temporary array. The destinati<strong>on</strong> positi<strong>on</strong> for each byte is<br />

determined by performing a lookup in a table that c<strong>on</strong>tains <strong>the</strong> destinati<strong>on</strong> index. After<br />

<strong>the</strong> copies (effectively a circular shift) are performed <strong>on</strong> a row in <strong>the</strong> temporary array <strong>the</strong><br />

row is copied back over <strong>the</strong> input to form <strong>the</strong> output.<br />

Cell Optimized Implementati<strong>on</strong>: This is ano<strong>the</strong>r step that can be performed in a simple<br />

and efficient manner <strong>on</strong> <strong>the</strong> SPEs. Instead of treating <strong>the</strong> 16 bytes of input as a 4x4 array<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 9


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

like <strong>the</strong> c<strong>on</strong>ceptual operati<strong>on</strong> and reference implementati<strong>on</strong> do, <strong>the</strong> 16 bytes are kept in<br />

vector unsigned char form. The row by row variable shifts can be performed by<br />

rearranging <strong>the</strong> bytes. A c<strong>on</strong>stant holds <strong>the</strong> shuffle pattern which dictates where each byte<br />

of <strong>the</strong> input gets moved to produce <strong>the</strong> output. A single spu_shuffle intrinsic call can<br />

perform <strong>the</strong> entire ShiftRows operati<strong>on</strong>. The spu_shuffle intrinsic maps to a single<br />

SHUFB CPU instructi<strong>on</strong>.<br />

MixColumns<br />

The MixColumns step arranges <strong>the</strong> 16 input bytes in a 4x4 byte array and applies a<br />

functi<strong>on</strong> to each column of <strong>the</strong> array. The functi<strong>on</strong> performs a transformati<strong>on</strong> <strong>on</strong> <strong>the</strong> bytes<br />

where each of <strong>the</strong> input bytes affects all of <strong>the</strong> output bytes. This step cannot be explained<br />

without getting into some heavy math. The purpose of this paper is to help <strong>the</strong> reader<br />

understand <strong>the</strong> potential of <strong>the</strong> Cell and what it takes to unlock it so detailed explanati<strong>on</strong>.<br />

This step, especially in <strong>the</strong> Cell optimized versi<strong>on</strong>, is too complex to be a good example<br />

of unlocking <strong>the</strong> power of <strong>the</strong> Cell, so this step will be skipped. The MixColumns<br />

functi<strong>on</strong> was heavily optimized for <strong>the</strong> Cell and <strong>the</strong> implementati<strong>on</strong> is creative and<br />

interesting. This was <strong>the</strong> most time c<strong>on</strong>suming part of <strong>AES</strong> to optimize. If you want<br />

more detail have a look at <strong>the</strong> included Cell optimized <strong>AES</strong> code; it is well documented.<br />

<strong>AES</strong> Key Expansi<strong>on</strong><br />

Summary: The key expansi<strong>on</strong> process c<strong>on</strong>verts <strong>the</strong> 128 bit key provided by <strong>the</strong> user to a<br />

set of round keys based off of <strong>the</strong> user key. One 128 bit round key needs to be generated<br />

for each round, plus <strong>on</strong>e additi<strong>on</strong>al round key that is applied before <strong>the</strong> rounds begin. The<br />

number of rounds is dependent <strong>on</strong> key size. Since a 128 bit key is being tested <strong>the</strong>re will<br />

be 10 rounds, and <strong>the</strong>refore 11 round keys.<br />

Cell Optimized Implementati<strong>on</strong>: Key expansi<strong>on</strong> was difficult to optimize due to data<br />

dependencies. The key provided by <strong>the</strong> user is used to generate <strong>the</strong> next 128 bit round<br />

key. This 128 bits is used to generate <strong>the</strong> next 128 bits, and so <strong>on</strong>. Each 128 bit round<br />

key is based <strong>on</strong> <strong>the</strong> previous 128 bit round key, so round keys cannot be generated in<br />

parallel. Not <strong>on</strong>ly is each round key dependent <strong>on</strong> <strong>the</strong> previous 128 bit round key, each<br />

32 bits of a single round key is dependent <strong>on</strong> <strong>the</strong> previous 32 bits of that round key. This<br />

means multiple round keys cannot be generated in parallel, and <strong>the</strong> parts of a single round<br />

key cannot be generated in parallel. This makes it very difficult to take advantage of<br />

SIMD operati<strong>on</strong>s. After much ag<strong>on</strong>y I came up with an interesting soluti<strong>on</strong>. If SIMD<br />

vector operati<strong>on</strong>s cannot be used to expand a single key, <strong>the</strong> <strong>on</strong>ly effective opti<strong>on</strong> is to<br />

expand four keys at a time! This technique would not be very useful if <strong>the</strong> use case<br />

required a single key to encrypt many blocks as symmetric ciphers are typically used, but<br />

it is a perfect fit for a key search since a key expansi<strong>on</strong> is being performed for every<br />

encrypt operati<strong>on</strong>.<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 10


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

<strong>AES</strong> Implementati<strong>on</strong> – x86<br />

There are many x86 optimized <strong>AES</strong> implementati<strong>on</strong>s available. For <strong>the</strong> comparis<strong>on</strong> I chose <strong>the</strong><br />

Rijndael Optimized C Code versi<strong>on</strong> 3.0 [5]. This implementati<strong>on</strong> was chosen because it is in <strong>the</strong><br />

public domain, performs well, and is frequently referenced and cited in comparis<strong>on</strong>s. The<br />

reference implementati<strong>on</strong> of <strong>AES</strong> could have been used for <strong>the</strong> comparis<strong>on</strong>, but this would have<br />

heavily favored <strong>the</strong> Cell.<br />

Test C<strong>on</strong>figurati<strong>on</strong><br />

The goal of this comparis<strong>on</strong> was to compare <strong>the</strong> Playstati<strong>on</strong> 3/Cell to commodity PC hardware.<br />

Test envir<strong>on</strong>ments were set up as closely to each o<strong>the</strong>r as possible given <strong>the</strong> vast differences in<br />

architecture between <strong>the</strong> platforms.<br />

Envir<strong>on</strong>ment<br />

Commodity PC System:<br />

Cell System:<br />

Test Parameters<br />

Hardware:<br />

Athl<strong>on</strong> 64 x2 3800+ CPU<br />

2 GB Corsair PC 3200 RAM<br />

Abit AV8 Mo<strong>the</strong>rboard<br />

Software:<br />

Fedora Core 6 Linux<br />

GCC 4.1.2 Compiler<br />

Hardware:<br />

S<strong>on</strong>y Playstati<strong>on</strong> 3<br />

Software:<br />

Fedora Core 6 Linux<br />

IBM Cell SDK 2.1<br />

GCC 4.1.2 Compiler<br />

Each versi<strong>on</strong> of <strong>the</strong> software was c<strong>on</strong>figured to search <strong>the</strong> range of keys from<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 11


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

0x00000000000000000000000000000000 to 0x0000000000000000000000003C000000. This is<br />

1,006,632,960 keys. Yes, over a billi<strong>on</strong>. The Cell versi<strong>on</strong> of <strong>the</strong> program breaks <strong>the</strong> keyspace up<br />

into six chunks, <strong>on</strong>e for each SPU. The x86 versi<strong>on</strong> breaks <strong>the</strong> keyspace up into two chunks, <strong>on</strong>e<br />

for each core. Threads are spawned to process <strong>the</strong> chunks in parallel until <strong>the</strong> keyspace has been<br />

searched. Arbitrary data was chosen for <strong>the</strong> plaintext to encrypt. An arbitrary key was chosen<br />

that is outside <strong>the</strong> search range to ensure that <strong>the</strong> entire range was searched.<br />

Results<br />

x86:<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0<br />

0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0 0x0 0x0.<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0<br />

0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3c 0x0 0x0 0x0.<br />

Successfully executed in 133.00 sec<strong>on</strong>ds.<br />

Keyspace Searched<br />

Cell:<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0<br />

0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0.<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0<br />

0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x14 0x0 0x0 0x0.<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x14 0x0<br />

0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0 0x0 0x0.<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0<br />

0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x28 0x0 0x0 0x0.<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x28 0x0<br />

0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0.<br />

Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0<br />

0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3c 0x0 0x0 0x0.<br />

Successfully executed in 47.00 sec<strong>on</strong>ds.<br />

Keyspace Searched<br />

Analysis<br />

The Cell optimized <strong>AES</strong> attack searched <strong>the</strong> key range in 35% of <strong>the</strong> time that <strong>the</strong> x86 versi<strong>on</strong><br />

took. The Cell optimized versi<strong>on</strong> was nearly three times as fast! Performance per dollar is also a<br />

factor. A 60 GB Playstati<strong>on</strong> 3 and a PC with similar specificati<strong>on</strong>s to <strong>the</strong> test system cost roughly<br />

<strong>the</strong> same (around $500). This allows <strong>the</strong> Playstati<strong>on</strong> 3 to keep its nearly 3:1 performance<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 12


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

advantage when cost is c<strong>on</strong>sidered. Depending <strong>on</strong> <strong>the</strong> applicati<strong>on</strong> it may not be necessary to<br />

purchase hardware to harness <strong>the</strong> power of <strong>the</strong> Playstati<strong>on</strong> 3. As <strong>the</strong> Folding @ Home team<br />

found, <strong>the</strong>re are plenty of idle CPU cycles that can be utilized. Folding @ Home gets users to<br />

d<strong>on</strong>ate CPU cycles to a good cause. Perhaps a generic massively parallel computing network of<br />

Playstati<strong>on</strong> 3s would work? Businesses and scientists could lease time <strong>on</strong> <strong>the</strong> network, and users<br />

with idle hardware could get paid to put <strong>the</strong> hardware <strong>the</strong>y already own to use. The Cell<br />

processor is a true disruptive technology. Its power is just beginning to be recognized. Hopefully<br />

this paper provided a good introducti<strong>on</strong> to <strong>the</strong> Cell processor, exposed its potential for compute<br />

intensive applicati<strong>on</strong>s, and provided insight into programming <strong>the</strong> Cell.<br />

Acknowledgements<br />

I’d like to thank a handful of people/resources that provided assistance and guidance with this<br />

paper/project.<br />

Dr. Barun Chandra (Project Advisor)<br />

Neil Costigan (Editor, provider of priceless advice)<br />

IBM (The Cell programming resources <strong>the</strong>y put in <strong>the</strong> public domain are amazing).<br />

http://ps2dev.org/ (One of <strong>the</strong> few places to find Cell programmers)<br />

My wife Michele (Put up with me working <strong>on</strong> this n<strong>on</strong>-stop)<br />

References<br />

[ 1 ] N a t i o n a l I n s t i t u t e o f S t a n d a r d s a n d Te c h n o l o g y, F e d e r a l<br />

I n f o r m a t i o n Pro c e s s i n g S t a n d a rd s P u b l i c a t i o n 1 9 7 : A d v a n c e d<br />

E n c r y p t i o n S t a n d a rd , 2 0 0 1 .<br />

[ 2 ] I B M , C e l l Bro a d b a n d E n g i n e P ro gramming Handbook Ve r s i o n 1 . 1 ,<br />

2 0 0 7 .<br />

[ 3 ] I B M , C e l l Bro a d b a n d E n g i n e P ro gramming Tu t o r i a l Ve r s i o n 2 . 1 ,<br />

2 0 0 7 .<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 13


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

[ 4 ] P. B a r r e t o , V. R ij men, “Reference A N S I C c o d e , ” 2 0 0 2 M a r c h<br />

( Ve r s i o n 2 . 2 ) , Av a i l a b l e H T T P :<br />

h t t p : / / w w w. i a i k . t u g r a z . a t / r e s e a r c h / k r yp t o / A E S / o l d / ~ r i j m e n / r i j n d a e l /<br />

r i j n d a e l r e f . z i p .<br />

[ 5 ] V. Rij men, A . B o s s e l a e r s , P. B a r r e t o , “ O p t i m i z e d A N S I C c o d e f o r<br />

t h e R i j n d a e l c i p h e r ( n o w A E S ) , ” 2 0 0 0 D e c e m b e r ( Ve r s i o n 3 . 0 ) ,<br />

Av a i l a b l e H T T P :<br />

h t t p : / / w w w. i a i k . t u g r a z . a t / r e s e a r c h / k r yp t o / A E S / o l d / ~ r i j m e n / r i j n d a e l /<br />

r i j n d a e l - f s t - 3 . 0 . z i p .<br />

[ 6 ] V. Pande, S t a n f o r d U n i v e r s i t y, " F o l d i n g @ H o me D i s t r i b u t e d<br />

C o mputing" 2000-2006. [Online]. Av a i l a b l e :<br />

h t t p : / / f o l d i n g . s t a n f o r d . e d u / . [ A c c e s s e d A u g u s t 2 9 2 0 0 7 ] .<br />

[ 7 ] J . D a e m e n , V. R ij me n S , T h e D esign of Rijndael. Ve r l a g : S p r i n g e r,<br />

2 0 0 2 .<br />

[ 8 ] J . K a h l e e t a l . , " I n t r o d u c t i o n t o t h e C e l l M u l t i p r o c e s s o r, " I B M J .<br />

R e s e a r c h a n d D e v e l o p m e n t , S e p t . 2 0 0 5 .<br />

[ 9 ] C . R . J o h n s , D . A . B r o k e n s h i r e , " I n t r o d u c t i o n t o t h e C e l l B r o a d b a n d<br />

E n g i n e A r c h i t e c t u r e , ” I B M J . R e s e a r c h a n d D e v e l o p ment, Sept.<br />

2 0 0 7 .<br />

[ 1 0 ] I B M , “ I B M B l a d e C e n t e r Q S 2 0 b l a d e w i t h n e w C e l l B E p r o c e s s o r<br />

o ff e r s u n i q u e c a p a b i l i t i e s f o r g r a p h i c - i n t e n s i v e , n u m e r i c<br />

a p p l i c a t i o n s , ” S e p t e m b e r 2 0 0 6 , h t t p : / / w w w -<br />

3 0 6 . i b m. c o m/ c o m m o n / s s i / r e p _ c a / 7 / 8 9 7 / E N U S 1 0 6 - 6 7 7 / i n d e x . h t ml .<br />

[ 11 ] M e r c u r y C o mputer S ys t e ms , " C e l l B r o a d b a n d E n g i n e ( B E )<br />

P r o c e s s o r S o l u t i o n s , ” 2 0 0 7 , h t t p : / / w w w. mc.com/ m i c r o s i t e s / c e l l .<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 14


Kenneth Roe (kennethroe@sbcglobal.net)<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />

Computer Science Masters Project<br />

University of New Haven<br />

[ 1 2 ] I B M , “ I B M t o B u i l d Wo r l d ' s F i r s t C e l l B r o a d b a n d E n g i n e B a s e d<br />

S u p e r c o mputer, ” h t t p : / / w w w -<br />

0 3 . i b m. c o m/press/us/en/pressrelease/20210.wss.<br />

[ 1 3 ] S . S i e w e r t , “ T h e C e l l B r o a d b a n d E n g i n e c h i p : H i g h - s p e e d o ff l o a d<br />

f o r t h e ma s s e s , ” A p r i l 2 0 0 7 ,<br />

h t t p : / / w w w. i b m. c o m / d e v e l o p e r w o r k s / l i n u x / l i b r a r y/ p a -<br />

s o c 1 2 / i n d e x . h t ml?ca=drs-.<br />

[ 1 4 ] I B M , C / C + + L a n g u a g e E x t e n s i o n s f o r C e l l Bro a d b a n d E n g i n e<br />

A rc h i t e c t u re Ve r s i o n 2 . 4 , 2 0 0 7 .<br />

<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />

4/15/2007 5:00 PM<br />

P a g e | 15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!