Cryptography on the Playstation 3: Brute Force AES Attack Introduction
Cryptography on the Playstation 3: Brute Force AES Attack Introduction
Cryptography on the Playstation 3: Brute Force AES Attack Introduction
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong><br />
<strong>Force</strong> <strong>AES</strong> <strong>Attack</strong><br />
Introducti<strong>on</strong><br />
Adults typically categorize game c<strong>on</strong>soles as toys. This is not surprising, since that is <strong>the</strong>ir<br />
intended purpose. Recently however <strong>the</strong> technology used in game c<strong>on</strong>soles has become<br />
extremely advanced, even bleeding edge. While <strong>the</strong> intended purpose of <strong>the</strong>se devices is<br />
entertainment, modern game c<strong>on</strong>sole hardware such as <strong>the</strong> S<strong>on</strong>y Playstati<strong>on</strong> 3 have more in<br />
comm<strong>on</strong> with high end supercomputers than <strong>the</strong>y do with <strong>the</strong> toys <strong>the</strong>y share a shelf with.<br />
Unfortunately most of this power goes to waste – <strong>the</strong>se machines sit idle most of <strong>the</strong> day.<br />
Stanford University’s Pande group recognized this wasted potential and found a way to utilize it.<br />
This group runs <strong>the</strong> Folding @ Home network, a distributed computing network where people<br />
can d<strong>on</strong>ate idle processing power to scientists so <strong>the</strong>y can run complex calculati<strong>on</strong>s [6]. This<br />
group worked with S<strong>on</strong>y to develop a Folding @ Home client for <strong>the</strong> Playstati<strong>on</strong> 3. Since this<br />
client went live, <strong>the</strong> processing power of <strong>the</strong>ir distributed computing network has more than<br />
doubled to become <strong>on</strong>e of <strong>the</strong> fastest distributed computing networks in <strong>the</strong> world. The<br />
overwhelming success of <strong>the</strong> Playstati<strong>on</strong> 3 in <strong>the</strong> protein folding project was <strong>the</strong> inspirati<strong>on</strong> for<br />
this project. Since it does so well at <strong>the</strong>se complex scientific calculati<strong>on</strong>s, how would it do at<br />
cryptography? Perhaps more importantly can a typical programmer unlock <strong>the</strong> potential of <strong>the</strong><br />
Cell, or is this a beast that can <strong>on</strong>ly be tamed by a handful of specialists?<br />
Strategy<br />
This project could have examined <strong>the</strong> technical specificati<strong>on</strong>s of <strong>the</strong> Playstati<strong>on</strong> 3 and its Cell<br />
processor and marvel at how great it would be at cryptography, but that would leave <strong>the</strong> reader<br />
w<strong>on</strong>dering if <strong>the</strong>ory translates to reality, and more importantly it would not be much fun. To see<br />
what <strong>the</strong> Playstati<strong>on</strong> 3 can really do this project implemented an interesting, computati<strong>on</strong>ally<br />
intensive applicati<strong>on</strong> <strong>on</strong> both <strong>the</strong> Playstati<strong>on</strong> 3 and <strong>on</strong> commodity PC hardware to compare <strong>the</strong>ir<br />
performance. The chosen example was a known plaintext brute force attack <strong>on</strong> <strong>the</strong> <strong>AES</strong><br />
encrypti<strong>on</strong> algorithm. This paper did not aim to shoot holes in <strong>AES</strong>; this would be naive and<br />
unachievable. The primary goal of this paper is to teach <strong>the</strong> reader about a unique new breed of<br />
processor and show an interesting applicati<strong>on</strong> of <strong>the</strong> Cell processor that highlights its<br />
performance potential.<br />
<strong>AES</strong><br />
<strong>AES</strong> is a symmetric block cipher that is <strong>the</strong> current FIPS standard for protecting electr<strong>on</strong>ic data in<br />
business and government (FIPS 197) [1]. It is based off of <strong>the</strong> Rijndael cipher but uses a fixed 128 bit<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 1
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
block size and <strong>on</strong>ly supports 128, 192, or 256 bit keys (Rijndael allows variable key length and block size)<br />
[7].<br />
Cryptanalysis<br />
Every cryptographic algorithm is vulnerable to cryptanalysis. There are many different<br />
cryptanalysis techniques, but <strong>on</strong>e is possible with every cryptographic algorithm: brute force.<br />
The <strong>the</strong>oretical vulnerability of an algorithm is determined by <strong>the</strong> effort required to search <strong>the</strong> full<br />
key space; if every possible key is tried, <strong>on</strong>e of <strong>the</strong>m has to be <strong>the</strong> correct key. Due to <strong>the</strong>ir<br />
simplistic nature many would c<strong>on</strong>sider brute force attacks to be primitive. Primitive or not, it is<br />
often <strong>the</strong> most effective cryptanalytic technique. Algorithms are c<strong>on</strong>sidered “broken” when an<br />
attack is found that requires less than <strong>the</strong> number of operati<strong>on</strong>s a brute force attack would require<br />
to recover <strong>the</strong> plaintext. These “breaks” show weakness in <strong>the</strong> algorithm, but are often nothing<br />
more than <strong>the</strong>oretical weaknesses due to unrealistic c<strong>on</strong>straints <strong>on</strong> <strong>the</strong> attack, such as requiring a<br />
huge number of known plaintext, ciphertext pairs. It’s hard for an attacker to get a single known<br />
plaintext, ciphertext pair, let al<strong>on</strong>e a large number of <strong>the</strong>m.<br />
The Cell<br />
The heart of <strong>the</strong> Playstati<strong>on</strong> 3 is <strong>the</strong> Cell Broadband Engine microprocessor. This processor was<br />
jointly developed by S<strong>on</strong>y, Toshiba, and IBM (STI) [8]. While <strong>the</strong> purpose of this paper is not to<br />
marvel at <strong>the</strong> microarchitecture of <strong>the</strong> Cell, understanding what makes <strong>the</strong> Cell different from<br />
typical CPUs is essential to unlocking its power. Traditi<strong>on</strong>al microprocessors have a single<br />
general purpose processing core. Recently multi-core processors have reached <strong>the</strong> mainstream<br />
market. These are essentially multiple (identical) general purpose processing cores packaged<br />
toge<strong>the</strong>r so <strong>the</strong>y can be installed in a single socket. They are a c<strong>on</strong>solidated versi<strong>on</strong> of <strong>the</strong> shared<br />
memory multiprocessor systems that preceded <strong>the</strong>m.<br />
The Cell is different, very different. Instead of trying to build a faster processor by cramming<br />
more transistors into general purpose processors STI decided to improve performance through<br />
specializati<strong>on</strong>. The Cell c<strong>on</strong>sists of a single general purpose core called <strong>the</strong> Power Processing<br />
Element (PPE) and eight highly specialized 128 bit vector processing units called Synergistic<br />
Processing Elements (SPEs). The PPE is capable of general purpose computing; it is <strong>the</strong> heart of<br />
<strong>the</strong> Cell. The PPE is a 64 bit RISC processing unit based off IBM’s POWER architecture that is<br />
capable of running two threads in parallel [9]. The SPEs are <strong>the</strong> workhorses of <strong>the</strong> Cell. These<br />
are specialized processing units that are built to perform a limited set of operati<strong>on</strong>s very quickly.<br />
The general programming strategy recommended by IBM is to c<strong>on</strong>trol <strong>the</strong> SPEs with <strong>the</strong> code<br />
running <strong>on</strong> <strong>the</strong> PPE, and offload all compute intensive work to <strong>the</strong> SPEs [3].<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 2
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
The PPE and SPEs are not set up in a traditi<strong>on</strong>al shared memory multiprocessor c<strong>on</strong>figurati<strong>on</strong>.<br />
The PPE is linked directly to main memory (256MB). The SPEs each have <strong>the</strong>ir own private<br />
memory, referred to as <strong>the</strong> Local Store (256KB). The PPE and SPEs are c<strong>on</strong>nected via a high<br />
capacity interc<strong>on</strong>nect called <strong>the</strong> Element Interc<strong>on</strong>nect Bus (EIB). While <strong>the</strong> PPE and SPEs are<br />
c<strong>on</strong>nected, <strong>the</strong> SPEs cannot access main memory directly; <strong>the</strong>y must do so via DMA [9]. This is<br />
a unique memory c<strong>on</strong>figurati<strong>on</strong>. The SPEs can access <strong>the</strong>ir local store very quickly, and since a<br />
local store is dedicated to a single SPE it does not need to worry about c<strong>on</strong>tenti<strong>on</strong>. When SPEs<br />
need to work with data that is in main memory, <strong>the</strong> data must be transferred across <strong>the</strong> EIB. The<br />
PPE and SPEs can both initiate DMA requests, but for efficiency reas<strong>on</strong>s it is preferable to<br />
initiate DMA from <strong>the</strong> SPEs [3]. If an SPE modifies data and wants <strong>the</strong> PPU to be able to see <strong>the</strong><br />
change, it needs to write <strong>the</strong> updated data back to main memory via DMA.<br />
The Cell processor is not exclusive to <strong>the</strong> Playstati<strong>on</strong> 3, it is also available in high performance<br />
servers such as <strong>the</strong> IBM QS20 [10], dedicated processing boards such as those produced by<br />
Mercury Computer [11], and in <strong>the</strong> Department of Energy’s next supercomputer, <strong>the</strong> IBM<br />
Roadrunner [12]. The Cell processor in <strong>the</strong> Playstati<strong>on</strong> 3 is a full featured Cell, <strong>the</strong> <strong>on</strong>ly<br />
restricti<strong>on</strong> being that <strong>on</strong>e of <strong>the</strong> eight SPEs is disabled. This was d<strong>on</strong>e to improve chip yields;<br />
many more processors can pass QC if <strong>on</strong>e of <strong>the</strong> SPEs is permitted to be defective [13]. The<br />
Playstati<strong>on</strong> 3 provides a unique opportunity to gain access to supercomputer technology without<br />
paying supercomputer prices (even <strong>the</strong> relatively cheap QS20 blade server is around $20,000<br />
[10]).<br />
SPE C<strong>on</strong>cepts<br />
The SPEs are what makes <strong>the</strong> Cell a m<strong>on</strong>ster. Their wide 128 bit registers and SIMD instructi<strong>on</strong><br />
set allow huge volumes of data to be processed quickly. The fact that <strong>the</strong> Cell processor c<strong>on</strong>tains<br />
eight SPEs does not hurt ei<strong>the</strong>r (six available through Linux [13]). To harness <strong>the</strong> power of <strong>the</strong><br />
SPEs programmers must understand a few key c<strong>on</strong>cepts:<br />
Vector Data Types<br />
Vectors data types allow multiple sub-quadword values to be stored in a single 128 bit quadword.<br />
The number of values that can fit in a vector vary based <strong>on</strong> <strong>the</strong> type of scalar being aggregated<br />
into a vector. A vector unsigned char can hold 16 X 8 bit unsigned chars, a vector unsigned int<br />
can hold 4 X 32 bit unsigned ints, and so <strong>on</strong>. The utility of vector data types is not immediately<br />
obvious, but will become so in <strong>the</strong> next secti<strong>on</strong>.<br />
SIMD Operati<strong>on</strong>s<br />
Vector data types do not do much <strong>on</strong> <strong>the</strong>ir own; <strong>the</strong>y just provide a way to structure data in<br />
quadwords so it can be processed using SIMD operati<strong>on</strong>s. SIMD stands for Single Instructi<strong>on</strong><br />
Multiple Data. SIMD operati<strong>on</strong>s allow <strong>on</strong>e CPU operati<strong>on</strong> to be applied to multiple values in<br />
parallel. The figure below shows two lists of four 32 bit unsigned ints being strored in two 128<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 3
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
bit vector unsigned ints. These are added using an SIMD add operati<strong>on</strong> to produce a 128 bit<br />
vector unsigned int result. With a single CPU instructi<strong>on</strong> four pairs of values have been added.<br />
Intrinsics<br />
C<strong>on</strong>cept derived from [2]<br />
Typically a high level language like C would be used to implement an algorithm, and <strong>the</strong><br />
compiler would be resp<strong>on</strong>sible for mapping <strong>the</strong> C operati<strong>on</strong>s to CPU instructi<strong>on</strong>s. This works<br />
well for some target platforms. A good C compiler can normally come very close to <strong>the</strong><br />
performance of an assembly implementati<strong>on</strong>. This is not <strong>the</strong> case with <strong>the</strong> Cell however. For <strong>the</strong><br />
Cell to perform, data needs to be vectorized and SIMD operati<strong>on</strong>s need to be used. I expected <strong>the</strong><br />
compiler to be able to automatically vectorize data (such as scalar arrays) and use SIMD<br />
operati<strong>on</strong>s, but C code compiled for <strong>the</strong> Cell performed dismally. I am not sure if this is due to<br />
<strong>the</strong> immaturity of <strong>the</strong> compiler or if effective auto-vectorizati<strong>on</strong> is too much to ask.<br />
Optimizing for <strong>the</strong> Cell requires <strong>the</strong> programmer to c<strong>on</strong>vert data to vectors and c<strong>on</strong>vert<br />
operati<strong>on</strong>s to intrinsics calls. A vector is a 128 bit chunk of data. The SPEs have 128 bit<br />
registers, so any of <strong>the</strong> vector types can fit in a single register. SPE CPU instructi<strong>on</strong>s operate <strong>on</strong><br />
vectors. IBM has provided a library of SPE intrinsics that allow <strong>the</strong> programmer to make near<br />
direct calls to <strong>the</strong> SPEs SIMD CPU instructi<strong>on</strong>s [14]. This allows <strong>the</strong> programmer to take back<br />
some c<strong>on</strong>trol from <strong>the</strong> compiler without resorting to raw assembly.<br />
Vector/Intrinsics Example: XOR two 128 bit chunks of data.<br />
Standard C implementati<strong>on</strong>:<br />
unsigned char chunk1[16] = {0xA0,0x03,0x00,0x04,0x13,0xB4,0x00,0x05,0x80,0x66,0xDF,0x01,0x34,0x06,0x80,0x10};<br />
unsigned char chunk2[16] = {0xF4,0x50,0x01,0xA4,0x57,0x23,0x60,0x40,0xF0,0xAA,0x12,0x40,0x01,0xEF,0xC4,0x08};<br />
unsigned char result[16];<br />
int i;<br />
for (i=0;i
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
The standard C implementati<strong>on</strong> represents <strong>the</strong> 128 bits of data as an array of unsigned chars. It<br />
<strong>the</strong>n loops through each positi<strong>on</strong> and XORs <strong>the</strong> chunks to produce <strong>the</strong> result. The compiler will<br />
need to interpret <strong>the</strong> code and decide how to map this to CPU instructi<strong>on</strong>s. On a machine with 32<br />
or 64 bit registers multiple CPU instructi<strong>on</strong>s would typically be required to XOR two 128 bit<br />
chunks of data.<br />
The vectorized versi<strong>on</strong> using SPE intrinsics takes c<strong>on</strong>trol away from <strong>the</strong> compiler. The 128 bits<br />
of data is represented with a vector unsigned char. This is a data type that c<strong>on</strong>tains 16 8 bit<br />
unsigned chars aligned to be stored in a single register. The XOR operati<strong>on</strong> is performed using<br />
<strong>the</strong> spu_xor intrinsic. The documentati<strong>on</strong> for this intrinsic indicates that it will be mapped to a<br />
single CPU instructi<strong>on</strong> – XOR [14]. Intrinsics allow <strong>the</strong> programmer to make low level calls<br />
easily from a high level language.<br />
Performance Comparis<strong>on</strong> Strategy<br />
To evaluate <strong>the</strong> performance of <strong>the</strong> Playstati<strong>on</strong> 3 and its Cell processor I implemented a brute<br />
force attack <strong>on</strong> <strong>AES</strong> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3 as well as <strong>on</strong> commodity PC hardware. A range of keys<br />
to search was chosen, and <strong>the</strong> search was performed <strong>on</strong> both sets of hardware. Timing was taken<br />
for both versi<strong>on</strong>s and <strong>the</strong> results were compared.<br />
Software Design<br />
The x86 and Cell versi<strong>on</strong>s of <strong>the</strong> attack will use <strong>the</strong> same basic design. There will be a single<br />
c<strong>on</strong>troller thread and many worker threads.<br />
C<strong>on</strong>troller<br />
The c<strong>on</strong>troller is resp<strong>on</strong>sible for managing <strong>the</strong> attack. It breaks up <strong>the</strong> work to be d<strong>on</strong>e into<br />
chunks. Worker threads are spawned to process <strong>the</strong> chunks. The optimal number of worker<br />
threads will be <strong>the</strong> number of cores available <strong>on</strong> <strong>the</strong> target CPU.<br />
Pseudocode:<br />
Build a known plaintext, ciphertext pair.<br />
for (each core)<br />
{<br />
Allocate a porti<strong>on</strong> (1/# cores) of <strong>the</strong> keyspace to be searched.<br />
}<br />
Spawn a worker thread to search <strong>the</strong> keyspace passing it <strong>the</strong> range to search<br />
and <strong>the</strong> known plaintext, ciphertext pair.<br />
Exit when a worker thread finds <strong>the</strong> key OR all worker threads finish searching<br />
<strong>the</strong>ir chunk of <strong>the</strong> keyspace.<br />
Worker:<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 5
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
The worker loops through <strong>the</strong> range of keys and tests each candidate key. If <strong>the</strong> key is found it<br />
returns <strong>the</strong> key. When <strong>the</strong> entire range has been searched <strong>the</strong> worker returns indicating that <strong>the</strong><br />
key was not found.<br />
Pseudocode:<br />
for (start of key range TO end of key range)<br />
{<br />
Encrypt known plaintext to produce candidate ciphertext.<br />
if (candidate ciphertext = actual ciphertext)<br />
{<br />
return <strong>the</strong> key;<br />
}<br />
}<br />
Move to next key.<br />
<strong>AES</strong> Implementati<strong>on</strong> - Cell<br />
<strong>AES</strong> can be optimized for a wide range of architectures. To maximize performance software<br />
implementati<strong>on</strong>s need to be designed with <strong>the</strong> target platform in mind. Optimized software<br />
implementati<strong>on</strong>s are widely available for many types of hardware, from smart cards to<br />
supercomputers. There is not currently a freely available Cell optimized <strong>AES</strong> implementati<strong>on</strong><br />
however. Since <strong>the</strong>re was no Cell optimized <strong>AES</strong> implementati<strong>on</strong> available, I needed to create<br />
<strong>on</strong>e.<br />
The <strong>AES</strong> implementati<strong>on</strong> created for this project is not a full fledged implementati<strong>on</strong>. Only<br />
encrypt and key scheduling operati<strong>on</strong>s needed to be implemented for <strong>the</strong> attack. In additi<strong>on</strong> <strong>the</strong><br />
attack works <strong>on</strong> 128 bit keys, so <strong>the</strong> <strong>AES</strong> implementati<strong>on</strong> <strong>on</strong>ly supports 128 bit keys. It would be<br />
trivial to turn this into a full fledged Cell optimized <strong>AES</strong> implementati<strong>on</strong> (if some<strong>on</strong>e else is<br />
looking for a project).<br />
The process began with evaluating <strong>the</strong> available <strong>AES</strong> implementati<strong>on</strong>s compiled for <strong>the</strong> Cell to<br />
determine how well <strong>the</strong> attack would run without optimizati<strong>on</strong>. Performance was much slower<br />
than <strong>on</strong> x86 hardware with all <strong>the</strong> implementati<strong>on</strong>s tested. It became clear that realizing <strong>the</strong><br />
Cell's potential would require a custom <strong>AES</strong> implementati<strong>on</strong>. The <strong>on</strong>ly questi<strong>on</strong> remaining was<br />
which of <strong>the</strong> existing <strong>AES</strong> implementati<strong>on</strong>s to use as a starting point. The “optimized” reference<br />
implementati<strong>on</strong> [5] performed significantly better than <strong>the</strong> base reference implementati<strong>on</strong> [4], but<br />
<strong>the</strong>re was not a clear divisi<strong>on</strong> of <strong>the</strong> different steps of an <strong>AES</strong> round, so I found it easier to start<br />
my optimizati<strong>on</strong> attempts with <strong>the</strong> base reference implementati<strong>on</strong>. It is necessary to understand<br />
how <strong>AES</strong> works to understand what was d<strong>on</strong>e to optimize it for <strong>the</strong> Cell.<br />
<strong>AES</strong> Encrypt Operati<strong>on</strong><br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 6
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
The <strong>AES</strong> encrypt operati<strong>on</strong> takes a plaintext and an expanded key and returns a ciphertext. The<br />
key steps in encrypti<strong>on</strong>, SubBytes, ShiftRows, MixColumns and AddRoundKey are outlined in<br />
more detail below.<br />
Pseudocode:<br />
[1]<br />
AddRoundKey<br />
Summary: The AddRoundKey step applies a round key that is derived from <strong>the</strong> 128 bit<br />
encrypti<strong>on</strong> key to <strong>the</strong> input. The 128 bit key is applied to <strong>the</strong> 128 bit input by XORing<br />
<strong>the</strong>m toge<strong>the</strong>r.<br />
[1]<br />
Reference Implementati<strong>on</strong>: The reference implementati<strong>on</strong> loops through each byte of<br />
<strong>the</strong> input XORing <strong>on</strong>e at a time with <strong>the</strong> corresp<strong>on</strong>ding byte of <strong>the</strong> round key.<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 7
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
Cell Optimized Implementati<strong>on</strong>: This operati<strong>on</strong> is very easy to optimize for <strong>the</strong> Cell. The<br />
input is 128 bits, <strong>the</strong> round key is 128 bits, and <strong>the</strong> key is applied to <strong>the</strong> input via <strong>the</strong> XOR<br />
operati<strong>on</strong> to produce a 128 bit output. C<strong>on</strong>veniently <strong>the</strong> Cell has a vector intrinsic<br />
(spu_xor) that takes two 128 bit vectors, XORs <strong>the</strong>m, and returns an output vector. The<br />
intrinsic maps to a single CPU instructi<strong>on</strong> (XOR) so <strong>the</strong> entire AddRoundKey step can be<br />
performed in <strong>on</strong>e operati<strong>on</strong>.<br />
SubBytes<br />
Summary: The SubBytes step takes each byte of <strong>the</strong> input, uses it as <strong>the</strong> key to a table<br />
lookup, and replaces <strong>the</strong> byte with <strong>the</strong> value from <strong>the</strong> table.<br />
[1]<br />
Reference Implementati<strong>on</strong>: The s-box is implemented as a 256 entry byte array. The<br />
input array is looped through <strong>on</strong>e byte at a time with <strong>the</strong> current byte used as <strong>the</strong> key to<br />
<strong>the</strong> 256 entry byte array. The value returned from <strong>the</strong> table lookup is substituted in <strong>on</strong>e<br />
byte at a time.<br />
Cell Optimized Implementati<strong>on</strong>: This step is tricky to optimize. It is necessary to look at<br />
what is going <strong>on</strong> from a higher level without being biased by <strong>the</strong> reference<br />
implementati<strong>on</strong>'s approach. The s-box is implemented as a 16 entry quadword array.<br />
Lookups are performed using <strong>the</strong> 5 least significant bits of each byte of <strong>the</strong> input data as a<br />
key to index two of <strong>the</strong> 16 byte entries of <strong>the</strong> s-box array at a time. This is repeated 8<br />
times to search <strong>the</strong> whole 16 entry s-box. The result of <strong>the</strong>se 32 byte s-box lookups is 8<br />
intermediate vectors. These vectors c<strong>on</strong>tain <strong>the</strong> valid substituti<strong>on</strong> values, but <strong>the</strong>y also<br />
c<strong>on</strong>tain many invalid values since <strong>the</strong> most significant 3 bits of each input byte were<br />
ignored while doing <strong>the</strong> lookup. The invalid values need to be eliminated and <strong>the</strong> 8<br />
intermediate vectors need to be c<strong>on</strong>solidated into a single result vector c<strong>on</strong>taining <strong>on</strong>ly <strong>the</strong><br />
valid values. This is d<strong>on</strong>e using binary tree pruning. The bytes of <strong>the</strong> result vectors are<br />
pruned down to eliminate invalid values based <strong>on</strong> <strong>the</strong> value of <strong>the</strong> three most significant<br />
bits that were ignored previously. This is d<strong>on</strong>e in stages, eventually leaving a single<br />
vector c<strong>on</strong>taining <strong>the</strong> substituted values. This seems overly complex and wasteful<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 8
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
compared to <strong>the</strong> single byte substituti<strong>on</strong>s <strong>the</strong> reference implementati<strong>on</strong> uses. This is not<br />
<strong>the</strong> case however due to <strong>the</strong> SIMD operati<strong>on</strong>s this technique uses to perform lookups <strong>on</strong><br />
16 bytes at a time. This technique for SIMD table lookups was illustrated in [2].<br />
SIMD table lookup using binary tree pruning:<br />
ShiftRows<br />
[2]<br />
Summary: The ShiftRows step arranges <strong>the</strong> input as a 4x4 byte array and circular left<br />
shifts each row of <strong>the</strong> array by a varying number of bytes. The first row is not shifted at<br />
all, <strong>the</strong> sec<strong>on</strong>d by <strong>on</strong>e byte, <strong>the</strong> third by two bytes, and <strong>the</strong> fourth by three bytes.<br />
[1]<br />
Reference Implementati<strong>on</strong>: In <strong>the</strong> reference implementati<strong>on</strong> <strong>the</strong> ShiftRows step is<br />
performed by treating <strong>the</strong> input as a 4x4 array as in <strong>the</strong> c<strong>on</strong>ceptual ShiftRows step. Each<br />
of <strong>the</strong> rows except <strong>the</strong> first gets looped through. For each row each byte is copied to <strong>the</strong><br />
destinati<strong>on</strong> positi<strong>on</strong> in a temporary array. The destinati<strong>on</strong> positi<strong>on</strong> for each byte is<br />
determined by performing a lookup in a table that c<strong>on</strong>tains <strong>the</strong> destinati<strong>on</strong> index. After<br />
<strong>the</strong> copies (effectively a circular shift) are performed <strong>on</strong> a row in <strong>the</strong> temporary array <strong>the</strong><br />
row is copied back over <strong>the</strong> input to form <strong>the</strong> output.<br />
Cell Optimized Implementati<strong>on</strong>: This is ano<strong>the</strong>r step that can be performed in a simple<br />
and efficient manner <strong>on</strong> <strong>the</strong> SPEs. Instead of treating <strong>the</strong> 16 bytes of input as a 4x4 array<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 9
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
like <strong>the</strong> c<strong>on</strong>ceptual operati<strong>on</strong> and reference implementati<strong>on</strong> do, <strong>the</strong> 16 bytes are kept in<br />
vector unsigned char form. The row by row variable shifts can be performed by<br />
rearranging <strong>the</strong> bytes. A c<strong>on</strong>stant holds <strong>the</strong> shuffle pattern which dictates where each byte<br />
of <strong>the</strong> input gets moved to produce <strong>the</strong> output. A single spu_shuffle intrinsic call can<br />
perform <strong>the</strong> entire ShiftRows operati<strong>on</strong>. The spu_shuffle intrinsic maps to a single<br />
SHUFB CPU instructi<strong>on</strong>.<br />
MixColumns<br />
The MixColumns step arranges <strong>the</strong> 16 input bytes in a 4x4 byte array and applies a<br />
functi<strong>on</strong> to each column of <strong>the</strong> array. The functi<strong>on</strong> performs a transformati<strong>on</strong> <strong>on</strong> <strong>the</strong> bytes<br />
where each of <strong>the</strong> input bytes affects all of <strong>the</strong> output bytes. This step cannot be explained<br />
without getting into some heavy math. The purpose of this paper is to help <strong>the</strong> reader<br />
understand <strong>the</strong> potential of <strong>the</strong> Cell and what it takes to unlock it so detailed explanati<strong>on</strong>.<br />
This step, especially in <strong>the</strong> Cell optimized versi<strong>on</strong>, is too complex to be a good example<br />
of unlocking <strong>the</strong> power of <strong>the</strong> Cell, so this step will be skipped. The MixColumns<br />
functi<strong>on</strong> was heavily optimized for <strong>the</strong> Cell and <strong>the</strong> implementati<strong>on</strong> is creative and<br />
interesting. This was <strong>the</strong> most time c<strong>on</strong>suming part of <strong>AES</strong> to optimize. If you want<br />
more detail have a look at <strong>the</strong> included Cell optimized <strong>AES</strong> code; it is well documented.<br />
<strong>AES</strong> Key Expansi<strong>on</strong><br />
Summary: The key expansi<strong>on</strong> process c<strong>on</strong>verts <strong>the</strong> 128 bit key provided by <strong>the</strong> user to a<br />
set of round keys based off of <strong>the</strong> user key. One 128 bit round key needs to be generated<br />
for each round, plus <strong>on</strong>e additi<strong>on</strong>al round key that is applied before <strong>the</strong> rounds begin. The<br />
number of rounds is dependent <strong>on</strong> key size. Since a 128 bit key is being tested <strong>the</strong>re will<br />
be 10 rounds, and <strong>the</strong>refore 11 round keys.<br />
Cell Optimized Implementati<strong>on</strong>: Key expansi<strong>on</strong> was difficult to optimize due to data<br />
dependencies. The key provided by <strong>the</strong> user is used to generate <strong>the</strong> next 128 bit round<br />
key. This 128 bits is used to generate <strong>the</strong> next 128 bits, and so <strong>on</strong>. Each 128 bit round<br />
key is based <strong>on</strong> <strong>the</strong> previous 128 bit round key, so round keys cannot be generated in<br />
parallel. Not <strong>on</strong>ly is each round key dependent <strong>on</strong> <strong>the</strong> previous 128 bit round key, each<br />
32 bits of a single round key is dependent <strong>on</strong> <strong>the</strong> previous 32 bits of that round key. This<br />
means multiple round keys cannot be generated in parallel, and <strong>the</strong> parts of a single round<br />
key cannot be generated in parallel. This makes it very difficult to take advantage of<br />
SIMD operati<strong>on</strong>s. After much ag<strong>on</strong>y I came up with an interesting soluti<strong>on</strong>. If SIMD<br />
vector operati<strong>on</strong>s cannot be used to expand a single key, <strong>the</strong> <strong>on</strong>ly effective opti<strong>on</strong> is to<br />
expand four keys at a time! This technique would not be very useful if <strong>the</strong> use case<br />
required a single key to encrypt many blocks as symmetric ciphers are typically used, but<br />
it is a perfect fit for a key search since a key expansi<strong>on</strong> is being performed for every<br />
encrypt operati<strong>on</strong>.<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 10
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
<strong>AES</strong> Implementati<strong>on</strong> – x86<br />
There are many x86 optimized <strong>AES</strong> implementati<strong>on</strong>s available. For <strong>the</strong> comparis<strong>on</strong> I chose <strong>the</strong><br />
Rijndael Optimized C Code versi<strong>on</strong> 3.0 [5]. This implementati<strong>on</strong> was chosen because it is in <strong>the</strong><br />
public domain, performs well, and is frequently referenced and cited in comparis<strong>on</strong>s. The<br />
reference implementati<strong>on</strong> of <strong>AES</strong> could have been used for <strong>the</strong> comparis<strong>on</strong>, but this would have<br />
heavily favored <strong>the</strong> Cell.<br />
Test C<strong>on</strong>figurati<strong>on</strong><br />
The goal of this comparis<strong>on</strong> was to compare <strong>the</strong> Playstati<strong>on</strong> 3/Cell to commodity PC hardware.<br />
Test envir<strong>on</strong>ments were set up as closely to each o<strong>the</strong>r as possible given <strong>the</strong> vast differences in<br />
architecture between <strong>the</strong> platforms.<br />
Envir<strong>on</strong>ment<br />
Commodity PC System:<br />
Cell System:<br />
Test Parameters<br />
Hardware:<br />
Athl<strong>on</strong> 64 x2 3800+ CPU<br />
2 GB Corsair PC 3200 RAM<br />
Abit AV8 Mo<strong>the</strong>rboard<br />
Software:<br />
Fedora Core 6 Linux<br />
GCC 4.1.2 Compiler<br />
Hardware:<br />
S<strong>on</strong>y Playstati<strong>on</strong> 3<br />
Software:<br />
Fedora Core 6 Linux<br />
IBM Cell SDK 2.1<br />
GCC 4.1.2 Compiler<br />
Each versi<strong>on</strong> of <strong>the</strong> software was c<strong>on</strong>figured to search <strong>the</strong> range of keys from<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 11
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
0x00000000000000000000000000000000 to 0x0000000000000000000000003C000000. This is<br />
1,006,632,960 keys. Yes, over a billi<strong>on</strong>. The Cell versi<strong>on</strong> of <strong>the</strong> program breaks <strong>the</strong> keyspace up<br />
into six chunks, <strong>on</strong>e for each SPU. The x86 versi<strong>on</strong> breaks <strong>the</strong> keyspace up into two chunks, <strong>on</strong>e<br />
for each core. Threads are spawned to process <strong>the</strong> chunks in parallel until <strong>the</strong> keyspace has been<br />
searched. Arbitrary data was chosen for <strong>the</strong> plaintext to encrypt. An arbitrary key was chosen<br />
that is outside <strong>the</strong> search range to ensure that <strong>the</strong> entire range was searched.<br />
Results<br />
x86:<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0<br />
0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0 0x0 0x0.<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0<br />
0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3c 0x0 0x0 0x0.<br />
Successfully executed in 133.00 sec<strong>on</strong>ds.<br />
Keyspace Searched<br />
Cell:<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0<br />
0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0.<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0<br />
0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x14 0x0 0x0 0x0.<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x14 0x0<br />
0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0 0x0 0x0.<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0<br />
0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x28 0x0 0x0 0x0.<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x28 0x0<br />
0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0.<br />
Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0<br />
0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3c 0x0 0x0 0x0.<br />
Successfully executed in 47.00 sec<strong>on</strong>ds.<br />
Keyspace Searched<br />
Analysis<br />
The Cell optimized <strong>AES</strong> attack searched <strong>the</strong> key range in 35% of <strong>the</strong> time that <strong>the</strong> x86 versi<strong>on</strong><br />
took. The Cell optimized versi<strong>on</strong> was nearly three times as fast! Performance per dollar is also a<br />
factor. A 60 GB Playstati<strong>on</strong> 3 and a PC with similar specificati<strong>on</strong>s to <strong>the</strong> test system cost roughly<br />
<strong>the</strong> same (around $500). This allows <strong>the</strong> Playstati<strong>on</strong> 3 to keep its nearly 3:1 performance<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 12
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
advantage when cost is c<strong>on</strong>sidered. Depending <strong>on</strong> <strong>the</strong> applicati<strong>on</strong> it may not be necessary to<br />
purchase hardware to harness <strong>the</strong> power of <strong>the</strong> Playstati<strong>on</strong> 3. As <strong>the</strong> Folding @ Home team<br />
found, <strong>the</strong>re are plenty of idle CPU cycles that can be utilized. Folding @ Home gets users to<br />
d<strong>on</strong>ate CPU cycles to a good cause. Perhaps a generic massively parallel computing network of<br />
Playstati<strong>on</strong> 3s would work? Businesses and scientists could lease time <strong>on</strong> <strong>the</strong> network, and users<br />
with idle hardware could get paid to put <strong>the</strong> hardware <strong>the</strong>y already own to use. The Cell<br />
processor is a true disruptive technology. Its power is just beginning to be recognized. Hopefully<br />
this paper provided a good introducti<strong>on</strong> to <strong>the</strong> Cell processor, exposed its potential for compute<br />
intensive applicati<strong>on</strong>s, and provided insight into programming <strong>the</strong> Cell.<br />
Acknowledgements<br />
I’d like to thank a handful of people/resources that provided assistance and guidance with this<br />
paper/project.<br />
Dr. Barun Chandra (Project Advisor)<br />
Neil Costigan (Editor, provider of priceless advice)<br />
IBM (The Cell programming resources <strong>the</strong>y put in <strong>the</strong> public domain are amazing).<br />
http://ps2dev.org/ (One of <strong>the</strong> few places to find Cell programmers)<br />
My wife Michele (Put up with me working <strong>on</strong> this n<strong>on</strong>-stop)<br />
References<br />
[ 1 ] N a t i o n a l I n s t i t u t e o f S t a n d a r d s a n d Te c h n o l o g y, F e d e r a l<br />
I n f o r m a t i o n Pro c e s s i n g S t a n d a rd s P u b l i c a t i o n 1 9 7 : A d v a n c e d<br />
E n c r y p t i o n S t a n d a rd , 2 0 0 1 .<br />
[ 2 ] I B M , C e l l Bro a d b a n d E n g i n e P ro gramming Handbook Ve r s i o n 1 . 1 ,<br />
2 0 0 7 .<br />
[ 3 ] I B M , C e l l Bro a d b a n d E n g i n e P ro gramming Tu t o r i a l Ve r s i o n 2 . 1 ,<br />
2 0 0 7 .<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 13
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
[ 4 ] P. B a r r e t o , V. R ij men, “Reference A N S I C c o d e , ” 2 0 0 2 M a r c h<br />
( Ve r s i o n 2 . 2 ) , Av a i l a b l e H T T P :<br />
h t t p : / / w w w. i a i k . t u g r a z . a t / r e s e a r c h / k r yp t o / A E S / o l d / ~ r i j m e n / r i j n d a e l /<br />
r i j n d a e l r e f . z i p .<br />
[ 5 ] V. Rij men, A . B o s s e l a e r s , P. B a r r e t o , “ O p t i m i z e d A N S I C c o d e f o r<br />
t h e R i j n d a e l c i p h e r ( n o w A E S ) , ” 2 0 0 0 D e c e m b e r ( Ve r s i o n 3 . 0 ) ,<br />
Av a i l a b l e H T T P :<br />
h t t p : / / w w w. i a i k . t u g r a z . a t / r e s e a r c h / k r yp t o / A E S / o l d / ~ r i j m e n / r i j n d a e l /<br />
r i j n d a e l - f s t - 3 . 0 . z i p .<br />
[ 6 ] V. Pande, S t a n f o r d U n i v e r s i t y, " F o l d i n g @ H o me D i s t r i b u t e d<br />
C o mputing" 2000-2006. [Online]. Av a i l a b l e :<br />
h t t p : / / f o l d i n g . s t a n f o r d . e d u / . [ A c c e s s e d A u g u s t 2 9 2 0 0 7 ] .<br />
[ 7 ] J . D a e m e n , V. R ij me n S , T h e D esign of Rijndael. Ve r l a g : S p r i n g e r,<br />
2 0 0 2 .<br />
[ 8 ] J . K a h l e e t a l . , " I n t r o d u c t i o n t o t h e C e l l M u l t i p r o c e s s o r, " I B M J .<br />
R e s e a r c h a n d D e v e l o p m e n t , S e p t . 2 0 0 5 .<br />
[ 9 ] C . R . J o h n s , D . A . B r o k e n s h i r e , " I n t r o d u c t i o n t o t h e C e l l B r o a d b a n d<br />
E n g i n e A r c h i t e c t u r e , ” I B M J . R e s e a r c h a n d D e v e l o p ment, Sept.<br />
2 0 0 7 .<br />
[ 1 0 ] I B M , “ I B M B l a d e C e n t e r Q S 2 0 b l a d e w i t h n e w C e l l B E p r o c e s s o r<br />
o ff e r s u n i q u e c a p a b i l i t i e s f o r g r a p h i c - i n t e n s i v e , n u m e r i c<br />
a p p l i c a t i o n s , ” S e p t e m b e r 2 0 0 6 , h t t p : / / w w w -<br />
3 0 6 . i b m. c o m/ c o m m o n / s s i / r e p _ c a / 7 / 8 9 7 / E N U S 1 0 6 - 6 7 7 / i n d e x . h t ml .<br />
[ 11 ] M e r c u r y C o mputer S ys t e ms , " C e l l B r o a d b a n d E n g i n e ( B E )<br />
P r o c e s s o r S o l u t i o n s , ” 2 0 0 7 , h t t p : / / w w w. mc.com/ m i c r o s i t e s / c e l l .<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 14
Kenneth Roe (kennethroe@sbcglobal.net)<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> Playstati<strong>on</strong> 3: <strong>Brute</strong> force <strong>AES</strong> <strong>Attack</strong><br />
Computer Science Masters Project<br />
University of New Haven<br />
[ 1 2 ] I B M , “ I B M t o B u i l d Wo r l d ' s F i r s t C e l l B r o a d b a n d E n g i n e B a s e d<br />
S u p e r c o mputer, ” h t t p : / / w w w -<br />
0 3 . i b m. c o m/press/us/en/pressrelease/20210.wss.<br />
[ 1 3 ] S . S i e w e r t , “ T h e C e l l B r o a d b a n d E n g i n e c h i p : H i g h - s p e e d o ff l o a d<br />
f o r t h e ma s s e s , ” A p r i l 2 0 0 7 ,<br />
h t t p : / / w w w. i b m. c o m / d e v e l o p e r w o r k s / l i n u x / l i b r a r y/ p a -<br />
s o c 1 2 / i n d e x . h t ml?ca=drs-.<br />
[ 1 4 ] I B M , C / C + + L a n g u a g e E x t e n s i o n s f o r C e l l Bro a d b a n d E n g i n e<br />
A rc h i t e c t u re Ve r s i o n 2 . 4 , 2 0 0 7 .<br />
<str<strong>on</strong>g>Cryptography</str<strong>on</strong>g>OnThePS3.doc<br />
4/15/2007 5:00 PM<br />
P a g e | 15