10.05.2017 Views

PC_Advisor_Issue_264_July_2017

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

NEWSANALYSIS<br />

Google says its AI chips smoke<br />

CPUs, GPUs in performance tests<br />

The TPUs are faster at neural net inference and excel at performance per watt, reveals Blair Hanley Frank<br />

F<br />

our years ago, Google was faced<br />

with a conundrum: if all its users<br />

hit its voice-recognition services<br />

for three minutes a day, it would need to<br />

double the number of data centres just to<br />

handle all of the requests to the machine<br />

learning system powering those services.<br />

Rather than buy a bunch of new real<br />

estate and servers just for that purpose, the<br />

company embarked on a journey to create<br />

dedicated hardware for running machinelearning<br />

applications like voice recognition.<br />

The result was the Tensor Processing Unit<br />

(TPU), a chip that is designed to accelerate<br />

the inference stage of deep neural networks.<br />

Google published a paper recently laying<br />

out the performance gains the company<br />

saw over comparable CPUs and GPUs, both<br />

in terms of raw power and the performance<br />

per watt of power consumed.<br />

A TPU was on average 15- to 30 times<br />

faster at the machine learning inference<br />

tasks tested than a comparable serverclass<br />

Intel Haswell CPU or Nvidia K80 GPU.<br />

Importantly, the performance per watt of<br />

the TPU was 25 to 80 times better than<br />

what Google found with the CPU and GPU.<br />

Driving this sort of performance increase<br />

is important for Google, considering the<br />

company’s emphasis on building machine<br />

learning applications. The gains validate<br />

the company’s focus on building machine<br />

learning hardware at a time when it’s<br />

harder to get massive performance<br />

boosts from traditional silicon.<br />

This is more than just an academic<br />

exercise. Google has used TPUs in its<br />

data centres since 2015 and they’ve been<br />

put to use improving the performance<br />

of applications including translation and<br />

image recognition. The TPUs are particularly<br />

useful when it comes to energy efficiency,<br />

which is an important metric related to the<br />

cost of using hardware at massive scale.<br />

One of the other key metrics for Google’s<br />

purposes is latency, which is where the TPUs<br />

excel compared to other silicon options.<br />

Norm Jouppi, a distinguished hardware<br />

engineer at Google, said that machine<br />

learning systems need to respond quickly in<br />

order to provide a good user experience.<br />

“The point is, the internet takes time, so<br />

if you’re using an internet-based server, it<br />

takes time to get from your device to the<br />

cloud, it takes time to get back,” Jouppi said.<br />

“Networking and various things in the cloud<br />

— in the data centre — they takes some time.<br />

So that doesn’t leave a lot of [time] if you<br />

want near-instantaneous responses.”<br />

Google tested the chips on six different<br />

neural network inference applications,<br />

representing 95 percent of all such<br />

applications in Google’s data centres. The<br />

applications tested include DeepMind<br />

AlphaGo, the system that defeated Lee<br />

Sedol at Go in a five-game match in 2016.<br />

Performance<br />

The company tested the TPUs against<br />

hardware that was released around roughly<br />

the same time to try and get an applesto-apples<br />

performance comparison. It’s<br />

possible that newer hardware would at<br />

least narrow the performance gap.<br />

There’s still room for TPUs to improve,<br />

too. Using the GDDR5 memory that’s present<br />

in an Nvidia K80 GPU with the TPU should<br />

provide a performance improvement over<br />

the existing configuration that Google tested.<br />

According to the company’s research, the<br />

performance of several applications was<br />

constrained by memory bandwidth.<br />

Furthermore, the authors of Google’s<br />

paper claim that there’s room for additional<br />

software optimisation to increase<br />

performance. The writers called out one<br />

of the tested convolutional neural network<br />

applications (referred to in the paper as<br />

CNN1) as a candidate. However, because of<br />

existing performance gains from the use of<br />

TPUs, it’s not clear if those optimisations<br />

will take place. While neural networks mimic<br />

the way neurons transmit information in<br />

humans, CNNs are modelled specifically on<br />

how the brain processes visual information.<br />

“As CNN1 currently runs more than 70<br />

times faster on the TPU than the CPU, the<br />

CNN1 developers are already very happy,<br />

so it’s not clear whether or when such<br />

optimisations would be performed,” the<br />

authors wrote.<br />

TPUs are what’s known in chip lingo as an<br />

application-specific integrated circuit (ASIC).<br />

They’re custom silicon built for one task,<br />

with an instruction set hard-coded into the<br />

chip itself. Jouppi said that he wasn’t overly<br />

concerned by that, and pointed out that the<br />

TPUs are flexible enough to handle changes<br />

in machine learning models. “It’s not like it<br />

was designed for one model, and if someone<br />

comes up with a new model, we’d have to<br />

junk our chips or anything like that,” he said.<br />

Google isn’t the only company focused<br />

on using dedicated hardware for machine<br />

learning. Jouppi added that he knows of<br />

several start-ups working in the space,<br />

and Microsoft has deployed a fleet of<br />

field-programmable gate arrays in its data<br />

centres to accelerate networking and<br />

machine learning applications. J<br />

<strong>July</strong> <strong>2017</strong> www.pcadvisor.co.uk/news 11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!