Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
NEWSANALYSIS<br />
Google says its AI chips smoke<br />
CPUs, GPUs in performance tests<br />
The TPUs are faster at neural net inference and excel at performance per watt, reveals Blair Hanley Frank<br />
F<br />
our years ago, Google was faced<br />
with a conundrum: if all its users<br />
hit its voice-recognition services<br />
for three minutes a day, it would need to<br />
double the number of data centres just to<br />
handle all of the requests to the machine<br />
learning system powering those services.<br />
Rather than buy a bunch of new real<br />
estate and servers just for that purpose, the<br />
company embarked on a journey to create<br />
dedicated hardware for running machinelearning<br />
applications like voice recognition.<br />
The result was the Tensor Processing Unit<br />
(TPU), a chip that is designed to accelerate<br />
the inference stage of deep neural networks.<br />
Google published a paper recently laying<br />
out the performance gains the company<br />
saw over comparable CPUs and GPUs, both<br />
in terms of raw power and the performance<br />
per watt of power consumed.<br />
A TPU was on average 15- to 30 times<br />
faster at the machine learning inference<br />
tasks tested than a comparable serverclass<br />
Intel Haswell CPU or Nvidia K80 GPU.<br />
Importantly, the performance per watt of<br />
the TPU was 25 to 80 times better than<br />
what Google found with the CPU and GPU.<br />
Driving this sort of performance increase<br />
is important for Google, considering the<br />
company’s emphasis on building machine<br />
learning applications. The gains validate<br />
the company’s focus on building machine<br />
learning hardware at a time when it’s<br />
harder to get massive performance<br />
boosts from traditional silicon.<br />
This is more than just an academic<br />
exercise. Google has used TPUs in its<br />
data centres since 2015 and they’ve been<br />
put to use improving the performance<br />
of applications including translation and<br />
image recognition. The TPUs are particularly<br />
useful when it comes to energy efficiency,<br />
which is an important metric related to the<br />
cost of using hardware at massive scale.<br />
One of the other key metrics for Google’s<br />
purposes is latency, which is where the TPUs<br />
excel compared to other silicon options.<br />
Norm Jouppi, a distinguished hardware<br />
engineer at Google, said that machine<br />
learning systems need to respond quickly in<br />
order to provide a good user experience.<br />
“The point is, the internet takes time, so<br />
if you’re using an internet-based server, it<br />
takes time to get from your device to the<br />
cloud, it takes time to get back,” Jouppi said.<br />
“Networking and various things in the cloud<br />
— in the data centre — they takes some time.<br />
So that doesn’t leave a lot of [time] if you<br />
want near-instantaneous responses.”<br />
Google tested the chips on six different<br />
neural network inference applications,<br />
representing 95 percent of all such<br />
applications in Google’s data centres. The<br />
applications tested include DeepMind<br />
AlphaGo, the system that defeated Lee<br />
Sedol at Go in a five-game match in 2016.<br />
Performance<br />
The company tested the TPUs against<br />
hardware that was released around roughly<br />
the same time to try and get an applesto-apples<br />
performance comparison. It’s<br />
possible that newer hardware would at<br />
least narrow the performance gap.<br />
There’s still room for TPUs to improve,<br />
too. Using the GDDR5 memory that’s present<br />
in an Nvidia K80 GPU with the TPU should<br />
provide a performance improvement over<br />
the existing configuration that Google tested.<br />
According to the company’s research, the<br />
performance of several applications was<br />
constrained by memory bandwidth.<br />
Furthermore, the authors of Google’s<br />
paper claim that there’s room for additional<br />
software optimisation to increase<br />
performance. The writers called out one<br />
of the tested convolutional neural network<br />
applications (referred to in the paper as<br />
CNN1) as a candidate. However, because of<br />
existing performance gains from the use of<br />
TPUs, it’s not clear if those optimisations<br />
will take place. While neural networks mimic<br />
the way neurons transmit information in<br />
humans, CNNs are modelled specifically on<br />
how the brain processes visual information.<br />
“As CNN1 currently runs more than 70<br />
times faster on the TPU than the CPU, the<br />
CNN1 developers are already very happy,<br />
so it’s not clear whether or when such<br />
optimisations would be performed,” the<br />
authors wrote.<br />
TPUs are what’s known in chip lingo as an<br />
application-specific integrated circuit (ASIC).<br />
They’re custom silicon built for one task,<br />
with an instruction set hard-coded into the<br />
chip itself. Jouppi said that he wasn’t overly<br />
concerned by that, and pointed out that the<br />
TPUs are flexible enough to handle changes<br />
in machine learning models. “It’s not like it<br />
was designed for one model, and if someone<br />
comes up with a new model, we’d have to<br />
junk our chips or anything like that,” he said.<br />
Google isn’t the only company focused<br />
on using dedicated hardware for machine<br />
learning. Jouppi added that he knows of<br />
several start-ups working in the space,<br />
and Microsoft has deployed a fleet of<br />
field-programmable gate arrays in its data<br />
centres to accelerate networking and<br />
machine learning applications. J<br />
<strong>July</strong> <strong>2017</strong> www.pcadvisor.co.uk/news 11