Developing High-Performance Networking Applications with ...

ethernetsummit.com

Developing High-Performance Networking Applications with ...

Developing High-Performance Networking

Applications with Manycore Processors

Michael Zimmerman, vice president of marketing, Tilera

mzimmerman@tilera.com

Ethernet Technology Summit 2013


Ethernet Everywhere Evolution

Mobile

Infrastructure

Transport

NIC

Switch

Processing

April 2013

2

© 2013 Tilera Corporation


And The Winner Is……..

But new problems arise…..

April 2013

3

© 2013 Tilera Corporation


Ethernet to Compute – The Critical Link

Mobile

Infrastructure

Transport

• Not programmable

• No intelligent offload

• Network to compute

bottleneck

Switch

Processing

April 2013

4

© 2013 Tilera Corporation


Ethernet to Compute: A Missing Networking Block

Virtualization

DDIO

Manycore

PCIe Gen 3.0

Open Stack

High Clock

Compute

PCIe

ETHERNET

SDN

10G→40G→100G

OpenFlow 1.3 BYOD

VXLAN

DPI

Network

April 2013

5

© 2013 Tilera Corporation


Ethernet to Compute: Programmability Gap

• Innovative

• Newer protocols

Higher throughput

• Programmable,

adaptable

Programmable

High-Performance

Compute

Compute

Programmable

High-Performance

throughput

Network

• No innovation

• Lack of functionality

• Poor adaptability

April 2013

6

© 2013 Tilera Corporation


NIC Trends – Evolution in the Making

FUTURE

PAST

80G

COMPUTE NIC

40G

LAYER 3-7

COMPUTE OFFLOAD

20G

LAYER 2 SWITCHING

VIRTUALIZATION

10G

FIXED FUNCTION

ASIC

April 2013

7

© 2013 Tilera Corporation


Programmable Offload NIC

80G

COMPUTE NIC

40G

LAYER 3-7

COMPUTE OFFLOAD

20G

LAYER 2 SWITCHING

VIRTUALIZATION

10G

FIXED FUNCTION

ASIC

April 2013

8

© 2013 Tilera Corporation


Tilera Compute Offload Technology

Up to 400Gbps

Memory I/O

DDR3 Controller

Up to 23MB

Coherent Cache

Host PCIe

40Gbps PCIe

120mpps - Programmable Packet

Processor

8 x 10G

DDR3 Controller

Sea of compute

9-72 coherent cores

Programmable

Packet Processor

April 2013

9

© 2013 Tilera Corporation


Tilera Compute Offload Technology

DDR3 Controller

40Gbps PCIe

Gx-72

60mpps - Programmable Packet

Processor

DDR3 Controller

Lower than 5 Watt/10Gbps

with full L2-7 Linux/C programming

April 2013

10

© 2013 Tilera Corporation


TILE-Gx: A Comprehensive Family of Processors

In production

Sampling

1935-Ball BGA, 45 x 45

pin compatible, 1265-Ball BGA, 37.5 x 37.5

Gx-72

Gx-9

9 Cores, 1-1.2GHz

64b Architecture

1 x DDR3 1333

2 x XAUI

12 x GbE

8 lanes PCIe

Gx-16

16 Cores, 1-1.2GHz

64b Architecture

2 x DDR3 1600

2 x XAUI

12 x GbE

12 lanes PCIe

Gx-36

36 Cores, 1.2GHz

64b Architecture

2 x DDR3 1600

4 x XAUI

16 x GbE

12 lanes PCIe

72 Cores, 1-1.2GHz

64b Architecture

4 x DDR3 1866

8 x XAUI

32 x GbE

24 lanes PCIe

SOFTWARE COMPATIBLE

April 2013

11

© 2013 Tilera Corporation


TILE-Gx72 Manycore System-on-a-Chip

24-Lanes

MiCA

UART x2,

USB x2,

JTAG,

I2C, SPI

PCIe 2.0

8 Lanes

PCIe 2.0

4 Lanes

PCIe 2.0

4 Lanes

PCIe 2.0

8 Lanes

Flexible

I/O

MiCA

TRIO

DDR3 Controller

DDR3 Controller

DDR3 Controller

DDR3 Controller

mPIPE

Network I/O

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

4x GbE

SGMII

10 GbE

XAUI

SerDes

SerDes

SerDes

SerDes

SerDes

SerDes

SerDes

SerDes

32-Lanes

Tiles

• 72 64-bit processor cores

• 1.0 – 1.2GHz

• 23 MBytes total cache

• 100 Tbps iMesh BW

DDR3 RAM

• 4 memory controllers @ 1866

I/O

• >100 Gbps packet I/O

‒ 8 ports 10Gb XAUI / double XAUI

‒ 32 ports 1 GbE (SGMII)

• 96 Gbps PCIe I/O

‒ Two 8-lane, Two 4-lane; or six 4-lane

• mPIPE subsystem

‒ Wire speed packet engine

‒ 120 Mpps ingress + 120 Mpps egress

Acceleration

• 2 MiCA engines w/ 80 threads:

45 x 45mm BGA package

‒ 40 Gbps crypto (AES-SHA, etc.)

‒ 45Ktps RSA PubKey acceleration

April 2013

12

© 2013 Tilera Corporation


TILE-Gx72 is Shipping

April 2013

13

© 2013 Tilera Corporation


72-core, 80Gbps High Throughput Ethernet

Key Features

• Full-length, full-height, two slot card

• 1x TILE-Gx72 processor

• 8 x 10G Ethernet ports

• x8 Gen3 connection to PCIe host

– 64 Gbps peak throughput

– Via two Gx72 x8 PCIe ports from Gx72

through PCI switch

• 4 DDR3 memory DIMMs with ECC

– 64 GBytes of memory w/ 16GB DIMMs

Mezzanine Card

USB

Mezz connector

1G/10GbE

Mezz connector

Quad Phy

1G or 10GbE

1G or 10GbE

Quad Phy

1G/10GbE

1G/10GbE

1G or 10GbE

PLX

1G/10GbE

1G or 10GbE

April 2013

14

© 2013 Tilera Corporation


C/Linux Programmable Ethernet Pipeline

Fully Programmable

High- Performance Compute

Processor

High Packet Throughput

PCIe

1x8 Lanes

3x4 Lanes

TRIO

Cache

L1- I L1- D

L2

High Perf,

Low

Latency

Mesh

mPIPE

4x10GE

16 x 1GE

Up to 100G

C/Linux Cores

100s of threads

23MB on-chip cache

Cache and I/O coherency

C-Programmable Cores

Deep packet Parser

Wire speed classifier

Load balancing across 100s of threads

Packet Reordering

Time stamping

April 2013

15

© 2013 Tilera Corporation


Highest Performance/Watt Architecture

High-Performance Compute

Processor

High Packet Throughput

PCIe

1x8 Lanes

3x4 Lanes

TRIO

Cache

L1- I L1- D

L2

High Perf,

Low

Latency

Mesh

mPIPE

4x10GE

16 x 1GE

x1200

1875mW

x86

x330

x86

TILE

Cost

x1

mPIPE

700mW

TILE

Power

16mW

mPIPE

• Normalized to 1GHz core

• Per core power and size

April 2013

16

© 2013 Tilera Corporation


iMesh – Low Power Coherent Interconnect

Processor

3 Exec. Pipelines

Cache

L1- I L1- D

L2

High Perf,

Low

Latency

Mesh

< 400mW Tile

April 2013

17

© 2013 Tilera Corporation


iMesh – Highest Performance/Watt

Homogeneous

Scales to 100s of cores

C/Linux

Easy to program

Lowest Power

Highest Core Density

April 2013

18

© 2013 Tilera Corporation


Non - iMesh Architectures

Don’t Scale

Complex programming model

Not open source friendly

Not well designed to high

throughput compute

• Crypto and compression are exceptions

Function

Specific

Accelerators

April 2013

19

© 2013 Tilera Corporation


Trends Pushing L2-7 to the NIC

NIC 2013-6 CAGR

% Virtualized (All workloads WW)

+

=

* Vmware 2011

Compute NIC

April 2013

20

© 2013 Tilera Corporation


Virtualizing the Compute Offload

Guest

OS

Guest

OS

Guest

OS

Guest

OS

Guest

OS

Guest

OS

Guest

OS

Guest

OS

VMM

PCIe

Guest

OS

Guest

OS

VMM

Guest

OS

PCIe

Guest

OS

VF VF VF VF

VMM

PCIe

VF VF VF VF

L2-L7 Linux Classifier

Compute Offload

10G/40G Ethernet

COMPUTE OFFLOAD

VF

VF VF VF

L2-L7 Linux

Classifier

VF VF VF

40G/80G Ethernet

Linux/Java

Compute

Engines

COMPUTE NIC

Virtual L2 Bridge

SR-IOV NIC

10G Ethernet

April 2013

21

© 2013 Tilera Corporation


End-to-End Deep Programmable Dataplane

VM VM VM VM VM VM

PCIe

Openflow

VF VF VF VF

Not Programmable

L2-L7 Linux Classifier

10G/40G Ethernet

TOR

SWITCH

ROUTER

April 2013

22

© 2013 Tilera Corporation


SDNing the Compute Offload

Functions controlled in the NIC:

1) Flows assignments to

queues

Openflow

Guest

OS

Guest

OS

Guest

OS

Guest

OS

2) Bandwidth management

per flow

VMM

PCIe

3) VM to flows assignments

4) Security policies

5) Metering and Monitoring

6) Local switching and routing

7) Metadata extraction

VF VF VF VF

L2-L7 Linux Classifier

10G/40G Ethernet

COMPUTE OFFLOAD

8) Future: iSCSI and other

protocol processing

23

April 2013

23

© 2013 Tilera Corporation


Summary

• 10G/40G, virtualization and granular flows

management drive the need for deep programmable

dataplane offload

• Adding C/Linux programmability (i.e. manycore) to the

NIC is inevitable

• NIC power and cost have to be supported by scalable

architecture

• NIC is the interconnect between Ethernet and Compute

(PCIe), hence integral part of SDN

April 2013

24

© 2013 Tilera Corporation

More magazines by this user
Similar magazines