15.04.2013 Views

the Engineers' Guide to VME, VPX & VXS 2013 - Subscribe

the Engineers' Guide to VME, VPX & VXS 2013 - Subscribe

the Engineers' Guide to VME, VPX & VXS 2013 - Subscribe

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The process of adapting HPC technologies <strong>to</strong> <strong>the</strong> embedded space has<br />

recently been described as high-performance embedded computing<br />

(HPEC). Several vendors in COTS computing, including Curtiss-<br />

Wright, use <strong>the</strong> term HPEC <strong>to</strong> mean embedded HPC. Just as HPC<br />

is synonymous with <strong>the</strong> his<strong>to</strong>rical term “supercomputing,” HPEC<br />

systems are <strong>the</strong> SWaP-constrained variant of supercomputers. In<br />

<strong>the</strong> defense computing market, <strong>the</strong> highest performing Open<strong>VPX</strong><br />

systems, from vendors like Curtiss-Wright, fit 28 Intel CPUs (112<br />

cores) in a 16-slot chassis, interconnected with a 224 GB/sec dualstar<br />

system fabric (Figure 2). But it’s not only about CPUs, buses and<br />

interconnects. HPEC is about being able <strong>to</strong> run <strong>the</strong> same software<br />

that is used in HPC.<br />

Fabric Discontinuity – Software Continuity<br />

HPC is dominated by E<strong>the</strong>rnet and InfiniBand, while HPEC 6U<br />

Open<strong>VPX</strong> computing has been and continues <strong>to</strong> be dominated by<br />

RapidIO. This apparent discontinuity<br />

has been one of <strong>the</strong> major<br />

roadblocks <strong>to</strong> bringing HPC technologies<br />

<strong>to</strong> <strong>the</strong> HPEC world as <strong>the</strong><br />

fabric has traditionally had a major<br />

impact on software architecture.<br />

The first thing <strong>to</strong> consider is why<br />

stick with RapidIO in <strong>the</strong> face of<br />

o<strong>the</strong>r reasonably good options?<br />

The answer is simple: RapidIO<br />

dominates telecommunications<br />

DSP computing which faces many<br />

of <strong>the</strong> same constraints as military<br />

DSP. Even better, RapidIO is<br />

backed by a volume commercial<br />

market. IDT, <strong>the</strong> leading RapidIO<br />

switch vendor, just announced<br />

that <strong>the</strong>y have shipped 2.5 million<br />

RapidIO switches. RapidIO has a<br />

dominant position in <strong>the</strong> DSP processing<br />

that is essential <strong>to</strong> 4G and<br />

3G wireless base stations. RapidIO<br />

has captured virtually 100% of <strong>the</strong><br />

3G market in China, <strong>the</strong> fastest<br />

growing telecom market. To put<br />

it ano<strong>the</strong>r way, when you talk on<br />

your cell phone, <strong>the</strong>re is something<br />

like a 90% chance that <strong>the</strong> bits that represent your voice are at some<br />

point transmitted between two DSP processors over a RapidIO link.<br />

There are a number of reasons why RapidIO makes sense in <strong>the</strong> context<br />

of HPEC Open<strong>VPX</strong> computing:<br />

<br />

saving SWaP and cost.<br />

<br />

performance<br />

<br />

<br />

choice in HPC, it is a point technology in Open<strong>VPX</strong> HPEC. Unlike<br />

Figure 2: Curtiss-Wright showcasing 224GB/s dual-star fabric with<br />

28 Intel CPUs (112 cores) in a mere 16-slot chassis.<br />

SPECIAL FEATURE<br />

alternatives such as E<strong>the</strong>rnet and RapidIO, InfiniBand is not<br />

anticipated (per simulation) <strong>to</strong> run reliably at 10 GHz over existing<br />

Open<strong>VPX</strong> technology. It will require a connec<strong>to</strong>r change which is a<br />

fairly involved and slow-moving process for an organization like VITA.<br />

There were two major challenges in getting RapidIO working in <strong>the</strong><br />

Intel environment. The first was a classic interconnect problem.<br />

PowerPC processors supported RapidIO natively, but Intel did not,<br />

so a bridge was needed. The IDT Tsi721 provided this critical piece<br />

of technology. The Tsi721 converts from PCIe <strong>to</strong> RapidIO and vice<br />

versa and provides full line rate bridging at 20 Gbaud. Using <strong>the</strong><br />

Tsi721 designers can develop heterogeneous systems that leverage<br />

<strong>the</strong> peer <strong>to</strong> peer networking performance of RapidIO while at <strong>the</strong><br />

same time using multiprocessor clusters that may only be PCIe<br />

enabled. Using <strong>the</strong> Tsi721, applications that require large amounts<br />

of data transferred efficiently without processor involvement can be<br />

executed using <strong>the</strong> full line rate<br />

block DMA+Messaging engines of<br />

<strong>the</strong> Tsi721.<br />

The second major challenge related<br />

<strong>to</strong> RapidIO was software. RapidIO<br />

isn’t used in HPC so it doesn’t<br />

run <strong>the</strong> same software as those<br />

large cluster-based systems in <strong>the</strong><br />

<strong>to</strong>p500 that use fabrics like E<strong>the</strong>rnet<br />

and InfiniBand. InfiniBand<br />

vendors encountered <strong>the</strong>se same<br />

market constraints while trying <strong>to</strong><br />

grow beyond <strong>the</strong>ir niche. It’s hard<br />

<strong>to</strong> “fight” E<strong>the</strong>rnet. However, E<strong>the</strong>rnet<br />

wasn’t appropriate for <strong>the</strong><br />

highest performance HPC systems<br />

because of <strong>the</strong> CPU and/or silicon<br />

overhead associated with TCP<br />

offload. The answer came in <strong>the</strong> form<br />

of new pro<strong>to</strong>cols and new software.<br />

Open Fabric Alliance<br />

The OpenFabrics Alliance (OFA)<br />

was formed <strong>to</strong> promote Remote<br />

Direct Memory Acess (RDMA)<br />

functionality that allows E<strong>the</strong>rnet<br />

silicon <strong>to</strong> move packets from <strong>the</strong><br />

memory of one compute node <strong>to</strong> <strong>the</strong> memory of ano<strong>the</strong>r with very<br />

little CPU intervention. There are competing pro<strong>to</strong>cols <strong>to</strong> do this, but<br />

wisely, <strong>the</strong> OFA created a unified software layer called OFED which<br />

is supported by Intel, Chelseo, Mellanox and <strong>the</strong> o<strong>the</strong>r members of<br />

<strong>the</strong> E<strong>the</strong>rnet RDMA ecosystem. OFED is used in business, research<br />

and scientific environments that require highly efficient networks,<br />

s<strong>to</strong>rage connectivity and parallel computing.<br />

The OpenFabrics Enterprise Distribution (OFED) is open-source<br />

software for RDMA and kernel bypass applications. One of <strong>the</strong><br />

things that traditionally slowed E<strong>the</strong>rnet down and wasted <strong>the</strong> CPU<br />

was <strong>the</strong> need <strong>to</strong> copy a packet payload numerous times before it was<br />

shipped out <strong>the</strong> E<strong>the</strong>rnet interface (Figure 3). RDMA has eliminated<br />

www.eecatalog.com/vme 25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!