the Engineers' Guide to VME, VPX & VXS 2013 - Subscribe
the Engineers' Guide to VME, VPX & VXS 2013 - Subscribe
the Engineers' Guide to VME, VPX & VXS 2013 - Subscribe
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The process of adapting HPC technologies <strong>to</strong> <strong>the</strong> embedded space has<br />
recently been described as high-performance embedded computing<br />
(HPEC). Several vendors in COTS computing, including Curtiss-<br />
Wright, use <strong>the</strong> term HPEC <strong>to</strong> mean embedded HPC. Just as HPC<br />
is synonymous with <strong>the</strong> his<strong>to</strong>rical term “supercomputing,” HPEC<br />
systems are <strong>the</strong> SWaP-constrained variant of supercomputers. In<br />
<strong>the</strong> defense computing market, <strong>the</strong> highest performing Open<strong>VPX</strong><br />
systems, from vendors like Curtiss-Wright, fit 28 Intel CPUs (112<br />
cores) in a 16-slot chassis, interconnected with a 224 GB/sec dualstar<br />
system fabric (Figure 2). But it’s not only about CPUs, buses and<br />
interconnects. HPEC is about being able <strong>to</strong> run <strong>the</strong> same software<br />
that is used in HPC.<br />
Fabric Discontinuity – Software Continuity<br />
HPC is dominated by E<strong>the</strong>rnet and InfiniBand, while HPEC 6U<br />
Open<strong>VPX</strong> computing has been and continues <strong>to</strong> be dominated by<br />
RapidIO. This apparent discontinuity<br />
has been one of <strong>the</strong> major<br />
roadblocks <strong>to</strong> bringing HPC technologies<br />
<strong>to</strong> <strong>the</strong> HPEC world as <strong>the</strong><br />
fabric has traditionally had a major<br />
impact on software architecture.<br />
The first thing <strong>to</strong> consider is why<br />
stick with RapidIO in <strong>the</strong> face of<br />
o<strong>the</strong>r reasonably good options?<br />
The answer is simple: RapidIO<br />
dominates telecommunications<br />
DSP computing which faces many<br />
of <strong>the</strong> same constraints as military<br />
DSP. Even better, RapidIO is<br />
backed by a volume commercial<br />
market. IDT, <strong>the</strong> leading RapidIO<br />
switch vendor, just announced<br />
that <strong>the</strong>y have shipped 2.5 million<br />
RapidIO switches. RapidIO has a<br />
dominant position in <strong>the</strong> DSP processing<br />
that is essential <strong>to</strong> 4G and<br />
3G wireless base stations. RapidIO<br />
has captured virtually 100% of <strong>the</strong><br />
3G market in China, <strong>the</strong> fastest<br />
growing telecom market. To put<br />
it ano<strong>the</strong>r way, when you talk on<br />
your cell phone, <strong>the</strong>re is something<br />
like a 90% chance that <strong>the</strong> bits that represent your voice are at some<br />
point transmitted between two DSP processors over a RapidIO link.<br />
There are a number of reasons why RapidIO makes sense in <strong>the</strong> context<br />
of HPEC Open<strong>VPX</strong> computing:<br />
<br />
saving SWaP and cost.<br />
<br />
performance<br />
<br />
<br />
choice in HPC, it is a point technology in Open<strong>VPX</strong> HPEC. Unlike<br />
Figure 2: Curtiss-Wright showcasing 224GB/s dual-star fabric with<br />
28 Intel CPUs (112 cores) in a mere 16-slot chassis.<br />
SPECIAL FEATURE<br />
alternatives such as E<strong>the</strong>rnet and RapidIO, InfiniBand is not<br />
anticipated (per simulation) <strong>to</strong> run reliably at 10 GHz over existing<br />
Open<strong>VPX</strong> technology. It will require a connec<strong>to</strong>r change which is a<br />
fairly involved and slow-moving process for an organization like VITA.<br />
There were two major challenges in getting RapidIO working in <strong>the</strong><br />
Intel environment. The first was a classic interconnect problem.<br />
PowerPC processors supported RapidIO natively, but Intel did not,<br />
so a bridge was needed. The IDT Tsi721 provided this critical piece<br />
of technology. The Tsi721 converts from PCIe <strong>to</strong> RapidIO and vice<br />
versa and provides full line rate bridging at 20 Gbaud. Using <strong>the</strong><br />
Tsi721 designers can develop heterogeneous systems that leverage<br />
<strong>the</strong> peer <strong>to</strong> peer networking performance of RapidIO while at <strong>the</strong><br />
same time using multiprocessor clusters that may only be PCIe<br />
enabled. Using <strong>the</strong> Tsi721, applications that require large amounts<br />
of data transferred efficiently without processor involvement can be<br />
executed using <strong>the</strong> full line rate<br />
block DMA+Messaging engines of<br />
<strong>the</strong> Tsi721.<br />
The second major challenge related<br />
<strong>to</strong> RapidIO was software. RapidIO<br />
isn’t used in HPC so it doesn’t<br />
run <strong>the</strong> same software as those<br />
large cluster-based systems in <strong>the</strong><br />
<strong>to</strong>p500 that use fabrics like E<strong>the</strong>rnet<br />
and InfiniBand. InfiniBand<br />
vendors encountered <strong>the</strong>se same<br />
market constraints while trying <strong>to</strong><br />
grow beyond <strong>the</strong>ir niche. It’s hard<br />
<strong>to</strong> “fight” E<strong>the</strong>rnet. However, E<strong>the</strong>rnet<br />
wasn’t appropriate for <strong>the</strong><br />
highest performance HPC systems<br />
because of <strong>the</strong> CPU and/or silicon<br />
overhead associated with TCP<br />
offload. The answer came in <strong>the</strong> form<br />
of new pro<strong>to</strong>cols and new software.<br />
Open Fabric Alliance<br />
The OpenFabrics Alliance (OFA)<br />
was formed <strong>to</strong> promote Remote<br />
Direct Memory Acess (RDMA)<br />
functionality that allows E<strong>the</strong>rnet<br />
silicon <strong>to</strong> move packets from <strong>the</strong><br />
memory of one compute node <strong>to</strong> <strong>the</strong> memory of ano<strong>the</strong>r with very<br />
little CPU intervention. There are competing pro<strong>to</strong>cols <strong>to</strong> do this, but<br />
wisely, <strong>the</strong> OFA created a unified software layer called OFED which<br />
is supported by Intel, Chelseo, Mellanox and <strong>the</strong> o<strong>the</strong>r members of<br />
<strong>the</strong> E<strong>the</strong>rnet RDMA ecosystem. OFED is used in business, research<br />
and scientific environments that require highly efficient networks,<br />
s<strong>to</strong>rage connectivity and parallel computing.<br />
The OpenFabrics Enterprise Distribution (OFED) is open-source<br />
software for RDMA and kernel bypass applications. One of <strong>the</strong><br />
things that traditionally slowed E<strong>the</strong>rnet down and wasted <strong>the</strong> CPU<br />
was <strong>the</strong> need <strong>to</strong> copy a packet payload numerous times before it was<br />
shipped out <strong>the</strong> E<strong>the</strong>rnet interface (Figure 3). RDMA has eliminated<br />
www.eecatalog.com/vme 25