13.07.2015 Views

1qfrBND

1qfrBND

1qfrBND

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Average AMDC latency (us)4.54.44.34.24.143.93.83.73.63.50 200 400 600OPRA feed rate (Mbps)Fig. 10. Graph showing latency of the AMDC portion of the overall systemlatency with increasing throughputV. RELATED WORKAn alternative hardware-accelerated market data architectureis the Exegy Ticker Plant [5]. In this approach, incomingmarket data enters the system via a conventional Ethernet card.Processing is then augmented with an FPGA accelerator on theprocessor bus. This has the advantage of a higher throughputbus, although this isn’t needed even for a saturated gigabitEthernet link.Another accelerated feed processing approach is providedby the ActivFeed MPU [6]. This uses an XtremeData FPGAaccelerator which is placed in a processor socket of the hostsystem, and communicates using HyperTransport. However,network integration is via an Infiniband bridge, rather than bydirect connection to the Ethernet, and processing latencies arequoted as “end-to-end latency surpass[es] 100 us”, comparedto the 20 us latencies reported here.The idea of packet processing using graphs of composablenodes has been used in a number of previous systems. Inthe Click system [7] a graph of processing nodes is definedusing a C++ interface in software, which is then compiledinto an FPGA design. The Net FPGA project (http://www.netfpga.org/) offers a library of packet processing components,which communicate using a common protocol, but requiresthe use to manually connect together each of the protocolwires between them. In contrast, the approach developed hereallows the graph to be described directly in Handel-C [3],so no extra compiler passes are required. The connectionsbetween components are also specified as abstract data-paths –all protocol-specific details of the connection are hidden fromthe programmer.VI. CONCLUSIONThis paper presents a method that allows processing of marketdata feeds using FPGAs, providing the ability to processextremely large numbers of messages per second, while alsominimising the latency between arrival of network packetsand their delivery to their intended target in software. Thisis achieved by eliminating the operating system networkingstack: all message processing and filtering is applied in anFPGA, which is then able to push messages directly intothe memory space of software threads via FPGA-initiatedDMA. As well as reducing latency due to the OS stack, thisalso reduces both the programming burden and performanceover-head for software components, as messages are providedas fully decoded memory structures, rather than serialisedmessages which must be parsed.This approach has been implemented in the CeloxicaAMDC accelerator card, which incorporates two gigabit Ethernetports and a Xilinx Virtex-5 LX110T FPGA, connected toa host computer over the PCIe bus. Tests performed using theOPRA-FAST compressed data feed format have shown that anAMDC accelerated system can support a message throughputof 5.5 million messages per second, 12 times the current realworldrate, while the complete system rebroadcasts at least99% of packets with a latency of less than 26us. The hardwareportion of the design has a constant latency, irrespective ofthroughput, of 4us.Currently the proposed architecture only accelerates incomingmarket data. In the future accelerated Ethernet transmissionwill be examined. This will include Uni- and Multi-cast UDPoffload, TCP/IP offload, and market order execution.REFERENCES[1] T. J. Todman, G. A. Constantinides, S. J. E. Wilton, O. Mencer, W. Luk,and P. Y. K. Cheung, “Reconfigurable computing: architectures and designmethods,” IEE Proc. Computing and Digital Techniques, vol. 152, no. 2,pp. 193–207, 2004.[2] K. Houstoun, FIX Adapted for STreaming - FAST Protocol TechnicalOverview, 2006.[3] Handel-C Language Reference, http://www.celoxica.com, Celoxica Ltd.,1999.[4] G. W. Morris and M. Aubury, “Design space exploration of the Europeanoption benchmark using Hyperstreams,” in FPL, 2007, pp. 5–10.[5] S. T. A. Center, “Exegy ticker plant with infiniband,” STAC Report, July2007.[6] “Activefeed MPU: Accelerate your market data,” http://www.activfinancial.com/docs/ActivFeedMPU.pdf, 2007.[7] K. C, G. Brebner, and G. Schelle, “Mapping a domain specific language toa platform FPGA,” in Proc. IEEE Design Automation Conference, 2004,pp. 924–927.9789Authorized licensed use limited to: Imperial College London. Downloaded on October 12, 2009 at 14:27 from IEEE Xplore. Restrictions apply.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!