12.07.2015 Views

The Wishbone II transaction bus - Military Embedded Systems

The Wishbone II transaction bus - Military Embedded Systems

The Wishbone II transaction bus - Military Embedded Systems

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Hardware: <strong>Systems</strong> – big and small<strong>The</strong> <strong>Wishbone</strong> <strong>II</strong> <strong>transaction</strong> <strong>bus</strong>:Another grade of speedSingle Print OnlyBy Uros Platise<strong>Wishbone</strong> specifications have beenreleased by OpenCores and Silicorewith the aim to provide a standardIP core interconnection schemeto fulfill requirements of modernSystem-on-Chip (SoC) designs,including CPUs, DMA engines,memory interfaces, peripheralinterfaces, and so on. <strong>The</strong> andEuroscompany has used the <strong>Wishbone</strong>specification since its inceptionand has developed an improved versionof the <strong>Wishbone</strong> <strong>bus</strong>, called<strong>Wishbone</strong> <strong>II</strong>, to propose an advancedpipelined architecture where readand write <strong>transaction</strong>s are separatedand the <strong>bus</strong> acts as a <strong>transaction</strong> <strong>bus</strong>.In this way, multiple <strong>transaction</strong>scan take place at the same time,removing all latencies along thepath and stalling RMW cycles byincorporating a new per-cell lockingconcept. <strong>The</strong> ultimate benefit, ofcourse, is that finally <strong>bus</strong> throughputhas been increased to the maximum.Design and development of large-scaleFPGA/ASIC SoC designs have forceddesigners to implement a modular architecturewith a standardized module interfacethat connects various IP modules inany possible configuration. One of themost popular interconnect architectureswas released by OpenCores called the<strong>Wishbone</strong> B.3 <strong>bus</strong> (www.opencores.org).In a similar way, Altera has introduced itsown interconnect scheme called AvalonBus (www.altera.com) around whichSOPC Builder and Nios (<strong>II</strong>) <strong>Systems</strong> aremade. Xilinx has also introduced its own<strong>bus</strong> called the On-Chip Peripheral Buscombined with the Processor Local Bus(www.xilinx.com).<strong>The</strong>se interconnect architectures are single<strong>transaction</strong> master/slave oriented, meaningthat a CPU requesting a word from agiven address stalls itself and a path (<strong>bus</strong>)to the destination for as long as this wordis not received. Lots of <strong>bus</strong> cycles arelost in this way, giving lower actual datathroughput than expected despite the relativelyhigh system <strong>bus</strong> frequency. Evenwith fast burst reads and writes introducedby special signals, <strong>bus</strong> cycles are still lostuntil the first word is received at the additionalcost of doubling the burst logicat both sides, source and destination. Busstalling is more evident when accessingslower modules with greater latencies. Inthese cases, system performance degradesdramatically; for example, a 100 MHzsystem may see its throughput fall as lowas a few MB per second.That is why there was a desperate need todevelop <strong>bus</strong> architectures employing newconcepts. Some new signals have beenintroduced to support new <strong>transaction</strong> <strong>bus</strong>concepts based on the <strong>Wishbone</strong> B.3 architecture,overcoming latency issues whilemaintaining backwards compatibility.<strong>Wishbone</strong> <strong>II</strong> <strong>transaction</strong><strong>bus</strong> conceptIn our proposed <strong>bus</strong>, <strong>transaction</strong>s are representedby a <strong>transaction</strong> vector containing:• Source (module) address• Destination (module) address• Operator• DataSource and destination addresses definethe path; the operator describes one ormore operations to be executed along thepath and/or at the destination address;and some operations require supplementaldata given to complete the <strong>transaction</strong>.Actual implementation requiresadditional handshaking signals.Transaction vectors are placed onto a<strong>transaction</strong> <strong>bus</strong> transporting the vectorfrom source to destination, and executing<strong>bus</strong>-oriented operations as requestedby the vector. Once the <strong>transaction</strong> vectoris placed (sent), the source has no furtherresponsibility and the <strong>transaction</strong> <strong>bus</strong>takes complete control over it. <strong>The</strong> sourceis then ready to issue the next <strong>transaction</strong>vector. Multiple tasks or requests maybe issued beforehand, one per <strong>bus</strong> cycle,which reduces the need for any predictionlogic at the destination module to supportburst reads or writes as prediction logicfor various kinds of burst reads.<strong>The</strong>re are two kinds of <strong>transaction</strong>s:• Independent• Dependent (when their orderis important)To support dependent <strong>transaction</strong>s, the<strong>transaction</strong> <strong>bus</strong> must never change theorder of already placed <strong>transaction</strong>s. <strong>The</strong><strong>transaction</strong> <strong>bus</strong> features a fully acknowledgedmechanism to accept new <strong>transaction</strong>vectors, execute internal forwarding,and deliver to the destination module. <strong>The</strong>


transparent architecture reflects itself as asimple input-output black box; however,the implementation is based on a multipipelinedstructure where each (FIFO)line holds one <strong>transaction</strong> vector.<strong>The</strong> <strong>Wishbone</strong> <strong>II</strong> <strong>transaction</strong> <strong>bus</strong> proposesfour basic operations only:• Single read• Single write• Cell lock• Bus lockSingle read and write are issued by modules,where cell and <strong>bus</strong> locking operationsare in the <strong>transaction</strong> <strong>bus</strong> domain.Burst reads and burst writes are accomplishedby issuing a stream of read orwrite <strong>transaction</strong>s. RMW cycles are supportedthrough the <strong>bus</strong>, or even better,they can be facilitated using the new celllocking concept, which instead of stallingthe complete SoC <strong>bus</strong> locks a singleor multiple memory cells only to a givenowner. <strong>The</strong>se cells cannot be accessed byothers as long as they are not unlocked.<strong>Wishbone</strong> <strong>II</strong> signalsA <strong>Wishbone</strong> <strong>II</strong> <strong>transaction</strong> vector iscomposed from the <strong>Wishbone</strong> B.3 specificationsby introducing the followingnew signals:WB_ACW Write AcknowledgeWB_ACR Read AcknowledgeWB_TGA Address Tag in bothdirectionsWB_ALK Address LockIn the further text, prefix WB may bechanged to WBM denoting a masterinterface, and WBS denotes a slave interfaceor can be left blank to describe anymaster or slave interfaces. Input signalsare appended _I at the end and output signalswith _O. <strong>The</strong> proposed <strong>bus</strong> discardsthe <strong>Wishbone</strong> B.3 ACK signal since itsfunctionality is now split among the ACRand ACW signals. Complete basic signaldescriptions for master and slave arelisted in Table 1. New signals are markedin bold.DESCRIPTION MASTER SLAVEData from master to slave (Data) WBM_DAT_O WBS_DAT_IData from slave to master (Data) WBM_DAT_I WBS_DAT_OSlave (Destination) Address WBM_ADR_O WBS_ADR_ITransaction Strobe (Handshaking) WBM_STB_O WBS_STB_IDestination Operation (Operator) WBM_WE_O WBS_WE_IBus Lock (Operator) WBM_LOCK_O WBS_LOCK_IWrite Acknowledge (Handshaking) WBM_ACW_I WBS_ACW_ORead Acknowledge (Handshaking) WBM_ACR_I WBS_ACR_OAddress Tag Write (Source) WBM_TGA_O WBS_TGA_ISingle Print OnlyAddress Tag Read (Destination) WBM_TGA_I WBS_TGA_OAddress Lock (Operator) WBM_ALK_O WBS_ALK_IMASTER<strong>Wishbone</strong> <strong>II</strong> <strong>bus</strong> <strong>transaction</strong>sWrite <strong>transaction</strong>sA write <strong>transaction</strong> is almost identical tothe write <strong>transaction</strong> given in the <strong>Wishbone</strong>B.3 specifications, except <strong>Wishbone</strong><strong>II</strong> uses the ACW signal to acknowledge awrite cycle. A read and write <strong>transaction</strong> iscomposed of read requests that are identicalto write <strong>transaction</strong>s except that thedestination operation signal WE is set.Read <strong>transaction</strong>sA read <strong>transaction</strong> is composed of two<strong>transaction</strong>s:• Read request <strong>transaction</strong> issued bysource• Read response <strong>transaction</strong> issuedby destinationA read request is sent by the master modulerepresenting a source by first issuinga write <strong>transaction</strong> with the destinationoperation WE set to read. <strong>The</strong> MasterTable 1pipelinestage1/zwrite path1/zread pathFigure 2SLAVE(memory)should set the Address Tag Write vectorto identify read response. (If there is asingle master, this is not necessary.) <strong>The</strong>read request <strong>transaction</strong> is acknowledgedin the same way as the write <strong>transaction</strong>.<strong>The</strong> destination completes the <strong>transaction</strong>by returning a separate read response<strong>transaction</strong> marked by the acknowledgesignal ACR and providing valid dataand Address Tag Read information.Address Tag Read is a copy of theAddress Tag Write.Figure 1 shows an example system withone pipeline stage on write (input) andread (output) paths between the source(master) and destination (slave) devices.<strong>The</strong> system has 1 cycle directions on bothdirections; therefore, a request-responseloop takes at least 2 wait cycles. Slave(memory) may also perform some internalmanagement like refresh, which addsup to the total number of wait states.


Hardware: <strong>Systems</strong> – big and smallYou can see that Figure 2 depicts a <strong>transaction</strong><strong>bus</strong> data flow diagram for the givenexample of the three read request <strong>transaction</strong>splaced by the master as AD0, AD1,and AD2, and the associated returned readresponse <strong>transaction</strong>s as DO0, DO1, andDO2. <strong>The</strong> signal WE is assumed to becleared for all three <strong>transaction</strong>s to indicateread operations. Transactions AD0and AD1 are burst <strong>transaction</strong>s, meaningthat AD1 = AD0 + 1, and the AD2 is anindependent <strong>transaction</strong> triggered meantimethat could be a cause of an externalinterrupt that loads its interrupt vector,and so forth.Each read request <strong>transaction</strong> is acknowledgedby the ACW signal, and thereturned read response <strong>transaction</strong> ismarked (acknowledged) by the ACRsignal. Note that the latency order maynot be the same, due to other higher prioritymaster(s) or memory refresh functions,and so on. In the previous example,the AD0 is immediately acknowledgedbut it takes 3 wait cycles to return theDO0; the AD1 is acknowledged 1 cyclelater while the DO1 is returned in2 wait cycles only, and the DO2 againtakes 3 wait cycles. All three <strong>transaction</strong>sare completed in 9 cycles; theoretically,without adding two illustrative waitcycles, they would complete in 7 cyclesonly. Using the <strong>Wishbone</strong> B.3 specifications,the same scenario is shown inFigure 3.Where again AD0 and AD1 are bursts,AD1 = AD0 + 1, and the AD2 is an independentrequest. All three <strong>transaction</strong>sare completed in 12 cycles, decreasingperformance for 41 percent (at a minimum7 cycles in <strong>Wishbone</strong> <strong>II</strong>) even atadditional silicon cost, a memory burstlogic implementation on both sides:source and destination.Imagine a continuous burst <strong>Wishbone</strong> <strong>II</strong>would perform with 0 wait cycles (latencyis completely removed) and absolutelyno loss (again 0 wait cycles) at the slaveside when more than just one mastercoexists in the system for issuing the firstword. To be more illustrative for a systemrunning at 150 MHz, long bursts withfixed latency of 2 cycles would yield a<strong>Wishbone</strong> <strong>II</strong> bandwidth of 150 Mwords,and <strong>Wishbone</strong> B.3 of 50 Mwords only.Single Print OnlyDAT, ADR,STB, WETGA, LOCK, ALKCLKDAT, ADRSTB, WESTBACKCLKSTBACWACRAD0 AD1 AD2DAT, TGA DO0 DO1 DO2Figure 2AD0 AD1 AD2DAT DO0 DO1 DO2Figure 3Read-modify-write cyclesand exclusive <strong>bus</strong>/AddresslockingA read-modify-write cycle can be madeusing the <strong>bus</strong> LOCK signal by issuing theread request and LOCK signal set, waitingfor the read response, followed by awrite, and finally releasing the LOCKafterward. To not stall the complete<strong>bus</strong>, <strong>Wishbone</strong> <strong>II</strong> introduces a per-cellmemory locking feature using the ALKsignal, which is used in almost the sameway as <strong>Wishbone</strong> LOCK signal, just thatit doesn’t stall the complete <strong>bus</strong> but grantsexclusive permissions to a given moduledistinguished by the source TGA.<strong>Wishbone</strong> <strong>II</strong> races intothe future<strong>The</strong> <strong>Wishbone</strong> <strong>II</strong> <strong>bus</strong> proposes anadvanced <strong>transaction</strong> <strong>bus</strong>-oriented architecturefor SoC designs for FPGAs andASICs in which architecture write andread operations are handled as separatewrite and read <strong>transaction</strong>s. Each <strong>transaction</strong>is stored in a single line, and themulti-pipeline architecture acts as a FIFObuffer transporting multiple <strong>transaction</strong>sfrom and to multiple source and destinationmodules. An advanced locking mechanismprevents the complete <strong>bus</strong> fromstalling due to the RMW cycles using atemporary per-cell locking mechanism.In this way, overall design data throughputis increased just up to the maximumwhile the design successfully integratesslow- and high-speed, low- and highlatencyperipherals and CPUs.


Uros Platise has been R&D manager for more than 10 yearsat andEuros, specializing in electronics, robotics, and softwareengineering. His expertise includes FPGA architectures, price/performance optimizations, communication protocols, sensornetworks, and so forth. He will receive his PhD in Isotropic Networks,from JSI in Ljubljana, Slovenia. Uros can be contacted aturos@andEuros.org.andEuros • +385-52-777-341 • www.andEuros.org/erdTo find out more about <strong>Wishbone</strong> <strong>II</strong>, visit www.andEuros.com/erd.© 2008 Open<strong>Systems</strong> Publishing. Not Licensed for distribution. Visit opensystems-publishing.com/reprints for copyright permissions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!