12.07.2015 Views

AXI master modeling - Carbon Design Systems

AXI master modeling - Carbon Design Systems

AXI master modeling - Carbon Design Systems

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A two-phased multimedia SoCdesign optimization using ESL toolsHoon Oh, Youngkil Park, Byungchul Hong, Chulho Shin and Derek KoLG Electronics{hoon.oh, youngkil.park, byungchul.hong, chulho.shin and derek.ko}@lge.comLG ElectronicsSystem IC Business Team


Motivation1 (related to <strong>master</strong>s and interconnects)Master<strong>AXI</strong> CompatibleInterconnectMaster 1 M… M… M… M… M… Master 20<strong>AXI</strong> CompatibleInterconnect a1<strong>AXI</strong> CompatibleInterconnect b1<strong>AXI</strong> CompatibleInterconnect a…<strong>AXI</strong> CompatibleInterconnect bN<strong>AXI</strong> CompatibleInterconnect aM- Many IP blocks under development- more than 20 <strong>master</strong>s : memorybandwidth- hungry and latencysensitive multimedia applicationspecific<strong>master</strong>s.-A CPU that runs at 700MHz andGPU for 3D graphics acceleration.- Visibility & configurability ?DDR3MemoryControllerDDR3MemoryDDR3 MemoryController 1DDR3Memory 1DDR3 MemoryController …DDR3Memory …DDR3 MemoryController NDDR3Memory N- Optimal architecture forinterconnect? (BW, operationfrequency, bit-width of theinterconnect, gate count, cascadedepth …..)- What kind of arbitration & QoSscheme needed?


Motivation3 (other challenges)Limited resources for system architecturingLess than two manpower available except memory controller designNo accumulation of ESL library models of each <strong>master</strong> IPchicken-and-egg problemTop system architecturing requires each <strong>master</strong>’s accurate BW and latency information.Each <strong>master</strong> requires it’s available BW and latency constraintsWhat is the precedence? Master IP design or Top architecturing?In this talk, we demonstrate a two-phased approach to tackle all these problems.


Our approachesKey objectives :Maximize external memoryBW(bandwidth)Keep memory access latency as short asrequired for latency sensitive <strong>master</strong>s.For minimal resources (people andtime),we decided to use ESL tools:<strong>Carbon</strong> <strong>Design</strong> System’s two major ESLtools to help tackle our problem<strong>Carbon</strong> Model Studio to generate Cmodels from RTL codes<strong>Carbon</strong> SoC <strong>Design</strong>erto integrate the generated Cmodels for constructing oursystemto explore various architecturesto simulate the systemto analyze simulation results<strong>Carbon</strong> ModelDatabase FileRTL Code<strong>Carbon</strong> Model Studio:Compiler<strong>Carbon</strong> Model Studio:Component GeneratorSoC <strong>Design</strong>er Component<strong>Carbon</strong> SoC <strong>Design</strong>er<strong>Carbon</strong> ModelRefer to appendix for more detailsabout carbon tools.


Basic concept1 st phase 2 nd phase<strong>AXI</strong> TransactionParameterRTL dump(<strong>AXI</strong> TransactionMonitor)FPGA dump(<strong>AXI</strong> TransactionMonitor)CTG (Configurable Traffic Generator.):Our traffic generator that can behaveaccording to the given set of behavioralparameters for memory accessesSysetmCCTGRDTGFDTGRDTG (RTL Dump file-based TrafficGenerator):<strong>AXI</strong> traces were captured from RTLsimulation<strong>Carbon</strong>C ModelInterconnectFDTG (FPGA Dump file-based TrafficGenerator):<strong>AXI</strong> traces were captured from FPGAemulation<strong>Carbon</strong>C ModelMemoryController<strong>Carbon</strong>C ModelDDR Memory


1 st phase : <strong>AXI</strong> <strong>master</strong> <strong>modeling</strong>We defined many parameters:<strong>AXI</strong> protocols : bit width, interface clock, burstlength, transfer size….Addressing setup : multiple outstanding,accessing address range, increment/randomaddressing ratio….Bandwidth control : BW quantity, active BWsection, latency limit….Profiling Information : Profiling on/off, BWprofiling, latency profiling, profiling range.There are more parameters….The parameters that specifying behavior ofeach <strong>master</strong> were collected from <strong>master</strong>block designer.CTG Componentin SoC <strong>Design</strong>erCTG ParametersettingRefer to Appendix about the parameter table& complex addressing examples.


1 st phase : <strong>AXI</strong> compliant interconnect IPC model which converted from RTL by carbon model studio.<strong>AXI</strong> Master 0<strong>AXI</strong> Slave 0<strong>AXI</strong> Master 1<strong>AXI</strong> Slave 1•••<strong>AXI</strong> Master 9<strong>AXI</strong>compatibleInterconnectIP•••<strong>AXI</strong> Slave 5Create<strong>Carbon</strong>Componentclk-inRegister InterfaceRTL<strong>Carbon</strong> Component


1 st phase : LG’s own DDR3 memory controller & DDR3 modelAPB Drivercomponentfor registersetting<strong>Carbon</strong>DDR3 MemoryControllerCompoent<strong>Carbon</strong>DDR3 MemoryComponent


1 st phase : Top simulation environment capture.Masters(CTG)<strong>AXI</strong> BusArchitectureAPBDriverMemoryControllerDDR3Memory


1 st phase : CTG simulation environment [Monitoring, WaveView, Profiling]Monitoring<strong>AXI</strong> BUSBreak PointWaveView<strong>AXI</strong> BUSProfiling<strong>AXI</strong> BUS


BW Profiling capture 1 : Only one information enabled.VDECGPU


Multi-dimensional Latency Profiling capture.1. valid base,including overlapped latencyDisplay Engine 02. ready base,including overlapped latencyRead3. valid base,excluding overlapped latency4. ready base,excluding overlapped latency1. valid base,including overlapped latencyWrite2. ready base,including overlapped latency3. valid base,excluding overlapped latency4. ready base,excluding overlapped latencyRefer to Appendix about multi-dimensional latency profiling.


1 st phase : System Optimization.Master IP optimizationReduced unfavorable <strong>AXI</strong> transactions such as non-burst and random addressingIncreased <strong>master</strong>’s buffer size to increase multiple outstanding transactionsDecreased <strong>master</strong>’s buffer size which exceeding maximum multiple outstanding limit.Supplemented reduced operation mode according to use case scenario.(ex. changing 10bit -> 8bit processing)<strong>AXI</strong> compatible Interconnect IP optimizationIterative redesigning while considering bandwidth, latency, arbitration, QoS (Quality ofService)., operation frequency, gate count ...etc.To harmonize with the memory controllerDDR3 memory controller redesign & optimizationTo support bandwidth critical & latency critical <strong>master</strong>sTo find out optimal arbitration & QoS schemeTo find out optimal buffer size & <strong>AXI</strong> interface port numberTo control DDR3 physical memory effectively according to the <strong>master</strong>’s request


2 nd phase : 1 st phase’s problem & solutions.Accuracy problems : <strong>Design</strong>er provided parameters was not fullyaccurateMasters easy to predict access patternsdependent on fixed video resolutionMasters hard to predict access patternsdependent on Operating System & device driverhaving variable BW according to the inputs streamOutsourcing IPsFor the 2 nd phase, we decided to replay real <strong>AXI</strong> transaction traceswhichdumped from RTL simulationcaptured from FPGA board emulation via <strong>AXI</strong> transaction capture & dumplogic


2 nd phase : RDTG (RTL Dump file-based Traffic Generator)<strong>AXI</strong>Master(RTL)VM(Virtual Master)VMS0 S1 S2<strong>AXI</strong> InterconnectM0<strong>AXI</strong> Transactioncapture & dump<strong>AXI</strong> Transactioncapture & dump<strong>AXI</strong> Transactioncapture & dumpVMVM0 123DDR3 Memory controller16DDR3 MEMORYTraces were captured from <strong>AXI</strong> interfaces of IP block and used as <strong>AXI</strong> traffics for ESLsimulation.


2 nd phase : FDTG (FPGA Dump file-based Traffic Generator)CPUGPU<strong>AXI</strong> Transactioncapture & dumpMEMORYCONTROLLER1<strong>AXI</strong> Transactioncapture & dumpMEMORYCONTROLLER2DDR3 MEMORY1DDR3 MEMORY2The reason for using 2 nd DDR3 memory controller is not to disturb 1 st DDR3’smemory operation


ConclusionThis approach presentsarchitectural insights and visibilitybandwidth profiling, multi-dimensional latency profiling, transactionstatisticsfaster reconfiguration & simulation time. (10~100 times faster than RTLsimulation time)minimum resources in ESL-related workby saving SystemC <strong>modeling</strong> time for IP blocks.This approach can be applied even without RTL implementationsolution for the “chicken-and-egg problem”Thanks to this two-phased approach,memory throughput improved up to 19% over initial design in best case.achieved 88% effective DDR3 memory utilization In best casefinished top architecturing even before all IP’s RTL is ready.accomplished another chip’s top architecture design very quickly by usingthis project’s result.


References.<strong>Carbon</strong> Model Studio User Documentation by<strong>Carbon</strong> SoC <strong>Design</strong>er User Guide.http://www.carbondesignsystems.com/AMBA® <strong>AXI</strong> Protocol v1.0 byhttp://www.arm.com/


Appendix


<strong>Carbon</strong> Model StudioYou can use <strong>Carbon</strong> Model Studio tocreate a <strong>Carbon</strong> SoC <strong>Design</strong>ercompatiblecomponent from the <strong>Carbon</strong>Model, which includes the simulation anddebugging interfaces.Using this process, system engineers usethe component in SoC <strong>Design</strong>er to build asimulatable system and perform thefollowing tasks:• Architectural exploration• System and software validation anddebug• Embedded software development<strong>Carbon</strong> ModelDatabase FileRTL Code<strong>Carbon</strong> Model Studio:Compiler<strong>Carbon</strong> Model Studio:Component Generator<strong>Carbon</strong> ModelSoC <strong>Design</strong>er Component<strong>Carbon</strong> SoC <strong>Design</strong>er


<strong>Carbon</strong> SoC <strong>Design</strong>er<strong>Carbon</strong> SoC <strong>Design</strong>er is a simulation environment for easy <strong>modeling</strong> and fastsimulation of integrated systems-on-chip with multiple cores, peripherals andmemories using C++.The SoC <strong>Design</strong>er systems can be used as standalone simulation models orintegrated into HW simulation or HW/SW co-simulation tools.<strong>Carbon</strong> SoC <strong>Design</strong>er is fully based on the SystemC language. While cycle-basedsimulation is strongly recommended for speed and ease of debugging, SoC<strong>Design</strong>er supports running any event-driven SystemC simulations.


1 st phase : <strong>AXI</strong> <strong>master</strong> <strong>modeling</strong> : CTG ParametersCTG_nameMAX_WID/MAX_RIDRequired BWParametersAddressingparametersProfilingParametersOthersCTG component namemaximum number of <strong>AXI</strong> ID within CTG.All parameters can be set differently per ID.CTG is the <strong>AXI</strong> Master generating <strong>AXI</strong> transaction as the speed of bandwidth which was set by user.WBPS / RBPSOPER_FREQAWSIZE/ ARSIZErequired Bandwidth parameter (unit : bps(bit per second))operating frequency of CTG(unit : of parameter is Hz)maximum number of data bytes to transfer in each beat, or data transfer, within a burst.<strong>AXI</strong> Master( Video encoder/ Video decoder/ GPU/ CPU) has his special address pattern.W_OUTSTANDING /R_OUTSTANDINGRANDOM_WEN /RADDOM_RENAWLEN /ARLENAWBURST /ARBURSTnumbers of <strong>AXI</strong> commands issuing in succession without waiting of previous transaction done.enable parameter when the following address of issuing command is random compared to pastaddress of it.Burst length, The burst length gives the exact number of transfers in a burst.Burst type parameterthis parameter is equal to burst type signal of the <strong>AXI</strong> protocolWhen we simulate Platform in ESL Level, we get Profiling information.The following profiling informationa) Measured Bandwidth displayb) Measured latency displayc) the section which <strong>AXI</strong> transaction occurs. ……………………..PROFILE_EN... There are more parameters…….The profiling enable parameter


1 st phase : <strong>AXI</strong> <strong>master</strong> <strong>modeling</strong> : CTG Capability ExamplesAddress pattern parameter examples. (‘b’ means burst length)[wr.incr 4*b8@id0] & [rd.random 8*b4@id1] (‘&’ means it occurs simultaneously)random_wen = 0, w_outstanding = 4, awlen = 7, id = 0random_ren = 1, r_outstanding = 8, arlen = 3, id = 1[wr.incr 4*b8@id0] + [rd.random 8*b4@id0] (‘+’ means it occurs successively)random_wen = 0, w_outstanding = 4, awlen = 7, id = 0random_ren = 1, r_outstanding = 8, arlen = 3, id = 0Complex address pattern is possible.[wr.incr 5 * (b16 + b4) @ id0] & [rd.incr 1 *(b10 + b3)@id1] & [wr.random 16*b2@id2]random_wen = 0, w_outstanding = 5, awlen0 = 15, awlen1 = 3, id = 0random_ren = 0, r_outstanding = 1, arlen0 = 9, arlen1 = 2 , id = 1random_wen = 1, w_outstanding = 16, awlen = 1, id = 2[wr.incr 5 * (b16 + b4) @ id0] + [rd.incr 1 *(b10 + b3)@id0] + [wr.random 16*b2@id0]random_wen = 0, w_outstanding = 5, awlen0 = 15, awlen1 = 3, id = 0random_ren = 0, r_outstanding = 1, arlen0 = 9, arlen1 = 2 , id = 0random_wen = 1, w_outstanding = 16, awlen = 1, id = 0


Latency : Multi-dimensional latency profilingLG defined 4 types of latency for latency profiling.1. valid base, including overlapped latency2. ready base, including overlapped latency3. valid base, excluding overlapped latency4. ready base, excluding overlapped latency


Latency : Definition of valid base & ready base : read exampleRead Burst 4 exampleACLKT0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13ARADDRADDR(A)ARVALIDARREADYRVALIDRDATAD(A0) D(A1) D(A2) D(A3)RLASTRREADYready base latencyvalid base latency


Latency : Definition of valid base & ready base : write exampleWrite Burst 4 example


Latency : including overlapped latency & excluding overlapped latency2 overlapping read bursts.cmd B’s valid base including overlapped latency is ‘9’ (T12-T3)cmd B’s valid base excluding overlapped latency is ‘3’ (T12-T9)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!