An Automatic Approach to Generate Haste Code from Simulink ...

More documents

Recommendations

Info

& sim_sum2 = proc (& Y1 ! chan VECTOR_17& A1 ? chan VECTOR_16& A2 ? chan VECTOR_16). begin& v_A1 : var VECTOR_16& v_A2 : var VECTOR_16| forever do// input acquisition( A1 ? v_A1 || A2 ? v_A2 )// output generation// (sim_sum2_f is imported); Y1 ! sim_sum2_f(.A1( v_A1 ), .A2( v_A2 ) )odendFigure 10. Haste shell for a 2-inputs adder.input and each output is listed as an input or an outputchannel respectively. In the body of the procedure onlythe interface operations are performed: inputs are read andoutputs are generated by the external function associated tothe block itself. Please mind the order of execution, indeedthe inputs are collected in parallel and obviously when allof them are available, the outputs can be generated.5.2.2. Sampling Blocks. Sampling blocks can have differentimplementations synchronized with a global clock, in orderto slow down the circuit operation (to make it operate ata certain Sampling Time) or completely asynchronous (seeSec. 4.2). In both modes the input data rate can differ fromthe output one. Using these blocks it is possible to make amulti-rate system in which the data rate is increased (usinga unit delay block) or decreased (using a zero orderhold block). Figure 11 shows the Haste description of suchblocks.5.2.3. RTL Processing Part / Parametric RTL Description.Each block has a set of parameters that can be configuredto make the module able to deal with different scenarios(serial or parallel input/output representation, different datawidth,. . . ) and all these parameters can be configured inthe VHDL description. For each block a HDL file will begenerated with all the desired parameters set and an RTLCompiler script that can synthesize it into a Verilog netlist.5.3. Simulink to CodeSimulink ConversionThe typical approach used to develop a design that shouldbe converted into hardware is to build a diagram using Code-Simulink blocks from the start. The advantage of startingwith CodeSimulink blocks instead of Simulink blocks isthat their simulation behavior matches that of their hardwareimplementation. Since the CodeSimulink block set is oneto-onecompatible with the standard Simulink one, we also& sim_ud = proc (& Y1 ! chan VECTOR_16& A1 ? chan VECTOR_16). begin& v_A1 : var VECTOR_16| forever do// output generation (oversampled)for 5 do ( Y1 ! v_A1 ) od// input acquisition; A1 ? v_A1odend(a)& sim_zoh = proc (& Y1 ! chan VECTOR_16& A1 ? chan VECTOR_16). begin& v_A1 : var VECTOR_16| forever do// input acquisitionfor 5 do ( A1 ? v_A1 ) od// output generation (undersampled); Y1 ! v_A1odend(b)Figure 11. Haste description of a “unit delay” 11(a) and ofa “zero order hold” 11(b) blocks both with a over- undersamplingratio of 5.provide a conversion utility which automatically converts apure Simulink model into a CodeSimulink one by settingthe parameters needed for the implementation according tothe simulation results of the model.5.4. System DescriptionNow that we have introduced the structure of each blockin the design, we will explain how the whole system isdescribed.The main Haste file is composed of different sections (SeeFig. 12):• the definition of the types used across the design;• the definition of the system interface;• the external RTL functions import;• the Haste declaration of each block;• the block instance and connection.6. Case Study: a Commercial Audio CODECTo test our methodology we apply it to a Simulink modelof a commercial Audio CODEC. Such a model describes oneof the two channels in a stereo audio chip implementing aSigma-Delta modulator [17].8
& VECTOR_16& VECTOR_17& VECTOR_32= type [0..2ˆ16-1]= type [0..2ˆ17-1]= type [0..2ˆ32-1]Table 4. Synthesis result comparisons of the same Simulinkmodel in different implementations. The designs have beenimplemented using a 180nm technology library.& datapath = main proc (& O ! chan VECTOR_32& A ? chan VECTOR_16& B ? chan VECTOR_16).begin// Internal channel declaration& Y1_6 : chan VECTOR_16 broad// ...// External function declaration& Sum = func (& A1 ? var VECTOR_16& A2 ? var VECTOR_16): VECTOR_16. import// ...// Haste shell description of each block& Sum_sh = proc (& Y1 ! chan VECTOR_17& A1 ? chan VECTOR_16& A2 ? chan VECTOR_16).begin& v_A1 : var VECTOR_16 := 0& v_A2 : var VECTOR_16 := 0|forever do( A1 ? v_A1 || A2 ? v_A2 ); Y1 ! Sum( .A1(v_A1), .A2(v_A2))odend// ...|// Block instance and connection// ...|| Sum_sh ( .Y1( Y1_6 ),.A1( Y1_8 ), .A2( Y1_3 ))// ...endFigure 12. Example of the Haste code generated for the mainprocedure.This model is quite complex, since it is composed of about150 blocks, including: about 30 16-bit wide multiplicationby constant values, 15 8-bit wide multipliers, and 30 16-bit wide adders. It has been used to develop a hand-writtenimplementation in Haste. Thanks to the collaboration withan industrial partner we had access to synthesis resultsof this asynchronous hand-written version and we couldcompare this with the Haste version generated by our tool.Comparisons for both versions are based on optimized prelayoutnetlists mapped onto the same technology library.The results of this analysis are reported in Tab. 4. Inthis table we compare the hand written Haste code withtwo versions of the automatically generated one: the first isDesign Hand written Automatic GeneratedTool TiDE 5.2 TiDE 5.2 TiDE 6.0Sequentialµm 2 32018 89792 11632Logic 138244 357368 152468Totalµm 2 173694 468746 164100Overhead — +170% -5.5%Coding time about 1 week 20 minutespassed through version 5.2 of TiDE flow, while the secondhas been processed with the new pre-release version (6.0).Unfortunately it was not possible to compile hand-writtenversion with the TiDE 6.0 flow, since it does not supportanymore some low level constructs available in the oldrelease. We can notice a number of differences between thethree versions proposed. The designs are not architecturallythe same, since the number of registers is not the same inall of them. This is due to the code generated (or written):• for the hand-written code, most of the blocks in theSimulink model have been implemented using Hastefunctions [15]. The number of blocks for which thedesigner decided to insert registers is small comparedto the total number of blocks.• for the TiDE 5.2 version, each block has registers onits inputs, which results in a high overhead, since manyof them are not required.• for the TiDE 6.0 version, the compiler automaticallydecides the minimum number of registers required forthe described circuit.For the reasons above, we can conclude that at the momentthe code generated automatically and compiled with theTiDE 6.0 version represents the lower bound with respect tothe number of registers. On the other hand the same designcompiled with the 5.2 version is the upper bound, since thegranularity at the Simulink level is very fine-grained.Since our work was targeted for the TiDE 6.0 version,the results shown in Tab. 4 are promising. The achievedimplementation based on this new flow 2 requires less areathan the hand-written counterpart.In order to guarantee the circuit equivalence, we simulatethe netlist generated from TiDE 5.2 of the hand-writtencode and the automatically generated one with the sametest bench. Since we had not access to the testbenchesused to develop the original version, we had to create anew tesbench based on the data streams derived from theSimulink simulation. Because we are still working on afeature which generates input patterns directly from the2. At the moment TiDE 6.0 is not complete; indeed some operationshave to be performed by hand, but the optimizations performed by the toolare stable and will not change significantly with the official tool release.9
Page 3 and 4: generated by CodeSimulink, in order
Page 5 and 6: B: process( i0?chan [0..255]& i1?ch
Page 7: Simulink ModelHDL CodeCodeSimulink

An Automatic Approach to Generate Haste Code from Simulink ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?