13.07.2015 Views

An Automatic Approach to Generate Haste Code from Simulink ...

An Automatic Approach to Generate Haste Code from Simulink ...

An Automatic Approach to Generate Haste Code from Simulink ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 2. Comparisons among different block using channels,shared variables, with state or without state implementation.(These results refer <strong>to</strong> a different implementation of thedesign depicted in Fig. 3 and are in number of gates.)Implementation choices Area [µm 2 ]Registers Channels Variables Memory C-gates TotalX X - 15857.6 1441.0 54829.1X - X 15857.6 490.6 54140.9- X - 0 1134.4 44470.9- - X 0 367.9 43683.8Table 3. Comparisons among different coding styles for thedesign depicted in Fig. 3.Design Tuple Registers Area [µm 2 ]Not Used Used Not Used Used Total/C-gatesX - - X 11454.9/4804.4Datapath- X - X 11883.8/4792.2X - X - 4067.3/438.3- X X - 3670.3/254.5cheap, but they require explicit synchronization betweenreaders and writers in order <strong>to</strong> avoid data miss and dataduplication, since registers are shared between the writerand the readers.Channels on the other hand au<strong>to</strong>matically synchronizeinput and output actions of modules running in paralleland thereby guarantee a correct timing relationship betweenthe read and write actions. Their implementation is moreexpensive in terms of area than a shared variable (around1.5%, see Tab. 2 for further details). To keep the conversionof the <strong>Simulink</strong> model <strong>to</strong> <strong>Haste</strong> straightforward, we wouldlike <strong>to</strong> avoid explicit synchronization between modules.Therefore we choose <strong>to</strong> use channels instead of sharedvariables.A channel is a communication mechanism shared betweendifferent objects with at least one transmitter and at leas<strong>to</strong>ne receiver. The implementation of a channel relies on thebundled data approach. This implementation consists of adata part and a control part. The control part takes care ofthe communication pro<strong>to</strong>col and the required delay matchingof the data part.The simplest way <strong>to</strong> describe the way the blocks communicatein a <strong>Simulink</strong> diagram is using separate channelsfor each input/output. This solution is straightforward <strong>to</strong>implement, but it can be more expensive since every inputhas its own control logic.<strong>Haste</strong> allows the user <strong>to</strong> group <strong>to</strong>gether data channels,thereby sharing handshake control circuitry. Such a multipledatachannel is called tuple channel. This solution requiresless area. Deadlock can be introduced however due <strong>to</strong> theIo!IAi?v ; o! A(v)Bi?[[ao,io]]; o! B( ao, io )Figure 4. Example of a <strong>Simulink</strong> model that can lead <strong>to</strong> adeadlock (see Fig. 5)fact that all the input communications are synchronized<strong>to</strong>gether, therefore not allowing individual completion.A typical example is the one depicted in Fig. 5: blockA needs <strong>to</strong> have a complete handshake on its inputs <strong>to</strong>compute; block I needs <strong>to</strong> wait until all the blocks fed byits output have captured its value before continuing. Forthis reason, before concluding the communication with A, itneeds <strong>to</strong> wait for the completion of the communication withB. However, B cannot finish its communication with I untilit receives data <strong>from</strong> A and this can never happen, sinceA cannot compute until it finishes its input communicationwith I. So the system is stuck waiting for a condition thatwill never happen.4.1.2. Functions or Procedures. A module in <strong>Haste</strong> canbe described as a fully combinational block or as a blockwith registers (Fig. 6). Data-flow networks usually do notinclude stages (since data is processed <strong>from</strong> input <strong>to</strong> outputcontinuously). However, in order <strong>to</strong> <strong>to</strong> increase systemthroughput decoupling stages (i.e. registers or latches) canbe required. The results presented in Tab. 3 show a largedifference in terms of area for the two implementations. As adesign trade off exists between area and speed, it is possible<strong>to</strong> choose the desired implementation.4.1.3. Register Placement. As previously mentioned, <strong>Simulink</strong>models do not have the concept of registers as is usualin digital design. Most standard blocks perform operationsregardless of the concept of time. Only a few blocks arerelated <strong>to</strong> timing events. We will come back <strong>to</strong> these blockslater.Registers are necessary <strong>to</strong> achieve performance, but wehave <strong>to</strong> decide where <strong>to</strong> insert them. Since each <strong>Simulink</strong>block has only one output, whereas it can have more thanone input, it is natural <strong>to</strong> insert registers on the outputin order <strong>to</strong> optimize area. Using the <strong>Haste</strong> language it isdifficult <strong>to</strong> describe such an implementation, since when youget data <strong>from</strong> one or more input channels you have <strong>to</strong> s<strong>to</strong>rethem in<strong>to</strong> registers, and this results in latching the inputs.In the present version of the TiDE flow (5.2) the compilerwill put registers where the designer has inserted them in the<strong>Haste</strong> description. In the future release (6.0), the compilercan optimize the number of registers au<strong>to</strong>matically given therequired number of decoupling stages. For this reason weOi?v4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!