Sequential Logic Synthesis with Retiming in Encounter ... - CiteSeerX
Sequential Logic Synthesis with Retiming in Encounter ... - CiteSeerX
Sequential Logic Synthesis with Retiming in Encounter ... - CiteSeerX
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Sequential</strong> <strong>Logic</strong> <strong>Synthesis</strong> <strong>with</strong> <strong>Retim<strong>in</strong>g</strong> <strong>in</strong> <strong>Encounter</strong> RTL Compiler (RC)<br />
Christoph Albrecht 1 , Shrirang Dhamdhere 1 , Suresh Nair 1 , Krishnan Palaniswami 2 , Sascha Richter 1<br />
1 Cadence Design Systems, 2 Focus Semiconductor<br />
Session Track: Digital IC Design<br />
Session Number: 2.3<br />
Relevant Cadence Products: <strong>Encounter</strong> RTL Compiler (RC), <strong>Encounter</strong> Conformal <strong>Logic</strong> Equivalence<br />
Checker (LEC)<br />
Abstract<br />
Typical ASIC designs are highly unbalanced <strong>with</strong> respect to the tim<strong>in</strong>g criticality of their comb<strong>in</strong>ational logic<br />
paths. This is ma<strong>in</strong>ly due to the ad-hoc manual design specification of the register transfer level (RTL),<br />
which does not use any <strong>in</strong>formation regard<strong>in</strong>g the sequential tim<strong>in</strong>g criticality. Traditional logic synthesis<br />
does not support “borrow<strong>in</strong>g” of tim<strong>in</strong>g slack across registers, and the optimization is restricted by fixed<br />
positions of the registers. This may result <strong>in</strong> a suboptimal solution, <strong>in</strong> a loss of performance, and<br />
unnecessary area and power consumption.<br />
This paper expla<strong>in</strong>s the concept of clock schedul<strong>in</strong>g and retim<strong>in</strong>g used by <strong>Encounter</strong> RTL Compiler (RC) to<br />
optimize across register boundaries. <strong>Retim<strong>in</strong>g</strong> is a structural transformation which changes the positions of<br />
the registers <strong>with</strong>out modify<strong>in</strong>g the <strong>in</strong>put-output behavior of the circuit. The reader will understand how the<br />
area, the number of registers, or the delay of the design is m<strong>in</strong>imized. Computational results show the<br />
tradeoff between these two objectives.<br />
Practical applications are discussed: Registers may have different control signals, enable signals, or reset<br />
signals. This leads to the multiclass retim<strong>in</strong>g problem and the reset l<strong>in</strong>e justification problem.<br />
<strong>Retim<strong>in</strong>g</strong> used to be a difficult challenge for equivalence check<strong>in</strong>g. However, together <strong>with</strong> <strong>Encounter</strong><br />
Conformal <strong>Logic</strong> Equivalence Checker (LEC) the verification is now simple: RC writes out checkpo<strong>in</strong>t netlist<br />
files and one script, which LEC can then process to automatically verify the golden RTL aga<strong>in</strong>st the f<strong>in</strong>al<br />
netlist.<br />
We present a case study show<strong>in</strong>g how retim<strong>in</strong>g was used by Focus Semiconductor, a division of Focus<br />
Enhancements, on a 1.5 M <strong>in</strong>stance UWB baseband chip. <strong>Retim<strong>in</strong>g</strong> substantially improved the Quality of<br />
Results (QoR) and helped to meet the design objectives.<br />
CDNLive! Silicon Valley 2006 1
1 Introduction<br />
Traditional comb<strong>in</strong>atorial logic synthesis focuses all the optimization efforts on the comb<strong>in</strong>ational paths<br />
between the registers. It does not support any tradeoff between tight paths and loose paths when these are<br />
separated by registers.<br />
To motivate the use of sequential logic synthesis <strong>with</strong> retim<strong>in</strong>g, we will discuss the slack distribution of a<br />
typical ASIC design.<br />
Figure 1: Slack distribution of a<br />
typical ASIC design.<br />
Figure 1 shows the slack distribution, more specifically the distribution of the setup slacks of a late-mode<br />
analysis after synthesis. For each slack <strong>in</strong>terval on the x-axis, the number of comb<strong>in</strong>ational paths which<br />
have a slack value <strong>with</strong><strong>in</strong> that <strong>in</strong>terval is shown. The design has a worst negative slack of -529 ps.<br />
Figure 2: Slack distribution of the<br />
same ASIC design for which the<br />
slack distribution is shown <strong>in</strong><br />
Figure 1, however this time <strong>with</strong><br />
optimized clock latencies.<br />
Figure 2 shows for the same design an optimized slack distribution. The netlist was not changed, only the<br />
clock latencies at the registers. The latencies were computed <strong>with</strong> a slack balanc<strong>in</strong>g algorithm which we will<br />
discuss later. The number of critical paths has decreased drastically. Only a small fraction of the paths<br />
have a negative latency. In this case it was not possible to improve the worst negative slack, because the<br />
worst path <strong>in</strong> this design is a path from a primary <strong>in</strong>put to a primary output.<br />
The two figures, Figure 1 and Figure 2, impressively demonstrate the optimization potential which becomes<br />
available when the registers are unlocked and not kept fixed as hard boundaries, which constra<strong>in</strong>s the<br />
synthesis optimization algorithms. With the optimized clock latencies, many paths become uncritical. The<br />
additional slack can be used to downsize the comb<strong>in</strong>ational gates or even to use a different logic structure<br />
that has smaller area and power consumption.<br />
While clock schedul<strong>in</strong>g was not able to reduce the worst negative slack for this specific design, clock<br />
schedul<strong>in</strong>g was able to improve the slack of the side paths. These are either comb<strong>in</strong>ational paths that start<br />
CDNLive! Silicon Valley 2006 2
at the primary <strong>in</strong>put of the critical path and end at a register or paths that start at a register and end at a<br />
primary output. This is helpful for the synthesis optimization algorithms <strong>in</strong> RC. RC is able to improve the<br />
slack of a path by us<strong>in</strong>g slack of the side paths.<br />
In this paper we discuss the two sequential optimization techniques, clock schedul<strong>in</strong>g and retim<strong>in</strong>g, and<br />
show how the comb<strong>in</strong>ation of both these techniques is used <strong>in</strong> RC. The paper is organized as follows:<br />
In Section 2 we discuss clock schedul<strong>in</strong>g. Clock schedul<strong>in</strong>g is also known as useful skew. It changes the<br />
latencies of the clock signal but does not change the logic. The different latencies need to be realized by a<br />
sophisticated clock network.<br />
In Section 3 we describe retim<strong>in</strong>g. <strong>Retim<strong>in</strong>g</strong> is a structural transformation. While retim<strong>in</strong>g does not change<br />
the comb<strong>in</strong>ational gates, it modifies the netlist by mov<strong>in</strong>g the registers forward and backward <strong>in</strong> the logic.<br />
RC can use clock schedul<strong>in</strong>g as an <strong>in</strong>termediate step to drive the logic synthesis and optimization process.<br />
Ultimately, it realizes the different latencies by retim<strong>in</strong>g so that a conventional zero or limited skew backend<br />
flow can place the design, construct the clock network, and route the nets. This is described <strong>in</strong> Section 4.<br />
In practice, retim<strong>in</strong>g can be constra<strong>in</strong>ed by registers that have different control signals (for example, enable<br />
signals, asynchronous set or reset signals). Section 5 discusses these constra<strong>in</strong>ts.<br />
In Section 6 we discuss the automatic verification flow <strong>with</strong> LEC.<br />
In the last section we present a case study how retim<strong>in</strong>g was used on an UWB baseband chip from Focus<br />
Semiconductors.<br />
2 Clock Schedul<strong>in</strong>g<br />
The follow<strong>in</strong>g figure shows how the worst slack of a design can be improved by chang<strong>in</strong>g the clock<br />
latencies: Buffers are added to the clock distribution network and the switch<strong>in</strong>g time of the register is<br />
delayed. In this case the worst slack is improved from -2 ns to 0 ns and the design meets the tim<strong>in</strong>g<br />
requirements. If the clock latency of the captur<strong>in</strong>g register of a comb<strong>in</strong>ational path is <strong>in</strong>creased, the slack of<br />
the comb<strong>in</strong>ational path <strong>in</strong>creases by the same amount. If, on the other hand, the clock latency for the<br />
captur<strong>in</strong>g register is decreased, the slack of the comb<strong>in</strong>ational path decreases. Increas<strong>in</strong>g the clock latency<br />
of the launch<strong>in</strong>g register decreases the slack and decreas<strong>in</strong>g the latency has the opposite effect on the slack<br />
of the path.<br />
4 ns<br />
3 ns<br />
3 ns<br />
2 ns<br />
3 ns<br />
1 ns 2 ns<br />
1 ns 1 ns<br />
clock<br />
+ 2 ns + 1 ns<br />
+ 1 ns<br />
Target clock period: 5 ns Worst slack <strong>with</strong>out clock latencies:<br />
Worst slack <strong>with</strong> clock latencies:<br />
- 2 ns<br />
0 ns<br />
Figure 3: The worst slack<br />
is improved by adjust<strong>in</strong>g<br />
the clock latencies.<br />
CDNLive! Silicon Valley 2006 3
A l<strong>in</strong>ear programm<strong>in</strong>g formulation<br />
The clock schedul<strong>in</strong>g problem can be formulated as a l<strong>in</strong>ear program. This was first done by Fishburn<br />
<strong>in</strong> 1990 [1]. Let T be the clock period. The clock period should be m<strong>in</strong>imized. Furthermore, let l i be the<br />
latency of the clock signal arriv<strong>in</strong>g at register i, and let d ij be the maximum delay of all comb<strong>in</strong>ational path<br />
from register i to register j.<br />
m<strong>in</strong> T<br />
subject to l i + d ij ≤ l j + T for all comb<strong>in</strong>ational paths (i, j).<br />
The difference <strong>in</strong> the <strong>in</strong>equality is the slack. Should the design have constra<strong>in</strong>ed primary <strong>in</strong>puts or outputs,<br />
we can represent all these <strong>in</strong>puts and outputs by one dummy register that can have, <strong>with</strong>out loss of<br />
generality, a clock latency of zero. Hence, we can assume that even <strong>in</strong> this case the l<strong>in</strong>ear program has the<br />
form above.<br />
The l<strong>in</strong>ear program is a very special l<strong>in</strong>ear program and it can be solved efficiently <strong>with</strong> comb<strong>in</strong>atorial<br />
algorithms. It can be proved that the m<strong>in</strong>imum clock period achievable by clock schedul<strong>in</strong>g is equal to the<br />
maximum average path delay of all cycles <strong>in</strong> the register-to-register tim<strong>in</strong>g graph. The register-to-register<br />
tim<strong>in</strong>g graph conta<strong>in</strong>s a node for every register and an edge whenever there is a comb<strong>in</strong>ational path<br />
between the registers <strong>with</strong> a weight equal to the maximum delay of these paths.<br />
In general, the l<strong>in</strong>ear program does not have one s<strong>in</strong>gle solution. However, any solution that m<strong>in</strong>imizes the<br />
clock period is usually not desirable. For example, we exam<strong>in</strong>ed the ASIC design for which the two different<br />
slack distributions are shown <strong>in</strong> Figure 1 and Figure 2. The worst negative slacks of the two slack<br />
distributions are equal and so are the clock periods at which the chips can operate <strong>with</strong>out failure.<br />
Clock schedul<strong>in</strong>g optimally balanc<strong>in</strong>g the slack<br />
In the follow<strong>in</strong>g we discuss how it is possible to compute a clock schedule <strong>with</strong> a specific property which we<br />
call optimally balanced slack. As a result of this property many paths are uncritical and have a lot of slack.<br />
This part is more theoretical and if the time of the reader is limited, we recommend skipp<strong>in</strong>g this part<br />
because the sections follow<strong>in</strong>g are more important for the practical use.<br />
We consider a small example circuit <strong>with</strong> four registers, a, b, c, and d, shown <strong>in</strong> Figure 5.<br />
2<br />
7<br />
a<br />
5<br />
6<br />
b<br />
4<br />
5<br />
d<br />
9<br />
c<br />
Figure 4: Example circuit <strong>with</strong><br />
comb<strong>in</strong>ational gates and four<br />
registers. The numbers specify<br />
the delay of the gates.<br />
From the circuit we can construct the register-to-register tim<strong>in</strong>g graph which is shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />
The graph has one node for each of the four registers and an edge between two nodes whenever there is a<br />
comb<strong>in</strong>ational path between the correspond<strong>in</strong>g registers. Associated <strong>with</strong> the edges is the maximum delay<br />
of the comb<strong>in</strong>ational paths.<br />
CDNLive! Silicon Valley 2006 4
a<br />
6<br />
b<br />
9 9<br />
7<br />
11 9<br />
c<br />
5<br />
9<br />
d<br />
Figure 5: Register-to-register<br />
tim<strong>in</strong>g graph for the circuit <strong>in</strong><br />
Figure 4.<br />
Without clock latencies, the m<strong>in</strong>imum feasible clock period for this circuit is equal to the maximum delay of<br />
the comb<strong>in</strong>ational paths, <strong>in</strong> this case T = 11. By <strong>in</strong>creas<strong>in</strong>g the clock latency for the register b to +1, the<br />
clock period can be decreased to T = 10. This is the m<strong>in</strong>imum clock period which can be achieved by clock<br />
schedul<strong>in</strong>g, because <strong>with</strong> these latencies the two paths (b,d) and (d,b) have a slack of zero. Figure 6 shows<br />
the register-to-register tim<strong>in</strong>g graph <strong>with</strong> the latency +1 at register b. In addition to the comb<strong>in</strong>ational delays<br />
we show also the slacks for the clock period T = 10 <strong>in</strong> brackets.<br />
9<br />
(1)<br />
a<br />
c<br />
9<br />
(1)<br />
6 (5)<br />
7 (3)<br />
5 (5)<br />
9 (1)<br />
11<br />
(0)<br />
b<br />
d<br />
+1<br />
9<br />
(0)<br />
clock period<br />
T = 10<br />
delay<br />
(slack)<br />
Figure 6: A clock schedule<br />
applied to the registers such that<br />
the worst <strong>in</strong>com<strong>in</strong>g slack equals<br />
the worst outgo<strong>in</strong>g slack for every<br />
register. The edges<br />
correspond<strong>in</strong>g to the critical paths<br />
<strong>with</strong> a slack smaller than or equal<br />
to 1 are shown <strong>in</strong> red.<br />
The clock schedule shown <strong>in</strong> Figure 6 has the property that for every register the worst <strong>in</strong>com<strong>in</strong>g slack is<br />
equal to the worst outgo<strong>in</strong>g slack. Chang<strong>in</strong>g the clock latency of one s<strong>in</strong>gle register alone does not give an<br />
improvement, s<strong>in</strong>ce the worst slack of all the paths start<strong>in</strong>g or end<strong>in</strong>g at the register can only get worse.<br />
The Figure 6 shows that there is one critical edge <strong>in</strong> red, the edge (d,c), which is not part of a critical cycle. It<br />
is possible to <strong>in</strong>crease the slack of this edge by <strong>in</strong>creas<strong>in</strong>g the clock latency of the registers a and c<br />
simultaneously. This does not affect the two critical edges (c,a) and (a,c). The result is shown <strong>in</strong> Figure 7.<br />
In this figure the worst <strong>in</strong>com<strong>in</strong>g slack equals the worst outgo<strong>in</strong>g slack for every subset of the registers.<br />
Note that before, <strong>in</strong> Figure 6, the worst outgo<strong>in</strong>g slack for the registers a and b together is equal to 5<br />
whereas the worst <strong>in</strong>com<strong>in</strong>g slack is only 1.<br />
+2<br />
9<br />
(1)<br />
+2<br />
a<br />
c<br />
9<br />
(1)<br />
6 (3)<br />
7 (5)<br />
5 (3)<br />
9 (3)<br />
b<br />
+1<br />
11 9<br />
(0) (0)<br />
d<br />
clock period<br />
T = 10<br />
Figure 7: An optimally balanced<br />
clock schedule: The worst<br />
<strong>in</strong>com<strong>in</strong>g slack equals the worst<br />
outgo<strong>in</strong>g slack for every subset of<br />
the registers.<br />
CDNLive! Silicon Valley 2006 5
The clock schedule shown <strong>in</strong> Figure 2 on page 2, <strong>in</strong> which the number of critical paths has decreased so<br />
drastically, has exactly this property. It is computationally too expensive to consider all subsets of the<br />
registers, because there are exponentially many cycles. Nevertheless, the efficient m<strong>in</strong>imum mean balance<br />
algorithm by Young, Taran and Orl<strong>in</strong> [3] can f<strong>in</strong>d such a solution by iteratively f<strong>in</strong>d<strong>in</strong>g critical cycles and<br />
contract<strong>in</strong>g them.<br />
For synthesis operations it is helpful if the side paths of a critical path have additional slack. The slack can<br />
be used to reduce the delay of the critical path. An example for such a synthesis operation is Shannon<br />
decomposition shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />
comb<strong>in</strong>ational<br />
logic<br />
x<br />
0<br />
x<br />
a<br />
critical path<br />
a<br />
1<br />
Figure 8: A critical path becomes<br />
short and fast us<strong>in</strong>g Shannon<br />
decomposition.<br />
If only one path start<strong>in</strong>g at a po<strong>in</strong>t a and end<strong>in</strong>g at a po<strong>in</strong>t x is critical and all other paths end<strong>in</strong>g at x are<br />
uncritical, then the fan<strong>in</strong> logic of x can be duplicated twice, once the value of a is permanently set to zero<br />
and once it is set to one. The two outputs of the replicated logic feed a multiplexer that chooses the right<br />
value for x depend<strong>in</strong>g on the value for a. The constant values for a are propagated to simplify the logic.<br />
After this transformation the path from a to x is very short and hence very fast.<br />
Limitations of clock schedul<strong>in</strong>g<br />
Clock schedul<strong>in</strong>g has limitations. Chang<strong>in</strong>g the clock latencies may <strong>in</strong>crease the number of hold violations.<br />
The hold constra<strong>in</strong>t ensures that data signals do not arrive too early at the data <strong>in</strong>put p<strong>in</strong> of the register at<br />
the end of the path. The signal has to arrive after the register has closed. A high number can potentially<br />
lead to an enormous number of hold buffers, which need to be added at the end of the flow. Due to process<br />
variations the f<strong>in</strong>al delay of the paths on the fabricated chip can deviate from the computed delay. This<br />
limits the use of clock schedul<strong>in</strong>g further. For example, it is not possible to have a long comb<strong>in</strong>ational path<br />
that has a comb<strong>in</strong>ational delay equal to ten times the clock period and realize the tim<strong>in</strong>g constra<strong>in</strong>ts by<br />
adjust<strong>in</strong>g the latencies of the clock signals at the launch<strong>in</strong>g and receiv<strong>in</strong>g register. On such a comb<strong>in</strong>ational<br />
path there would be 10 different data signals at the same time. These signals need to arrive at the receiv<strong>in</strong>g<br />
register at the right time. If the comb<strong>in</strong>ational delay of the path were only 10% smaller on the f<strong>in</strong>al fabricated<br />
chip due to process variations, the signal would arrive too early and this would result <strong>in</strong> a hold time violation.<br />
As the delay could also <strong>in</strong>crease, it is not possible to fix this hold violation by add<strong>in</strong>g additional delay <strong>with</strong><br />
hold buffers.<br />
Nevertheless, RC can use <strong>in</strong>ternally large positive and negative clock latencies and optimize the<br />
comb<strong>in</strong>ational logic <strong>with</strong> these latencies. In the end, the latencies are realized by retim<strong>in</strong>g and mov<strong>in</strong>g the<br />
registers through the comb<strong>in</strong>ational logic. The latencies are only bounded by the number and the movement<br />
of the registers.<br />
CDNLive! Silicon Valley 2006 6
3 <strong>Retim<strong>in</strong>g</strong><br />
<strong>Retim<strong>in</strong>g</strong> is a powerful sequential optimization technique which overcomes the limitations of clock<br />
schedul<strong>in</strong>g. <strong>Retim<strong>in</strong>g</strong> moves the registers across the comb<strong>in</strong>ational logic to improve the performance<br />
<strong>with</strong>out chang<strong>in</strong>g the <strong>in</strong>put/output behavior of the circuit.<br />
The follow<strong>in</strong>g figure shows the slack of a circuit can be improved by retim<strong>in</strong>g. It is the same circuit for which<br />
we applied clock schedul<strong>in</strong>g <strong>in</strong> Figure 4. The registers are retimed backward aga<strong>in</strong>st the direction of the<br />
signal propagation.<br />
4 ns 3 ns 3 ns<br />
2 ns 3 ns 1 ns 2 ns<br />
1 ns<br />
1 ns<br />
Target clock period: 5 ns Worst slack before retim<strong>in</strong>g: - 2 ns<br />
4 ns 3 ns 3 ns<br />
2 ns<br />
3 ns<br />
1 ns<br />
2 ns<br />
Worst slack after retim<strong>in</strong>g:<br />
1 ns 1 ns Figure 9: The worst slack is<br />
improved by retim<strong>in</strong>g the registers<br />
0 ns<br />
backward aga<strong>in</strong>st the direction of<br />
the signal propagation.<br />
This example shows that retim<strong>in</strong>g changes the number of registers. In this case, the number of registers<br />
<strong>in</strong>creases. However, the number of registers can also decrease. RC m<strong>in</strong>imizes the clock period as a first<br />
objective. Among all possible retim<strong>in</strong>g solutions that achieve the m<strong>in</strong>imum clock period, RC f<strong>in</strong>ds the<br />
solution <strong>with</strong> the m<strong>in</strong>imum number of registers. In addition, RC has the option to m<strong>in</strong>imize the number of<br />
registers <strong>with</strong>out <strong>in</strong>creas<strong>in</strong>g the current clock period.<br />
Any retim<strong>in</strong>g can be achieved by a sequence of two elementary retim<strong>in</strong>g steps: Forward retim<strong>in</strong>g removes<br />
the registers at the <strong>in</strong>put of a gate and creates new registers at the outputs. Backward retim<strong>in</strong>g does the<br />
opposite: It removes the registers at the output and creates a new register at each <strong>in</strong>put. The two retim<strong>in</strong>g<br />
steps are shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />
forward retim<strong>in</strong>g<br />
backward retim<strong>in</strong>g<br />
Figure 10: Registers retimed<br />
forward and backward over an<br />
AND gate.<br />
For forward retim<strong>in</strong>g it is necessary that each <strong>in</strong>put of the gate is driven by a register. Similarly, for backward<br />
retim<strong>in</strong>g the gate must not drive any comb<strong>in</strong>ational gate but only registers.<br />
In order to ensure equivalent <strong>in</strong>put / output behavior of the circuit, retim<strong>in</strong>g cannot change the number of<br />
registers on any loop and on any path from a primary <strong>in</strong>put to a primary output path. This is guaranteed by<br />
the two operations. Of course, it may still be possible to retime registers forward or backward over a gate if<br />
CDNLive! Silicon Valley 2006 7
this condition does not hold for the orig<strong>in</strong>al circuit, but the condition has to be achieved by elementary<br />
retim<strong>in</strong>g steps applied for the other gates before.<br />
Constants and dangl<strong>in</strong>g logic (logic that does not drive anyth<strong>in</strong>g) are an exception. Constant propagation as<br />
part of the RC synthesis operations simplifies any logic driven by a constant, unless the gates are preserved<br />
by an attribute. Similarly, dangl<strong>in</strong>g logic is removed. However, should this logic be preserved, retim<strong>in</strong>g is<br />
able to create or remove registers at constants and dangl<strong>in</strong>g logic.<br />
The follow<strong>in</strong>g figure shows an example <strong>in</strong> which retim<strong>in</strong>g cannot improve the critical path because no<br />
elementary retim<strong>in</strong>g step is possible:<br />
A<br />
B<br />
2 3 3 4<br />
C<br />
Figure 11: An example <strong>in</strong> which<br />
retim<strong>in</strong>g cannot improve the clock<br />
period because the register<br />
cannot be moved forward.<br />
Depend<strong>in</strong>g on the clk-to-q delay of the register, the critical path goes from the register to the primary output<br />
C. If the primary <strong>in</strong>puts are even unconstra<strong>in</strong>ed, then the critical path starts at the register <strong>in</strong> any case. Just<br />
check<strong>in</strong>g the slack at the data <strong>in</strong>put p<strong>in</strong> and the output p<strong>in</strong> of the register, the user may wonder why the<br />
register was not moved forward. This is not possible, because there is no register follow<strong>in</strong>g directly the<br />
primary <strong>in</strong>put B.<br />
Efficient algorithms for retim<strong>in</strong>g have been developed and published. We refer the <strong>in</strong>terested reader to the<br />
fundamental paper by Leiserson and Saxe published <strong>in</strong> 1991 [2] <strong>in</strong> which the problem of f<strong>in</strong>d<strong>in</strong>g a retim<strong>in</strong>g<br />
realiz<strong>in</strong>g a given clock period and m<strong>in</strong>imiz<strong>in</strong>g the number of registers is formulated and solved as a m<strong>in</strong>imum<br />
cost flow problem. Polynomial time algorithms have been developed for this problem. A comprehensive<br />
book about tim<strong>in</strong>g <strong>in</strong> general and clock schedul<strong>in</strong>g and retim<strong>in</strong>g is the recent book by S. Sapatnekar [5].<br />
Relationship between clock schedul<strong>in</strong>g and retim<strong>in</strong>g<br />
The two sequential optimization techniques, clock schedul<strong>in</strong>g and retim<strong>in</strong>g are related: It can be proved that<br />
the clock period achievable by clock schedul<strong>in</strong>g (ignor<strong>in</strong>g any hold constra<strong>in</strong>ts) is a lower bound on the clock<br />
period that can be achieved by retim<strong>in</strong>g [3]. It can also be proved that retim<strong>in</strong>g can almost achieve this clock<br />
period: The m<strong>in</strong>imum clock period achievable by retim<strong>in</strong>g is at most the m<strong>in</strong>imum clock period achievable<br />
by clock schedul<strong>in</strong>g plus the maximum delay of all gates.<br />
If a clock schedule is given a retim<strong>in</strong>g can be computed as follows: F<strong>in</strong>d a register <strong>with</strong> the maximum<br />
positive clock latency. Decrease the clock latency until the <strong>in</strong>com<strong>in</strong>g slack is zero. If the slack is already<br />
zero, perform a backward retim<strong>in</strong>g over the gate driv<strong>in</strong>g the register. The new registers added <strong>in</strong> front of the<br />
gate get a clock latency equal to the latency of the orig<strong>in</strong>al registers m<strong>in</strong>us the delay of the gate. This<br />
procedure is repeated until the clock latency of each register is smaller than half the delay of the gate driv<strong>in</strong>g<br />
the register. Then a similar procedure is applied for registers <strong>with</strong> the m<strong>in</strong>imum negative clock latency. The<br />
registers are moved forward and the clock latency is <strong>in</strong>creased by the delay of the gate until the clock<br />
latency of each register is larger than the negative value of half the delay of the gate driven by the register.<br />
If the clock latency of every register is then set to zero, then the retimed circuit has a clock period of which is<br />
at most the clock period of the orig<strong>in</strong>al circuit <strong>with</strong> clock schedul<strong>in</strong>g plus the maximum delay of all gates.<br />
CDNLive! Silicon Valley 2006 8
4 The global sequentially driven synthesis flow <strong>in</strong> RC<br />
RC comb<strong>in</strong>es the two sequential optimization techniques, clock schedul<strong>in</strong>g and retim<strong>in</strong>g, <strong>in</strong> a global<br />
sequential synthesis flow shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />
sequentially driven synthesis<br />
and optimization<br />
comb<strong>in</strong>ational<br />
synthesis<br />
clock<br />
schedul<strong>in</strong>g<br />
retim<strong>in</strong>g<br />
comb<strong>in</strong>ational synthesis<br />
Figure 12: The global<br />
sequentially driven synthesis flow<br />
<strong>in</strong> RC<br />
The logic synthesis and optimization algorithms are tightly <strong>in</strong>terl<strong>in</strong>ked <strong>with</strong> clock schedul<strong>in</strong>g. Clock<br />
schedul<strong>in</strong>g computes clock latencies which improve the clock period and the slack of the comb<strong>in</strong>atorial<br />
paths. The synthesis algorithms can use slack of side paths to further improve critical paths. In the next<br />
step, retim<strong>in</strong>g moves the registers through the comb<strong>in</strong>ational logic. It m<strong>in</strong>imizes the clock period and as<br />
second objective m<strong>in</strong>imizes the number of registers. Ultimately, retim<strong>in</strong>g is followed once more by<br />
comb<strong>in</strong>ational synthesis. This is necessary because the loads of the gates have changed as the registers<br />
were moved.<br />
RC performs these steps automatically. The user only has to set the attribute “retime” to true for either the<br />
top design or the subdesigns for which retim<strong>in</strong>g should be performed and then call the “synthesize”<br />
command.<br />
5 Special cases for retim<strong>in</strong>g<br />
In this section we describe special cases for retim<strong>in</strong>g due to control signals at the registers. The control<br />
signals at the registers may constra<strong>in</strong> the movement of the registers. First we discuss the retim<strong>in</strong>g of<br />
registers <strong>with</strong> enable signals. Then we describe the case when registers <strong>with</strong> an enable signal are<br />
implemented by a simple register <strong>with</strong> a multiplexer feedback loop. F<strong>in</strong>ally, we discuss asynchronous set<br />
and reset signals.<br />
<strong>Retim<strong>in</strong>g</strong> of registers <strong>with</strong> different enable signals<br />
In practice, the retim<strong>in</strong>g of the registers can be constra<strong>in</strong>ed: The registers <strong>in</strong> the circuit may have different<br />
control signals, for example enable signals. <strong>Retim<strong>in</strong>g</strong> cannot comb<strong>in</strong>e registers which have different control<br />
signals. Figure 13 shows an example. To improve the tim<strong>in</strong>g, the two registers should be comb<strong>in</strong>ed and<br />
retimed backward. However, this is not possible because the two registers receive different enable signals.<br />
RC can comb<strong>in</strong>e and retime registers forward or backward only if they receive the same enable signals.<br />
CDNLive! Silicon Valley 2006 9
en<br />
1<br />
clock<br />
enable 1<br />
enable 2<br />
4 7 5<br />
en<br />
2<br />
Figure 13: The two registers<br />
cannot be moved backward<br />
because they receive different<br />
enable signals.<br />
Multiplexer feedback loop<br />
Registers <strong>with</strong> an enable signal can also be implemented by a simple register and a multiplexer. This may<br />
be an advantage for retim<strong>in</strong>g because the registers can then be merged even though the enable signals are<br />
different. It may, however, also constra<strong>in</strong> the register movement and <strong>in</strong>crease the number of registers.<br />
Figure 14 shows that the number of registers can be larger. It is a pipel<strong>in</strong>e design <strong>with</strong> three stages of<br />
registers at the primary outputs. The enable is realized by a multiplexer. When the registers are retimed<br />
<strong>in</strong>to the comb<strong>in</strong>ational logic (apply<strong>in</strong>g only the elementary retim<strong>in</strong>g steps <strong>in</strong> Figure 10), one register has to<br />
rema<strong>in</strong> <strong>in</strong> each loop <strong>with</strong> the multiplexer. Furthermore, registers pile up at the select l<strong>in</strong>es of the multiplexer.<br />
enable 1<br />
enable 2<br />
enable 3<br />
enable 1<br />
enable 2<br />
enable 3<br />
Figure 14: Registers <strong>with</strong> enable<br />
can be implemented by a simple<br />
register and a multiplexer. This<br />
may <strong>in</strong>crease the register count<br />
when the registers are moved<br />
backward.<br />
If the registers have an enable signal <strong>in</strong>stead of a loop <strong>with</strong> a multiplexer that can be moved <strong>with</strong> the<br />
registers, then the number of registers after retim<strong>in</strong>g is smaller.<br />
If the registers <strong>with</strong> the multiplexers are at the primary <strong>in</strong>puts and have to be moved forward, the problem is<br />
different: only the last register can be retimed forward. To retime more registers forward it would be<br />
necessary to have additional registers at the select l<strong>in</strong>e of the multiplexers.<br />
By default RC uses registers which have enable logic built <strong>in</strong>to the register. Only if the variable<br />
“hdl_ff_keep_feedback” is true, RC uses simple registers which are <strong>in</strong> a loop <strong>with</strong> a multiplexer. The results<br />
depend on the structure of the design and can differ drastically.<br />
<strong>Retim<strong>in</strong>g</strong> of registers <strong>with</strong> asynchronous set and reset signals<br />
<strong>Retim<strong>in</strong>g</strong> of registers <strong>with</strong> asynchronous set or reset signals is more <strong>in</strong>volved. When these registers are<br />
retimed forward or backward through the comb<strong>in</strong>ational logic it is necessary to compute the new reset<br />
values. Mov<strong>in</strong>g these registers forward through the comb<strong>in</strong>ational logic is simple: The reset values are<br />
propagated through the logic. Figure 15 shows an example.<br />
CDNLive! Silicon Valley 2006 10
1<br />
1<br />
0<br />
1<br />
0<br />
1 1<br />
Figure 15: The registers are<br />
retimed forward. The reset values<br />
are propagated to the registers <strong>in</strong><br />
the new locations.<br />
Mov<strong>in</strong>g registers backward is more complicated. First, all the registers driven by the gate need to have the<br />
same reset values. Second, the reset values of the new registers that drive the <strong>in</strong>puts of the gate are not<br />
unique. A naive approach that moves the registers over the gates one gate by the next and randomly<br />
chooses any reset values is not possible. The wrong reset values could be chosen such that later the<br />
registers cannot be retimed backward over a gate because the reset values are different. Hence, it is<br />
necessary to solve a global problem: what are the required 0/1 reset values for the registers <strong>in</strong> the new<br />
locations such that propagat<strong>in</strong>g these values through the logic results <strong>in</strong> the given reset values at the<br />
registers <strong>in</strong> the new location This problem can be transformed <strong>in</strong>to a satisfiablity problem. It is very similar<br />
to verify<strong>in</strong>g that two netlists are equivalent, <strong>in</strong> which we ask the question: do 0/1 values exist for the<br />
registers and primary <strong>in</strong>puts such that propagat<strong>in</strong>g these values through the logic results <strong>in</strong> different values<br />
at a <strong>in</strong>put of a register or a primary output<br />
Sometimes no 0/1 reset values exist for the registers <strong>in</strong> the new locations, such that propagat<strong>in</strong>g these<br />
values forward would result <strong>in</strong> the right given values at the orig<strong>in</strong>al locations. The follow<strong>in</strong>g figure shows an<br />
example. In this case no valid reset values exist if the registers were moved further backward. RC can<br />
move registers <strong>with</strong> asynchronous set or reset backward only as far as valid reset values for the registers<br />
exist.<br />
1<br />
<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
Figure 16: It is not possible to f<strong>in</strong>d<br />
reset values for the registers <strong>in</strong> the<br />
new locations such that propagat<strong>in</strong>g<br />
these values results <strong>in</strong> the given<br />
values for the registers <strong>in</strong> the<br />
orig<strong>in</strong>al locations.<br />
If all the registers that retim<strong>in</strong>g needs to merge and move either forward or backward receive equivalent<br />
control signals and if also the reset l<strong>in</strong>e justification problem is solvable, then retim<strong>in</strong>g is more powerful than<br />
clock schedul<strong>in</strong>g. It is possible to have extremely long comb<strong>in</strong>ational paths that have a delay as large as<br />
several times the clock period. If there are sufficient registers at the beg<strong>in</strong>n<strong>in</strong>g or end of the paths, retim<strong>in</strong>g<br />
can move these registers <strong>in</strong>to the comb<strong>in</strong>ational logic and still achieve the target clock period. Earlier we<br />
had seen that clock schedul<strong>in</strong>g is limited because hold constra<strong>in</strong>ts need to be considered. If the delays of<br />
the paths as well as the variations of the path delays are too large, it is at some po<strong>in</strong>t impossible to realize<br />
the hold constra<strong>in</strong>ts together <strong>with</strong> the setup constra<strong>in</strong>ts.<br />
<strong>Retim<strong>in</strong>g</strong> may <strong>in</strong>crease the number of registers. This is the only drawback. For some designs the <strong>in</strong>crease<br />
can be significant. However, RC can also decrease the number of registers. Usually for larger designs that<br />
have only one critical part, RC can improve the clock period as well as decrease the number of registers: In<br />
the uncritical parts the locations of the registers are very flexible and hence the registers can be moved and<br />
possibly merged.<br />
CDNLive! Silicon Valley 2006 11
6 An automated verification flow<br />
<strong>Retim<strong>in</strong>g</strong> used to pose fundamental hurdles for equivalence check<strong>in</strong>g. Prov<strong>in</strong>g that two netlists are<br />
equivalent if one netlist was generated from another netlist through comb<strong>in</strong>ational synthesis as well as<br />
through retim<strong>in</strong>g is a problem of enormous complexity. To address these verification challenges RC writes<br />
out checkpo<strong>in</strong>t files (Verilog netlist) that describe the design at a particular stage. When retim<strong>in</strong>g is used,<br />
RC can write out the checkpo<strong>in</strong>t files before and after retim<strong>in</strong>g as shown <strong>in</strong> the follow<strong>in</strong>g diagram.<br />
RC<br />
LEC<br />
read RTL<br />
<strong>in</strong>itial RTL<br />
comb<strong>in</strong>ational synthesis<br />
equivalence check 1<br />
(comb<strong>in</strong>ational)<br />
write checkpo<strong>in</strong>t file<br />
retim<strong>in</strong>g<br />
write checkpo<strong>in</strong>t file<br />
comb<strong>in</strong>ational synthesis<br />
write f<strong>in</strong>al netlist<br />
pre-retim<strong>in</strong>g<br />
checkpo<strong>in</strong>t netlist<br />
post-retim<strong>in</strong>g<br />
checkpo<strong>in</strong>t netlist<br />
f<strong>in</strong>al netlist<br />
equivalence check 2<br />
(retim<strong>in</strong>g)<br />
equivalence check 3<br />
(comb<strong>in</strong>ational)<br />
Figure 17: The automated<br />
synthesis and verification<br />
flow <strong>with</strong> checkpo<strong>in</strong>t files<br />
generated by RC and read<br />
by LEC.<br />
Along <strong>with</strong> each checkpo<strong>in</strong>t file, RC also generates a correspond<strong>in</strong>g “dofile”, a command script used by<br />
Conformal <strong>Logic</strong> Equivalence Checker (LEC). Equivalence between RTL and the f<strong>in</strong>al netlist is established<br />
through a series of verification steps which compare the <strong>in</strong>itial RTL <strong>with</strong> first checkpo<strong>in</strong>t_file, checkpo<strong>in</strong>t –tocheckpo<strong>in</strong>t<br />
file and last checkpo<strong>in</strong>t file to the f<strong>in</strong>al netlist. The appropriate dofile sets up the verification of<br />
correspond<strong>in</strong>g stages as shown <strong>in</strong> the diagram. Conformal verifies the equivalence under the assumption<br />
that either only comb<strong>in</strong>ational synthesis operations were performed or only the registers were moved by<br />
retim<strong>in</strong>g operations.<br />
7 Case study: <strong>Retim<strong>in</strong>g</strong> for an UWB baseband chip from Focus Enhancements<br />
As a case study we describe how retim<strong>in</strong>g <strong>in</strong> RC was used by Focus Semiconductor, a division of Focus<br />
Enhancements, for the dual-phy UWB baseband chip MADRAS. This chip supports a proprietary Focus<br />
(Turbo) mode and a WiMedia mode which is compliant <strong>with</strong> the Multiband OFDM Alliance (MBOA). The<br />
Focus mode is more powerful than the MBOA mode: The ratio of the bandwidth versus the distance is<br />
about 2x greater. The chip is designed <strong>in</strong> a 0.13um CMOS TSMC process technology <strong>with</strong> an analog front<br />
end. It has about 4 million transistors which correspond to approximately 1.5 million <strong>in</strong>stances.<br />
The Synchronization Module has a three stage hierarchical datapath implementation. Each stage is<br />
composed of a f<strong>in</strong>ite <strong>in</strong>put response (FIR) filter which required datapath optimization support from RC.<br />
The Synchronization Peak F<strong>in</strong>der Module conta<strong>in</strong>s a divider which is used to normalize the synchronization<br />
threshold. Enough pipel<strong>in</strong>e registers were added at the <strong>in</strong>puts and outputs of the block. RC then<br />
rebalances the comb<strong>in</strong>ational paths by retim<strong>in</strong>g the registers <strong>in</strong>to the comb<strong>in</strong>ational logic.<br />
CDNLive! Silicon Valley 2006 12
The Coarse Equalization Module consists of a Media Access Controller (MAC) and scratchpad memory.<br />
<strong>Retim<strong>in</strong>g</strong> was also used for this module. Pipel<strong>in</strong>e registers were added at the primary <strong>in</strong>puts and outputs<br />
and retim<strong>in</strong>g automatically moved these registers <strong>in</strong>to the logic and rebalanced the delay of the<br />
comb<strong>in</strong>ational paths.<br />
The F<strong>in</strong>e Equalization and the Track<strong>in</strong>g Module use a similar MAC and memory that made the use of<br />
retim<strong>in</strong>g for these modules necessary.<br />
A top-down sequential synthesis flow <strong>with</strong> retim<strong>in</strong>g<br />
The design consists of a 600K <strong>in</strong>stance top level block FPT which was synthesized top-down. The “retime”<br />
attribute was set on 16 submodules correspond<strong>in</strong>g to about 45% of the total logic and 49% of the registers.<br />
The follow<strong>in</strong>g table shows all the modules for which the retime attribute was set to true <strong>in</strong> the automatic<br />
“synthesize –retime” flow.<br />
number of registers<br />
clock period (ps)<br />
subdesign gates PIs POs before after change before after change<br />
block_1 51,667 738 571 2,589 2,558 -1.20% 12,908 3,248 -74.80%<br />
block_2 13,893 266 234 1,766 2,042 15.60% 13,119 3,384 -74.20%<br />
block_3 28,017 880 895 8,283 6,990 -15.60% 6,583 3,176 -51.80%<br />
block_4 2,577 65 66 141 327 131.90% 6,724 3,142 -53.30%<br />
block_5 17,646 407 54 380 639 68.20% 5,489 3,748 -31.70%<br />
block_6 8,345 503 175 388 520 34.00% 9,044 4,407 -51.30%<br />
block_7-a 7,680 597 77 1,269 1,473 16.10% 5,484 3,249 -40.70%<br />
block_7-b 7,748 597 77 1,269 1,416 11.60% 5,484 3,369 -38.60%<br />
block_7-c 7,716 597 77 1,269 1,420 11.90% 5,451 3,422 -37.20%<br />
block_7-d 7,772 597 77 1,269 1,392 9.70% 5,446 3,457 -36.50%<br />
block_7-e 7,778 597 77 1,269 1,446 13.90% 5,465 3,366 -38.40%<br />
block_7-f 7,789 597 77 1,269 1,445 13.90% 5,459 3,380 -38.10%<br />
block_8 7,163 141 71 1,088 1,128 3.70% 8,421 5,300 -37.10%<br />
block_9 28,841 411 170 1,500 1,392 -7.20% 12,291 5,693 -53.70%<br />
block_10 18,009 440 135 2,862 3,035 6.00% 9,195 4,427 -51.90%<br />
block_11 88,925 1,683 1,700 6,694 5,897 -11.90% 5,212 4,573 -12.30%<br />
Average 19,472 569 283 2,081 2,070 -0.60% (1) 7,611 3,834 -49.60% (2)<br />
(1) percentage change of the average number of registers before and after retim<strong>in</strong>g<br />
(2) average of the percentage change of the clock period before and after retim<strong>in</strong>g<br />
The table shows the number of comb<strong>in</strong>ational gates, the number of primary <strong>in</strong>puts (PIs), and the number of<br />
primary outputs (POs). The next three columns show the number of registers before and after retim<strong>in</strong>g and<br />
the percentage change. The last three columns show the clock period <strong>in</strong> picoseconds before and after<br />
retim<strong>in</strong>g and the percentage change.<br />
The table shows that retim<strong>in</strong>g can <strong>in</strong>crease and decrease the number of registers. Overall the number of<br />
registers decreases by 0.6%. The clock period improves always. For many of the subdesigns it is expected<br />
that the clock period decreases by a large amount because pipel<strong>in</strong>e registers were added at either the<br />
primary <strong>in</strong>puts or primary outputs.<br />
CDNLive! Silicon Valley 2006 13
Conclusion<br />
With <strong>in</strong>creas<strong>in</strong>g demands for faster designs and shorter time-to-market, it is important for designers to look<br />
for efficient optimization techniques. <strong>Retim<strong>in</strong>g</strong> <strong>in</strong> <strong>Encounter</strong> RTL Compiler is one very powerful technique<br />
that can achieve substantial improvements <strong>in</strong> performance.<br />
In this paper we have described how RTL Compiler uses clock schedul<strong>in</strong>g <strong>in</strong> a sequentially driven synthesis<br />
flow and then performs retim<strong>in</strong>g m<strong>in</strong>imiz<strong>in</strong>g the clock period and the number of registers. We have<br />
discussed special cases of retim<strong>in</strong>g, registers <strong>with</strong> enable signals, registers <strong>with</strong> a multiplexer feedback loop<br />
and registers <strong>with</strong> asynchronous set and reset signals.<br />
With RTL Compiler it is easy to perform retim<strong>in</strong>g and the direct l<strong>in</strong>k to Conformal <strong>Logic</strong> Equivalence<br />
Check<strong>in</strong>g provides a complete verification solution.<br />
References<br />
[1] J. P. Fishburn, Clock Skew Optimization, IEEE Transactions on Computers, vol. 39, pp. 945-951, July<br />
1990.<br />
[2] C. Leiserson and J. Saxe, <strong>Retim<strong>in</strong>g</strong> Synchronous Circuitry, Algorithmica, vol. 6, pp. 5-35, 1991.<br />
[3] N. E. Young, R. E. Tarjan, J. B. Orl<strong>in</strong>: Faster Parametric Shortest path and M<strong>in</strong>imum Balance<br />
Algorithms, Networks, 21 (1991), 205-221.<br />
[4] S. S. Sapatnekar, R. B. Deokar: Utiliz<strong>in</strong>g the retim<strong>in</strong>g-skew equivalence <strong>in</strong> a practical algorithm for<br />
retim<strong>in</strong>g large circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and<br />
Systems, vol. 15, no. 10, October 1996.<br />
[5] S. S. Sapatnekar, Tim<strong>in</strong>g, Kluwer Academic Publishers, Boston, MA, 2004.<br />
CDNLive! Silicon Valley 2006 14
Appendix: <strong>Encounter</strong> RTL Compiler commands for retim<strong>in</strong>g<br />
Automatic synthesis <strong>with</strong> retim<strong>in</strong>g<br />
It is easy to use retim<strong>in</strong>g <strong>in</strong> RC: only the attribute “retime” needs to be set to true for the design or<br />
subdesign which should be retimed. Then dur<strong>in</strong>g synthesis the design or subdesign is processed<br />
automatically by the sequentially driven synthesis flow <strong>with</strong> retim<strong>in</strong>g as described <strong>in</strong> Section 4.<br />
set_attr retime true [subdesign]<br />
synthesize –to_mapped<br />
Manual retim<strong>in</strong>g flow<br />
This flow can be used when a specific module or modules need to be retimed. It can be used as an<br />
exploratory tool to see the impact of what retim<strong>in</strong>g can do for a subdesign <strong>in</strong> a mapped design. The first step<br />
“retime –prepare” prepares the design for retim<strong>in</strong>g and “retime –m<strong>in</strong>_delay” performs the actual retim<strong>in</strong>g.<br />
Even though “retime –m<strong>in</strong>_delay” performs a local mapp<strong>in</strong>g of immediate logic near the flops, it is<br />
recommended to follow it <strong>with</strong> an <strong>in</strong>cremental synthesis or preferably a global synthesis depend<strong>in</strong>g on the<br />
granularity of the changes.<br />
retime –prepare [subdesign | design ]<br />
retime –m<strong>in</strong>_delay [subdesign | design ]<br />
synthesize –to_mapped [-<strong>in</strong>cr ]<br />
Manual retim<strong>in</strong>g flow m<strong>in</strong>imiz<strong>in</strong>g the number of registers<br />
This flow explicitly tries to m<strong>in</strong>imize the number of registers and thus the area. This should be used only for<br />
a design which has positive slack.<br />
synthesize –to_mapped<br />
retime –m<strong>in</strong>_area [subdesign | design ]<br />
synthesize –to_mapped [-<strong>in</strong>cr ]<br />
Attributes<br />
set_attr dont_retime true [flop]<br />
set_attr retime_hard_region true \<br />
[subdesign]<br />
set_attr boundary_opto false \<br />
[subdesign]<br />
set_attr retime_async_reset true<br />
set_attr retime_optimize_reset true<br />
Do not retime the register specified.<br />
<strong>Retim<strong>in</strong>g</strong> cannot move registers <strong>in</strong>to or out of the<br />
“subdesign”.<br />
Disable boundary optimization (constant propagation<br />
and rewir<strong>in</strong>g of equivalent signals across hierarchy) and<br />
preserve the <strong>in</strong>put and output p<strong>in</strong>s of a subdesign. This<br />
enables easier ECO for the blocks and might be<br />
necessary for formal verification.<br />
Enable retim<strong>in</strong>g on flops <strong>with</strong> asynchronous set or reset<br />
signals. The runtime may <strong>in</strong>crease if registers need to<br />
be moved backward. By default, registers <strong>with</strong><br />
asynchronous set or reset signals are excluded from<br />
retim<strong>in</strong>g.<br />
If this attribute is used <strong>in</strong> comb<strong>in</strong>ation <strong>with</strong> the previous<br />
attribute, the reset logic is optimized by replac<strong>in</strong>g<br />
asynchronous flops <strong>with</strong> simple flops wherever possible.<br />
For more <strong>in</strong>formation refer to the <strong>Encounter</strong> RTL Compiler User Guide, chapter 9, “<strong>Retim<strong>in</strong>g</strong> the Design”.<br />
CDNLive! Silicon Valley 2006 15
Interface to Conformal <strong>Logic</strong> Equivalence Checker (LEC)<br />
The checkpo<strong>in</strong>t files of the automatic verification flow described <strong>in</strong> Section 6 and the correspond<strong>in</strong>g dofiles<br />
for LEC are generated by RC if the checkpo<strong>in</strong>t attributes are set as shown below.<br />
set_attribute checkpo<strong>in</strong>t_flow true<br />
set_attribute library my_library.lib<br />
read my_design.v<br />
elaborate<br />
set_attribute checkpo<strong>in</strong>t_netlist_nam<strong>in</strong>g_style \<br />
“my_chk_dir/chk_%d.v” /designs/my_top<br />
set_attribute checkpo<strong>in</strong>t_dofile_nam<strong>in</strong>g_style \<br />
“my_chk_dir/chk_%d_to_chk_%d.do” /designs/my_top<br />
read_sdc my_constra<strong>in</strong>ts.sdc<br />
set_attr retime true my_top<br />
synthesize –to_mapped<br />
write –m > f<strong>in</strong>al.v<br />
write_do_lec –revised f<strong>in</strong>al.v > f<strong>in</strong>al.do<br />
To run LEC<br />
lec -ultra –Dofile hdl_to_chk_01.do<br />
lec -ultra –Dofile chk_01_to_chk_02.do<br />
lec -ultra –Dofile f<strong>in</strong>al.do<br />
For more <strong>in</strong>formation refer to the document “Interfac<strong>in</strong>g between RTL Compiler and Conformal”.<br />
CDNLive! Silicon Valley 2006 16