21.01.2015 Views

Sequential Logic Synthesis with Retiming in Encounter ... - CiteSeerX

Sequential Logic Synthesis with Retiming in Encounter ... - CiteSeerX

Sequential Logic Synthesis with Retiming in Encounter ... - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Sequential</strong> <strong>Logic</strong> <strong>Synthesis</strong> <strong>with</strong> <strong>Retim<strong>in</strong>g</strong> <strong>in</strong> <strong>Encounter</strong> RTL Compiler (RC)<br />

Christoph Albrecht 1 , Shrirang Dhamdhere 1 , Suresh Nair 1 , Krishnan Palaniswami 2 , Sascha Richter 1<br />

1 Cadence Design Systems, 2 Focus Semiconductor<br />

Session Track: Digital IC Design<br />

Session Number: 2.3<br />

Relevant Cadence Products: <strong>Encounter</strong> RTL Compiler (RC), <strong>Encounter</strong> Conformal <strong>Logic</strong> Equivalence<br />

Checker (LEC)<br />

Abstract<br />

Typical ASIC designs are highly unbalanced <strong>with</strong> respect to the tim<strong>in</strong>g criticality of their comb<strong>in</strong>ational logic<br />

paths. This is ma<strong>in</strong>ly due to the ad-hoc manual design specification of the register transfer level (RTL),<br />

which does not use any <strong>in</strong>formation regard<strong>in</strong>g the sequential tim<strong>in</strong>g criticality. Traditional logic synthesis<br />

does not support “borrow<strong>in</strong>g” of tim<strong>in</strong>g slack across registers, and the optimization is restricted by fixed<br />

positions of the registers. This may result <strong>in</strong> a suboptimal solution, <strong>in</strong> a loss of performance, and<br />

unnecessary area and power consumption.<br />

This paper expla<strong>in</strong>s the concept of clock schedul<strong>in</strong>g and retim<strong>in</strong>g used by <strong>Encounter</strong> RTL Compiler (RC) to<br />

optimize across register boundaries. <strong>Retim<strong>in</strong>g</strong> is a structural transformation which changes the positions of<br />

the registers <strong>with</strong>out modify<strong>in</strong>g the <strong>in</strong>put-output behavior of the circuit. The reader will understand how the<br />

area, the number of registers, or the delay of the design is m<strong>in</strong>imized. Computational results show the<br />

tradeoff between these two objectives.<br />

Practical applications are discussed: Registers may have different control signals, enable signals, or reset<br />

signals. This leads to the multiclass retim<strong>in</strong>g problem and the reset l<strong>in</strong>e justification problem.<br />

<strong>Retim<strong>in</strong>g</strong> used to be a difficult challenge for equivalence check<strong>in</strong>g. However, together <strong>with</strong> <strong>Encounter</strong><br />

Conformal <strong>Logic</strong> Equivalence Checker (LEC) the verification is now simple: RC writes out checkpo<strong>in</strong>t netlist<br />

files and one script, which LEC can then process to automatically verify the golden RTL aga<strong>in</strong>st the f<strong>in</strong>al<br />

netlist.<br />

We present a case study show<strong>in</strong>g how retim<strong>in</strong>g was used by Focus Semiconductor, a division of Focus<br />

Enhancements, on a 1.5 M <strong>in</strong>stance UWB baseband chip. <strong>Retim<strong>in</strong>g</strong> substantially improved the Quality of<br />

Results (QoR) and helped to meet the design objectives.<br />

CDNLive! Silicon Valley 2006 1


1 Introduction<br />

Traditional comb<strong>in</strong>atorial logic synthesis focuses all the optimization efforts on the comb<strong>in</strong>ational paths<br />

between the registers. It does not support any tradeoff between tight paths and loose paths when these are<br />

separated by registers.<br />

To motivate the use of sequential logic synthesis <strong>with</strong> retim<strong>in</strong>g, we will discuss the slack distribution of a<br />

typical ASIC design.<br />

Figure 1: Slack distribution of a<br />

typical ASIC design.<br />

Figure 1 shows the slack distribution, more specifically the distribution of the setup slacks of a late-mode<br />

analysis after synthesis. For each slack <strong>in</strong>terval on the x-axis, the number of comb<strong>in</strong>ational paths which<br />

have a slack value <strong>with</strong><strong>in</strong> that <strong>in</strong>terval is shown. The design has a worst negative slack of -529 ps.<br />

Figure 2: Slack distribution of the<br />

same ASIC design for which the<br />

slack distribution is shown <strong>in</strong><br />

Figure 1, however this time <strong>with</strong><br />

optimized clock latencies.<br />

Figure 2 shows for the same design an optimized slack distribution. The netlist was not changed, only the<br />

clock latencies at the registers. The latencies were computed <strong>with</strong> a slack balanc<strong>in</strong>g algorithm which we will<br />

discuss later. The number of critical paths has decreased drastically. Only a small fraction of the paths<br />

have a negative latency. In this case it was not possible to improve the worst negative slack, because the<br />

worst path <strong>in</strong> this design is a path from a primary <strong>in</strong>put to a primary output.<br />

The two figures, Figure 1 and Figure 2, impressively demonstrate the optimization potential which becomes<br />

available when the registers are unlocked and not kept fixed as hard boundaries, which constra<strong>in</strong>s the<br />

synthesis optimization algorithms. With the optimized clock latencies, many paths become uncritical. The<br />

additional slack can be used to downsize the comb<strong>in</strong>ational gates or even to use a different logic structure<br />

that has smaller area and power consumption.<br />

While clock schedul<strong>in</strong>g was not able to reduce the worst negative slack for this specific design, clock<br />

schedul<strong>in</strong>g was able to improve the slack of the side paths. These are either comb<strong>in</strong>ational paths that start<br />

CDNLive! Silicon Valley 2006 2


at the primary <strong>in</strong>put of the critical path and end at a register or paths that start at a register and end at a<br />

primary output. This is helpful for the synthesis optimization algorithms <strong>in</strong> RC. RC is able to improve the<br />

slack of a path by us<strong>in</strong>g slack of the side paths.<br />

In this paper we discuss the two sequential optimization techniques, clock schedul<strong>in</strong>g and retim<strong>in</strong>g, and<br />

show how the comb<strong>in</strong>ation of both these techniques is used <strong>in</strong> RC. The paper is organized as follows:<br />

In Section 2 we discuss clock schedul<strong>in</strong>g. Clock schedul<strong>in</strong>g is also known as useful skew. It changes the<br />

latencies of the clock signal but does not change the logic. The different latencies need to be realized by a<br />

sophisticated clock network.<br />

In Section 3 we describe retim<strong>in</strong>g. <strong>Retim<strong>in</strong>g</strong> is a structural transformation. While retim<strong>in</strong>g does not change<br />

the comb<strong>in</strong>ational gates, it modifies the netlist by mov<strong>in</strong>g the registers forward and backward <strong>in</strong> the logic.<br />

RC can use clock schedul<strong>in</strong>g as an <strong>in</strong>termediate step to drive the logic synthesis and optimization process.<br />

Ultimately, it realizes the different latencies by retim<strong>in</strong>g so that a conventional zero or limited skew backend<br />

flow can place the design, construct the clock network, and route the nets. This is described <strong>in</strong> Section 4.<br />

In practice, retim<strong>in</strong>g can be constra<strong>in</strong>ed by registers that have different control signals (for example, enable<br />

signals, asynchronous set or reset signals). Section 5 discusses these constra<strong>in</strong>ts.<br />

In Section 6 we discuss the automatic verification flow <strong>with</strong> LEC.<br />

In the last section we present a case study how retim<strong>in</strong>g was used on an UWB baseband chip from Focus<br />

Semiconductors.<br />

2 Clock Schedul<strong>in</strong>g<br />

The follow<strong>in</strong>g figure shows how the worst slack of a design can be improved by chang<strong>in</strong>g the clock<br />

latencies: Buffers are added to the clock distribution network and the switch<strong>in</strong>g time of the register is<br />

delayed. In this case the worst slack is improved from -2 ns to 0 ns and the design meets the tim<strong>in</strong>g<br />

requirements. If the clock latency of the captur<strong>in</strong>g register of a comb<strong>in</strong>ational path is <strong>in</strong>creased, the slack of<br />

the comb<strong>in</strong>ational path <strong>in</strong>creases by the same amount. If, on the other hand, the clock latency for the<br />

captur<strong>in</strong>g register is decreased, the slack of the comb<strong>in</strong>ational path decreases. Increas<strong>in</strong>g the clock latency<br />

of the launch<strong>in</strong>g register decreases the slack and decreas<strong>in</strong>g the latency has the opposite effect on the slack<br />

of the path.<br />

4 ns<br />

3 ns<br />

3 ns<br />

2 ns<br />

3 ns<br />

1 ns 2 ns<br />

1 ns 1 ns<br />

clock<br />

+ 2 ns + 1 ns<br />

+ 1 ns<br />

Target clock period: 5 ns Worst slack <strong>with</strong>out clock latencies:<br />

Worst slack <strong>with</strong> clock latencies:<br />

- 2 ns<br />

0 ns<br />

Figure 3: The worst slack<br />

is improved by adjust<strong>in</strong>g<br />

the clock latencies.<br />

CDNLive! Silicon Valley 2006 3


A l<strong>in</strong>ear programm<strong>in</strong>g formulation<br />

The clock schedul<strong>in</strong>g problem can be formulated as a l<strong>in</strong>ear program. This was first done by Fishburn<br />

<strong>in</strong> 1990 [1]. Let T be the clock period. The clock period should be m<strong>in</strong>imized. Furthermore, let l i be the<br />

latency of the clock signal arriv<strong>in</strong>g at register i, and let d ij be the maximum delay of all comb<strong>in</strong>ational path<br />

from register i to register j.<br />

m<strong>in</strong> T<br />

subject to l i + d ij ≤ l j + T for all comb<strong>in</strong>ational paths (i, j).<br />

The difference <strong>in</strong> the <strong>in</strong>equality is the slack. Should the design have constra<strong>in</strong>ed primary <strong>in</strong>puts or outputs,<br />

we can represent all these <strong>in</strong>puts and outputs by one dummy register that can have, <strong>with</strong>out loss of<br />

generality, a clock latency of zero. Hence, we can assume that even <strong>in</strong> this case the l<strong>in</strong>ear program has the<br />

form above.<br />

The l<strong>in</strong>ear program is a very special l<strong>in</strong>ear program and it can be solved efficiently <strong>with</strong> comb<strong>in</strong>atorial<br />

algorithms. It can be proved that the m<strong>in</strong>imum clock period achievable by clock schedul<strong>in</strong>g is equal to the<br />

maximum average path delay of all cycles <strong>in</strong> the register-to-register tim<strong>in</strong>g graph. The register-to-register<br />

tim<strong>in</strong>g graph conta<strong>in</strong>s a node for every register and an edge whenever there is a comb<strong>in</strong>ational path<br />

between the registers <strong>with</strong> a weight equal to the maximum delay of these paths.<br />

In general, the l<strong>in</strong>ear program does not have one s<strong>in</strong>gle solution. However, any solution that m<strong>in</strong>imizes the<br />

clock period is usually not desirable. For example, we exam<strong>in</strong>ed the ASIC design for which the two different<br />

slack distributions are shown <strong>in</strong> Figure 1 and Figure 2. The worst negative slacks of the two slack<br />

distributions are equal and so are the clock periods at which the chips can operate <strong>with</strong>out failure.<br />

Clock schedul<strong>in</strong>g optimally balanc<strong>in</strong>g the slack<br />

In the follow<strong>in</strong>g we discuss how it is possible to compute a clock schedule <strong>with</strong> a specific property which we<br />

call optimally balanced slack. As a result of this property many paths are uncritical and have a lot of slack.<br />

This part is more theoretical and if the time of the reader is limited, we recommend skipp<strong>in</strong>g this part<br />

because the sections follow<strong>in</strong>g are more important for the practical use.<br />

We consider a small example circuit <strong>with</strong> four registers, a, b, c, and d, shown <strong>in</strong> Figure 5.<br />

2<br />

7<br />

a<br />

5<br />

6<br />

b<br />

4<br />

5<br />

d<br />

9<br />

c<br />

Figure 4: Example circuit <strong>with</strong><br />

comb<strong>in</strong>ational gates and four<br />

registers. The numbers specify<br />

the delay of the gates.<br />

From the circuit we can construct the register-to-register tim<strong>in</strong>g graph which is shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />

The graph has one node for each of the four registers and an edge between two nodes whenever there is a<br />

comb<strong>in</strong>ational path between the correspond<strong>in</strong>g registers. Associated <strong>with</strong> the edges is the maximum delay<br />

of the comb<strong>in</strong>ational paths.<br />

CDNLive! Silicon Valley 2006 4


a<br />

6<br />

b<br />

9 9<br />

7<br />

11 9<br />

c<br />

5<br />

9<br />

d<br />

Figure 5: Register-to-register<br />

tim<strong>in</strong>g graph for the circuit <strong>in</strong><br />

Figure 4.<br />

Without clock latencies, the m<strong>in</strong>imum feasible clock period for this circuit is equal to the maximum delay of<br />

the comb<strong>in</strong>ational paths, <strong>in</strong> this case T = 11. By <strong>in</strong>creas<strong>in</strong>g the clock latency for the register b to +1, the<br />

clock period can be decreased to T = 10. This is the m<strong>in</strong>imum clock period which can be achieved by clock<br />

schedul<strong>in</strong>g, because <strong>with</strong> these latencies the two paths (b,d) and (d,b) have a slack of zero. Figure 6 shows<br />

the register-to-register tim<strong>in</strong>g graph <strong>with</strong> the latency +1 at register b. In addition to the comb<strong>in</strong>ational delays<br />

we show also the slacks for the clock period T = 10 <strong>in</strong> brackets.<br />

9<br />

(1)<br />

a<br />

c<br />

9<br />

(1)<br />

6 (5)<br />

7 (3)<br />

5 (5)<br />

9 (1)<br />

11<br />

(0)<br />

b<br />

d<br />

+1<br />

9<br />

(0)<br />

clock period<br />

T = 10<br />

delay<br />

(slack)<br />

Figure 6: A clock schedule<br />

applied to the registers such that<br />

the worst <strong>in</strong>com<strong>in</strong>g slack equals<br />

the worst outgo<strong>in</strong>g slack for every<br />

register. The edges<br />

correspond<strong>in</strong>g to the critical paths<br />

<strong>with</strong> a slack smaller than or equal<br />

to 1 are shown <strong>in</strong> red.<br />

The clock schedule shown <strong>in</strong> Figure 6 has the property that for every register the worst <strong>in</strong>com<strong>in</strong>g slack is<br />

equal to the worst outgo<strong>in</strong>g slack. Chang<strong>in</strong>g the clock latency of one s<strong>in</strong>gle register alone does not give an<br />

improvement, s<strong>in</strong>ce the worst slack of all the paths start<strong>in</strong>g or end<strong>in</strong>g at the register can only get worse.<br />

The Figure 6 shows that there is one critical edge <strong>in</strong> red, the edge (d,c), which is not part of a critical cycle. It<br />

is possible to <strong>in</strong>crease the slack of this edge by <strong>in</strong>creas<strong>in</strong>g the clock latency of the registers a and c<br />

simultaneously. This does not affect the two critical edges (c,a) and (a,c). The result is shown <strong>in</strong> Figure 7.<br />

In this figure the worst <strong>in</strong>com<strong>in</strong>g slack equals the worst outgo<strong>in</strong>g slack for every subset of the registers.<br />

Note that before, <strong>in</strong> Figure 6, the worst outgo<strong>in</strong>g slack for the registers a and b together is equal to 5<br />

whereas the worst <strong>in</strong>com<strong>in</strong>g slack is only 1.<br />

+2<br />

9<br />

(1)<br />

+2<br />

a<br />

c<br />

9<br />

(1)<br />

6 (3)<br />

7 (5)<br />

5 (3)<br />

9 (3)<br />

b<br />

+1<br />

11 9<br />

(0) (0)<br />

d<br />

clock period<br />

T = 10<br />

Figure 7: An optimally balanced<br />

clock schedule: The worst<br />

<strong>in</strong>com<strong>in</strong>g slack equals the worst<br />

outgo<strong>in</strong>g slack for every subset of<br />

the registers.<br />

CDNLive! Silicon Valley 2006 5


The clock schedule shown <strong>in</strong> Figure 2 on page 2, <strong>in</strong> which the number of critical paths has decreased so<br />

drastically, has exactly this property. It is computationally too expensive to consider all subsets of the<br />

registers, because there are exponentially many cycles. Nevertheless, the efficient m<strong>in</strong>imum mean balance<br />

algorithm by Young, Taran and Orl<strong>in</strong> [3] can f<strong>in</strong>d such a solution by iteratively f<strong>in</strong>d<strong>in</strong>g critical cycles and<br />

contract<strong>in</strong>g them.<br />

For synthesis operations it is helpful if the side paths of a critical path have additional slack. The slack can<br />

be used to reduce the delay of the critical path. An example for such a synthesis operation is Shannon<br />

decomposition shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />

comb<strong>in</strong>ational<br />

logic<br />

x<br />

0<br />

x<br />

a<br />

critical path<br />

a<br />

1<br />

Figure 8: A critical path becomes<br />

short and fast us<strong>in</strong>g Shannon<br />

decomposition.<br />

If only one path start<strong>in</strong>g at a po<strong>in</strong>t a and end<strong>in</strong>g at a po<strong>in</strong>t x is critical and all other paths end<strong>in</strong>g at x are<br />

uncritical, then the fan<strong>in</strong> logic of x can be duplicated twice, once the value of a is permanently set to zero<br />

and once it is set to one. The two outputs of the replicated logic feed a multiplexer that chooses the right<br />

value for x depend<strong>in</strong>g on the value for a. The constant values for a are propagated to simplify the logic.<br />

After this transformation the path from a to x is very short and hence very fast.<br />

Limitations of clock schedul<strong>in</strong>g<br />

Clock schedul<strong>in</strong>g has limitations. Chang<strong>in</strong>g the clock latencies may <strong>in</strong>crease the number of hold violations.<br />

The hold constra<strong>in</strong>t ensures that data signals do not arrive too early at the data <strong>in</strong>put p<strong>in</strong> of the register at<br />

the end of the path. The signal has to arrive after the register has closed. A high number can potentially<br />

lead to an enormous number of hold buffers, which need to be added at the end of the flow. Due to process<br />

variations the f<strong>in</strong>al delay of the paths on the fabricated chip can deviate from the computed delay. This<br />

limits the use of clock schedul<strong>in</strong>g further. For example, it is not possible to have a long comb<strong>in</strong>ational path<br />

that has a comb<strong>in</strong>ational delay equal to ten times the clock period and realize the tim<strong>in</strong>g constra<strong>in</strong>ts by<br />

adjust<strong>in</strong>g the latencies of the clock signals at the launch<strong>in</strong>g and receiv<strong>in</strong>g register. On such a comb<strong>in</strong>ational<br />

path there would be 10 different data signals at the same time. These signals need to arrive at the receiv<strong>in</strong>g<br />

register at the right time. If the comb<strong>in</strong>ational delay of the path were only 10% smaller on the f<strong>in</strong>al fabricated<br />

chip due to process variations, the signal would arrive too early and this would result <strong>in</strong> a hold time violation.<br />

As the delay could also <strong>in</strong>crease, it is not possible to fix this hold violation by add<strong>in</strong>g additional delay <strong>with</strong><br />

hold buffers.<br />

Nevertheless, RC can use <strong>in</strong>ternally large positive and negative clock latencies and optimize the<br />

comb<strong>in</strong>ational logic <strong>with</strong> these latencies. In the end, the latencies are realized by retim<strong>in</strong>g and mov<strong>in</strong>g the<br />

registers through the comb<strong>in</strong>ational logic. The latencies are only bounded by the number and the movement<br />

of the registers.<br />

CDNLive! Silicon Valley 2006 6


3 <strong>Retim<strong>in</strong>g</strong><br />

<strong>Retim<strong>in</strong>g</strong> is a powerful sequential optimization technique which overcomes the limitations of clock<br />

schedul<strong>in</strong>g. <strong>Retim<strong>in</strong>g</strong> moves the registers across the comb<strong>in</strong>ational logic to improve the performance<br />

<strong>with</strong>out chang<strong>in</strong>g the <strong>in</strong>put/output behavior of the circuit.<br />

The follow<strong>in</strong>g figure shows the slack of a circuit can be improved by retim<strong>in</strong>g. It is the same circuit for which<br />

we applied clock schedul<strong>in</strong>g <strong>in</strong> Figure 4. The registers are retimed backward aga<strong>in</strong>st the direction of the<br />

signal propagation.<br />

4 ns 3 ns 3 ns<br />

2 ns 3 ns 1 ns 2 ns<br />

1 ns<br />

1 ns<br />

Target clock period: 5 ns Worst slack before retim<strong>in</strong>g: - 2 ns<br />

4 ns 3 ns 3 ns<br />

2 ns<br />

3 ns<br />

1 ns<br />

2 ns<br />

Worst slack after retim<strong>in</strong>g:<br />

1 ns 1 ns Figure 9: The worst slack is<br />

improved by retim<strong>in</strong>g the registers<br />

0 ns<br />

backward aga<strong>in</strong>st the direction of<br />

the signal propagation.<br />

This example shows that retim<strong>in</strong>g changes the number of registers. In this case, the number of registers<br />

<strong>in</strong>creases. However, the number of registers can also decrease. RC m<strong>in</strong>imizes the clock period as a first<br />

objective. Among all possible retim<strong>in</strong>g solutions that achieve the m<strong>in</strong>imum clock period, RC f<strong>in</strong>ds the<br />

solution <strong>with</strong> the m<strong>in</strong>imum number of registers. In addition, RC has the option to m<strong>in</strong>imize the number of<br />

registers <strong>with</strong>out <strong>in</strong>creas<strong>in</strong>g the current clock period.<br />

Any retim<strong>in</strong>g can be achieved by a sequence of two elementary retim<strong>in</strong>g steps: Forward retim<strong>in</strong>g removes<br />

the registers at the <strong>in</strong>put of a gate and creates new registers at the outputs. Backward retim<strong>in</strong>g does the<br />

opposite: It removes the registers at the output and creates a new register at each <strong>in</strong>put. The two retim<strong>in</strong>g<br />

steps are shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />

forward retim<strong>in</strong>g<br />

backward retim<strong>in</strong>g<br />

Figure 10: Registers retimed<br />

forward and backward over an<br />

AND gate.<br />

For forward retim<strong>in</strong>g it is necessary that each <strong>in</strong>put of the gate is driven by a register. Similarly, for backward<br />

retim<strong>in</strong>g the gate must not drive any comb<strong>in</strong>ational gate but only registers.<br />

In order to ensure equivalent <strong>in</strong>put / output behavior of the circuit, retim<strong>in</strong>g cannot change the number of<br />

registers on any loop and on any path from a primary <strong>in</strong>put to a primary output path. This is guaranteed by<br />

the two operations. Of course, it may still be possible to retime registers forward or backward over a gate if<br />

CDNLive! Silicon Valley 2006 7


this condition does not hold for the orig<strong>in</strong>al circuit, but the condition has to be achieved by elementary<br />

retim<strong>in</strong>g steps applied for the other gates before.<br />

Constants and dangl<strong>in</strong>g logic (logic that does not drive anyth<strong>in</strong>g) are an exception. Constant propagation as<br />

part of the RC synthesis operations simplifies any logic driven by a constant, unless the gates are preserved<br />

by an attribute. Similarly, dangl<strong>in</strong>g logic is removed. However, should this logic be preserved, retim<strong>in</strong>g is<br />

able to create or remove registers at constants and dangl<strong>in</strong>g logic.<br />

The follow<strong>in</strong>g figure shows an example <strong>in</strong> which retim<strong>in</strong>g cannot improve the critical path because no<br />

elementary retim<strong>in</strong>g step is possible:<br />

A<br />

B<br />

2 3 3 4<br />

C<br />

Figure 11: An example <strong>in</strong> which<br />

retim<strong>in</strong>g cannot improve the clock<br />

period because the register<br />

cannot be moved forward.<br />

Depend<strong>in</strong>g on the clk-to-q delay of the register, the critical path goes from the register to the primary output<br />

C. If the primary <strong>in</strong>puts are even unconstra<strong>in</strong>ed, then the critical path starts at the register <strong>in</strong> any case. Just<br />

check<strong>in</strong>g the slack at the data <strong>in</strong>put p<strong>in</strong> and the output p<strong>in</strong> of the register, the user may wonder why the<br />

register was not moved forward. This is not possible, because there is no register follow<strong>in</strong>g directly the<br />

primary <strong>in</strong>put B.<br />

Efficient algorithms for retim<strong>in</strong>g have been developed and published. We refer the <strong>in</strong>terested reader to the<br />

fundamental paper by Leiserson and Saxe published <strong>in</strong> 1991 [2] <strong>in</strong> which the problem of f<strong>in</strong>d<strong>in</strong>g a retim<strong>in</strong>g<br />

realiz<strong>in</strong>g a given clock period and m<strong>in</strong>imiz<strong>in</strong>g the number of registers is formulated and solved as a m<strong>in</strong>imum<br />

cost flow problem. Polynomial time algorithms have been developed for this problem. A comprehensive<br />

book about tim<strong>in</strong>g <strong>in</strong> general and clock schedul<strong>in</strong>g and retim<strong>in</strong>g is the recent book by S. Sapatnekar [5].<br />

Relationship between clock schedul<strong>in</strong>g and retim<strong>in</strong>g<br />

The two sequential optimization techniques, clock schedul<strong>in</strong>g and retim<strong>in</strong>g are related: It can be proved that<br />

the clock period achievable by clock schedul<strong>in</strong>g (ignor<strong>in</strong>g any hold constra<strong>in</strong>ts) is a lower bound on the clock<br />

period that can be achieved by retim<strong>in</strong>g [3]. It can also be proved that retim<strong>in</strong>g can almost achieve this clock<br />

period: The m<strong>in</strong>imum clock period achievable by retim<strong>in</strong>g is at most the m<strong>in</strong>imum clock period achievable<br />

by clock schedul<strong>in</strong>g plus the maximum delay of all gates.<br />

If a clock schedule is given a retim<strong>in</strong>g can be computed as follows: F<strong>in</strong>d a register <strong>with</strong> the maximum<br />

positive clock latency. Decrease the clock latency until the <strong>in</strong>com<strong>in</strong>g slack is zero. If the slack is already<br />

zero, perform a backward retim<strong>in</strong>g over the gate driv<strong>in</strong>g the register. The new registers added <strong>in</strong> front of the<br />

gate get a clock latency equal to the latency of the orig<strong>in</strong>al registers m<strong>in</strong>us the delay of the gate. This<br />

procedure is repeated until the clock latency of each register is smaller than half the delay of the gate driv<strong>in</strong>g<br />

the register. Then a similar procedure is applied for registers <strong>with</strong> the m<strong>in</strong>imum negative clock latency. The<br />

registers are moved forward and the clock latency is <strong>in</strong>creased by the delay of the gate until the clock<br />

latency of each register is larger than the negative value of half the delay of the gate driven by the register.<br />

If the clock latency of every register is then set to zero, then the retimed circuit has a clock period of which is<br />

at most the clock period of the orig<strong>in</strong>al circuit <strong>with</strong> clock schedul<strong>in</strong>g plus the maximum delay of all gates.<br />

CDNLive! Silicon Valley 2006 8


4 The global sequentially driven synthesis flow <strong>in</strong> RC<br />

RC comb<strong>in</strong>es the two sequential optimization techniques, clock schedul<strong>in</strong>g and retim<strong>in</strong>g, <strong>in</strong> a global<br />

sequential synthesis flow shown <strong>in</strong> the follow<strong>in</strong>g figure.<br />

sequentially driven synthesis<br />

and optimization<br />

comb<strong>in</strong>ational<br />

synthesis<br />

clock<br />

schedul<strong>in</strong>g<br />

retim<strong>in</strong>g<br />

comb<strong>in</strong>ational synthesis<br />

Figure 12: The global<br />

sequentially driven synthesis flow<br />

<strong>in</strong> RC<br />

The logic synthesis and optimization algorithms are tightly <strong>in</strong>terl<strong>in</strong>ked <strong>with</strong> clock schedul<strong>in</strong>g. Clock<br />

schedul<strong>in</strong>g computes clock latencies which improve the clock period and the slack of the comb<strong>in</strong>atorial<br />

paths. The synthesis algorithms can use slack of side paths to further improve critical paths. In the next<br />

step, retim<strong>in</strong>g moves the registers through the comb<strong>in</strong>ational logic. It m<strong>in</strong>imizes the clock period and as<br />

second objective m<strong>in</strong>imizes the number of registers. Ultimately, retim<strong>in</strong>g is followed once more by<br />

comb<strong>in</strong>ational synthesis. This is necessary because the loads of the gates have changed as the registers<br />

were moved.<br />

RC performs these steps automatically. The user only has to set the attribute “retime” to true for either the<br />

top design or the subdesigns for which retim<strong>in</strong>g should be performed and then call the “synthesize”<br />

command.<br />

5 Special cases for retim<strong>in</strong>g<br />

In this section we describe special cases for retim<strong>in</strong>g due to control signals at the registers. The control<br />

signals at the registers may constra<strong>in</strong> the movement of the registers. First we discuss the retim<strong>in</strong>g of<br />

registers <strong>with</strong> enable signals. Then we describe the case when registers <strong>with</strong> an enable signal are<br />

implemented by a simple register <strong>with</strong> a multiplexer feedback loop. F<strong>in</strong>ally, we discuss asynchronous set<br />

and reset signals.<br />

<strong>Retim<strong>in</strong>g</strong> of registers <strong>with</strong> different enable signals<br />

In practice, the retim<strong>in</strong>g of the registers can be constra<strong>in</strong>ed: The registers <strong>in</strong> the circuit may have different<br />

control signals, for example enable signals. <strong>Retim<strong>in</strong>g</strong> cannot comb<strong>in</strong>e registers which have different control<br />

signals. Figure 13 shows an example. To improve the tim<strong>in</strong>g, the two registers should be comb<strong>in</strong>ed and<br />

retimed backward. However, this is not possible because the two registers receive different enable signals.<br />

RC can comb<strong>in</strong>e and retime registers forward or backward only if they receive the same enable signals.<br />

CDNLive! Silicon Valley 2006 9


en<br />

1<br />

clock<br />

enable 1<br />

enable 2<br />

4 7 5<br />

en<br />

2<br />

Figure 13: The two registers<br />

cannot be moved backward<br />

because they receive different<br />

enable signals.<br />

Multiplexer feedback loop<br />

Registers <strong>with</strong> an enable signal can also be implemented by a simple register and a multiplexer. This may<br />

be an advantage for retim<strong>in</strong>g because the registers can then be merged even though the enable signals are<br />

different. It may, however, also constra<strong>in</strong> the register movement and <strong>in</strong>crease the number of registers.<br />

Figure 14 shows that the number of registers can be larger. It is a pipel<strong>in</strong>e design <strong>with</strong> three stages of<br />

registers at the primary outputs. The enable is realized by a multiplexer. When the registers are retimed<br />

<strong>in</strong>to the comb<strong>in</strong>ational logic (apply<strong>in</strong>g only the elementary retim<strong>in</strong>g steps <strong>in</strong> Figure 10), one register has to<br />

rema<strong>in</strong> <strong>in</strong> each loop <strong>with</strong> the multiplexer. Furthermore, registers pile up at the select l<strong>in</strong>es of the multiplexer.<br />

enable 1<br />

enable 2<br />

enable 3<br />

enable 1<br />

enable 2<br />

enable 3<br />

Figure 14: Registers <strong>with</strong> enable<br />

can be implemented by a simple<br />

register and a multiplexer. This<br />

may <strong>in</strong>crease the register count<br />

when the registers are moved<br />

backward.<br />

If the registers have an enable signal <strong>in</strong>stead of a loop <strong>with</strong> a multiplexer that can be moved <strong>with</strong> the<br />

registers, then the number of registers after retim<strong>in</strong>g is smaller.<br />

If the registers <strong>with</strong> the multiplexers are at the primary <strong>in</strong>puts and have to be moved forward, the problem is<br />

different: only the last register can be retimed forward. To retime more registers forward it would be<br />

necessary to have additional registers at the select l<strong>in</strong>e of the multiplexers.<br />

By default RC uses registers which have enable logic built <strong>in</strong>to the register. Only if the variable<br />

“hdl_ff_keep_feedback” is true, RC uses simple registers which are <strong>in</strong> a loop <strong>with</strong> a multiplexer. The results<br />

depend on the structure of the design and can differ drastically.<br />

<strong>Retim<strong>in</strong>g</strong> of registers <strong>with</strong> asynchronous set and reset signals<br />

<strong>Retim<strong>in</strong>g</strong> of registers <strong>with</strong> asynchronous set or reset signals is more <strong>in</strong>volved. When these registers are<br />

retimed forward or backward through the comb<strong>in</strong>ational logic it is necessary to compute the new reset<br />

values. Mov<strong>in</strong>g these registers forward through the comb<strong>in</strong>ational logic is simple: The reset values are<br />

propagated through the logic. Figure 15 shows an example.<br />

CDNLive! Silicon Valley 2006 10


1<br />

1<br />

0<br />

1<br />

0<br />

1 1<br />

Figure 15: The registers are<br />

retimed forward. The reset values<br />

are propagated to the registers <strong>in</strong><br />

the new locations.<br />

Mov<strong>in</strong>g registers backward is more complicated. First, all the registers driven by the gate need to have the<br />

same reset values. Second, the reset values of the new registers that drive the <strong>in</strong>puts of the gate are not<br />

unique. A naive approach that moves the registers over the gates one gate by the next and randomly<br />

chooses any reset values is not possible. The wrong reset values could be chosen such that later the<br />

registers cannot be retimed backward over a gate because the reset values are different. Hence, it is<br />

necessary to solve a global problem: what are the required 0/1 reset values for the registers <strong>in</strong> the new<br />

locations such that propagat<strong>in</strong>g these values through the logic results <strong>in</strong> the given reset values at the<br />

registers <strong>in</strong> the new location This problem can be transformed <strong>in</strong>to a satisfiablity problem. It is very similar<br />

to verify<strong>in</strong>g that two netlists are equivalent, <strong>in</strong> which we ask the question: do 0/1 values exist for the<br />

registers and primary <strong>in</strong>puts such that propagat<strong>in</strong>g these values through the logic results <strong>in</strong> different values<br />

at a <strong>in</strong>put of a register or a primary output<br />

Sometimes no 0/1 reset values exist for the registers <strong>in</strong> the new locations, such that propagat<strong>in</strong>g these<br />

values forward would result <strong>in</strong> the right given values at the orig<strong>in</strong>al locations. The follow<strong>in</strong>g figure shows an<br />

example. In this case no valid reset values exist if the registers were moved further backward. RC can<br />

move registers <strong>with</strong> asynchronous set or reset backward only as far as valid reset values for the registers<br />

exist.<br />

1<br />

<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

Figure 16: It is not possible to f<strong>in</strong>d<br />

reset values for the registers <strong>in</strong> the<br />

new locations such that propagat<strong>in</strong>g<br />

these values results <strong>in</strong> the given<br />

values for the registers <strong>in</strong> the<br />

orig<strong>in</strong>al locations.<br />

If all the registers that retim<strong>in</strong>g needs to merge and move either forward or backward receive equivalent<br />

control signals and if also the reset l<strong>in</strong>e justification problem is solvable, then retim<strong>in</strong>g is more powerful than<br />

clock schedul<strong>in</strong>g. It is possible to have extremely long comb<strong>in</strong>ational paths that have a delay as large as<br />

several times the clock period. If there are sufficient registers at the beg<strong>in</strong>n<strong>in</strong>g or end of the paths, retim<strong>in</strong>g<br />

can move these registers <strong>in</strong>to the comb<strong>in</strong>ational logic and still achieve the target clock period. Earlier we<br />

had seen that clock schedul<strong>in</strong>g is limited because hold constra<strong>in</strong>ts need to be considered. If the delays of<br />

the paths as well as the variations of the path delays are too large, it is at some po<strong>in</strong>t impossible to realize<br />

the hold constra<strong>in</strong>ts together <strong>with</strong> the setup constra<strong>in</strong>ts.<br />

<strong>Retim<strong>in</strong>g</strong> may <strong>in</strong>crease the number of registers. This is the only drawback. For some designs the <strong>in</strong>crease<br />

can be significant. However, RC can also decrease the number of registers. Usually for larger designs that<br />

have only one critical part, RC can improve the clock period as well as decrease the number of registers: In<br />

the uncritical parts the locations of the registers are very flexible and hence the registers can be moved and<br />

possibly merged.<br />

CDNLive! Silicon Valley 2006 11


6 An automated verification flow<br />

<strong>Retim<strong>in</strong>g</strong> used to pose fundamental hurdles for equivalence check<strong>in</strong>g. Prov<strong>in</strong>g that two netlists are<br />

equivalent if one netlist was generated from another netlist through comb<strong>in</strong>ational synthesis as well as<br />

through retim<strong>in</strong>g is a problem of enormous complexity. To address these verification challenges RC writes<br />

out checkpo<strong>in</strong>t files (Verilog netlist) that describe the design at a particular stage. When retim<strong>in</strong>g is used,<br />

RC can write out the checkpo<strong>in</strong>t files before and after retim<strong>in</strong>g as shown <strong>in</strong> the follow<strong>in</strong>g diagram.<br />

RC<br />

LEC<br />

read RTL<br />

<strong>in</strong>itial RTL<br />

comb<strong>in</strong>ational synthesis<br />

equivalence check 1<br />

(comb<strong>in</strong>ational)<br />

write checkpo<strong>in</strong>t file<br />

retim<strong>in</strong>g<br />

write checkpo<strong>in</strong>t file<br />

comb<strong>in</strong>ational synthesis<br />

write f<strong>in</strong>al netlist<br />

pre-retim<strong>in</strong>g<br />

checkpo<strong>in</strong>t netlist<br />

post-retim<strong>in</strong>g<br />

checkpo<strong>in</strong>t netlist<br />

f<strong>in</strong>al netlist<br />

equivalence check 2<br />

(retim<strong>in</strong>g)<br />

equivalence check 3<br />

(comb<strong>in</strong>ational)<br />

Figure 17: The automated<br />

synthesis and verification<br />

flow <strong>with</strong> checkpo<strong>in</strong>t files<br />

generated by RC and read<br />

by LEC.<br />

Along <strong>with</strong> each checkpo<strong>in</strong>t file, RC also generates a correspond<strong>in</strong>g “dofile”, a command script used by<br />

Conformal <strong>Logic</strong> Equivalence Checker (LEC). Equivalence between RTL and the f<strong>in</strong>al netlist is established<br />

through a series of verification steps which compare the <strong>in</strong>itial RTL <strong>with</strong> first checkpo<strong>in</strong>t_file, checkpo<strong>in</strong>t –tocheckpo<strong>in</strong>t<br />

file and last checkpo<strong>in</strong>t file to the f<strong>in</strong>al netlist. The appropriate dofile sets up the verification of<br />

correspond<strong>in</strong>g stages as shown <strong>in</strong> the diagram. Conformal verifies the equivalence under the assumption<br />

that either only comb<strong>in</strong>ational synthesis operations were performed or only the registers were moved by<br />

retim<strong>in</strong>g operations.<br />

7 Case study: <strong>Retim<strong>in</strong>g</strong> for an UWB baseband chip from Focus Enhancements<br />

As a case study we describe how retim<strong>in</strong>g <strong>in</strong> RC was used by Focus Semiconductor, a division of Focus<br />

Enhancements, for the dual-phy UWB baseband chip MADRAS. This chip supports a proprietary Focus<br />

(Turbo) mode and a WiMedia mode which is compliant <strong>with</strong> the Multiband OFDM Alliance (MBOA). The<br />

Focus mode is more powerful than the MBOA mode: The ratio of the bandwidth versus the distance is<br />

about 2x greater. The chip is designed <strong>in</strong> a 0.13um CMOS TSMC process technology <strong>with</strong> an analog front<br />

end. It has about 4 million transistors which correspond to approximately 1.5 million <strong>in</strong>stances.<br />

The Synchronization Module has a three stage hierarchical datapath implementation. Each stage is<br />

composed of a f<strong>in</strong>ite <strong>in</strong>put response (FIR) filter which required datapath optimization support from RC.<br />

The Synchronization Peak F<strong>in</strong>der Module conta<strong>in</strong>s a divider which is used to normalize the synchronization<br />

threshold. Enough pipel<strong>in</strong>e registers were added at the <strong>in</strong>puts and outputs of the block. RC then<br />

rebalances the comb<strong>in</strong>ational paths by retim<strong>in</strong>g the registers <strong>in</strong>to the comb<strong>in</strong>ational logic.<br />

CDNLive! Silicon Valley 2006 12


The Coarse Equalization Module consists of a Media Access Controller (MAC) and scratchpad memory.<br />

<strong>Retim<strong>in</strong>g</strong> was also used for this module. Pipel<strong>in</strong>e registers were added at the primary <strong>in</strong>puts and outputs<br />

and retim<strong>in</strong>g automatically moved these registers <strong>in</strong>to the logic and rebalanced the delay of the<br />

comb<strong>in</strong>ational paths.<br />

The F<strong>in</strong>e Equalization and the Track<strong>in</strong>g Module use a similar MAC and memory that made the use of<br />

retim<strong>in</strong>g for these modules necessary.<br />

A top-down sequential synthesis flow <strong>with</strong> retim<strong>in</strong>g<br />

The design consists of a 600K <strong>in</strong>stance top level block FPT which was synthesized top-down. The “retime”<br />

attribute was set on 16 submodules correspond<strong>in</strong>g to about 45% of the total logic and 49% of the registers.<br />

The follow<strong>in</strong>g table shows all the modules for which the retime attribute was set to true <strong>in</strong> the automatic<br />

“synthesize –retime” flow.<br />

number of registers<br />

clock period (ps)<br />

subdesign gates PIs POs before after change before after change<br />

block_1 51,667 738 571 2,589 2,558 -1.20% 12,908 3,248 -74.80%<br />

block_2 13,893 266 234 1,766 2,042 15.60% 13,119 3,384 -74.20%<br />

block_3 28,017 880 895 8,283 6,990 -15.60% 6,583 3,176 -51.80%<br />

block_4 2,577 65 66 141 327 131.90% 6,724 3,142 -53.30%<br />

block_5 17,646 407 54 380 639 68.20% 5,489 3,748 -31.70%<br />

block_6 8,345 503 175 388 520 34.00% 9,044 4,407 -51.30%<br />

block_7-a 7,680 597 77 1,269 1,473 16.10% 5,484 3,249 -40.70%<br />

block_7-b 7,748 597 77 1,269 1,416 11.60% 5,484 3,369 -38.60%<br />

block_7-c 7,716 597 77 1,269 1,420 11.90% 5,451 3,422 -37.20%<br />

block_7-d 7,772 597 77 1,269 1,392 9.70% 5,446 3,457 -36.50%<br />

block_7-e 7,778 597 77 1,269 1,446 13.90% 5,465 3,366 -38.40%<br />

block_7-f 7,789 597 77 1,269 1,445 13.90% 5,459 3,380 -38.10%<br />

block_8 7,163 141 71 1,088 1,128 3.70% 8,421 5,300 -37.10%<br />

block_9 28,841 411 170 1,500 1,392 -7.20% 12,291 5,693 -53.70%<br />

block_10 18,009 440 135 2,862 3,035 6.00% 9,195 4,427 -51.90%<br />

block_11 88,925 1,683 1,700 6,694 5,897 -11.90% 5,212 4,573 -12.30%<br />

Average 19,472 569 283 2,081 2,070 -0.60% (1) 7,611 3,834 -49.60% (2)<br />

(1) percentage change of the average number of registers before and after retim<strong>in</strong>g<br />

(2) average of the percentage change of the clock period before and after retim<strong>in</strong>g<br />

The table shows the number of comb<strong>in</strong>ational gates, the number of primary <strong>in</strong>puts (PIs), and the number of<br />

primary outputs (POs). The next three columns show the number of registers before and after retim<strong>in</strong>g and<br />

the percentage change. The last three columns show the clock period <strong>in</strong> picoseconds before and after<br />

retim<strong>in</strong>g and the percentage change.<br />

The table shows that retim<strong>in</strong>g can <strong>in</strong>crease and decrease the number of registers. Overall the number of<br />

registers decreases by 0.6%. The clock period improves always. For many of the subdesigns it is expected<br />

that the clock period decreases by a large amount because pipel<strong>in</strong>e registers were added at either the<br />

primary <strong>in</strong>puts or primary outputs.<br />

CDNLive! Silicon Valley 2006 13


Conclusion<br />

With <strong>in</strong>creas<strong>in</strong>g demands for faster designs and shorter time-to-market, it is important for designers to look<br />

for efficient optimization techniques. <strong>Retim<strong>in</strong>g</strong> <strong>in</strong> <strong>Encounter</strong> RTL Compiler is one very powerful technique<br />

that can achieve substantial improvements <strong>in</strong> performance.<br />

In this paper we have described how RTL Compiler uses clock schedul<strong>in</strong>g <strong>in</strong> a sequentially driven synthesis<br />

flow and then performs retim<strong>in</strong>g m<strong>in</strong>imiz<strong>in</strong>g the clock period and the number of registers. We have<br />

discussed special cases of retim<strong>in</strong>g, registers <strong>with</strong> enable signals, registers <strong>with</strong> a multiplexer feedback loop<br />

and registers <strong>with</strong> asynchronous set and reset signals.<br />

With RTL Compiler it is easy to perform retim<strong>in</strong>g and the direct l<strong>in</strong>k to Conformal <strong>Logic</strong> Equivalence<br />

Check<strong>in</strong>g provides a complete verification solution.<br />

References<br />

[1] J. P. Fishburn, Clock Skew Optimization, IEEE Transactions on Computers, vol. 39, pp. 945-951, July<br />

1990.<br />

[2] C. Leiserson and J. Saxe, <strong>Retim<strong>in</strong>g</strong> Synchronous Circuitry, Algorithmica, vol. 6, pp. 5-35, 1991.<br />

[3] N. E. Young, R. E. Tarjan, J. B. Orl<strong>in</strong>: Faster Parametric Shortest path and M<strong>in</strong>imum Balance<br />

Algorithms, Networks, 21 (1991), 205-221.<br />

[4] S. S. Sapatnekar, R. B. Deokar: Utiliz<strong>in</strong>g the retim<strong>in</strong>g-skew equivalence <strong>in</strong> a practical algorithm for<br />

retim<strong>in</strong>g large circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and<br />

Systems, vol. 15, no. 10, October 1996.<br />

[5] S. S. Sapatnekar, Tim<strong>in</strong>g, Kluwer Academic Publishers, Boston, MA, 2004.<br />

CDNLive! Silicon Valley 2006 14


Appendix: <strong>Encounter</strong> RTL Compiler commands for retim<strong>in</strong>g<br />

Automatic synthesis <strong>with</strong> retim<strong>in</strong>g<br />

It is easy to use retim<strong>in</strong>g <strong>in</strong> RC: only the attribute “retime” needs to be set to true for the design or<br />

subdesign which should be retimed. Then dur<strong>in</strong>g synthesis the design or subdesign is processed<br />

automatically by the sequentially driven synthesis flow <strong>with</strong> retim<strong>in</strong>g as described <strong>in</strong> Section 4.<br />

set_attr retime true [subdesign]<br />

synthesize –to_mapped<br />

Manual retim<strong>in</strong>g flow<br />

This flow can be used when a specific module or modules need to be retimed. It can be used as an<br />

exploratory tool to see the impact of what retim<strong>in</strong>g can do for a subdesign <strong>in</strong> a mapped design. The first step<br />

“retime –prepare” prepares the design for retim<strong>in</strong>g and “retime –m<strong>in</strong>_delay” performs the actual retim<strong>in</strong>g.<br />

Even though “retime –m<strong>in</strong>_delay” performs a local mapp<strong>in</strong>g of immediate logic near the flops, it is<br />

recommended to follow it <strong>with</strong> an <strong>in</strong>cremental synthesis or preferably a global synthesis depend<strong>in</strong>g on the<br />

granularity of the changes.<br />

retime –prepare [subdesign | design ]<br />

retime –m<strong>in</strong>_delay [subdesign | design ]<br />

synthesize –to_mapped [-<strong>in</strong>cr ]<br />

Manual retim<strong>in</strong>g flow m<strong>in</strong>imiz<strong>in</strong>g the number of registers<br />

This flow explicitly tries to m<strong>in</strong>imize the number of registers and thus the area. This should be used only for<br />

a design which has positive slack.<br />

synthesize –to_mapped<br />

retime –m<strong>in</strong>_area [subdesign | design ]<br />

synthesize –to_mapped [-<strong>in</strong>cr ]<br />

Attributes<br />

set_attr dont_retime true [flop]<br />

set_attr retime_hard_region true \<br />

[subdesign]<br />

set_attr boundary_opto false \<br />

[subdesign]<br />

set_attr retime_async_reset true<br />

set_attr retime_optimize_reset true<br />

Do not retime the register specified.<br />

<strong>Retim<strong>in</strong>g</strong> cannot move registers <strong>in</strong>to or out of the<br />

“subdesign”.<br />

Disable boundary optimization (constant propagation<br />

and rewir<strong>in</strong>g of equivalent signals across hierarchy) and<br />

preserve the <strong>in</strong>put and output p<strong>in</strong>s of a subdesign. This<br />

enables easier ECO for the blocks and might be<br />

necessary for formal verification.<br />

Enable retim<strong>in</strong>g on flops <strong>with</strong> asynchronous set or reset<br />

signals. The runtime may <strong>in</strong>crease if registers need to<br />

be moved backward. By default, registers <strong>with</strong><br />

asynchronous set or reset signals are excluded from<br />

retim<strong>in</strong>g.<br />

If this attribute is used <strong>in</strong> comb<strong>in</strong>ation <strong>with</strong> the previous<br />

attribute, the reset logic is optimized by replac<strong>in</strong>g<br />

asynchronous flops <strong>with</strong> simple flops wherever possible.<br />

For more <strong>in</strong>formation refer to the <strong>Encounter</strong> RTL Compiler User Guide, chapter 9, “<strong>Retim<strong>in</strong>g</strong> the Design”.<br />

CDNLive! Silicon Valley 2006 15


Interface to Conformal <strong>Logic</strong> Equivalence Checker (LEC)<br />

The checkpo<strong>in</strong>t files of the automatic verification flow described <strong>in</strong> Section 6 and the correspond<strong>in</strong>g dofiles<br />

for LEC are generated by RC if the checkpo<strong>in</strong>t attributes are set as shown below.<br />

set_attribute checkpo<strong>in</strong>t_flow true<br />

set_attribute library my_library.lib<br />

read my_design.v<br />

elaborate<br />

set_attribute checkpo<strong>in</strong>t_netlist_nam<strong>in</strong>g_style \<br />

“my_chk_dir/chk_%d.v” /designs/my_top<br />

set_attribute checkpo<strong>in</strong>t_dofile_nam<strong>in</strong>g_style \<br />

“my_chk_dir/chk_%d_to_chk_%d.do” /designs/my_top<br />

read_sdc my_constra<strong>in</strong>ts.sdc<br />

set_attr retime true my_top<br />

synthesize –to_mapped<br />

write –m > f<strong>in</strong>al.v<br />

write_do_lec –revised f<strong>in</strong>al.v > f<strong>in</strong>al.do<br />

To run LEC<br />

lec -ultra –Dofile hdl_to_chk_01.do<br />

lec -ultra –Dofile chk_01_to_chk_02.do<br />

lec -ultra –Dofile f<strong>in</strong>al.do<br />

For more <strong>in</strong>formation refer to the document “Interfac<strong>in</strong>g between RTL Compiler and Conformal”.<br />

CDNLive! Silicon Valley 2006 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!