DDR4 Design Considerations - EEWeb

EEWeb PULSE TECH ARTICLE 

signals. Therefore, the STOP signal will catch the START 

signal in the 16th delay comparator (2nd row, 3rd column). 

As can be seen, the propagation delay for the first 

configuration remains constant for the first and second 

columns, but it starts to decrease linearly after that due to 

a change in the capacitive load (data dependent delay). 

This is, of course, an undesired effect which must be 

avoided to prevent a non-linear behavior. On the other 

hand, by using the second configuration, each delay 

element within the X delay chain achieves a more uniform 

propagation delay and it’s almost data independent. 

Despite the fact that the use of this S-R latch configuration 

increases the nominal propagation delay, it is not a 

big issue and it can be readjusted by sizing the delay 

elements. Therefore we will be using the 2nd configuration 

in our SR_LATCH library component. 

Propagation Delay (ps) 

85 

83 

81 

79 

77 

75 

73 

71 

69 

67 

65 

Config 1 

Nom. Config 1 Config 2 Nom. Config 2 

X1-X2 X2-X3 X3-X4 X4-X5 X5-X6 X6-X7 X7-X8 X8-X9 X9-X10 

Delay Stage (X delay chain) 

Figure 6: Data dependent analysis for 

both configurations 

This concludes the basic description of our TDC delay 

comparator. However, as will be discussed in chapter 6, 

and due to the amount of dummy S-R latches introduced 

by the TDC matrix, we proceeded to substitute them for 

more optimal structures which roughly present the same 

capacitive loading, helping us to save area and power. 

Hence a new library component, called DUMMY, was 

created. 

5.4. Readout encoder 

Finally, we focused on the 32 to 5 encoder needed for the 

readout circuit. Since the existing/available testbench 

provided us with a quite good encoder, we decided to 

spend more effort into the matrix optimization. 

6. TUNING THE DESIGN 

Once the main design features have been described in 

the previous chapter, now we present all the different 

strategies we followed to achieve our current design. 

6.1. Area minimization 

Since our design was quite large in terms of the number 

of transistors, area minimization is a critical goal which 

allowed us to improve our design performance. In 

particular, we focused on the TDC matrix structure and 

devised several ways to improve its area consumption. 

6.1.1. Y chain load reduction 

We realized that for each row within the matrix, there are 

at most five out of ten useful devices for delay comparison. 

Hence, the Y delay chain had an excessive loading, which 

affected both the acquisition time and the circuit area for 

the same resolution. 

However, as we presented in the previous chapter, our 

S-R latch configuration made the circuit almost data 

independent. This feature allowed us to disconnect them 

from the Y chain and tie them to ground without harming 

the circuit performance. Figure 7 shows the TDC matrix 

after performing this optimization step, where red dots 

represent the S-R latches which were disconnected from 

the Y delay chain. However, there were still three dummy 

S-R latch structures left to balance the capacitive load. 

Delay y 

1 8 15 22 29 

2 9 16 23 30 

3 10 17 24 31 

4 11 18 25 32 

5 12 19 26 

Delay x 

Figure 7: Y delay chain optimization 

6 13 20 27 

7 14 21 28 

The, Y chain load was greatly reduced, thus reducing 

the propagation delay for each delay element within 

the chain. Furthermore, the propagation delay for each 

delay element within the X delay chain remained almost 

constant due to our S-R latch structure. 

6.1.2. Dummy structures 

After the Y chain optimization, most of the S-R latches had 

one of their input tied to ground. Looking at their gate 

level schematic, it was easy to conclude the following: 

• Since one NAND gate had an input tied to ground, its 

output is tied to 1. This output is used as one of the inputs 

of the other NAND gate within the component. 

• Removing the former NAND gate along with its 

associated inverter reduced the circuit area maintaining a 

constant propagation delay for the X chain delay elements. 

• However, since these structures do not commute, 

dynamic power consumption was not affected by this 

optimization. 

With these ideas in mind, we obtained the following 

structure for our DUMMY component, which is basically 

a capacitive load: 

6.2. Resolution optimization 

As shown in section 3.2, our TDC resolution is given by 

the following relationships: 

30 EEWeb | Electrical Engineering Community 

Visit www.eeweb.com 

VDD 

R# 

GND 

1 

Figure 8: Dummy component (transistor level) 

1 

NOTE: A further area optimization can be easily done by 

removing one of the pMOS transistors within the NAND 

gate since it is always OFF. However we did not realize 

that until later, we could not include it in the submitted 

TDC design. 

6.1.3. S-R latch optimization 

Our final area improvement dealt with the active S-R 

latches themselves. As we only used one of the device 

outputs, we just removed the unused output, along with its 

associated inverter. By introducing this modification, we 

did not observe any major change in the delay element’s 

propagation delay. 

VDD 

GND 

S# R# 

Figure 9: Final S-R latch component (transistor level) 

The following table summarizes all the relevant parameters 

needed for a given resolution ranging from 5 to 10 ps: 

Figure 10: TDC resolution chart 

Therefore, by just sizing the delay elements within both 

delay chains we can achieve any required resolution. 

However, our TDC architecture presents two major 

drawbacks in comparison with the linear Vernier delay 

line architecture: 

• Our resolution is totally dependent on the propagation 

delay of each delay elements, as can be seen in the 

table above. Ignoring this fact will cause the system to 

behave in an unpredictable and highly non-linear way. 

Hence bigger delay elements will be needed if aiming 

for high resolution. 

• Each delay element has to drive a larger load, due to the 

matrix configuration. Again, sizing becomes a problem 

for a given resolution, limiting our design space. 

Due to this, we aimed for a 9 ps resolution, which is 

slightly better than the proposed in the midterm report 

and does not imply too large transistors. 

Q 

31

Previous page

Next page

1

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

DDR4 Design Considerations - EEWeb

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?