DDR4 Design Considerations - EEWeb

EEWeb PULSE TECH ARTICLE 

Click the image below to read Part 1: 

Continued from Part 1 

5. SYSTEM OVERVIEW 

After choosing the TDC structure, we started working 

on our design implementation following a top-down 

approach. From our point of view, and due to the large 

degree of freedom available for this design, this was the 

most logical starting point. 

5.1. Matrix structure 

Choosing an appropriate matrix structure was the first 

design issue we had to face. As already mentioned, there 

are two main considerations to take into account: 

• Dummy structure minimization within the matrix, 

since they consume power and contribute to the overall 

design area. 

• Delay stage loading since a homogeneous load for 

every delay stage will contribute to an easier design and 

a better resolution controllability. 

As discussed before, a bigger matrix yields a larger 

number of dummy structures but a more homogeneous 

capacitive load for both X and Y delay chains, as the 

row-column ratio approaches 1 and the matrix becomes 

square, making resolution easier to set. A smaller matrix 

will have the opposite effect, thus yielding a smaller 

number of dummy structures but making resolution harder 

to set. 

Keeping these two design parameters in mind, we 

derived the mathematical expressions for calculating 

them. Afterwards, we built the following table which 

summarizes all the possible solutions for our 32 stage 2-D 

Vernier TDC design, allowing us to analyze the problem 

and find the best solution: 

From this table we can conclude the following: elements (X delay chain) and another one of 7 delay 

elements (Y delay chain). 

Figure 1: 2-D Vernier matrix figures of merit 

To achieve a square matrix, and therefore a homogeneous 

load for every delay element, we need a 17×17 matrix. 

However, this structure will yield an unacceptable number 

of dummy structures (88.93% of the matrix will consist 

of dummy structures). Hence this matrix layout was 

discarded. 

There is an interesting set of solutions, yielding a minimum 

number of columns (10 columns for 5, 6 and 7 rows). 

Naturally, the number of dummy structures increases 

with the number of rows. However, we finally chose the 

7×10 matrix configuration, as shown in Figure 2, since the 

dummy increment is not very large and the row-column 

ration is closer to 100% among those three configurations. 

Delay y 

1 8 15 22 29 

2 9 16 23 30 

3 10 17 24 31 

4 11 18 25 32 

5 12 19 26 

Delay x 

Figure 2: 7x10 TDC matrix structure 

5.2. DELAY CHAINS 

6 13 20 27 

7 14 21 28 

As seen in the previous chapters, the Vernier delay line 

architecture uses two different delay chains. While in the 

linear Vernier delay line architecture these chains only 

differ in the nominal propagation delay of each element 

(i.e. the propagation delay measured when the other 

input signal is tied to ground), 2-D Vernier delay chains 

also differ in the number of elements they are made of. 

For instance, a 32 stage linear Vernier TDC would need 

two delay chains of 32 delay elements each, while our 

2-D Vernier TDC would need only one chain of 10 delay 

For the delay stages within the chains, we decided to 

use non-inverting buffer structures as the main building 

blocks. These structures yield a worse propagation 

time than a single inverter, but they provide the delay 

comparison and encoding stages with a very simple 

time information format. For this purpose, we created 

two different components within our library, called BUF_X 

and BUF_Y. 

Since the Y delay chain has to be faster than the X delay 

chain but its capacitive load per delay element is, by 

construction, larger than the X’s, it makes sense to initially 

increase the size of the BUF_Y transistors. However, since 

we are only dealing with low-to-high transitions, this can 

be achieved by just increasing the pMOS transistor in 

the second inverter within the BUF_Y structure. For the 

BUF_X component we initially set to minimum size for 

both pMOS and nMOS transistors. 

Besides setting the minimum size, we added an extra 

delay element at the end of both delay chains. This final 

delay element was left open (actually it is driving a 1GΩ 

resistor to avoid Cadence WARNING messages) and 

its only goal is to balance every capacitive load within 

the chain. 

We also included an extra input delay stage on both chains, 

called FIX_DELAY within our library. These structures 

were used to provide a rise and fall time independent 

signals to the delay chain during the first design tests. 

While FIX_DELAY elements remained unchanged through 

the design process, BUF_X and BUF_Y buffers were 

resized and optimized to achieve the desired resolution. 

5.3. Delay comparators 

The time difference between the START and STOP signals 

is measured by the use of several memory elements 

which capture the moment when the START signal is 

surpassed by the STOP signal. Following this principle, 

a 32-bit pseudo thermo-code format is generated by 

the TDC, where the delay information is kept as the 

transition from 1 to 0. Finally, this code is passed to the 

5-bit encoding circuit. 

Choosing among all the available memory elements for 

this task, we followed the recommendations given in 

[1] and used a NAND gate based S-R latch as the basic 

delay comparison element. The main advantage which 

presents this structure is its symmetry for both S and 

R signals, helping us to achieve a more homogeneous 

capacitive loading for both the X and Y delay chains. 

Besides, we also included an inverting buffer at each 

output, as recommended in [1], making this device less 

sensitive to output loading variations and preventing the 

design from unwanted non-linear behavior. 

28 EEWeb | Electrical Engineering Community 

Visit www.eeweb.com 

S# 

R# 

Figure 3: Delay comparator (gate level) 

Figure 4: S-R latch truth table 

Q# 

Special care has to be taken when connecting the feedback 

and input signals to the NAND pull-down network due to 

data dependent delay. Indeed, the nominal propagation 

delay of each element within the chain can be affected 

by the S-R latch current state, introducing non-linear 

effects. In particular, for this TDC architecture, this effect 

becomes quite significant since there are several S-R 

latches connected to the same delay element output. 

Figure 5 shows the two possible configurations. 

VDD 

GND 

FB2 

FB2 

FB1 

S# R# 

FB1 

Config 1 Config 2 

FB1 

FB2 

VDD 

GND 

FB2 

FB2 

FB1 

S# R# 

Figure 5: Delay comparator (transistor level) 

We obtained some interesting results while testing both 

configurations; which are shown in Figure 6. In particular, 

this figure shows the propagation delay for each delay 

element within the X delay chain for both configurations. 

We used for this purpose a 10 ps resolution configuration 

and a time delay of 160 ps between the START and STOP 

FB1 

Q 

FB1 

FB2 

29

Previous page

Next page

1

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

DDR4 Design Considerations - EEWeb

Create successful ePaper yourself

Delete template?

Save as template?