Xcell Journal: The authoritative journal for programmable ... - Xilinx
Xcell Journal: The authoritative journal for programmable ... - Xilinx
Xcell Journal: The authoritative journal for programmable ... - Xilinx
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
written in a <strong>programmable</strong> latency buffer<br />
to match the trigger latency. A number of<br />
data verification checks are carried out. <strong>The</strong><br />
data is finally <strong>for</strong>matted into 64-bit words<br />
and written into FIFOs. From there, it is<br />
read out by the event builder on the board.<br />
Without going into the details of the<br />
functionality, we will focus on the various<br />
issues we faced (and solved) in making the<br />
real hardware churn out correct data, with<br />
a focus on the use of RocketIO transceivers.<br />
Much of what we learned was on a<br />
trial-and-error basis. <strong>The</strong> main issue was<br />
related to the reference clock, which we’ll<br />
describe in detail in the next section.<br />
<strong>The</strong> other significant issue that we<br />
faced was the alignment of the K character<br />
within the 2-byte data path of the received<br />
data. We were initially using<br />
the Gigabit_Ethernet primitive<br />
in half-rate mode <strong>for</strong> a 2-byte<br />
data path. But we observed that<br />
not all of the channels were<br />
putting the K character in the<br />
same place within the 2-byte<br />
word and there was no way to<br />
<strong>for</strong>ce this alignment in the<br />
Gigabit_Ethernet primitive<br />
(the ALIGN_COMMA_MSB<br />
parameter of this primitive is<br />
set to FALSE by default).<br />
Because our protocol expected<br />
the K to always appear on the<br />
LSB of the word, we switched to<br />
the GT_CUSTOM primitive,<br />
where we could <strong>for</strong>ce the alignment and subsequently<br />
swap the position of K to the LSB<br />
of the data. <strong>The</strong> simulations showed perfect<br />
alignment – but in real hardware, some of<br />
the channels were getting misaligned.<br />
A colleague of ours referred us to the<br />
design note about 32-bit word comma<br />
alignment in the RocketIO transceiver user<br />
guide. Although this is usually needed only<br />
<strong>for</strong> a 4-byte data path, we implemented a<br />
similar scheme <strong>for</strong> our 2-byte data path and<br />
this fixed our misalignment problem.<br />
Clock, Programming, and JTAG<br />
We cannot over-emphasize the need <strong>for</strong> a<br />
high-quality reference clock. Besides satisfying<br />
all of the criteria specified in the<br />
RocketIO user manual, we made sure that<br />
... this is a flexible approach, as the<br />
FPGAs are re<strong>programmable</strong> and a more<br />
economical solution in the long term.<br />
our reference clock was as clean as we could<br />
possibly get (see Figure 2).<br />
We used a quartz-based phase-locked<br />
loop (QPLL) circuit developed at CERN<br />
<strong>for</strong> our system to provide the best jitter-free<br />
clock source (100 ps peak-to-peak). We<br />
found that a lot of problems in the per<strong>for</strong>mance<br />
of the RocketIO devices could be<br />
traced to a noisy/jittery reference clock. If<br />
you are using RocketIO transceivers on<br />
both halves of the chip, then it’s much bet-<br />
Figure 2 – Clock jitter measurement<br />
ter to have two reference clocks. We believe<br />
this helps even if you are running the<br />
RocketIO transceivers in half-rate mode<br />
(which is our case).<br />
Another aspect of the clocking scheme<br />
that we used was to pass the reference clock<br />
through a global clock buffer after an input<br />
global differential clock buffer. We<br />
observed improved stability and a more<br />
uni<strong>for</strong>m distribution of the reference clock<br />
with the FPGA editor.<br />
Also, though not directly related to the<br />
high-speed transceivers, we found that an<br />
independent post-configuration DCM<br />
reset logic (usually recommended if you<br />
have an external feedback clock) is useful<br />
even when using internal feedback. This<br />
solved a problem we were having with the<br />
DCMs where they were sometimes not<br />
locking after reconfiguration. <strong>Xilinx</strong><br />
Technical Support helped us find the solution<br />
(<strong>Xilinx</strong> Answer Record 14425).<br />
As <strong>for</strong> programming and JTAG, we<br />
used the same group of EPROMs to configure<br />
eight of the nine FPGAs. One of the<br />
FPGAs is the master and provides the clock<br />
<strong>for</strong> all the devices in the chain. <strong>The</strong> ninth<br />
FPGA has a different pinout and a separate<br />
EPROM <strong>for</strong> itself.<br />
All circuits are connected in the same<br />
JTAG chain, which improved reprogramming<br />
time mainly during the “test” stages.<br />
We found that a need exists <strong>for</strong> a pull-up<br />
resistor on the TDO output of each <strong>Xilinx</strong><br />
device, something that we hope <strong>Xilinx</strong> will<br />
add in future devices. <strong>The</strong> JTAG is used<br />
also to check the board interconnections<br />
after assembly.<br />
Conclusion<br />
In this article, we’ve shown the advantages<br />
of using embedded deserializers instead of<br />
discrete components on a large project. By<br />
using nine 456-pin FPGAs to do the same<br />
job as 105 TQFPs, we saved time, both in<br />
the design and debugging phases. Plus, this<br />
is a flexible approach, as the FPGAs are<br />
re<strong>programmable</strong> and a more economical<br />
solution in the long term.<br />
We are currently considering migrating<br />
to a bigger <strong>Xilinx</strong> device as our processing<br />
requirements from the FPGAs increase.<br />
<strong>The</strong>re<strong>for</strong>e, we are studying the new devices<br />
available and how such a migration will<br />
affect our PCB design in terms of the routing<br />
of the high-speed lines.<br />
We believe that by following the design<br />
rules concerning high-speed design, like<br />
clean clock distribution, power supply<br />
filtering, and good routing of the internal<br />
reference clocks, it is possible to obtain a<br />
successful design in good time. For more<br />
in<strong>for</strong>mation, please write to us at<br />
jc.silva@cern.ch or adarsh.jain@cern.ch.<br />
Winter 2004 <strong>Xcell</strong> <strong>Journal</strong> 91