Automotive Innovators Hit High Gear in - Xilinx

More documents

Recommendations

Info

XCELLENCE IN AUTOMOTIVE & ISM sync _ inp 1 6 Y_in 1 2 Y_in 2 3 Y_in 3 4 Y_in 4 5 Y_in 5 1 sel 2 A0 3 A1 4 A2 5 A3 6 A4 7 A5 8 A6 9 A7 10 A8 sel in 0 in 1 in 2 in 3 in 4 in 5 in 6 in 7 in 8 synch counter and reset for the memory addressing, some simple binary logic and registers to implement appropriate delays. The blue shading and the Xilinx logo indicate the System Generator primitive blocks, each of which corresponds to optimized synthesizable HDL code. Through this graphical interface, System Generator allows the algorithm developer to easily implement a DSP algorithm in hardware without having any knowledge of HDL coding techniques. We use the second main subsystem, the 5x5 FIR kernel (shown in blue in Figure 3), to implement the convolution operation. It receives in input a block of 25 pixels from the line buffer block, multiplies them by the 25 coefficients of the Gaussian mask and accumulates the result into a register. We show the details of the implementation in Figure 5. To fit the performance of the tar- sync In 1 4 tapsdelay 4 counter reset _out Out 1 Out 2 In 1 Out 3 Out 4 4 tapsdelay 0 Out 1 Out 2 In 1 Out 3 Out 4 4 tapsdelay 1 Out 1 Out 2 In 1 Out 3 Out 4 4 tapsdelay 2 Out 1 Out 2 In 1 Out 3 Out 4 4 tapsdelay 3 out mux 9 A (latency =2) Out 1 Out 2 Out 3 Out 4 1 Out A 11 B0 12 B1 13 B2 14 B3 15 B4 16 B5 17 B6 18 B7 19 B8 sel A0 A1 A2 A3 A4 A5 A6 A7 A8 B0 B1 B2 B3 B4 B5 B6 B7 B8 C0 C1 C2 C3 C4 C5 C6 addr z -2 ROM A addr z -2 ROM B addr z -2 ROM C z -3 Delay 2 Out A Out B Out C 3 mux 9 (latency = 2 ) sel in 0 in 1 in 2 in 3 in 4 in 5 in 6 in 7 in 8 out mux 9 B (latency =2) 2 Out B Sel Sel Sel z -3 Delay 3 coeff _from _rom mux _out DSP 48 Macro (latency = 4 ) A coeff _from _rom mux _out DSP 48 Macro (latency = 4 ) B coeff _from _rom mux _out DSP 48 Macro (latency = 4 ) C 20 C0 21 C1 22 C2 23 C3 24 C4 25 C5 26 C6 P P P sel in 0 in 1 in 2 in 3 in 4 in 5 in 6 in 7 in 8 d en d en d en z q -1 Register 1 z q -1 Register 2 z q -1 Register 4 mux 9 C latency =3 out get device and to achieve a specified sample rate of 9 million samples per second (MSPS), we chose to use three multiplyaccumulators (which we implemented using the FPGA’s DSP48 programmable Multiply Accumulator functional units) in parallel. We dedicated each of them, at most, to nine multiplications, via time-division multiplexing (see the larger System Generator blocks in the top diagram of Figure 5). We split and stored the set of 25 coefficients of the mask in three ROMs that we implemented as distributed RAM (another memory resource the FPGA provides). In our implementation, we chose to fix the coefficients of the mask at compile time. However, we could have easily updated the design to accept these values as real-time inputs. In this way we could dynamically change the run-time intensity of the filtering based on environmental conditions. (Implementation Note: Because of the isotropic shape of the Gaussian, the kernel could be decomposed in two separable masks and the filtering could be obtained as two consecutive convolutions with 1x5 and 5x1 masks along the horizontal and vertical directions. Even though this approach would reduce the FPGA resources required, we didn’t adopt it to avoid losing generality and readability of the design.) Another powerful feature found in System Generator is the ability to apply a preload MATLAB function to customize the design before compilation. By setting parameters of the System Generator module (such as image resolution, FIR kernel values and the number of computational precision bits) and initializing the workspace signals we used during the run-time simulation, we can quickly and easily experiment with different processing 24 Xcell Journal Fourth Quarter 2008 3 Out C ↓9 z -1 Down Sample ↓9 z -1 Down Sample 1 ↓9 z -1 Down Sample 2 Figure 5 – System Generator implementation of FIR (convolution) filter a b z a + b -1 AddSub 1 z -1 Delay a b z a + b -1 AddSub 3 1 sel z cast -1 Convert 1 1 filtered [a:b] Slice 1 [a:b] Slice 2 in 0 3 in 1 4 in 2 5 in 3 6 in 4 7 in 5 8 in 6 9 in 7 10 in 8 z -1 Delay 1 sel d0 d1z d2 d3 -1 sel d0 d1 z d2 d3 -1 Mux Mux 1 z Delay -1 sel d0 d1 z -1 d2 d3 Mux6 out 1
methods and sets of input data. Furthermore, after we completed the simulation, a Stop MATLAB function is called to display the results by computing some useful information and comparing the fixed-point results against the floatingpoint reference ones. This methodology allows algorithm developers to closely analyze any portion of the hardware implementation and compare it to the original software model to verify compliance. System Generator FPGA Synthesis Results Those developing driver assistance systems must implement their designs at a cost level appropriate for high-volume production. The die resources needed to achieve a certain level of processing performance will define the size of the FPGA device they require and, therefore, its cost. In our lane departure warning preprocessor implementation, we target the XA Spartan-3A DSP 3400, currently the largest device available in the Xilinx automotive product line. We took this approach to support future planned development activities with this model, but analysis of the resources consumed for the preprocessing function clearly shows that this design would fit into a much smaller device. The following table reports the resources occupied by the GNR block on the XA Spartan-3A DSP 3400 device. The estimation assumes gray-level input images at VGA resolution at a 30-Hz frame rate (which implies an input data rate of 9.2 MSPS). DSP48s 3 out of 126 2% BRAMs 3 out of 126 2% Slices 525 out of 23872 1% From the perspective of timing performance, the GNR design runs at 168.32-MHz clock frequency and accepts an input data rate of up to 18.72 MSPS. Total resources needed for the entire lane detection preprocessing subsystem are summarized below: DSP48s 12 out of 126 9% BRAMs 16 out of 126 12% Slices 2594 out of 23872 10% The corresponding timing-performance analysis showed a clock frequency of 128.24 MHz, with a maximum input data rate of 14.2 MSPS. Given these resource requirements, we estimated the preprocessing function would even fit into an XA Spartan-3E 500— roughly one-seventh the density of the XA Spartan-3A 3400A device. Results and Future Work Figure 6 provides a sample of the performance of our LDW system, including an FPGA-based image-preprocessing function for lane-marking candidate extraction. You can see the input frame in the two images on the right. The pair of images on the left illustrate the performance of the preprocessing function we implemented in the FPGA. The picture at the top left represents the magnitude of the edge detection function after thresholding. The one at the lower left is taken after the edge-thinning and lane-marking pattern search processes. Clearly, our LDW preprocessor is very effective at taking a roadway scene and reducing the data to only the primary lane-marking candidates. The yellow and red lines, respectively, in the top and lower-right images represent instantaneous and tracked esti- Figure 6 – LDW processing model outputs XCELLENCE IN AUTOMOTIVE & ISM mations of the lane boundaries based on a simple, straight-line roadway model. In order to accurately predict the trajectory of the vehicle with respect to the lane boundaries, our future LDW system will use a curvature model. We are currently adopting a parabolic lane model in the object space, assuming the width of the lane is locally constant and is located on a flat ground plane. We can describe a parabolic lane model by creating four parameters incorporating the position, the angle and the curvature. Using a robust fitting technique, it is possible to estimate the four parameters for each frame of the video sequence. Noise, changing light, camera jitter, missing lane markings and tar strips could weaken the model extraction. To compensate for this information gap and make this phase more robust and reliable, the system needs a tracking stage. Tracking can be done using the Kalman filter in the space of the parameters of the lane model. The extraction and the tracking of the lane model are the next two stages we will implement in the FPGA soon. For this job we plan to use the AccelDSP synthesis tool, another Xilinx high-level DSP design tool. Because it supports a linear algebra library, we can use AccelDSP to implement a four- or six-state Kalman filter. Fourth Quarter 2008 Xcell Journal 25
Page 1 and 2: Xcell Issue 66 Fourth Quarter 2008
Page 4 and 5: Xcell journal PUBLISHER Mike Santar
Page 6 and 7: VIEWPOINTS Letter from the Publishe
Page 8 and 9: COVER STORY Driver Assistance Revs
Page 10 and 11: COVER STORY Awareness Warning Blind
Page 12 and 13: COVER STORY As developers seek to u
Page 14 and 15: COVER STORY Video Interface & Conve
Page 16 and 17: XPERT OPINION Driver Assistance Sys
Page 18 and 19: XPERT OPINION billion by 2012. Clea
Page 20 and 21: XCELLENCE IN AUTOMOTIVE & ISM Build
Page 22 and 23: XCELLENCE IN AUTOMOTIVE & ISM featu
Page 26 and 27: XCELLENCE IN AUTOMOTIVE & ISM Accel
Page 28 and 29: XCELLENCE IN AUTOMOTIVE & ISM Secur
Page 30 and 31: XCELLENCE IN AUTOMOTIVE & ISM produ
Page 32: XCELLENCE IN AUTOMOTIVE & ISM Video
Page 35 and 36: Let’s examine some of the challen
Page 37 and 38: then map or connect the clock-capab
Page 39 and 40: Timing Parameter Min (ns) Max (ns)
Page 41 and 42: The Xilinx design team recently dev
Page 43 and 44: For each RTL module, we created an
Page 45 and 46: At Missing Link Electronics, a comp
Page 47 and 48: simulation library. You can do this
Page 49 and 50: y Michael Seedman Designer michael@
Page 51 and 52: Finding a puck on the screen was li
Page 53 and 54: At the end of every frame we store
Page 55 and 56: the File menu. FPGA Editor will ask
Page 57 and 58: Figure 3 - The BUFG output net prop
Page 59 and 60: Embedded for Success The Xilinx Cus
Page 61 and 62: support in the EDK’s Base System
Page 63 and 64: “I designed a circuit that contro
Page 65 and 66: Xilinx ® Evaluation Kit Virtex ®
Page 68: Proven Xilinx ® Power Solutions Te

Automotive Innovators Hit High Gear in - Xilinx

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?