29.01.2015 Views

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

Embedded Software for SoC - Grupo de Mecatrônica EESC/USP

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Interactive Ray Tracing on Reconfigurable SIMD Morphosys 159<br />

6. LOCAL STACK FOR PARALLEL SHADING<br />

After all rays find the closest intersection points and intersected object, the<br />

ray-tracing algorithm calculates the color (using Phong-Shading mo<strong>de</strong>l [2, 3]).<br />

The shadow and reflection rays are generated and again traverse the BSP tree,<br />

as <strong>de</strong>scribed in Section 3.<br />

However, during this process, the intersection points and intersected objects<br />

can be different <strong>for</strong> different rays. This data cannot be saved to and later<br />

fetched from the FB. The reason is that they would have to be fetched one<br />

by one <strong>for</strong> different RCs due to limited bandwidth, which means all but one<br />

RC are idle and cycles are wasted. Local RC RAM is used to emulate a stack<br />

to address this problem. Besi<strong>de</strong>s stack, each RC has one vector to store the<br />

current intersection point and intersected object. During ray-object intersection<br />

process, when one object is found to be closer to the eye than the one<br />

already in the vector, the corresponding data replaces the data in the vector.<br />

Otherwise, the vector is kept unchanged. When new recursion starts, the vector<br />

is pushed into the stack. When recursion returns, the data is popped from the<br />

stack into the vector. In this way, the object data and intersection point required<br />

<strong>for</strong> shading are always available <strong>for</strong> different rays. The overhead due to these<br />

data saving and restoring is very small compared with the whole shading<br />

process. This process is illustrated in Figure 12-7.<br />

7. MEMORY UTILIZATION<br />

SIMD processing of 64 RCs <strong>de</strong>mands high memory bandwidth. For example,<br />

up to 64 different data may be concurrently required in MorphoSys.<br />

Fortunately, this is not a problem in our <strong>de</strong>sign. Our implementation guaran-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!