12.07.2015 Views

Toward A Multicore Architecture for Real-time Ray-tracing

Toward A Multicore Architecture for Real-time Ray-tracing

Toward A Multicore Architecture for Real-time Ray-tracing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

question was to decide whether to duplicate or share the kdtreedata structure. Our 8-core prototype hardware systemdoes not scale and our simulator <strong>for</strong> proposed hardware is tooslow to run the full application to completion. Only the modelcould reveal that duplication overhead is manageable, thusrelaxing coherence requirements <strong>for</strong> the architecture. To developthe model, we combined data gathered on the optimizedprototype using tools like Pin and Valgrind with simulationbasedmeasurements of cache behavior and instruction frequency.With this co-designed approach, we have shownthe ray<strong>tracing</strong>-based Copernican universe of application andgeneral visibility-centric 3D graphics is feasible. However,this work represents only a first cut at such a system design;there is more to explore in the application, architecture, andevaluation details.Application: Razor is the first implementation of an aggressivesoftware design incorporating many new ideas, some ofwhich have worked better than others. With further iterativedesign, algorithm development, and per<strong>for</strong>mance tuning, webelieve a ten-fold per<strong>for</strong>mance improvement in the softwareis possible. Non-rendering tasks in a game environmentand more generally irregular applications map well to thearchitecture and require further exploration.<strong>Architecture</strong>: There is potential <strong>for</strong> further architecturalenhancements. First, the length of basic blocks is quite largeand hence data-flow ISAs and/or greater SIMD width canprovide higher efficiency. Second, ISA specialization, beyondSIMD specialization targeted at shading and texture computationscould provide significant per<strong>for</strong>mance improvements.Improving memory system will be most effective, and 3D integratedDRAM could significantly increase per<strong>for</strong>mance andreduce system power [17]. Physically scaling the architectureby varying the number of tiles, frequency, and voltage scalingto meet power and area budgets provides a rich design spaceto be explored.Evaluation: Our analytical model enables accurate per<strong>for</strong>manceprojections and can even be used <strong>for</strong> sensitivitystudies. In addition, it can be extended to accommodate otherCopernican architectures, like Intel Larrabee [30].Comparison to GPUs and Beyond <strong>Ray</strong>-<strong>tracing</strong>: The processororganization in Copernicus is fundamentally differentfrom conventional GPUs, which provide a primitive memorysystem abstraction while deferring scene geometry managementto the CPU. Architecturally, the hardware Z-bufferis replaced with a flexible memory system and softwarespatial data structure <strong>for</strong> visibility test. This support enablesscene management and rendering in one single computationalsubstrate. We believe GPUs are likely to evolve to such amodel over <strong>time</strong>, potentially with a different implementation.For example, secondary rays could be hybridized with Z-buffer rendering. Our system is a particular point in thearchitecture design space that can support ray <strong>tracing</strong> as oneof potentially several workloads.8. ConclusionsModern rendering systems live in a Ptolemic Z-bufferuniverse that is beginning to pose several problems in providingsignificant visual quality improvements. We show that aCopernican universe centered around applications and sophisticatedvisibility algorithms with ray-<strong>tracing</strong> is possible andthe architecture and application challenges can be addressedthrough full system co-design. In this paper, we describeour system, called Copernicus, which includes several codesignedhardware and software innovations. Razor, the softwarecomponent of Copernicus, is a highly parallel, multigranular,locality-aware ray tracer. The hardware architectureis a large-scale tiled multicore processor with private L2caches, fine-grained ISA specialization tuned to the workload,multi-threading <strong>for</strong> hiding memory access latency, and limited(cluster-local) cache coherence. This organization representsa unique design point that trades off data redundancy andrecomputation over synchronization, thus easily scaling tohundreds of cores.The methodology used <strong>for</strong> this work is of interest in itsown right. We developed a novel evaluation methodologythat combines software implementation and analysis on currenthardware, architecture simulation of proposed hardware,and analytical per<strong>for</strong>mance modeling <strong>for</strong> the full hardwaresoftwareplat<strong>for</strong>m. Our results show that if projected improvementsin software algorithms are obtained, we cansustain real-<strong>time</strong> ray<strong>tracing</strong> on a future 240mm 2 chip at22nm technology. The mechanisms and the architecture arenot strictly limited to ray-<strong>tracing</strong>, as future systems thatmust execute irregular applications on large scale single-chipparallel processors are likely to have similar requirements.AcknowledgmentWe thank Paul Gratz, Boris Grot, Simha Sethumadhavan, theVertical group, and the anonymous reviewers <strong>for</strong> comments, theWisconsin Condor project and UW CSL <strong>for</strong> their assistance, andthe <strong>Real</strong>-Time Graphics and Parallel Systems Group <strong>for</strong> benchmarkscenes and <strong>for</strong> their prior work on Razor. Many thanks to MarkHill <strong>for</strong> several valuable suggestions. Support <strong>for</strong> this researchwas provided by NSF CAREER award #0546236 and by IntelCorporation.References[1] C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich, “<strong>Ray</strong>Tracing on the CELL Processor,” in Interactive <strong>Ray</strong> Tracing,2006, pp. 15–23.[2] J. Bigler, A. Stephens, and S. Parker, “Design <strong>for</strong> parallelinteractive ray <strong>tracing</strong> systems,” in Interactive <strong>Ray</strong> Tracing,2006, pp. 187–196.[3] D. Brooks, P. Bose, V. Srinivasan, M. K. Gschwind, P. G.Emma, and M. G. Rosenfield, “New methodology <strong>for</strong> earlystage,microarchitecture-level power-per<strong>for</strong>mance analysis ofmicroprocessors,” IBM J. Res. Dev., vol. 47, no. 5-6, pp. 653–670, 2003.11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!