13.07.2015 Views

Improvements on the kd-tree

Improvements on the kd-tree

Improvements on the kd-tree

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>László Szécsi and Balázs BenedekDepartment of C<strong>on</strong>trol Engineering and Informati<strong>on</strong> TheoryBudapest University of Technology and Ec<strong>on</strong>omics, Budapest, HungaryAbstract<str<strong>on</strong>g>Improvements</str<strong>on</strong>g> of <strong>the</strong> <strong>kd</strong>-<strong>tree</strong> structure for ray shooting are discussed, with special respect to animati<strong>on</strong>. The extremelyeffective cost model is analysed. A memory representati<strong>on</strong> using very few pointers is described and examined.Accelerati<strong>on</strong> possibilities using frame-to-frame coherence are listed for different types of animating objects,and an algorithm for parallel tracing of multiple <strong>kd</strong>-<strong>tree</strong>s is presented.Categories and Subject Descriptors (according to ACM CCS): I.3.6 [Computer Graphics]: Graphics data structuresand data types1. Introducti<strong>on</strong>The <strong>kd</strong>-<strong>tree</strong> structure is <strong>on</strong>e of <strong>the</strong> most potent accelerati<strong>on</strong>schemes for global illuminati<strong>on</strong> algorithms. Such <strong>tree</strong>s havebeen used and thoroughly studied for a l<strong>on</strong>g time, and recentlya method providing fine results has been found to estimate<strong>the</strong> time for <strong>the</strong> ray traversal, useful for building a nearoptimal<strong>tree</strong>. Fur<strong>the</strong>rmore, it is a driving aim to find fasteralgorithms for global illuminati<strong>on</strong> image syn<strong>the</strong>sis, and reduce<strong>the</strong> time cost for <strong>the</strong> existing <strong>on</strong>es. In order to achievethat, a straightforward effort is to pre-process <strong>the</strong> scene insome way, making fast image creati<strong>on</strong> possible at <strong>the</strong> cost ofa <strong>on</strong>e-time larger cost calculati<strong>on</strong>. This approach is promisingif multiple images of <strong>the</strong> same scene are needed, typicallyin case of animati<strong>on</strong>. Although presently availablespeed of <strong>the</strong>se algorithms is, despite recent advances 6 , stillinsufficient, interactive animati<strong>on</strong> could be a l<strong>on</strong>g-term objective.Naturally, we would have to calculate <strong>the</strong> image forevery frame of animati<strong>on</strong> using <strong>the</strong> informati<strong>on</strong> from preprocessingand previous frames to <strong>the</strong> maximum extent possible.As <strong>the</strong> c<strong>on</strong>structi<strong>on</strong> of accelerating data structures, including<strong>the</strong> <strong>kd</strong>-<strong>tree</strong>, is expensive, and any alterati<strong>on</strong> of <strong>the</strong>scene implies <strong>the</strong> change of <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>, it is critical to c<strong>on</strong>structand re-build <strong>the</strong> <strong>tree</strong> rapidly without increasing <strong>the</strong>cost of <strong>the</strong> ray traversal. Fur<strong>the</strong>rmore, <strong>the</strong> data structureshould aid to reveal and re-compute <strong>the</strong> differences betweenc<strong>on</strong>sequent frames of animati<strong>on</strong>.First, we investigate <strong>the</strong> algorithms for <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>structi<strong>on</strong>and ray traversal. We elaborate <strong>on</strong> <strong>the</strong> linear cost functi<strong>on</strong>,and show why it achieves near-optimal results seeminglyin c<strong>on</strong>tradicti<strong>on</strong> with its inaccuracy. We analyse <strong>the</strong>memory need and presumable cache-coherence of possible<strong>kd</strong>-<strong>tree</strong> representati<strong>on</strong>s, with special emphasis <strong>on</strong> using preallocatedmemory segments instead of dynamical allocati<strong>on</strong>.In order to calculate <strong>the</strong> animati<strong>on</strong> frames fast, it is notworth rebuilding <strong>the</strong> unchanged parts of <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>. Therefore,static and moving objects should be stored in separate<strong>tree</strong>s. We describe <strong>the</strong> operati<strong>on</strong> of this dual <strong>tree</strong> architecture,and compare <strong>the</strong> overhead and accelerati<strong>on</strong> gains. We willelaborate <strong>on</strong> questi<strong>on</strong>s arising about <strong>the</strong> dynamic <strong>tree</strong> that isbuilt for every frame. If <strong>the</strong> animati<strong>on</strong> sequence c<strong>on</strong>sists ofcomplex object hierarchies moving toge<strong>the</strong>r, <strong>the</strong> subdivisi<strong>on</strong>of <strong>the</strong>se rigid objects should not be repeated for every frame.We describe <strong>the</strong> method of transformed sub-<strong>tree</strong>s, and measuretransformati<strong>on</strong> overhead and <strong>the</strong> decrease of <strong>the</strong> timetaken by <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>structi<strong>on</strong>.We also investigate issues about <strong>the</strong> traversal of <strong>the</strong> dual<strong>kd</strong>-<strong>tree</strong> structure. We introduce an algorithm for <strong>the</strong> paralleltraversal of multiple <strong>tree</strong>s. We compare its speed to severalo<strong>the</strong>r methods, both for scenes with or without rigid objects.2. Previous Work2.1. The role of <strong>the</strong> <strong>kd</strong>-<strong>tree</strong> in global illuminati<strong>on</strong>algorithmsMost image syn<strong>the</strong>sis procedures modelling physical lighttransport make extensive use of <strong>the</strong> calculati<strong>on</strong> of ray-sceneintersecti<strong>on</strong>s. The naïve implementati<strong>on</strong>, determining <strong>the</strong>


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>hits for every primitive is unacceptably slow, in c<strong>on</strong>trastto <strong>the</strong> results achieved by space subdivisi<strong>on</strong>. In <strong>the</strong> lattercase, we <strong>on</strong>ly need to traverse cells al<strong>on</strong>g <strong>the</strong> ray and <strong>on</strong>lycompute intersecti<strong>on</strong>s for promising candidates. Best resultsam<strong>on</strong>g <strong>the</strong> spatial subdivisi<strong>on</strong> schemes are delivered by <strong>the</strong>BSP and <strong>kd</strong>-<strong>tree</strong>s. The <strong>kd</strong>-<strong>tree</strong> we use in this article is a binary,n<strong>on</strong>-balanced, spatial subdivisi<strong>on</strong> data structure, withaxis-aligned cutting planes associated to its n<strong>on</strong>-leaf nodes,and subsets of scene objects stored in <strong>the</strong> leaf nodes.The power of <strong>the</strong> structure lies in its flexibility. Cuttingplanes can be positi<strong>on</strong>ed depending <strong>on</strong> <strong>the</strong> locati<strong>on</strong> of <strong>the</strong>scene objects, so at <strong>the</strong> cost of some calculati<strong>on</strong> <strong>the</strong> soluti<strong>on</strong>resulting in an optimal traversal time can be chosen. Thecutting planes being axis aligned is a minor limitati<strong>on</strong>, asarbitrarily positi<strong>on</strong>ed planes may produce a better <strong>tree</strong>, butfinding <strong>the</strong> optimum would be less effective. Fur<strong>the</strong>rmore,storing <strong>the</strong> data describing <strong>the</strong> cutting planes requires lessmemory space, and it is far easier to compute <strong>the</strong> ray-planeintersecti<strong>on</strong>.2.2. Traversal al<strong>on</strong>g a rayDuring <strong>the</strong> image syn<strong>the</strong>sis a large number of ray-scene intersecti<strong>on</strong>shave to be computed. Compared to <strong>the</strong> <strong>on</strong>e-timec<strong>on</strong>structi<strong>on</strong> of <strong>the</strong> <strong>tree</strong> this means such a difference of scale,that it is worth taking every cost just to speed up traversal inmost of <strong>the</strong> cases.The sequential ray traversal algorithm is based <strong>on</strong> <strong>the</strong> spatialproximity search using <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>. First we take <strong>the</strong> originof <strong>the</strong> ray, and locate <strong>the</strong> cell c<strong>on</strong>taining it by walkingdown <strong>the</strong> <strong>tree</strong> from its root. Within <strong>the</strong> cell found, we carryout all intersecti<strong>on</strong> tests with <strong>the</strong> objects bel<strong>on</strong>ging to <strong>the</strong>cell. If no intersecti<strong>on</strong> within <strong>the</strong> cell was found, we proceedto <strong>the</strong> next cell. In order to find it, we use <strong>the</strong> same methodas before. We calculate <strong>the</strong> point where <strong>the</strong> ray leaves <strong>the</strong>cell, which is exactly where it enters <strong>the</strong> next. We translateit a tiny bit fur<strong>the</strong>r al<strong>on</strong>g <strong>the</strong> ray to resolve ambiguity, andrepeat <strong>the</strong> whole process using <strong>the</strong> spatial proximity searchwith this next point. We have to remark that <strong>the</strong> algorithmmay skip cells of extremely little or zero width. Although<strong>the</strong>se may seem useless at <strong>the</strong> first sight, <strong>the</strong>y can actuallyrightfully appear in <strong>kd</strong>-<strong>tree</strong>s for scenes where <strong>the</strong>re are numerousaxis-aligned polyg<strong>on</strong>s. This may be <strong>the</strong> case with geometricalscenes, typically boxes and rooms. Ano<strong>the</strong>r drawbackof this algorithm is that it starts from <strong>the</strong> root of <strong>the</strong><strong>tree</strong> for every new cell though it is very probable that twocells following each o<strong>the</strong>r are near each o<strong>the</strong>r in <strong>the</strong> structure.Therefore <strong>on</strong>e node could be visited many times.The recursive ray traversal algorithm eliminates <strong>the</strong> maindrawbacks of <strong>the</strong> sequential ray traversal algorithm and visitsevery node and leaf just <strong>on</strong>ly <strong>on</strong>ce 2 . We check if <strong>the</strong> rayintersects <strong>the</strong> volumes corresp<strong>on</strong>ding to <strong>the</strong> left and rightsub-<strong>tree</strong>s. The sub-<strong>tree</strong>s are traversed in <strong>the</strong> very same way,if necessary, starting with <strong>the</strong> <strong>on</strong>e nearer to <strong>the</strong> origin. Toterminate <strong>the</strong> recursi<strong>on</strong> <strong>the</strong> leaves of <strong>the</strong> <strong>tree</strong> are handled in<strong>the</strong> same manner as above. The implementati<strong>on</strong> of <strong>the</strong> algorithmneeds a traversal stack to store data about <strong>the</strong> sub-<strong>tree</strong>sneeded to be processed later.Whichever algorithm we use, we will walk through <strong>the</strong>leaf cells al<strong>on</strong>g <strong>the</strong> ray, and test possible intersecti<strong>on</strong>s for<strong>the</strong> segment inside <strong>the</strong> cell. If intersecti<strong>on</strong>s were found, <strong>the</strong>closest is taken, else <strong>the</strong> ray has to be followed <strong>on</strong>. C<strong>on</strong>sequently,<strong>the</strong> objective is to have minimal number of objectsin a cell, and if a ray intersects a cell, it should, with highprobability, also intersect an object within. This, pushed toits extremes, it accomplished when all objects are delimitedby six fitting cutting planes. However, if <strong>the</strong> bounding boxesof <strong>the</strong> objects overlap, like in most scenes, <strong>the</strong>n such cutsmay intersects several objects, adding <strong>the</strong>m to both child volumes,resulting in superfluously large list in <strong>the</strong> leaves, andworse-than-optimal traversal time.2.3. C<strong>on</strong>structing a <strong>kd</strong>-<strong>tree</strong> and possibledecisi<strong>on</strong>-making heuristicsThe <strong>tree</strong> can be built in a recursive way. Processing a volumeinvolves <strong>the</strong> choice and storage of <strong>the</strong> cutting plane, and<strong>the</strong> processing of <strong>the</strong> two new sub-volumes. The decisi<strong>on</strong> tomake is where to place <strong>the</strong> cutting plane, and if it is worthsubdividing <strong>the</strong> volume at all. This may be based <strong>on</strong> someheuristic scheme, or an estimati<strong>on</strong> of <strong>the</strong> resulting traversalcost.The first, most obvious method is to cut <strong>the</strong> volume intotwo equal halves, using <strong>the</strong> spatial median, similarly to <strong>the</strong>oc<strong>tree</strong> approach where we care little about <strong>the</strong> positi<strong>on</strong> of<strong>the</strong> objects when subdividing a volume. The resulting <strong>tree</strong>will of course not be balanced, and it is easy to c<strong>on</strong>struct ascene where this method comes near to useless. Similarly to<strong>the</strong> oc<strong>tree</strong>, spatial median subdivisi<strong>on</strong> performs well in caseof evenly distributed objects.Ano<strong>the</strong>r simple and more promising approach is to makeboth sub-volumes c<strong>on</strong>tain <strong>the</strong> same number of objects. Thepositi<strong>on</strong> with this property is called <strong>the</strong> object median. Tofind it, we have to do a ’select and partiti<strong>on</strong>’ median search.This can be c<strong>on</strong>sidered a modified versi<strong>on</strong> of <strong>the</strong> ’quick sort’algorithm that <strong>on</strong>ly sorts <strong>the</strong> partiti<strong>on</strong> c<strong>on</strong>taining <strong>the</strong> halvingelement of <strong>the</strong> array. This simpler procedure will also separate<strong>the</strong> array into elements smaller and greater than <strong>the</strong>median, and outperforms ’quick sort’. As <strong>the</strong> resulting <strong>tree</strong>would be balanced, its representati<strong>on</strong> could be simple andcompact. Fur<strong>the</strong>rmore, a balanced <strong>kd</strong>-<strong>tree</strong> can be c<strong>on</strong>sideredto be optimal for several tasks, such as proximity search.However, in ray casting, we do not <strong>on</strong>ly need to find an object,but to follow a ray through several cells intersected.Therefore, <strong>the</strong> probability of a sub-volume being hit by aray plays an important role in <strong>the</strong> expected time cost of <strong>the</strong>rendering algorithm. The object median method disregardsthat aspect. The unfortunate c<strong>on</strong>sequence for <strong>the</strong> optimal <strong>tree</strong>


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>is, we have to discard <strong>the</strong> c<strong>on</strong>cept of balancedness, and willhave to find <strong>the</strong> means to store a n<strong>on</strong>-balanced <strong>tree</strong> in a compactway.Although simple cut heuristics produce inferior traversaltimes, fast c<strong>on</strong>structi<strong>on</strong> and compact data structure are advantages.Therefore, <strong>the</strong>y may have some relevance if <strong>the</strong>structure is to be built real-time, despite <strong>the</strong> fact that in globallyilluminated animati<strong>on</strong> <strong>the</strong> traversal cost tends to be <strong>the</strong>bottleneck. As <strong>the</strong> <strong>tree</strong> c<strong>on</strong>structi<strong>on</strong> time rapidly increaseswith <strong>the</strong> number of objects, but <strong>the</strong> traversal time for sceneslarge enough is c<strong>on</strong>stant, it is not to exclude, that <strong>the</strong> situati<strong>on</strong>may change, especially in <strong>the</strong> case of very high polyg<strong>on</strong>number, vertex-based animati<strong>on</strong>s. The compact memoryrepresentati<strong>on</strong> used for <strong>the</strong> balanced <strong>tree</strong> is definitely to beused somehow in <strong>the</strong> more sophisticated methods.3. Improvement of <strong>the</strong> cost functi<strong>on</strong>3.1. Previous workA way to find <strong>the</strong> optimal cut is to c<strong>on</strong>sider all reas<strong>on</strong>ablecuts, including cutting off empty space and terminati<strong>on</strong> of<strong>the</strong> build, and choose <strong>the</strong> <strong>on</strong>e that produces <strong>the</strong> shortest expectedtraversal time. To achieve this we need estimate thattime. Havran proposed <strong>the</strong> following functi<strong>on</strong>, linear withrespect to <strong>the</strong> number of objects in <strong>the</strong> sub-volumes:C 1V SA´V SA´le ftChild´V µµ´N L · N SP µ·µSA´rightChild´V µµ´N SP · N R µ℄ (1)Where C V is <strong>the</strong> cost corresp<strong>on</strong>ding to volume V, SA´V µ is<strong>the</strong> surface area of volume V, and N L , N R , N SP are <strong>the</strong> numberof objects completely in <strong>the</strong> left and right sub-volumes,and <strong>the</strong> number of objects intersected by <strong>the</strong> splitting plane,respectively.This means that <strong>the</strong> expected time for <strong>the</strong> traversal of avolume is <strong>the</strong> time needed to carry out <strong>the</strong> naïve intersecti<strong>on</strong>test for all objects, multiplied by <strong>the</strong> probability of a rayhitting <strong>the</strong> volume. This probability, c<strong>on</strong>sidering that <strong>the</strong> volumesare c<strong>on</strong>vex, equals <strong>the</strong> ratio of <strong>the</strong> surface areas. Obviously,<strong>the</strong> estimate given by this functi<strong>on</strong> does not equal <strong>the</strong>actual time cost, as <strong>the</strong> created volumes will be subdividedfur<strong>the</strong>r, and not handled with <strong>the</strong> naïve algorithm. Havranalso identified this problem and proposed some ideas for <strong>the</strong>soluti<strong>on</strong>. He stated that <strong>the</strong> optimal cost functi<strong>on</strong> depends <strong>on</strong><strong>the</strong> distributi<strong>on</strong> of <strong>the</strong> objects in <strong>the</strong> actual scene to a greatextent, and thus for a better estimate <strong>the</strong> cost must be measuredin some way. Although it is possible to build <strong>the</strong> <strong>tree</strong>and compute <strong>the</strong> cost precisely, doing this every time <strong>the</strong>functi<strong>on</strong> should be evaluated would lead to computati<strong>on</strong>alexplosi<strong>on</strong> of <strong>the</strong> c<strong>on</strong>structi<strong>on</strong> algorithm. Therefore, in orderto obtain a more effective functi<strong>on</strong>, <strong>the</strong> scene should be characterisedby values that are easily determined, and influence<strong>the</strong> cost functi<strong>on</strong>.3.2. N<strong>on</strong>-linear cost estimateIn a recent article we have shown that for scenes with largenumber of random objects, <strong>kd</strong>-<strong>tree</strong> traversal is d<strong>on</strong>e in c<strong>on</strong>stanttime. How can this be brought into c<strong>on</strong>s<strong>on</strong>ance with <strong>the</strong>linear estimati<strong>on</strong>? How can Havran’s method provide outstandingresults despite this c<strong>on</strong>tradicti<strong>on</strong>? If we are low in<strong>the</strong> <strong>tree</strong>, near <strong>the</strong> leaves, and it is true that <strong>the</strong> sub-volumeswill go through little to no fur<strong>the</strong>r subdivisi<strong>on</strong>, than <strong>the</strong> linearestimati<strong>on</strong> is of course perfect. On <strong>the</strong> o<strong>the</strong>r hand, if weare near <strong>the</strong> root of <strong>the</strong> <strong>tree</strong>, meaning that <strong>the</strong> c<strong>on</strong>stant timetraversal statement hold for <strong>the</strong> sub-<strong>tree</strong>s, <strong>the</strong>n <strong>the</strong> expectedtraversal time is independent of <strong>the</strong> cut. Therefore, if <strong>the</strong> linearestimate would fail, <strong>the</strong>n where we cut is not so importantat all. However, it is possible to c<strong>on</strong>struct a more accuratecost estimate, if we are able to account for <strong>the</strong> gain fromseparating <strong>the</strong> elements and cutting off empty space. To calculatethat exactly would be hopelessly expensive, but bysimply changing <strong>the</strong> linear functi<strong>on</strong> to a bit more fitting <strong>on</strong>e,we may eliminate some of <strong>the</strong> inaccuracy <strong>on</strong> higher levels of<strong>the</strong> <strong>tree</strong>. It is of course imperative to keep <strong>the</strong> linearity in <strong>the</strong>lower regi<strong>on</strong>s where it works perfectly. Let us suppose thata cut improves <strong>the</strong> time by a factor of q 1 <strong>on</strong> average, andthat a cell c<strong>on</strong>taining n 0 elements is not worth dividing anymore. Actually, that means that a cell may c<strong>on</strong>tain n 0 objects<strong>on</strong> average. Using that <strong>the</strong> cost of traversal, without <strong>the</strong> adjustmentfor <strong>the</strong> probability of <strong>the</strong> volume being hit is givenin <strong>the</strong> following equati<strong>on</strong>. This functi<strong>on</strong> is to be applied to<strong>the</strong> number of objects in <strong>the</strong> sub-volumes in 1:f ´nµ n¡q log 2´n n 0µ(2)The value of n 0 is relatively easy to find, and will be determinedby <strong>the</strong> primitive geometry. The value q, however, isquite an abstracti<strong>on</strong>. It includes both cuts between objectsand cutting off empty space. Actually, it corresp<strong>on</strong>ds moreto <strong>the</strong> subdivisi<strong>on</strong> potential of <strong>the</strong> volume than to <strong>the</strong> obscurec<strong>on</strong>cept of cost reducti<strong>on</strong> achieved by a single cut. Still, it isnot harmful to overestimate both n 0 and q, as that will get usnearer to <strong>the</strong> original linear estimate. Therefore, <strong>the</strong> formulafor <strong>the</strong> expected number of intersecti<strong>on</strong> tests introduced inour previous article 3 can be applied to determine a probableupper bound for <strong>the</strong> traversal cost of <strong>the</strong> <strong>tree</strong> that is beingbuilt, providing a value for q. Naturally, significantly betterresults are <strong>on</strong>ly expected for large scenes with high primitivecount, as <strong>the</strong> linear functi<strong>on</strong> is less accurate, and <strong>the</strong> guessfor q is better in those cases. The previous equati<strong>on</strong> can fur<strong>the</strong>rbe written as:f ´nµ n¡´n n 0 µ log 2 q (3)f ´nµ ´n n 0 µ 1·log 2 q · n 0 ¡ ´n n 0 µ log2q (4)As ´n n 0 µ log2q 1, <strong>the</strong> cost may be over-estimated as:f ´nµ n¡´n n 0 µ log 2 q ´n n 0 µ 1·log 2 q · n 0 (5)


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>Should n be smaller than n 0 , which is possible as n 0 is anaverage value, we stick to <strong>the</strong> linear estimate:f ´nµ n if n n 0´n n 0 µ 1·log 2 q · n 0 if n n 0(6)3.3. ResultsAlthough Havran pointed out <strong>the</strong> difficulties at c<strong>on</strong>structinga general cost functi<strong>on</strong>, his investigati<strong>on</strong>s have also shownthat for a specific scene, <strong>the</strong> time of traversal as a functi<strong>on</strong>of <strong>the</strong> number of patches usually has little deviance from<strong>the</strong> logarithmic curve. Therefore, it is expected that if <strong>the</strong>best coefficients are found, <strong>the</strong> rendering time for a scenemay be decreased. However, as explained above, <strong>the</strong> linearestimate performs over expectances, so <strong>on</strong>ly little speed-upwill occur.We have summarised run times for Bi-directi<strong>on</strong>al PathTracing 4 image computati<strong>on</strong> for several scenes. Time isgiven in sec<strong>on</strong>ds everywhere.Scene HouseNumber of patches 24737Measured n 0 3.2Linear cost run-time 65.42n 0 q 09 q 095 q 0984 61.51 57.12 57.406 68.16 64.48 65.128 70.08 64.32 64.89For o<strong>the</strong>r scenes with fewer objects we obtained less significantresults. It is also to be remarked that <strong>the</strong> time takenby <strong>the</strong> BPT algorithm may also be influenced by factors like<strong>the</strong> actual paging coherence. Obviously, <strong>the</strong>re is little improvement,and <strong>the</strong> method we used to determine q is stillnot general enough to be used for every scene. Firstly, thisis due to <strong>the</strong> fact that <strong>the</strong> formula for traversal time given inour previous article 3 is <strong>on</strong>ly true for a scene large enough.Sec<strong>on</strong>dly, <strong>the</strong> c<strong>on</strong>cept of <strong>the</strong> cost reducti<strong>on</strong> caused by acut may be better, than <strong>the</strong> linear <strong>on</strong>e, but still not perfect.Havran’s measurements would ra<strong>the</strong>r suggest a logarithmiccurve. Therefore, it would be worth testing a few o<strong>the</strong>r formsof cost functi<strong>on</strong>s. However, <strong>the</strong> use of <strong>the</strong> n 0 value to characterise<strong>the</strong> object distributi<strong>on</strong>, or tessellati<strong>on</strong> of <strong>the</strong> scenewould probably be a valuable c<strong>on</strong>cept.4. Tree representati<strong>on</strong>, memory usage and cachecoherence4.1. Previous workThe single most important objective is to assure fast traversal.First, this means that we have to be able to find <strong>the</strong> childrenof a node quickly, and sec<strong>on</strong>d, we have to retrieve <strong>the</strong>data from memory fast. In order to achieve this, we shouldhave <strong>the</strong> most possible amount of data corresp<strong>on</strong>ding t<strong>on</strong>earby cells in <strong>the</strong> cache. That means we have to use <strong>the</strong>minimum amount of storage space per node, store nearbynodes next to each o<strong>the</strong>r, and still be able to find <strong>the</strong> childrenfor a node fast enough.A powerful soluti<strong>on</strong> to store a complete <strong>tree</strong> is <strong>the</strong> compactrepresentati<strong>on</strong>, where every node is an element of anarray, and <strong>the</strong> start of <strong>the</strong> children-array can be found bymultiplying <strong>the</strong> parent index with <strong>the</strong> number of childrenper node. This representati<strong>on</strong> does not use any pointers, and<strong>the</strong>refore it needs <strong>on</strong>ly <strong>the</strong> minimum amount of memory, butwe need to know <strong>the</strong> number of nodes in advance. Fur<strong>the</strong>rmore,for an unbalanced <strong>tree</strong> we need a compact structure asdeep as <strong>the</strong> deepest branch of <strong>the</strong> <strong>tree</strong>, and a large part of <strong>the</strong>array will be empty, causing a tremendous waste of memory.Fur<strong>the</strong>r <strong>on</strong>, we will call this phenomen<strong>on</strong> fragmentati<strong>on</strong>.Ano<strong>the</strong>r approach is to have separately allocated nodesand use pointers to find children. Two pointers per node maymean multiple memory need, and locality is not automaticallyassured. The optimal soluti<strong>on</strong> has to use some pointersto account for <strong>the</strong> problems with <strong>the</strong> naïve compact representati<strong>on</strong>,while keeping its advantages.Such a mid-way soluti<strong>on</strong> is proposed by Havran 2 . It usessmall compact <strong>tree</strong>s, c<strong>on</strong>nected by pointers. This limits fragmentati<strong>on</strong>due to <strong>the</strong> n<strong>on</strong>-balancedness to <strong>the</strong> sub-<strong>tree</strong>s, andif <strong>the</strong>ir memory representati<strong>on</strong> fits into a cache-line, cachecoherence is utilised well (See Figure 1). However, a largenumber of pointers are still used, and dynamic allocati<strong>on</strong> ofsub-<strong>tree</strong>s is assumed.4.2. <strong>kd</strong>-<strong>tree</strong> in pre-allocated memoryThe c<strong>on</strong>cept of compact sub-<strong>tree</strong>s may be developed fur<strong>the</strong>r.More pointers may be eliminated, if such a sub-<strong>tree</strong> isc<strong>on</strong>sidered a node of ano<strong>the</strong>r compact <strong>tree</strong>, where <strong>the</strong> nodeshave twice as many children as number of nodes <strong>on</strong> <strong>the</strong> lastlevel of <strong>the</strong> sub-<strong>tree</strong>. These super-nodes do not have to bec<strong>on</strong>nected by pointers, as <strong>the</strong>y also can be stored in a compactstructure. This way, pointers are completely eliminated,cache coherence is assured, but <strong>the</strong> fragmentati<strong>on</strong> is just asa costly issue as in case of <strong>the</strong> simple compact <strong>tree</strong>.The problem is that we have to store an unbalanced <strong>tree</strong>.If <strong>the</strong> super-nodes are c<strong>on</strong>nected by pointers, as Havran suggests,<strong>the</strong>n <strong>the</strong> fragmentati<strong>on</strong> will be limited to <strong>the</strong> sub-<strong>tree</strong>s.However, <strong>the</strong> compact soluti<strong>on</strong> can also be improved to handleunbalanced <strong>tree</strong>s. Whenever a branch terminates before<strong>the</strong> depth of <strong>the</strong> pre-allocated memory, <strong>the</strong> would-be childrenof <strong>the</strong> leaves become roots of free-space <strong>tree</strong>s. These"holes" can later be used to c<strong>on</strong>tain <strong>the</strong> parts of <strong>the</strong> <strong>tree</strong> thatwould stretch over <strong>the</strong> pre-allocated space. This means that<strong>on</strong>ly <strong>the</strong> nodes <strong>on</strong> <strong>the</strong> very last level need to have pointers,actually pointing somewhere back into <strong>the</strong> array. This mappingis dem<strong>on</strong>strated in Figure 2. It is of course possible anddesirable to use <strong>the</strong> cache-line sized sub-<strong>tree</strong>s mapping (SeeFigure 1.) to store <strong>the</strong> resulting balanced <strong>tree</strong> for better cacheperformance.


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>89 1011 12 13 1412 34 5 6 71516 1718 19 20 21Figure 1: Mapping of a <strong>tree</strong> into an array using cache-linesizedsub-<strong>tree</strong>s.need to have a good estimate of <strong>the</strong> number of <strong>the</strong> nodesto be able to allocate memory in advance. Fortunately, weknow that a <strong>kd</strong>-<strong>tree</strong> uses 6n splitting planes at most. Thisalso means a maximum of 6n leaves. Adding <strong>the</strong> worst-casenumber of pointers, which is exactly <strong>the</strong> number of nodes <strong>on</strong><strong>the</strong> last level, we c<strong>on</strong>clude that an array with 24n elementssuffices.A node itself has to be as tiny as possible. The above structureassumes, that <strong>the</strong> descripti<strong>on</strong> of a splitting plane for an<strong>on</strong>-leaf node, a pointer to <strong>the</strong> list of objects for leaves, anda pointer to ano<strong>the</strong>r node for a redirect node all fit into anelement of <strong>the</strong> array. As <strong>the</strong> plane is described by a not necessarilyprecise floating-point number, all <strong>the</strong>se <strong>on</strong>ly take upa few bytes. We need <strong>on</strong>e extra bit to distinguish betweenleaves and n<strong>on</strong>-leaf nodes.Figure 2: Mapping of an unbalanced <strong>tree</strong> into an array usingpointers <strong>on</strong> <strong>the</strong> last level. Darkened nodes are leaves.The number of <strong>the</strong>se pointers could fur<strong>the</strong>r be decimatedif we make use of <strong>the</strong> fact that leaves <strong>on</strong> <strong>the</strong> last level d<strong>on</strong>ot have children. That way, a node <strong>on</strong> <strong>the</strong> last level isei<strong>the</strong>r a leaf, or a reference to <strong>the</strong> actual positi<strong>on</strong> of <strong>the</strong>child node. This representati<strong>on</strong> allows <strong>the</strong> mapping of a n<strong>on</strong>balanced<strong>tree</strong> and with all <strong>the</strong> needed pointers into a single,pre-allocable array. This structure, also compatible with <strong>the</strong>cache-line mapping, is depicted in Figure 3. Naturally, weFigure 3: <strong>kd</strong>-<strong>tree</strong> using <strong>the</strong> minimal number of pointers.Dark nodes are leaves, hatched nodes are unused.4.3. Estimati<strong>on</strong> of <strong>the</strong> number of necessary pointersThe above figure for <strong>the</strong> memory need can fur<strong>the</strong>r be decreased,if we do not account for <strong>the</strong> worst case, and usesomewhat less memory. Would <strong>the</strong> <strong>tree</strong> exceed its predefinednode count, we will have to terminate <strong>the</strong> build. Ascompromising <strong>the</strong> <strong>tree</strong> c<strong>on</strong>structi<strong>on</strong> algorithm will prove tobe very costly during traversal, a secure size should be chosen,with practically zero chance of overflow. However, if wemake use of <strong>the</strong> fact that no pointers are needed for leaves,a lower figure for <strong>the</strong> number of pointers may be found. Obviously,if <strong>the</strong> lowest level of <strong>the</strong> <strong>tree</strong> would <strong>on</strong>ly c<strong>on</strong>tainpointers, as <strong>the</strong> previously given upper bound suggests, everysingle node above would be referenced. This is impossiblebecause of two reas<strong>on</strong>s: <strong>the</strong> nodes that bel<strong>on</strong>g to <strong>the</strong><strong>tree</strong> originating from <strong>the</strong> root are not referenced, and, moresignificantly, leaves are never referenced.To derive an exact number let us introduce <strong>the</strong> followingnomenclature. Let X be <strong>the</strong> number of pointers <strong>on</strong> <strong>the</strong> lastlevel, and N <strong>the</strong> number of pre-allocated nodes. If <strong>the</strong> arrayis not full, <strong>the</strong> number of pointers is irrelevant. We are <strong>on</strong>lyinterested in <strong>the</strong> case, when every node is used as a cut, a leafor a pointer. Therefore, <strong>the</strong> number of leaves L, <strong>the</strong> numberof cuts C, and <strong>the</strong> number of pointers X add up to <strong>the</strong> size of<strong>the</strong> array.L ·C · X N (7)The number of leaves and cuts are equal.2C · X N (8)Pointers may <strong>on</strong>ly reference cut nodes, and no node can bereferenced more than <strong>on</strong>ce:C X (9)Substituting this back, we get:3X N (10)Therefore, it is not half of <strong>the</strong> nodes needed for pointers in<strong>the</strong> worst case, <strong>on</strong>ly <strong>on</strong>e third. This way, <strong>the</strong> upper bound for


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong><strong>the</strong> array is 18n, where 6n nodes are reserved for <strong>the</strong> cuts, <strong>the</strong>leaves and <strong>the</strong> pointers. Although this may still be a roughover-estimati<strong>on</strong> for some simple scenes, going even lowermay involve disadvantageous effects. First of all, if <strong>the</strong> arrayis filled, <strong>the</strong> above menti<strong>on</strong>ed terminati<strong>on</strong> could occur.Sec<strong>on</strong>dly, even if <strong>the</strong> array is not filled, a deeper <strong>tree</strong> willallow l<strong>on</strong>ger branches without using pointers. Therefore,over-allocati<strong>on</strong> slightly increases compactness and traversalspeed. Setting <strong>the</strong> array to <strong>the</strong> size of 18n will allow for <strong>the</strong>storage of worst-case <strong>kd</strong>-<strong>tree</strong>s, and speed-effective representati<strong>on</strong>of simple <strong>on</strong>es.In <strong>the</strong> following table <strong>the</strong> numbers of leaves and pointersfor various scenes are listed. Obviously, <strong>the</strong> number ofpointers remains below <strong>the</strong> worst-case bound. This representati<strong>on</strong>definitely uses fewer pointers than <strong>the</strong> previous soluti<strong>on</strong>s,which should result in better cache coherence andfaster traversal.Scene Patches Nodes Leaves PointersCornell box 3968 9037 4296 444Beethoven 2636 23140 9883 3373Random 3515 39389 16981 5426Tea 10025 54392 23874 6643Chickens 16467 115455 49094 17266House 24737 156469 65540 253885. <strong>kd</strong>-<strong>tree</strong>s for animati<strong>on</strong>5.1. Separati<strong>on</strong> of dynamic and static objectsRebuilding <strong>the</strong> whole <strong>kd</strong>-<strong>tree</strong> for every frame is obviouslyvery expensive and also superfluous. If <strong>the</strong> objects are classifiedas static objects staying at a fixed positi<strong>on</strong> during <strong>the</strong>animati<strong>on</strong>, and dynamic objects that may move, we can buildtwo different <strong>kd</strong>-<strong>tree</strong>s. This may have various advantages.First of all, <strong>the</strong> time of <strong>the</strong> <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>structi<strong>on</strong> is dramaticallyreduced. It also becomes possible to shoot rays <strong>on</strong>lyinto <strong>the</strong> dynamic <strong>kd</strong>-<strong>tree</strong>, <strong>the</strong>reby identifying changes of <strong>the</strong>scene al<strong>on</strong>g previous shooting or ga<strong>the</strong>ring paths. It is interestingto examine how <strong>the</strong> data structure could be helpfulat making use of frame-to-frame coherence. However, in <strong>the</strong>dual <strong>kd</strong>-<strong>tree</strong> structure traversal will be slower. Theoretically,if <strong>the</strong> both <strong>tree</strong>s c<strong>on</strong>tain a large number of objects, <strong>the</strong> traversaltime would be independent of <strong>the</strong> size of <strong>the</strong> <strong>tree</strong>, <strong>the</strong>reforeseparating <strong>the</strong>m could double <strong>the</strong> time cost. This wouldof course be unacceptable, and should be addressed.Obviously, <strong>the</strong> less objects are in <strong>the</strong> dynamic <strong>kd</strong>-<strong>tree</strong>,<strong>the</strong> faster it can be built. The moving objects in an animati<strong>on</strong>sequence can usually be separated into sets of primitivesthat move toge<strong>the</strong>r. This is even more characteristic toscenes with rigid bodies, where <strong>the</strong> primitives of a higherlevelobject are static relative to each o<strong>the</strong>r. Rec<strong>on</strong>structing<strong>the</strong> <strong>kd</strong>-<strong>tree</strong> using <strong>the</strong> primitives would not take any advantageof this property. The <strong>kd</strong>-<strong>tree</strong> for <strong>the</strong> rigid objects can bebuilt in advance, but if <strong>the</strong> objects are rotated, <strong>the</strong> splittingplanes would not be axis-aligned any more, and such a structurecould not be used as a sub-<strong>tree</strong> of <strong>the</strong> dynamic <strong>kd</strong>-<strong>tree</strong>.The soluti<strong>on</strong> is pre-compute <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>s for <strong>the</strong> rigid objects,attach <strong>the</strong>m <strong>the</strong>se objects, and define <strong>the</strong> intersecti<strong>on</strong>test for an object as <strong>the</strong> ray shot in its <strong>kd</strong>-<strong>tree</strong>. If <strong>the</strong> objectsare translated, rotated or transformed in any o<strong>the</strong>r way,<strong>the</strong>n <strong>the</strong> ray must be transformed into <strong>the</strong> model space 5 ,inwhich <strong>the</strong> sub-<strong>kd</strong>-<strong>tree</strong> is axis-aligned. This way <strong>the</strong> dynamic<strong>tree</strong> will be built of a few rigid objects, and not many moreprimitives. The rec<strong>on</strong>structi<strong>on</strong> of <strong>the</strong> data structure betweenframes will be d<strong>on</strong>e in very little time, and traversal overheadbecause of <strong>the</strong> dual <strong>tree</strong> structure will be minimal.However, several questi<strong>on</strong>s arise. In order to build a <strong>kd</strong><strong>tree</strong>of transformed objects, <strong>the</strong> extremes al<strong>on</strong>g every axishave to be found. Computing a bounding box for a set ofpoints is straightforward but may be unacceptably expensivefor a large number of vertices. Fur<strong>the</strong>rmore, as an intersecti<strong>on</strong>test for such a high-level object is costly, a cheaper prefilteringwould be useful. Both problems are addressed by apre-computing a bounding object easy to transform. An ellipsoid,being a quadratic surface, is <strong>the</strong> most appropriate. If<strong>the</strong> smallest enclosing ellipsoid for <strong>the</strong> vertices of <strong>the</strong> objectis calculated, it can be transformed appropriately for everyframe. Its extremes may be used to determine <strong>the</strong> boundingbox, and an intersecti<strong>on</strong> test with a quadratic object can beused to filter a huge amount of n<strong>on</strong>-intersecting rays out. Thealgorithm used to determine <strong>the</strong> smallest enclosing ellipsoidis based <strong>on</strong> linear programming 7 and runs in Ç´nµ time 1 .5.2. Synchr<strong>on</strong>ous traversal of <strong>the</strong> dual <strong>tree</strong>We have menti<strong>on</strong>ed above that <strong>the</strong> traversal cost for two <strong>tree</strong>smay be <strong>the</strong> double of <strong>the</strong> cost for <strong>on</strong>e <strong>tree</strong> of twice as manyobjects. This is, however, a worst case assumpti<strong>on</strong>, and canbe avoided in several ways. First of all, <strong>the</strong> formerly describeduse of compound rigid objects will decrease <strong>the</strong> sizeand traversal cost of <strong>the</strong> dynamic <strong>tree</strong>. Sec<strong>on</strong>dly, it is obviousthat if we have found an intersecti<strong>on</strong> in <strong>the</strong> dynamic <strong>tree</strong>,<strong>the</strong> search in <strong>the</strong> static <strong>tree</strong> may be limited to <strong>the</strong> segment of<strong>the</strong> ray between <strong>the</strong> origin and <strong>the</strong> intersecti<strong>on</strong> point. Thatis, we do not test in areas occluded by dynamic objects. Thissimple modificati<strong>on</strong> will result traversal times very close to<strong>the</strong> <strong>on</strong>e <strong>tree</strong> case, especially if <strong>the</strong> dynamic objects are rarelyoccluded by static <strong>on</strong>es. However, it is not always possibleto identify rigid objects, and <strong>the</strong> visibility relati<strong>on</strong> between<strong>the</strong> dynamic and static objects may not be so determined forsome animati<strong>on</strong> sequences. Therefore, we introduce a costeffectivetraversal algorithm for multiple overlapping <strong>kd</strong><strong>tree</strong>s,especially useful if a large number of independentlymoving primitives are stored in <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>.Basically, a <strong>the</strong> cell boundaries of a <strong>kd</strong>-<strong>tree</strong> separate atraversing ray into segments. A traversal algorithm will identifythose segments, and will compute intersecti<strong>on</strong> tests <strong>on</strong>every segment in order. If <strong>the</strong> objects are stored in multiple<strong>kd</strong>-<strong>tree</strong>, multiple segmentati<strong>on</strong>s exist. The task is to find an


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>optimal order of <strong>the</strong> segments, so that no segment fur<strong>the</strong>rthan <strong>the</strong> first valid intersecti<strong>on</strong> is examined. That means, ifany point of segment A is nearer to <strong>the</strong> origin of <strong>the</strong> ray thanany point of segment B, <strong>the</strong>n A must precede B in <strong>the</strong> traversalorder. A known recursive algorithm, described in detailby Havran 2 , is extended <strong>the</strong> following way:1. Set up a search interval for every <strong>tree</strong> as <strong>the</strong> entire ray.2. Choose that ’n<strong>on</strong>-terminated’ <strong>tree</strong>, for which <strong>the</strong> minimumpoint of <strong>the</strong> search interval is <strong>the</strong> nearest to <strong>the</strong> origin.3. Traverse <strong>the</strong> chosen <strong>tree</strong> using <strong>the</strong> recursive algorithm. Aseparate traversal stack and a current node identifier hasto be maintained for every <strong>tree</strong>. C<strong>on</strong>tinue until a leaf isreached.4. If a leaf is being processed, test for intersecti<strong>on</strong>s, and update<strong>the</strong> global closest intersecti<strong>on</strong> found if necessary. Set<strong>the</strong> search interval to <strong>the</strong> segment of <strong>the</strong> ray intersected by<strong>the</strong> volume of <strong>the</strong> next node to be processed according to<strong>the</strong> recursive traversal algorithm. If <strong>the</strong> traversal stack isempty, or a valid intersecti<strong>on</strong> was found, mark <strong>the</strong> <strong>tree</strong> as’terminated’.5. If a valid intersecti<strong>on</strong> was already found, and <strong>the</strong> searchinterval for every <strong>tree</strong> is entirely fur<strong>the</strong>r <strong>the</strong>n <strong>the</strong> closestintersecti<strong>on</strong>, terminate, and return <strong>the</strong> found intersecti<strong>on</strong>.6. If all <strong>the</strong> <strong>tree</strong>s are marked ’terminated’, <strong>the</strong>re was no intersecti<strong>on</strong>with <strong>the</strong> ray in ei<strong>the</strong>r of <strong>the</strong> <strong>tree</strong>s, return withouta result.7. C<strong>on</strong>tinue with step 2.opposite case <strong>the</strong> same amount of tests are carried out. However,we have to remark that <strong>the</strong>re is some overhead becauseof some additi<strong>on</strong>al administrati<strong>on</strong> and weaker cache coherence,a result of handling more <strong>kd</strong>-<strong>tree</strong>s simultaneously.5.3. ResultsScenes have been divided into a static and dynamic part totest <strong>the</strong> algorithm. Three cases were examined:One <strong>tree</strong>: All <strong>the</strong> patches, static or dynamic, are stored in acomm<strong>on</strong> <strong>kd</strong>-<strong>tree</strong>.Sequential: Static and dynamic patches are stored in separate<strong>tree</strong>s. When calculating a ray-scene intersecti<strong>on</strong>, firstwe traverse <strong>on</strong>ly <strong>the</strong> dynamic <strong>tree</strong>. Thereafter <strong>the</strong> static<strong>tree</strong> is tested, but <strong>on</strong>ly <strong>on</strong> <strong>the</strong> ray segment between <strong>the</strong>origin and <strong>the</strong> intersecti<strong>on</strong> point in <strong>the</strong> dynamic <strong>tree</strong>.Parallel: Static and dynamic patches are stored in separate<strong>tree</strong>s. The parallel traversal algorithm is used for raysceneintersecti<strong>on</strong>s.0 0001 23 045Figure 4: The parallel traversal of two <strong>kd</strong>-<strong>tree</strong>s partiti<strong>on</strong>ing<strong>the</strong> same space, c<strong>on</strong>taining different sets of objects. The cellsare numbered to indicate <strong>the</strong> order of <strong>the</strong>ir processing. N<strong>on</strong>processedcells al<strong>on</strong>g <strong>the</strong> ray are marked with 0.Figure 5: One of <strong>the</strong> test scenes. The two standing chickensare c<strong>on</strong>sidered static, <strong>the</strong> o<strong>the</strong>r two, those over <strong>the</strong> ground,are dynamic.Compared to <strong>the</strong> sequential soluti<strong>on</strong>, where <strong>the</strong> <strong>tree</strong>s aretraversed after <strong>on</strong>e ano<strong>the</strong>r, <strong>on</strong> <strong>the</strong> interval limited by previouslyfound intersecti<strong>on</strong>s, we spare <strong>the</strong> traversal of <strong>the</strong> raysegments between <strong>the</strong> nearest intersecti<strong>on</strong> and those fur<strong>the</strong>rintersecti<strong>on</strong>s, that were to be found in previously traversed<strong>tree</strong>s. Speaking about <strong>the</strong> dual <strong>tree</strong> structure, we have twoopti<strong>on</strong>s: traverse <strong>the</strong> dynamic <strong>tree</strong>, and <strong>the</strong>n <strong>the</strong> static <strong>tree</strong>, ordo it in parallel. If <strong>the</strong> nearest intersecti<strong>on</strong> is in <strong>the</strong> static <strong>tree</strong>,<strong>the</strong>n <strong>the</strong> parallel algorithm will not investigate <strong>the</strong> segmentbetween <strong>the</strong> dynamic and static intersecti<strong>on</strong> points. In <strong>the</strong>The tests were run with two different <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>structi<strong>on</strong>routines. We rendered an image using Bi-directi<strong>on</strong>al PathTracing 4 . We used a test scene of large static and dynamicobjects with a high primitive count, which simulates a frameof an animati<strong>on</strong> sequence well, we believe. In <strong>the</strong> o<strong>the</strong>r testscene, both <strong>the</strong> static and dynamic patches were generated<strong>on</strong> random. With <strong>the</strong> first versi<strong>on</strong>, we obtained satisfying resultsshowing that <strong>the</strong> parallel traversal is faster. Executi<strong>on</strong>times are specified in sec<strong>on</strong>ds:


László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>Scene One <strong>tree</strong> Sequential ParallelChickens 94.97 132.26 124.46Random triangles 60.69 81.12 74.79However, when we switched to <strong>the</strong> sec<strong>on</strong>d c<strong>on</strong>structi<strong>on</strong>routine, which provided better traversal times, <strong>the</strong> resultswere not so c<strong>on</strong>vincing, showing that <strong>the</strong> parallel traversalmay be inferior to <strong>the</strong> sequential <strong>on</strong>e. We examined <strong>the</strong> situati<strong>on</strong>,and found, that <strong>the</strong> administrative overhead of <strong>the</strong> algorithmwas compensated in <strong>the</strong> first case, but not in <strong>the</strong> sec<strong>on</strong>d,as <strong>the</strong> proporti<strong>on</strong>ally equivalent speed-up of <strong>the</strong> fasteralgorithm meant less time reducti<strong>on</strong>. However, after optimisati<strong>on</strong>of <strong>the</strong> parallel traversal, <strong>the</strong> overhead was successfullydecreased:Scene One <strong>tree</strong> Sequential ParallelChickens 73.82 99.58 90.85Random triangles 49.38 55.03 53.33C<strong>on</strong>sequently, we may state that <strong>the</strong> relative performanceof <strong>the</strong> traversal algorithms depends str<strong>on</strong>gly <strong>on</strong> <strong>the</strong> implementati<strong>on</strong>.Our measurements show, that with proper optimisati<strong>on</strong>,<strong>the</strong> parallel method is more effective.2. Vlastimil Havran. Heuristic Ray Shooting Algorithms.PhD <strong>the</strong>sis, Czech Technical University, Prague, 2000.3. Szirmay-Kalos, Havran, Benedek, and Szécsi. On <strong>the</strong>efficiency of ray-shooting accelerati<strong>on</strong> schemes. SCCG2002, 2002.4. L. Szirmay-Kalos. Számítógépes grafika. Computer-Books, Budapest, 1999.5. L. Szirmay-Kalos (editor). Theory of Three Dimensi<strong>on</strong>alComputer Graphics. Akadémia Kiadó, Budapest,1995. http://www.iit.bme.hu/˜szirmay.6. Ingo Wald, Thomas Kollig, Carsten Benthin, AlexanderKeller, and Philipp Slusallek. Interactive globalilluminati<strong>on</strong>. Technical report, Computer GraphicsGroup, Saarland University, 2002. available athttp://www.openrt.de/Publicati<strong>on</strong>s.7. Emo Welzl. Smallest enclosing disks (balls end ellipsiods).New Results and New Trends in Computer Science(H. Maurer, ed.), pages 359–370, 1991.6. C<strong>on</strong>clusi<strong>on</strong>sWe have presented three improvements <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>cept.We have analysed <strong>the</strong> linear cost functi<strong>on</strong> and drafteda possible way to develop a more accurate <strong>on</strong>e. The c<strong>on</strong>ceptis far from complete, fur<strong>the</strong>r research would be necessary tofind a really scene-independent soluti<strong>on</strong> to find <strong>the</strong> approximatecost functi<strong>on</strong> without actually building <strong>the</strong> <strong>tree</strong>.We have described a memory representati<strong>on</strong> for <strong>the</strong> <strong>kd</strong><strong>tree</strong>that uses pre-allocated memory, and a minimal numberof pointers. We have also managed to give an upper boundfor <strong>the</strong> size of <strong>the</strong> needed memory, which is tight enoughfor practical applicati<strong>on</strong>. Although memory representati<strong>on</strong>is closely related to implementati<strong>on</strong> issues and cache architecture,<strong>the</strong> c<strong>on</strong>cept generally allows for little storage space,a low number of pointers, and high cache coherence.We have also examined <strong>the</strong> overhead resulting from <strong>the</strong>separati<strong>on</strong> of scene objects into two <strong>kd</strong>-<strong>tree</strong>s. As we haveshown, this can be reduced using <strong>the</strong> parallel traversal algorithm,provided that <strong>the</strong> administrati<strong>on</strong> overhead and cachecoherence loss is kept low. Fur<strong>the</strong>r research may investigate<strong>the</strong> possibility of distributed computing of <strong>the</strong> traversal. Itis also an important issue to develop methods for globallyilluminated animati<strong>on</strong>, that make use of frame-to-frame coherence,and <strong>the</strong> separati<strong>on</strong> of dynamic objects. Then, <strong>the</strong>positive and negative effects <strong>on</strong> <strong>on</strong>e-time and every-frametraversal times should be evaluated.References1. Bernd Gärtner and Sven Schönherr. Smallest enclosingellipses - fast and exact. 1997.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!