Improvements on the kd-tree

<strong>Improvements</strong> on the kd-treeLászló Szécsi and Balázs BenedekDepartment of Control Engineering and Information TheoryBudapest University of Technology and Economics, Budapest, HungaryAbstract<strong>Improvements</strong> of the kd-tree structure for ray shooting are discussed, with special respect to animation. The extremelyeffective cost model is analysed. A memory representation using very few pointers is described and examined.Acceleration possibilities using frame-to-frame coherence are listed for different types of animating objects,and an algorithm for parallel tracing of multiple kd-trees is presented.Categories and Subject Descriptors (according to ACM CCS): I.3.6 [Computer Graphics]: Graphics data structuresand data types1. IntroductionThe kd-tree structure is one of the most potent accelerationschemes for global illumination algorithms. Such trees havebeen used and thoroughly studied for a long time, and recentlya method providing fine results has been found to estimatethe time for the ray traversal, useful for building a nearoptimaltree. Furthermore, it is a driving aim to find fasteralgorithms for global illumination image synthesis, and reducethe time cost for the existing ones. In order to achievethat, a straightforward effort is to pre-process the scene insome way, making fast image creation possible at the cost ofa one-time larger cost calculation. This approach is promisingif multiple images of the same scene are needed, typicallyin case of animation. Although presently availablespeed of these algorithms is, despite recent advances 6 , stillinsufficient, interactive animation could be a long-term objective.Naturally, we would have to calculate the image forevery frame of animation using the information from preprocessingand previous frames to the maximum extent possible.As the construction of accelerating data structures, includingthe kd-tree, is expensive, and any alteration of thescene implies the change of the kd-tree, it is critical to constructand re-build the tree rapidly without increasing thecost of the ray traversal. Furthermore, the data structureshould aid to reveal and re-compute the differences betweenconsequent frames of animation.First, we investigate the algorithms for kd-tree constructionand ray traversal. We elaborate on the linear cost function,and show why it achieves near-optimal results seeminglyin contradiction with its inaccuracy. We analyse thememory need and presumable cache-coherence of possiblekd-tree representations, with special emphasis on using preallocatedmemory segments instead of dynamical allocation.In order to calculate the animation frames fast, it is notworth rebuilding the unchanged parts of the kd-tree. Therefore,static and moving objects should be stored in separatetrees. We describe the operation of this dual tree architecture,and compare the overhead and acceleration gains. We willelaborate on questions arising about the dynamic tree that isbuilt for every frame. If the animation sequence consists ofcomplex object hierarchies moving together, the subdivisionof these rigid objects should not be repeated for every frame.We describe the method of transformed sub-trees, and measuretransformation overhead and the decrease of the timetaken by kd-tree construction.We also investigate issues about the traversal of the dualkd-tree structure. We introduce an algorithm for the paralleltraversal of multiple trees. We compare its speed to severalother methods, both for scenes with or without rigid objects.2. Previous Work2.1. The role of the kd-tree in global illuminationalgorithmsMost image synthesis procedures modelling physical lighttransport make extensive use of the calculation of ray-sceneintersections. The naïve implementation, determining the

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treehits for every primitive is unacceptably slow, in contrastto the results achieved by space subdivision. In the lattercase, we only need to traverse cells along the ray and onlycompute intersections for promising candidates. Best resultsamong the spatial subdivision schemes are delivered by theBSP and kd-trees. The kd-tree we use in this article is a binary,non-balanced, spatial subdivision data structure, withaxis-aligned cutting planes associated to its non-leaf nodes,and subsets of scene objects stored in the leaf nodes.The power of the structure lies in its flexibility. Cuttingplanes can be positioned depending on the location of thescene objects, so at the cost of some calculation the solutionresulting in an optimal traversal time can be chosen. Thecutting planes being axis aligned is a minor limitation, asarbitrarily positioned planes may produce a better tree, butfinding the optimum would be less effective. Furthermore,storing the data describing the cutting planes requires lessmemory space, and it is far easier to compute the ray-planeintersection.2.2. Traversal along a rayDuring the image synthesis a large number of ray-scene intersectionshave to be computed. Compared to the one-timeconstruction of the tree this means such a difference of scale,that it is worth taking every cost just to speed up traversal inmost of the cases.The sequential ray traversal algorithm is based on the spatialproximity search using the kd-tree. First we take the originof the ray, and locate the cell containing it by walkingdown the tree from its root. Within the cell found, we carryout all intersection tests with the objects belonging to thecell. If no intersection within the cell was found, we proceedto the next cell. In order to find it, we use the same methodas before. We calculate the point where the ray leaves thecell, which is exactly where it enters the next. We translateit a tiny bit further along the ray to resolve ambiguity, andrepeat the whole process using the spatial proximity searchwith this next point. We have to remark that the algorithmmay skip cells of extremely little or zero width. Althoughthese may seem useless at the first sight, they can actuallyrightfully appear in kd-trees for scenes where there are numerousaxis-aligned polygons. This may be the case with geometricalscenes, typically boxes and rooms. Another drawbackof this algorithm is that it starts from the root of thetree for every new cell though it is very probable that twocells following each other are near each other in the structure.Therefore one node could be visited many times.The recursive ray traversal algorithm eliminates the maindrawbacks of the sequential ray traversal algorithm and visitsevery node and leaf just only once 2 . We check if the rayintersects the volumes corresponding to the left and rightsub-trees. The sub-trees are traversed in the very same way,if necessary, starting with the one nearer to the origin. Toterminate the recursion the leaves of the tree are handled inthe same manner as above. The implementation of the algorithmneeds a traversal stack to store data about the sub-treesneeded to be processed later.Whichever algorithm we use, we will walk through theleaf cells along the ray, and test possible intersections forthe segment inside the cell. If intersections were found, theclosest is taken, else the ray has to be followed on. Consequently,the objective is to have minimal number of objectsin a cell, and if a ray intersects a cell, it should, with highprobability, also intersect an object within. This, pushed toits extremes, it accomplished when all objects are delimitedby six fitting cutting planes. However, if the bounding boxesof the objects overlap, like in most scenes, then such cutsmay intersects several objects, adding them to both child volumes,resulting in superfluously large list in the leaves, andworse-than-optimal traversal time.2.3. Constructing a kd-tree and possibledecision-making heuristicsThe tree can be built in a recursive way. Processing a volumeinvolves the choice and storage of the cutting plane, andthe processing of the two new sub-volumes. The decision tomake is where to place the cutting plane, and if it is worthsubdividing the volume at all. This may be based on someheuristic scheme, or an estimation of the resulting traversalcost.The first, most obvious method is to cut the volume intotwo equal halves, using the spatial median, similarly to theoctree approach where we care little about the position ofthe objects when subdividing a volume. The resulting treewill of course not be balanced, and it is easy to construct ascene where this method comes near to useless. Similarly tothe octree, spatial median subdivision performs well in caseof evenly distributed objects.Another simple and more promising approach is to makeboth sub-volumes contain the same number of objects. Theposition with this property is called the object median. Tofind it, we have to do a ’select and partition’ median search.This can be considered a modified version of the ’quick sort’algorithm that only sorts the partition containing the halvingelement of the array. This simpler procedure will also separatethe array into elements smaller and greater than themedian, and outperforms ’quick sort’. As the resulting treewould be balanced, its representation could be simple andcompact. Furthermore, a balanced kd-tree can be consideredto be optimal for several tasks, such as proximity search.However, in ray casting, we do not only need to find an object,but to follow a ray through several cells intersected.Therefore, the probability of a sub-volume being hit by aray plays an important role in the expected time cost of therendering algorithm. The object median method disregardsthat aspect. The unfortunate consequence for the optimal tree

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treeis, we have to discard the concept of balancedness, and willhave to find the means to store a non-balanced tree in a compactway.Although simple cut heuristics produce inferior traversaltimes, fast construction and compact data structure are advantages.Therefore, they may have some relevance if thestructure is to be built real-time, despite the fact that in globallyilluminated animation the traversal cost tends to be thebottleneck. As the tree construction time rapidly increaseswith the number of objects, but the traversal time for sceneslarge enough is constant, it is not to exclude, that the situationmay change, especially in the case of very high polygonnumber, vertex-based animations. The compact memoryrepresentation used for the balanced tree is definitely to beused somehow in the more sophisticated methods.3. Improvement of the cost function3.1. Previous workA way to find the optimal cut is to consider all reasonablecuts, including cutting off empty space and termination ofthe build, and choose the one that produces the shortest expectedtraversal time. To achieve this we need estimate thattime. Havran proposed the following function, linear withrespect to the number of objects in the sub-volumes:C 1V SA´V SA´le ftChild´V µµŃ L · N SP µ·µSA´rightChild´V µµŃ SP · N R µ℄ (1)Where C V is the cost corresponding to volume V, SA´V µ isthe surface area of volume V, and N L , N R , N SP are the numberof objects completely in the left and right sub-volumes,and the number of objects intersected by the splitting plane,respectively.This means that the expected time for the traversal of avolume is the time needed to carry out the naïve intersectiontest for all objects, multiplied by the probability of a rayhitting the volume. This probability, considering that the volumesare convex, equals the ratio of the surface areas. Obviously,the estimate given by this function does not equal theactual time cost, as the created volumes will be subdividedfurther, and not handled with the naïve algorithm. Havranalso identified this problem and proposed some ideas for thesolution. He stated that the optimal cost function depends onthe distribution of the objects in the actual scene to a greatextent, and thus for a better estimate the cost must be measuredin some way. Although it is possible to build the treeand compute the cost precisely, doing this every time thefunction should be evaluated would lead to computationalexplosion of the construction algorithm. Therefore, in orderto obtain a more effective function, the scene should be characterisedby values that are easily determined, and influencethe cost function.3.2. Non-linear cost estimateIn a recent article we have shown that for scenes with largenumber of random objects, kd-tree traversal is done in constanttime. How can this be brought into consonance with thelinear estimation? How can Havran’s method provide outstandingresults despite this contradiction? If we are low inthe tree, near the leaves, and it is true that the sub-volumeswill go through little to no further subdivision, than the linearestimation is of course perfect. On the other hand, if weare near the root of the tree, meaning that the constant timetraversal statement hold for the sub-trees, then the expectedtraversal time is independent of the cut. Therefore, if the linearestimate would fail, then where we cut is not so importantat all. However, it is possible to construct a more accuratecost estimate, if we are able to account for the gain fromseparating the elements and cutting off empty space. To calculatethat exactly would be hopelessly expensive, but bysimply changing the linear function to a bit more fitting one,we may eliminate some of the inaccuracy on higher levels ofthe tree. It is of course imperative to keep the linearity in thelower regions where it works perfectly. Let us suppose thata cut improves the time by a factor of q 1 on average, andthat a cell containing n 0 elements is not worth dividing anymore. Actually, that means that a cell may contain n 0 objectson average. Using that the cost of traversal, without the adjustmentfor the probability of the volume being hit is givenin the following equation. This function is to be applied tothe number of objects in the sub-volumes in 1:f ńµ n¡q log 2ń n 0µ(2)The value of n 0 is relatively easy to find, and will be determinedby the primitive geometry. The value q, however, isquite an abstraction. It includes both cuts between objectsand cutting off empty space. Actually, it corresponds moreto the subdivision potential of the volume than to the obscureconcept of cost reduction achieved by a single cut. Still, it isnot harmful to overestimate both n 0 and q, as that will get usnearer to the original linear estimate. Therefore, the formulafor the expected number of intersection tests introduced inour previous article 3 can be applied to determine a probableupper bound for the traversal cost of the tree that is beingbuilt, providing a value for q. Naturally, significantly betterresults are only expected for large scenes with high primitivecount, as the linear function is less accurate, and the guessfor q is better in those cases. The previous equation can furtherbe written as:f ńµ n¡ń n 0 µ log 2 q (3)f ńµ ń n 0 µ 1·log 2 q · n 0 ¡ ń n 0 µ log2q (4)As ń n 0 µ log2q 1, the cost may be over-estimated as:f ńµ n¡ń n 0 µ log 2 q ń n 0 µ 1·log 2 q · n 0 (5)

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treeShould n be smaller than n 0 , which is possible as n 0 is anaverage value, we stick to the linear estimate:f ńµ n if n n 0ń n 0 µ 1·log 2 q · n 0 if n n 0(6)3.3. ResultsAlthough Havran pointed out the difficulties at constructinga general cost function, his investigations have also shownthat for a specific scene, the time of traversal as a functionof the number of patches usually has little deviance fromthe logarithmic curve. Therefore, it is expected that if thebest coefficients are found, the rendering time for a scenemay be decreased. However, as explained above, the linearestimate performs over expectances, so only little speed-upwill occur.We have summarised run times for Bi-directional PathTracing 4 image computation for several scenes. Time isgiven in seconds everywhere.Scene HouseNumber of patches 24737Measured n 0 3.2Linear cost run-time 65.42n 0 q 09 q 095 q 0984 61.51 57.12 57.406 68.16 64.48 65.128 70.08 64.32 64.89For other scenes with fewer objects we obtained less significantresults. It is also to be remarked that the time takenby the BPT algorithm may also be influenced by factors likethe actual paging coherence. Obviously, there is little improvement,and the method we used to determine q is stillnot general enough to be used for every scene. Firstly, thisis due to the fact that the formula for traversal time given inour previous article 3 is only true for a scene large enough.Secondly, the concept of the cost reduction caused by acut may be better, than the linear one, but still not perfect.Havran’s measurements would rather suggest a logarithmiccurve. Therefore, it would be worth testing a few other formsof cost functions. However, the use of the n 0 value to characterisethe object distribution, or tessellation of the scenewould probably be a valuable concept.4. Tree representation, memory usage and cachecoherence4.1. Previous workThe single most important objective is to assure fast traversal.First, this means that we have to be able to find the childrenof a node quickly, and second, we have to retrieve thedata from memory fast. In order to achieve this, we shouldhave the most possible amount of data corresponding tonearby cells in the cache. That means we have to use theminimum amount of storage space per node, store nearbynodes next to each other, and still be able to find the childrenfor a node fast enough.A powerful solution to store a complete tree is the compactrepresentation, where every node is an element of anarray, and the start of the children-array can be found bymultiplying the parent index with the number of childrenper node. This representation does not use any pointers, andtherefore it needs only the minimum amount of memory, butwe need to know the number of nodes in advance. Furthermore,for an unbalanced tree we need a compact structure asdeep as the deepest branch of the tree, and a large part of thearray will be empty, causing a tremendous waste of memory.Further on, we will call this phenomenon fragmentation.Another approach is to have separately allocated nodesand use pointers to find children. Two pointers per node maymean multiple memory need, and locality is not automaticallyassured. The optimal solution has to use some pointersto account for the problems with the naïve compact representation,while keeping its advantages.Such a mid-way solution is proposed by Havran 2 . It usessmall compact trees, connected by pointers. This limits fragmentationdue to the non-balancedness to the sub-trees, andif their memory representation fits into a cache-line, cachecoherence is utilised well (See Figure 1). However, a largenumber of pointers are still used, and dynamic allocation ofsub-trees is assumed.4.2. kd-tree in pre-allocated memoryThe concept of compact sub-trees may be developed further.More pointers may be eliminated, if such a sub-tree isconsidered a node of another compact tree, where the nodeshave twice as many children as number of nodes on the lastlevel of the sub-tree. These super-nodes do not have to beconnected by pointers, as they also can be stored in a compactstructure. This way, pointers are completely eliminated,cache coherence is assured, but the fragmentation is just asa costly issue as in case of the simple compact tree.The problem is that we have to store an unbalanced tree.If the super-nodes are connected by pointers, as Havran suggests,then the fragmentation will be limited to the sub-trees.However, the compact solution can also be improved to handleunbalanced trees. Whenever a branch terminates beforethe depth of the pre-allocated memory, the would-be childrenof the leaves become roots of free-space trees. These"holes" can later be used to contain the parts of the tree thatwould stretch over the pre-allocated space. This means thatonly the nodes on the very last level need to have pointers,actually pointing somewhere back into the array. This mappingis demonstrated in Figure 2. It is of course possible anddesirable to use the cache-line sized sub-trees mapping (SeeFigure 1.) to store the resulting balanced tree for better cacheperformance.

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-tree89 1011 12 13 1412 34 5 6 71516 1718 19 20 21Figure 1: Mapping of a tree into an array using cache-linesizedsub-trees.need to have a good estimate of the number of the nodesto be able to allocate memory in advance. Fortunately, weknow that a kd-tree uses 6n splitting planes at most. Thisalso means a maximum of 6n leaves. Adding the worst-casenumber of pointers, which is exactly the number of nodes onthe last level, we conclude that an array with 24n elementssuffices.A node itself has to be as tiny as possible. The above structureassumes, that the description of a splitting plane for anon-leaf node, a pointer to the list of objects for leaves, anda pointer to another node for a redirect node all fit into anelement of the array. As the plane is described by a not necessarilyprecise floating-point number, all these only take upa few bytes. We need one extra bit to distinguish betweenleaves and non-leaf nodes.Figure 2: Mapping of an unbalanced tree into an array usingpointers on the last level. Darkened nodes are leaves.The number of these pointers could further be decimatedif we make use of the fact that leaves on the last level donot have children. That way, a node on the last level iseither a leaf, or a reference to the actual position of thechild node. This representation allows the mapping of a nonbalancedtree and with all the needed pointers into a single,pre-allocable array. This structure, also compatible with thecache-line mapping, is depicted in Figure 3. Naturally, weFigure 3: kd-tree using the minimal number of pointers.Dark nodes are leaves, hatched nodes are unused.4.3. Estimation of the number of necessary pointersThe above figure for the memory need can further be decreased,if we do not account for the worst case, and usesomewhat less memory. Would the tree exceed its predefinednode count, we will have to terminate the build. Ascompromising the tree construction algorithm will prove tobe very costly during traversal, a secure size should be chosen,with practically zero chance of overflow. However, if wemake use of the fact that no pointers are needed for leaves,a lower figure for the number of pointers may be found. Obviously,if the lowest level of the tree would only containpointers, as the previously given upper bound suggests, everysingle node above would be referenced. This is impossiblebecause of two reasons: the nodes that belong to thetree originating from the root are not referenced, and, moresignificantly, leaves are never referenced.To derive an exact number let us introduce the followingnomenclature. Let X be the number of pointers on the lastlevel, and N the number of pre-allocated nodes. If the arrayis not full, the number of pointers is irrelevant. We are onlyinterested in the case, when every node is used as a cut, a leafor a pointer. Therefore, the number of leaves L, the numberof cuts C, and the number of pointers X add up to the size ofthe array.L ·C · X N (7)The number of leaves and cuts are equal.2C · X N (8)Pointers may only reference cut nodes, and no node can bereferenced more than once:C X (9)Substituting this back, we get:3X N (10)Therefore, it is not half of the nodes needed for pointers inthe worst case, only one third. This way, the upper bound for

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treethe array is 18n, where 6n nodes are reserved for the cuts, theleaves and the pointers. Although this may still be a roughover-estimation for some simple scenes, going even lowermay involve disadvantageous effects. First of all, if the arrayis filled, the above mentioned termination could occur.Secondly, even if the array is not filled, a deeper tree willallow longer branches without using pointers. Therefore,over-allocation slightly increases compactness and traversalspeed. Setting the array to the size of 18n will allow for thestorage of worst-case kd-trees, and speed-effective representationof simple ones.In the following table the numbers of leaves and pointersfor various scenes are listed. Obviously, the number ofpointers remains below the worst-case bound. This representationdefinitely uses fewer pointers than the previous solutions,which should result in better cache coherence andfaster traversal.Scene Patches Nodes Leaves PointersCornell box 3968 9037 4296 444Beethoven 2636 23140 9883 3373Random 3515 39389 16981 5426Tea 10025 54392 23874 6643Chickens 16467 115455 49094 17266House 24737 156469 65540 253885. kd-trees for animation5.1. Separation of dynamic and static objectsRebuilding the whole kd-tree for every frame is obviouslyvery expensive and also superfluous. If the objects are classifiedas static objects staying at a fixed position during theanimation, and dynamic objects that may move, we can buildtwo different kd-trees. This may have various advantages.First of all, the time of the kd-tree construction is dramaticallyreduced. It also becomes possible to shoot rays onlyinto the dynamic kd-tree, thereby identifying changes of thescene along previous shooting or gathering paths. It is interestingto examine how the data structure could be helpfulat making use of frame-to-frame coherence. However, in thedual kd-tree structure traversal will be slower. Theoretically,if the both trees contain a large number of objects, the traversaltime would be independent of the size of the tree, thereforeseparating them could double the time cost. This wouldof course be unacceptable, and should be addressed.Obviously, the less objects are in the dynamic kd-tree,the faster it can be built. The moving objects in an animationsequence can usually be separated into sets of primitivesthat move together. This is even more characteristic toscenes with rigid bodies, where the primitives of a higherlevelobject are static relative to each other. Reconstructingthe kd-tree using the primitives would not take any advantageof this property. The kd-tree for the rigid objects can bebuilt in advance, but if the objects are rotated, the splittingplanes would not be axis-aligned any more, and such a structurecould not be used as a sub-tree of the dynamic kd-tree.The solution is pre-compute the kd-trees for the rigid objects,attach them these objects, and define the intersectiontest for an object as the ray shot in its kd-tree. If the objectsare translated, rotated or transformed in any other way,then the ray must be transformed into the model space 5 ,inwhich the sub-kd-tree is axis-aligned. This way the dynamictree will be built of a few rigid objects, and not many moreprimitives. The reconstruction of the data structure betweenframes will be done in very little time, and traversal overheadbecause of the dual tree structure will be minimal.However, several questions arise. In order to build a kdtreeof transformed objects, the extremes along every axishave to be found. Computing a bounding box for a set ofpoints is straightforward but may be unacceptably expensivefor a large number of vertices. Furthermore, as an intersectiontest for such a high-level object is costly, a cheaper prefilteringwould be useful. Both problems are addressed by apre-computing a bounding object easy to transform. An ellipsoid,being a quadratic surface, is the most appropriate. Ifthe smallest enclosing ellipsoid for the vertices of the objectis calculated, it can be transformed appropriately for everyframe. Its extremes may be used to determine the boundingbox, and an intersection test with a quadratic object can beused to filter a huge amount of non-intersecting rays out. Thealgorithm used to determine the smallest enclosing ellipsoidis based on linear programming 7 and runs in Çńµ time 1 .5.2. Synchronous traversal of the dual treeWe have mentioned above that the traversal cost for two treesmay be the double of the cost for one tree of twice as manyobjects. This is, however, a worst case assumption, and canbe avoided in several ways. First of all, the formerly describeduse of compound rigid objects will decrease the sizeand traversal cost of the dynamic tree. Secondly, it is obviousthat if we have found an intersection in the dynamic tree,the search in the static tree may be limited to the segment ofthe ray between the origin and the intersection point. Thatis, we do not test in areas occluded by dynamic objects. Thissimple modification will result traversal times very close tothe one tree case, especially if the dynamic objects are rarelyoccluded by static ones. However, it is not always possibleto identify rigid objects, and the visibility relation betweenthe dynamic and static objects may not be so determined forsome animation sequences. Therefore, we introduce a costeffectivetraversal algorithm for multiple overlapping kdtrees,especially useful if a large number of independentlymoving primitives are stored in the kd-tree.Basically, a the cell boundaries of a kd-tree separate atraversing ray into segments. A traversal algorithm will identifythose segments, and will compute intersection tests onevery segment in order. If the objects are stored in multiplekd-tree, multiple segmentations exist. The task is to find an

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treeoptimal order of the segments, so that no segment furtherthan the first valid intersection is examined. That means, ifany point of segment A is nearer to the origin of the ray thanany point of segment B, then A must precede B in the traversalorder. A known recursive algorithm, described in detailby Havran 2 , is extended the following way:1. Set up a search interval for every tree as the entire ray.2. Choose that ’non-terminated’ tree, for which the minimumpoint of the search interval is the nearest to the origin.3. Traverse the chosen tree using the recursive algorithm. Aseparate traversal stack and a current node identifier hasto be maintained for every tree. Continue until a leaf isreached.4. If a leaf is being processed, test for intersections, and updatethe global closest intersection found if necessary. Setthe search interval to the segment of the ray intersected bythe volume of the next node to be processed according tothe recursive traversal algorithm. If the traversal stack isempty, or a valid intersection was found, mark the tree as’terminated’.5. If a valid intersection was already found, and the searchinterval for every tree is entirely further then the closestintersection, terminate, and return the found intersection.6. If all the trees are marked ’terminated’, there was no intersectionwith the ray in either of the trees, return withouta result.7. Continue with step 2.opposite case the same amount of tests are carried out. However,we have to remark that there is some overhead becauseof some additional administration and weaker cache coherence,a result of handling more kd-trees simultaneously.5.3. ResultsScenes have been divided into a static and dynamic part totest the algorithm. Three cases were examined:One tree: All the patches, static or dynamic, are stored in acommon kd-tree.Sequential: Static and dynamic patches are stored in separatetrees. When calculating a ray-scene intersection, firstwe traverse only the dynamic tree. Thereafter the statictree is tested, but only on the ray segment between theorigin and the intersection point in the dynamic tree.Parallel: Static and dynamic patches are stored in separatetrees. The parallel traversal algorithm is used for raysceneintersections.0 0001 23 045Figure 4: The parallel traversal of two kd-trees partitioningthe same space, containing different sets of objects. The cellsare numbered to indicate the order of their processing. Nonprocessedcells along the ray are marked with 0.Figure 5: One of the test scenes. The two standing chickensare considered static, the other two, those over the ground,are dynamic.Compared to the sequential solution, where the trees aretraversed after one another, on the interval limited by previouslyfound intersections, we spare the traversal of the raysegments between the nearest intersection and those furtherintersections, that were to be found in previously traversedtrees. Speaking about the dual tree structure, we have twooptions: traverse the dynamic tree, and then the static tree, ordo it in parallel. If the nearest intersection is in the static tree,then the parallel algorithm will not investigate the segmentbetween the dynamic and static intersection points. In theThe tests were run with two different kd-tree constructionroutines. We rendered an image using Bi-directional PathTracing 4 . We used a test scene of large static and dynamicobjects with a high primitive count, which simulates a frameof an animation sequence well, we believe. In the other testscene, both the static and dynamic patches were generatedon random. With the first version, we obtained satisfying resultsshowing that the parallel traversal is faster. Executiontimes are specified in seconds:

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treeScene One tree Sequential ParallelChickens 94.97 132.26 124.46Random triangles 60.69 81.12 74.79However, when we switched to the second constructionroutine, which provided better traversal times, the resultswere not so convincing, showing that the parallel traversalmay be inferior to the sequential one. We examined the situation,and found, that the administrative overhead of the algorithmwas compensated in the first case, but not in the second,as the proportionally equivalent speed-up of the fasteralgorithm meant less time reduction. However, after optimisationof the parallel traversal, the overhead was successfullydecreased:Scene One tree Sequential ParallelChickens 73.82 99.58 90.85Random triangles 49.38 55.03 53.33Consequently, we may state that the relative performanceof the traversal algorithms depends strongly on the implementation.Our measurements show, that with proper optimisation,the parallel method is more effective.2. Vlastimil Havran. Heuristic Ray Shooting Algorithms.PhD thesis, Czech Technical University, Prague, 2000.3. Szirmay-Kalos, Havran, Benedek, and Szécsi. On theefficiency of ray-shooting acceleration schemes. SCCG2002, 2002.4. L. Szirmay-Kalos. Számítógépes grafika. Computer-Books, Budapest, 1999.5. L. Szirmay-Kalos (editor). Theory of Three DimensionalComputer Graphics. Akadémia Kiadó, Budapest,1995. http://www.iit.bme.hu/˜szirmay.6. Ingo Wald, Thomas Kollig, Carsten Benthin, AlexanderKeller, and Philipp Slusallek. Interactive globalillumination. Technical report, Computer GraphicsGroup, Saarland University, 2002. available athttp://www.openrt.de/Publications.7. Emo Welzl. Smallest enclosing disks (balls end ellipsiods).New Results and New Trends in Computer Science(H. Maurer, ed.), pages 359–370, 1991.6. ConclusionsWe have presented three improvements on the kd-tree concept.We have analysed the linear cost function and drafteda possible way to develop a more accurate one. The conceptis far from complete, further research would be necessary tofind a really scene-independent solution to find the approximatecost function without actually building the tree.We have described a memory representation for the kdtreethat uses pre-allocated memory, and a minimal numberof pointers. We have also managed to give an upper boundfor the size of the needed memory, which is tight enoughfor practical application. Although memory representationis closely related to implementation issues and cache architecture,the concept generally allows for little storage space,a low number of pointers, and high cache coherence.We have also examined the overhead resulting from theseparation of scene objects into two kd-trees. As we haveshown, this can be reduced using the parallel traversal algorithm,provided that the administration overhead and cachecoherence loss is kept low. Further research may investigatethe possibility of distributed computing of the traversal. Itis also an important issue to develop methods for globallyilluminated animation, that make use of frame-to-frame coherence,and the separation of dynamic objects. Then, thepositive and negative effects on one-time and every-frametraversal times should be evaluated.References1. Bernd Gärtner and Sven Schönherr. Smallest enclosingellipses - fast and exact. 1997.

Improvements on the kd-tree

Create successful ePaper yourself

Delete template?

Save as template?