Improvements on the kd-tree

More documents

Recommendations

Info

László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-treeShould n be smaller than n 0 , which is possible as n 0 is anaverage value, we stick to the linear estimate:f ´nµ n if n n 0´n n 0 µ 1·log 2 q · n 0 if n n 0(6)3.3. ResultsAlthough Havran pointed out the difficulties at constructinga general cost function, his investigations have also shownthat for a specific scene, the time of traversal as a functionof the number of patches usually has little deviance fromthe logarithmic curve. Therefore, it is expected that if thebest coefficients are found, the rendering time for a scenemay be decreased. However, as explained above, the linearestimate performs over expectances, so only little speed-upwill occur.We have summarised run times for Bi-directional PathTracing 4 image computation for several scenes. Time isgiven in seconds everywhere.Scene HouseNumber of patches 24737Measured n 0 3.2Linear cost run-time 65.42n 0 q 09 q 095 q 0984 61.51 57.12 57.406 68.16 64.48 65.128 70.08 64.32 64.89For other scenes with fewer objects we obtained less significantresults. It is also to be remarked that the time takenby the BPT algorithm may also be influenced by factors likethe actual paging coherence. Obviously, there is little improvement,and the method we used to determine q is stillnot general enough to be used for every scene. Firstly, thisis due to the fact that the formula for traversal time given inour previous article 3 is only true for a scene large enough.Secondly, the concept of the cost reduction caused by acut may be better, than the linear one, but still not perfect.Havran’s measurements would rather suggest a logarithmiccurve. Therefore, it would be worth testing a few other formsof cost functions. However, the use of the n 0 value to characterisethe object distribution, or tessellation of the scenewould probably be a valuable concept.4. Tree representation, memory usage and cachecoherence4.1. Previous workThe single most important objective is to assure fast traversal.First, this means that we have to be able to find the childrenof a node quickly, and second, we have to retrieve thedata from memory fast. In order to achieve this, we shouldhave the most possible amount of data corresponding tonearby cells in the cache. That means we have to use theminimum amount of storage space per node, store nearbynodes next to each other, and still be able to find the childrenfor a node fast enough.A powerful solution to store a complete tree is the compactrepresentation, where every node is an element of anarray, and the start of the children-array can be found bymultiplying the parent index with the number of childrenper node. This representation does not use any pointers, andtherefore it needs only the minimum amount of memory, butwe need to know the number of nodes in advance. Furthermore,for an unbalanced tree we need a compact structure asdeep as the deepest branch of the tree, and a large part of thearray will be empty, causing a tremendous waste of memory.Further on, we will call this phenomenon fragmentation.Another approach is to have separately allocated nodesand use pointers to find children. Two pointers per node maymean multiple memory need, and locality is not automaticallyassured. The optimal solution has to use some pointersto account for the problems with the naïve compact representation,while keeping its advantages.Such a mid-way solution is proposed by Havran 2 . It usessmall compact trees, connected by pointers. This limits fragmentationdue to the non-balancedness to the sub-trees, andif their memory representation fits into a cache-line, cachecoherence is utilised well (See Figure 1). However, a largenumber of pointers are still used, and dynamic allocation ofsub-trees is assumed.4.2. kd-tree in pre-allocated memoryThe concept of compact sub-trees may be developed further.More pointers may be eliminated, if such a sub-tree isconsidered a node of another compact tree, where the nodeshave twice as many children as number of nodes on the lastlevel of the sub-tree. These super-nodes do not have to beconnected by pointers, as they also can be stored in a compactstructure. This way, pointers are completely eliminated,cache coherence is assured, but the fragmentation is just asa costly issue as in case of the simple compact tree.The problem is that we have to store an unbalanced tree.If the super-nodes are connected by pointers, as Havran suggests,then the fragmentation will be limited to the sub-trees.However, the compact solution can also be improved to handleunbalanced trees. Whenever a branch terminates beforethe depth of the pre-allocated memory, the would-be childrenof the leaves become roots of free-space trees. These"holes" can later be used to contain the parts of the tree thatwould stretch over the pre-allocated space. This means thatonly the nodes on the very last level need to have pointers,actually pointing somewhere back into the array. This mappingis demonstrated in Figure 2. It is of course possible anddesirable to use the cache-line sized sub-trees mapping (SeeFigure 1.) to store the resulting balanced tree for better cacheperformance.
László Szécsi and Balázs Benedek / <strong>Improvements</strong> on the kd-tree89 1011 12 13 1412 34 5 6 71516 1718 19 20 21Figure 1: Mapping of a tree into an array using cache-linesizedsub-trees.need to have a good estimate of the number of the nodesto be able to allocate memory in advance. Fortunately, weknow that a kd-tree uses 6n splitting planes at most. Thisalso means a maximum of 6n leaves. Adding the worst-casenumber of pointers, which is exactly the number of nodes onthe last level, we conclude that an array with 24n elementssuffices.A node itself has to be as tiny as possible. The above structureassumes, that the description of a splitting plane for anon-leaf node, a pointer to the list of objects for leaves, anda pointer to another node for a redirect node all fit into anelement of the array. As the plane is described by a not necessarilyprecise floating-point number, all these only take upa few bytes. We need one extra bit to distinguish betweenleaves and non-leaf nodes.Figure 2: Mapping of an unbalanced tree into an array usingpointers on the last level. Darkened nodes are leaves.The number of these pointers could further be decimatedif we make use of the fact that leaves on the last level donot have children. That way, a node on the last level iseither a leaf, or a reference to the actual position of thechild node. This representation allows the mapping of a nonbalancedtree and with all the needed pointers into a single,pre-allocable array. This structure, also compatible with thecache-line mapping, is depicted in Figure 3. Naturally, weFigure 3: kd-tree using the minimal number of pointers.Dark nodes are leaves, hatched nodes are unused.4.3. Estimation of the number of necessary pointersThe above figure for the memory need can further be decreased,if we do not account for the worst case, and usesomewhat less memory. Would the tree exceed its predefinednode count, we will have to terminate the build. Ascompromising the tree construction algorithm will prove tobe very costly during traversal, a secure size should be chosen,with practically zero chance of overflow. However, if wemake use of the fact that no pointers are needed for leaves,a lower figure for the number of pointers may be found. Obviously,if the lowest level of the tree would only containpointers, as the previously given upper bound suggests, everysingle node above would be referenced. This is impossiblebecause of two reasons: the nodes that belong to thetree originating from the root are not referenced, and, moresignificantly, leaves are never referenced.To derive an exact number let us introduce the followingnomenclature. Let X be the number of pointers on the lastlevel, and N the number of pre-allocated nodes. If the arrayis not full, the number of pointers is irrelevant. We are onlyinterested in the case, when every node is used as a cut, a leafor a pointer. Therefore, the number of leaves L, the numberof cuts C, and the number of pointers X add up to the size ofthe array.L ·C · X N (7)The number of leaves and cuts are equal.2C · X N (8)Pointers may only reference cut nodes, and no node can bereferenced more than once:C X (9)Substituting this back, we get:3X N (10)Therefore, it is not half of the nodes needed for pointers inthe worst case, only one third. This way, the upper bound for
Page 1 and 2: Improvements on th
Page 3: László Szécsi and Balázs Benede
Page 7 and 8: László Szécsi and Balázs Benede

Improvements on the kd-tree

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?