László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>hits for every primitive is unacceptably slow, in c<strong>on</strong>trastto <strong>the</strong> results achieved by space subdivisi<strong>on</strong>. In <strong>the</strong> lattercase, we <strong>on</strong>ly need to traverse cells al<strong>on</strong>g <strong>the</strong> ray and <strong>on</strong>lycompute intersecti<strong>on</strong>s for promising candidates. Best resultsam<strong>on</strong>g <strong>the</strong> spatial subdivisi<strong>on</strong> schemes are delivered by <strong>the</strong>BSP and <strong>kd</strong>-<strong>tree</strong>s. The <strong>kd</strong>-<strong>tree</strong> we use in this article is a binary,n<strong>on</strong>-balanced, spatial subdivisi<strong>on</strong> data structure, withaxis-aligned cutting planes associated to its n<strong>on</strong>-leaf nodes,and subsets of scene objects stored in <strong>the</strong> leaf nodes.The power of <strong>the</strong> structure lies in its flexibility. Cuttingplanes can be positi<strong>on</strong>ed depending <strong>on</strong> <strong>the</strong> locati<strong>on</strong> of <strong>the</strong>scene objects, so at <strong>the</strong> cost of some calculati<strong>on</strong> <strong>the</strong> soluti<strong>on</strong>resulting in an optimal traversal time can be chosen. Thecutting planes being axis aligned is a minor limitati<strong>on</strong>, asarbitrarily positi<strong>on</strong>ed planes may produce a better <strong>tree</strong>, butfinding <strong>the</strong> optimum would be less effective. Fur<strong>the</strong>rmore,storing <strong>the</strong> data describing <strong>the</strong> cutting planes requires lessmemory space, and it is far easier to compute <strong>the</strong> ray-planeintersecti<strong>on</strong>.2.2. Traversal al<strong>on</strong>g a rayDuring <strong>the</strong> image syn<strong>the</strong>sis a large number of ray-scene intersecti<strong>on</strong>shave to be computed. Compared to <strong>the</strong> <strong>on</strong>e-timec<strong>on</strong>structi<strong>on</strong> of <strong>the</strong> <strong>tree</strong> this means such a difference of scale,that it is worth taking every cost just to speed up traversal inmost of <strong>the</strong> cases.The sequential ray traversal algorithm is based <strong>on</strong> <strong>the</strong> spatialproximity search using <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>. First we take <strong>the</strong> originof <strong>the</strong> ray, and locate <strong>the</strong> cell c<strong>on</strong>taining it by walkingdown <strong>the</strong> <strong>tree</strong> from its root. Within <strong>the</strong> cell found, we carryout all intersecti<strong>on</strong> tests with <strong>the</strong> objects bel<strong>on</strong>ging to <strong>the</strong>cell. If no intersecti<strong>on</strong> within <strong>the</strong> cell was found, we proceedto <strong>the</strong> next cell. In order to find it, we use <strong>the</strong> same methodas before. We calculate <strong>the</strong> point where <strong>the</strong> ray leaves <strong>the</strong>cell, which is exactly where it enters <strong>the</strong> next. We translateit a tiny bit fur<strong>the</strong>r al<strong>on</strong>g <strong>the</strong> ray to resolve ambiguity, andrepeat <strong>the</strong> whole process using <strong>the</strong> spatial proximity searchwith this next point. We have to remark that <strong>the</strong> algorithmmay skip cells of extremely little or zero width. Although<strong>the</strong>se may seem useless at <strong>the</strong> first sight, <strong>the</strong>y can actuallyrightfully appear in <strong>kd</strong>-<strong>tree</strong>s for scenes where <strong>the</strong>re are numerousaxis-aligned polyg<strong>on</strong>s. This may be <strong>the</strong> case with geometricalscenes, typically boxes and rooms. Ano<strong>the</strong>r drawbackof this algorithm is that it starts from <strong>the</strong> root of <strong>the</strong><strong>tree</strong> for every new cell though it is very probable that twocells following each o<strong>the</strong>r are near each o<strong>the</strong>r in <strong>the</strong> structure.Therefore <strong>on</strong>e node could be visited many times.The recursive ray traversal algorithm eliminates <strong>the</strong> maindrawbacks of <strong>the</strong> sequential ray traversal algorithm and visitsevery node and leaf just <strong>on</strong>ly <strong>on</strong>ce 2 . We check if <strong>the</strong> rayintersects <strong>the</strong> volumes corresp<strong>on</strong>ding to <strong>the</strong> left and rightsub-<strong>tree</strong>s. The sub-<strong>tree</strong>s are traversed in <strong>the</strong> very same way,if necessary, starting with <strong>the</strong> <strong>on</strong>e nearer to <strong>the</strong> origin. Toterminate <strong>the</strong> recursi<strong>on</strong> <strong>the</strong> leaves of <strong>the</strong> <strong>tree</strong> are handled in<strong>the</strong> same manner as above. The implementati<strong>on</strong> of <strong>the</strong> algorithmneeds a traversal stack to store data about <strong>the</strong> sub-<strong>tree</strong>sneeded to be processed later.Whichever algorithm we use, we will walk through <strong>the</strong>leaf cells al<strong>on</strong>g <strong>the</strong> ray, and test possible intersecti<strong>on</strong>s for<strong>the</strong> segment inside <strong>the</strong> cell. If intersecti<strong>on</strong>s were found, <strong>the</strong>closest is taken, else <strong>the</strong> ray has to be followed <strong>on</strong>. C<strong>on</strong>sequently,<strong>the</strong> objective is to have minimal number of objectsin a cell, and if a ray intersects a cell, it should, with highprobability, also intersect an object within. This, pushed toits extremes, it accomplished when all objects are delimitedby six fitting cutting planes. However, if <strong>the</strong> bounding boxesof <strong>the</strong> objects overlap, like in most scenes, <strong>the</strong>n such cutsmay intersects several objects, adding <strong>the</strong>m to both child volumes,resulting in superfluously large list in <strong>the</strong> leaves, andworse-than-optimal traversal time.2.3. C<strong>on</strong>structing a <strong>kd</strong>-<strong>tree</strong> and possibledecisi<strong>on</strong>-making heuristicsThe <strong>tree</strong> can be built in a recursive way. Processing a volumeinvolves <strong>the</strong> choice and storage of <strong>the</strong> cutting plane, and<strong>the</strong> processing of <strong>the</strong> two new sub-volumes. The decisi<strong>on</strong> tomake is where to place <strong>the</strong> cutting plane, and if it is worthsubdividing <strong>the</strong> volume at all. This may be based <strong>on</strong> someheuristic scheme, or an estimati<strong>on</strong> of <strong>the</strong> resulting traversalcost.The first, most obvious method is to cut <strong>the</strong> volume intotwo equal halves, using <strong>the</strong> spatial median, similarly to <strong>the</strong>oc<strong>tree</strong> approach where we care little about <strong>the</strong> positi<strong>on</strong> of<strong>the</strong> objects when subdividing a volume. The resulting <strong>tree</strong>will of course not be balanced, and it is easy to c<strong>on</strong>struct ascene where this method comes near to useless. Similarly to<strong>the</strong> oc<strong>tree</strong>, spatial median subdivisi<strong>on</strong> performs well in caseof evenly distributed objects.Ano<strong>the</strong>r simple and more promising approach is to makeboth sub-volumes c<strong>on</strong>tain <strong>the</strong> same number of objects. Thepositi<strong>on</strong> with this property is called <strong>the</strong> object median. Tofind it, we have to do a ’select and partiti<strong>on</strong>’ median search.This can be c<strong>on</strong>sidered a modified versi<strong>on</strong> of <strong>the</strong> ’quick sort’algorithm that <strong>on</strong>ly sorts <strong>the</strong> partiti<strong>on</strong> c<strong>on</strong>taining <strong>the</strong> halvingelement of <strong>the</strong> array. This simpler procedure will also separate<strong>the</strong> array into elements smaller and greater than <strong>the</strong>median, and outperforms ’quick sort’. As <strong>the</strong> resulting <strong>tree</strong>would be balanced, its representati<strong>on</strong> could be simple andcompact. Fur<strong>the</strong>rmore, a balanced <strong>kd</strong>-<strong>tree</strong> can be c<strong>on</strong>sideredto be optimal for several tasks, such as proximity search.However, in ray casting, we do not <strong>on</strong>ly need to find an object,but to follow a ray through several cells intersected.Therefore, <strong>the</strong> probability of a sub-volume being hit by aray plays an important role in <strong>the</strong> expected time cost of <strong>the</strong>rendering algorithm. The object median method disregardsthat aspect. The unfortunate c<strong>on</strong>sequence for <strong>the</strong> optimal <strong>tree</strong>
László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>is, we have to discard <strong>the</strong> c<strong>on</strong>cept of balancedness, and willhave to find <strong>the</strong> means to store a n<strong>on</strong>-balanced <strong>tree</strong> in a compactway.Although simple cut heuristics produce inferior traversaltimes, fast c<strong>on</strong>structi<strong>on</strong> and compact data structure are advantages.Therefore, <strong>the</strong>y may have some relevance if <strong>the</strong>structure is to be built real-time, despite <strong>the</strong> fact that in globallyilluminated animati<strong>on</strong> <strong>the</strong> traversal cost tends to be <strong>the</strong>bottleneck. As <strong>the</strong> <strong>tree</strong> c<strong>on</strong>structi<strong>on</strong> time rapidly increaseswith <strong>the</strong> number of objects, but <strong>the</strong> traversal time for sceneslarge enough is c<strong>on</strong>stant, it is not to exclude, that <strong>the</strong> situati<strong>on</strong>may change, especially in <strong>the</strong> case of very high polyg<strong>on</strong>number, vertex-based animati<strong>on</strong>s. The compact memoryrepresentati<strong>on</strong> used for <strong>the</strong> balanced <strong>tree</strong> is definitely to beused somehow in <strong>the</strong> more sophisticated methods.3. Improvement of <strong>the</strong> cost functi<strong>on</strong>3.1. Previous workA way to find <strong>the</strong> optimal cut is to c<strong>on</strong>sider all reas<strong>on</strong>ablecuts, including cutting off empty space and terminati<strong>on</strong> of<strong>the</strong> build, and choose <strong>the</strong> <strong>on</strong>e that produces <strong>the</strong> shortest expectedtraversal time. To achieve this we need estimate thattime. Havran proposed <strong>the</strong> following functi<strong>on</strong>, linear withrespect to <strong>the</strong> number of objects in <strong>the</strong> sub-volumes:C 1V SA´V SA´le ftChild´V µµ´N L · N SP µ·µSA´rightChild´V µµ´N SP · N R µ℄ (1)Where C V is <strong>the</strong> cost corresp<strong>on</strong>ding to volume V, SA´V µ is<strong>the</strong> surface area of volume V, and N L , N R , N SP are <strong>the</strong> numberof objects completely in <strong>the</strong> left and right sub-volumes,and <strong>the</strong> number of objects intersected by <strong>the</strong> splitting plane,respectively.This means that <strong>the</strong> expected time for <strong>the</strong> traversal of avolume is <strong>the</strong> time needed to carry out <strong>the</strong> naïve intersecti<strong>on</strong>test for all objects, multiplied by <strong>the</strong> probability of a rayhitting <strong>the</strong> volume. This probability, c<strong>on</strong>sidering that <strong>the</strong> volumesare c<strong>on</strong>vex, equals <strong>the</strong> ratio of <strong>the</strong> surface areas. Obviously,<strong>the</strong> estimate given by this functi<strong>on</strong> does not equal <strong>the</strong>actual time cost, as <strong>the</strong> created volumes will be subdividedfur<strong>the</strong>r, and not handled with <strong>the</strong> naïve algorithm. Havranalso identified this problem and proposed some ideas for <strong>the</strong>soluti<strong>on</strong>. He stated that <strong>the</strong> optimal cost functi<strong>on</strong> depends <strong>on</strong><strong>the</strong> distributi<strong>on</strong> of <strong>the</strong> objects in <strong>the</strong> actual scene to a greatextent, and thus for a better estimate <strong>the</strong> cost must be measuredin some way. Although it is possible to build <strong>the</strong> <strong>tree</strong>and compute <strong>the</strong> cost precisely, doing this every time <strong>the</strong>functi<strong>on</strong> should be evaluated would lead to computati<strong>on</strong>alexplosi<strong>on</strong> of <strong>the</strong> c<strong>on</strong>structi<strong>on</strong> algorithm. Therefore, in orderto obtain a more effective functi<strong>on</strong>, <strong>the</strong> scene should be characterisedby values that are easily determined, and influence<strong>the</strong> cost functi<strong>on</strong>.3.2. N<strong>on</strong>-linear cost estimateIn a recent article we have shown that for scenes with largenumber of random objects, <strong>kd</strong>-<strong>tree</strong> traversal is d<strong>on</strong>e in c<strong>on</strong>stanttime. How can this be brought into c<strong>on</strong>s<strong>on</strong>ance with <strong>the</strong>linear estimati<strong>on</strong>? How can Havran’s method provide outstandingresults despite this c<strong>on</strong>tradicti<strong>on</strong>? If we are low in<strong>the</strong> <strong>tree</strong>, near <strong>the</strong> leaves, and it is true that <strong>the</strong> sub-volumeswill go through little to no fur<strong>the</strong>r subdivisi<strong>on</strong>, than <strong>the</strong> linearestimati<strong>on</strong> is of course perfect. On <strong>the</strong> o<strong>the</strong>r hand, if weare near <strong>the</strong> root of <strong>the</strong> <strong>tree</strong>, meaning that <strong>the</strong> c<strong>on</strong>stant timetraversal statement hold for <strong>the</strong> sub-<strong>tree</strong>s, <strong>the</strong>n <strong>the</strong> expectedtraversal time is independent of <strong>the</strong> cut. Therefore, if <strong>the</strong> linearestimate would fail, <strong>the</strong>n where we cut is not so importantat all. However, it is possible to c<strong>on</strong>struct a more accuratecost estimate, if we are able to account for <strong>the</strong> gain fromseparating <strong>the</strong> elements and cutting off empty space. To calculatethat exactly would be hopelessly expensive, but bysimply changing <strong>the</strong> linear functi<strong>on</strong> to a bit more fitting <strong>on</strong>e,we may eliminate some of <strong>the</strong> inaccuracy <strong>on</strong> higher levels of<strong>the</strong> <strong>tree</strong>. It is of course imperative to keep <strong>the</strong> linearity in <strong>the</strong>lower regi<strong>on</strong>s where it works perfectly. Let us suppose thata cut improves <strong>the</strong> time by a factor of q 1 <strong>on</strong> average, andthat a cell c<strong>on</strong>taining n 0 elements is not worth dividing anymore. Actually, that means that a cell may c<strong>on</strong>tain n 0 objects<strong>on</strong> average. Using that <strong>the</strong> cost of traversal, without <strong>the</strong> adjustmentfor <strong>the</strong> probability of <strong>the</strong> volume being hit is givenin <strong>the</strong> following equati<strong>on</strong>. This functi<strong>on</strong> is to be applied to<strong>the</strong> number of objects in <strong>the</strong> sub-volumes in 1:f ´nµ n¡q log 2´n n 0µ(2)The value of n 0 is relatively easy to find, and will be determinedby <strong>the</strong> primitive geometry. The value q, however, isquite an abstracti<strong>on</strong>. It includes both cuts between objectsand cutting off empty space. Actually, it corresp<strong>on</strong>ds moreto <strong>the</strong> subdivisi<strong>on</strong> potential of <strong>the</strong> volume than to <strong>the</strong> obscurec<strong>on</strong>cept of cost reducti<strong>on</strong> achieved by a single cut. Still, it isnot harmful to overestimate both n 0 and q, as that will get usnearer to <strong>the</strong> original linear estimate. Therefore, <strong>the</strong> formulafor <strong>the</strong> expected number of intersecti<strong>on</strong> tests introduced inour previous article 3 can be applied to determine a probableupper bound for <strong>the</strong> traversal cost of <strong>the</strong> <strong>tree</strong> that is beingbuilt, providing a value for q. Naturally, significantly betterresults are <strong>on</strong>ly expected for large scenes with high primitivecount, as <strong>the</strong> linear functi<strong>on</strong> is less accurate, and <strong>the</strong> guessfor q is better in those cases. The previous equati<strong>on</strong> can fur<strong>the</strong>rbe written as:f ´nµ n¡´n n 0 µ log 2 q (3)f ´nµ ´n n 0 µ 1·log 2 q · n 0 ¡ ´n n 0 µ log2q (4)As ´n n 0 µ log2q 1, <strong>the</strong> cost may be over-estimated as:f ´nµ n¡´n n 0 µ log 2 q ´n n 0 µ 1·log 2 q · n 0 (5)