László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong><strong>the</strong> array is 18n, where 6n nodes are reserved for <strong>the</strong> cuts, <strong>the</strong>leaves and <strong>the</strong> pointers. Although this may still be a roughover-estimati<strong>on</strong> for some simple scenes, going even lowermay involve disadvantageous effects. First of all, if <strong>the</strong> arrayis filled, <strong>the</strong> above menti<strong>on</strong>ed terminati<strong>on</strong> could occur.Sec<strong>on</strong>dly, even if <strong>the</strong> array is not filled, a deeper <strong>tree</strong> willallow l<strong>on</strong>ger branches without using pointers. Therefore,over-allocati<strong>on</strong> slightly increases compactness and traversalspeed. Setting <strong>the</strong> array to <strong>the</strong> size of 18n will allow for <strong>the</strong>storage of worst-case <strong>kd</strong>-<strong>tree</strong>s, and speed-effective representati<strong>on</strong>of simple <strong>on</strong>es.In <strong>the</strong> following table <strong>the</strong> numbers of leaves and pointersfor various scenes are listed. Obviously, <strong>the</strong> number ofpointers remains below <strong>the</strong> worst-case bound. This representati<strong>on</strong>definitely uses fewer pointers than <strong>the</strong> previous soluti<strong>on</strong>s,which should result in better cache coherence andfaster traversal.Scene Patches Nodes Leaves PointersCornell box 3968 9037 4296 444Beethoven 2636 23140 9883 3373Random 3515 39389 16981 5426Tea 10025 54392 23874 6643Chickens 16467 115455 49094 17266House 24737 156469 65540 253885. <strong>kd</strong>-<strong>tree</strong>s for animati<strong>on</strong>5.1. Separati<strong>on</strong> of dynamic and static objectsRebuilding <strong>the</strong> whole <strong>kd</strong>-<strong>tree</strong> for every frame is obviouslyvery expensive and also superfluous. If <strong>the</strong> objects are classifiedas static objects staying at a fixed positi<strong>on</strong> during <strong>the</strong>animati<strong>on</strong>, and dynamic objects that may move, we can buildtwo different <strong>kd</strong>-<strong>tree</strong>s. This may have various advantages.First of all, <strong>the</strong> time of <strong>the</strong> <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>structi<strong>on</strong> is dramaticallyreduced. It also becomes possible to shoot rays <strong>on</strong>lyinto <strong>the</strong> dynamic <strong>kd</strong>-<strong>tree</strong>, <strong>the</strong>reby identifying changes of <strong>the</strong>scene al<strong>on</strong>g previous shooting or ga<strong>the</strong>ring paths. It is interestingto examine how <strong>the</strong> data structure could be helpfulat making use of frame-to-frame coherence. However, in <strong>the</strong>dual <strong>kd</strong>-<strong>tree</strong> structure traversal will be slower. Theoretically,if <strong>the</strong> both <strong>tree</strong>s c<strong>on</strong>tain a large number of objects, <strong>the</strong> traversaltime would be independent of <strong>the</strong> size of <strong>the</strong> <strong>tree</strong>, <strong>the</strong>reforeseparating <strong>the</strong>m could double <strong>the</strong> time cost. This wouldof course be unacceptable, and should be addressed.Obviously, <strong>the</strong> less objects are in <strong>the</strong> dynamic <strong>kd</strong>-<strong>tree</strong>,<strong>the</strong> faster it can be built. The moving objects in an animati<strong>on</strong>sequence can usually be separated into sets of primitivesthat move toge<strong>the</strong>r. This is even more characteristic toscenes with rigid bodies, where <strong>the</strong> primitives of a higherlevelobject are static relative to each o<strong>the</strong>r. Rec<strong>on</strong>structing<strong>the</strong> <strong>kd</strong>-<strong>tree</strong> using <strong>the</strong> primitives would not take any advantageof this property. The <strong>kd</strong>-<strong>tree</strong> for <strong>the</strong> rigid objects can bebuilt in advance, but if <strong>the</strong> objects are rotated, <strong>the</strong> splittingplanes would not be axis-aligned any more, and such a structurecould not be used as a sub-<strong>tree</strong> of <strong>the</strong> dynamic <strong>kd</strong>-<strong>tree</strong>.The soluti<strong>on</strong> is pre-compute <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>s for <strong>the</strong> rigid objects,attach <strong>the</strong>m <strong>the</strong>se objects, and define <strong>the</strong> intersecti<strong>on</strong>test for an object as <strong>the</strong> ray shot in its <strong>kd</strong>-<strong>tree</strong>. If <strong>the</strong> objectsare translated, rotated or transformed in any o<strong>the</strong>r way,<strong>the</strong>n <strong>the</strong> ray must be transformed into <strong>the</strong> model space 5 ,inwhich <strong>the</strong> sub-<strong>kd</strong>-<strong>tree</strong> is axis-aligned. This way <strong>the</strong> dynamic<strong>tree</strong> will be built of a few rigid objects, and not many moreprimitives. The rec<strong>on</strong>structi<strong>on</strong> of <strong>the</strong> data structure betweenframes will be d<strong>on</strong>e in very little time, and traversal overheadbecause of <strong>the</strong> dual <strong>tree</strong> structure will be minimal.However, several questi<strong>on</strong>s arise. In order to build a <strong>kd</strong><strong>tree</strong>of transformed objects, <strong>the</strong> extremes al<strong>on</strong>g every axishave to be found. Computing a bounding box for a set ofpoints is straightforward but may be unacceptably expensivefor a large number of vertices. Fur<strong>the</strong>rmore, as an intersecti<strong>on</strong>test for such a high-level object is costly, a cheaper prefilteringwould be useful. Both problems are addressed by apre-computing a bounding object easy to transform. An ellipsoid,being a quadratic surface, is <strong>the</strong> most appropriate. If<strong>the</strong> smallest enclosing ellipsoid for <strong>the</strong> vertices of <strong>the</strong> objectis calculated, it can be transformed appropriately for everyframe. Its extremes may be used to determine <strong>the</strong> boundingbox, and an intersecti<strong>on</strong> test with a quadratic object can beused to filter a huge amount of n<strong>on</strong>-intersecting rays out. Thealgorithm used to determine <strong>the</strong> smallest enclosing ellipsoidis based <strong>on</strong> linear programming 7 and runs in Ç´nµ time 1 .5.2. Synchr<strong>on</strong>ous traversal of <strong>the</strong> dual <strong>tree</strong>We have menti<strong>on</strong>ed above that <strong>the</strong> traversal cost for two <strong>tree</strong>smay be <strong>the</strong> double of <strong>the</strong> cost for <strong>on</strong>e <strong>tree</strong> of twice as manyobjects. This is, however, a worst case assumpti<strong>on</strong>, and canbe avoided in several ways. First of all, <strong>the</strong> formerly describeduse of compound rigid objects will decrease <strong>the</strong> sizeand traversal cost of <strong>the</strong> dynamic <strong>tree</strong>. Sec<strong>on</strong>dly, it is obviousthat if we have found an intersecti<strong>on</strong> in <strong>the</strong> dynamic <strong>tree</strong>,<strong>the</strong> search in <strong>the</strong> static <strong>tree</strong> may be limited to <strong>the</strong> segment of<strong>the</strong> ray between <strong>the</strong> origin and <strong>the</strong> intersecti<strong>on</strong> point. Thatis, we do not test in areas occluded by dynamic objects. Thissimple modificati<strong>on</strong> will result traversal times very close to<strong>the</strong> <strong>on</strong>e <strong>tree</strong> case, especially if <strong>the</strong> dynamic objects are rarelyoccluded by static <strong>on</strong>es. However, it is not always possibleto identify rigid objects, and <strong>the</strong> visibility relati<strong>on</strong> between<strong>the</strong> dynamic and static objects may not be so determined forsome animati<strong>on</strong> sequences. Therefore, we introduce a costeffectivetraversal algorithm for multiple overlapping <strong>kd</strong><strong>tree</strong>s,especially useful if a large number of independentlymoving primitives are stored in <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>.Basically, a <strong>the</strong> cell boundaries of a <strong>kd</strong>-<strong>tree</strong> separate atraversing ray into segments. A traversal algorithm will identifythose segments, and will compute intersecti<strong>on</strong> tests <strong>on</strong>every segment in order. If <strong>the</strong> objects are stored in multiple<strong>kd</strong>-<strong>tree</strong>, multiple segmentati<strong>on</strong>s exist. The task is to find an
László Szécsi and Balázs Benedek / <str<strong>on</strong>g>Improvements</str<strong>on</strong>g> <strong>on</strong> <strong>the</strong> <strong>kd</strong>-<strong>tree</strong>optimal order of <strong>the</strong> segments, so that no segment fur<strong>the</strong>rthan <strong>the</strong> first valid intersecti<strong>on</strong> is examined. That means, ifany point of segment A is nearer to <strong>the</strong> origin of <strong>the</strong> ray thanany point of segment B, <strong>the</strong>n A must precede B in <strong>the</strong> traversalorder. A known recursive algorithm, described in detailby Havran 2 , is extended <strong>the</strong> following way:1. Set up a search interval for every <strong>tree</strong> as <strong>the</strong> entire ray.2. Choose that ’n<strong>on</strong>-terminated’ <strong>tree</strong>, for which <strong>the</strong> minimumpoint of <strong>the</strong> search interval is <strong>the</strong> nearest to <strong>the</strong> origin.3. Traverse <strong>the</strong> chosen <strong>tree</strong> using <strong>the</strong> recursive algorithm. Aseparate traversal stack and a current node identifier hasto be maintained for every <strong>tree</strong>. C<strong>on</strong>tinue until a leaf isreached.4. If a leaf is being processed, test for intersecti<strong>on</strong>s, and update<strong>the</strong> global closest intersecti<strong>on</strong> found if necessary. Set<strong>the</strong> search interval to <strong>the</strong> segment of <strong>the</strong> ray intersected by<strong>the</strong> volume of <strong>the</strong> next node to be processed according to<strong>the</strong> recursive traversal algorithm. If <strong>the</strong> traversal stack isempty, or a valid intersecti<strong>on</strong> was found, mark <strong>the</strong> <strong>tree</strong> as’terminated’.5. If a valid intersecti<strong>on</strong> was already found, and <strong>the</strong> searchinterval for every <strong>tree</strong> is entirely fur<strong>the</strong>r <strong>the</strong>n <strong>the</strong> closestintersecti<strong>on</strong>, terminate, and return <strong>the</strong> found intersecti<strong>on</strong>.6. If all <strong>the</strong> <strong>tree</strong>s are marked ’terminated’, <strong>the</strong>re was no intersecti<strong>on</strong>with <strong>the</strong> ray in ei<strong>the</strong>r of <strong>the</strong> <strong>tree</strong>s, return withouta result.7. C<strong>on</strong>tinue with step 2.opposite case <strong>the</strong> same amount of tests are carried out. However,we have to remark that <strong>the</strong>re is some overhead becauseof some additi<strong>on</strong>al administrati<strong>on</strong> and weaker cache coherence,a result of handling more <strong>kd</strong>-<strong>tree</strong>s simultaneously.5.3. ResultsScenes have been divided into a static and dynamic part totest <strong>the</strong> algorithm. Three cases were examined:One <strong>tree</strong>: All <strong>the</strong> patches, static or dynamic, are stored in acomm<strong>on</strong> <strong>kd</strong>-<strong>tree</strong>.Sequential: Static and dynamic patches are stored in separate<strong>tree</strong>s. When calculating a ray-scene intersecti<strong>on</strong>, firstwe traverse <strong>on</strong>ly <strong>the</strong> dynamic <strong>tree</strong>. Thereafter <strong>the</strong> static<strong>tree</strong> is tested, but <strong>on</strong>ly <strong>on</strong> <strong>the</strong> ray segment between <strong>the</strong>origin and <strong>the</strong> intersecti<strong>on</strong> point in <strong>the</strong> dynamic <strong>tree</strong>.Parallel: Static and dynamic patches are stored in separate<strong>tree</strong>s. The parallel traversal algorithm is used for raysceneintersecti<strong>on</strong>s.0 0001 23 045Figure 4: The parallel traversal of two <strong>kd</strong>-<strong>tree</strong>s partiti<strong>on</strong>ing<strong>the</strong> same space, c<strong>on</strong>taining different sets of objects. The cellsare numbered to indicate <strong>the</strong> order of <strong>the</strong>ir processing. N<strong>on</strong>processedcells al<strong>on</strong>g <strong>the</strong> ray are marked with 0.Figure 5: One of <strong>the</strong> test scenes. The two standing chickensare c<strong>on</strong>sidered static, <strong>the</strong> o<strong>the</strong>r two, those over <strong>the</strong> ground,are dynamic.Compared to <strong>the</strong> sequential soluti<strong>on</strong>, where <strong>the</strong> <strong>tree</strong>s aretraversed after <strong>on</strong>e ano<strong>the</strong>r, <strong>on</strong> <strong>the</strong> interval limited by previouslyfound intersecti<strong>on</strong>s, we spare <strong>the</strong> traversal of <strong>the</strong> raysegments between <strong>the</strong> nearest intersecti<strong>on</strong> and those fur<strong>the</strong>rintersecti<strong>on</strong>s, that were to be found in previously traversed<strong>tree</strong>s. Speaking about <strong>the</strong> dual <strong>tree</strong> structure, we have twoopti<strong>on</strong>s: traverse <strong>the</strong> dynamic <strong>tree</strong>, and <strong>the</strong>n <strong>the</strong> static <strong>tree</strong>, ordo it in parallel. If <strong>the</strong> nearest intersecti<strong>on</strong> is in <strong>the</strong> static <strong>tree</strong>,<strong>the</strong>n <strong>the</strong> parallel algorithm will not investigate <strong>the</strong> segmentbetween <strong>the</strong> dynamic and static intersecti<strong>on</strong> points. In <strong>the</strong>The tests were run with two different <strong>kd</strong>-<strong>tree</strong> c<strong>on</strong>structi<strong>on</strong>routines. We rendered an image using Bi-directi<strong>on</strong>al PathTracing 4 . We used a test scene of large static and dynamicobjects with a high primitive count, which simulates a frameof an animati<strong>on</strong> sequence well, we believe. In <strong>the</strong> o<strong>the</strong>r testscene, both <strong>the</strong> static and dynamic patches were generated<strong>on</strong> random. With <strong>the</strong> first versi<strong>on</strong>, we obtained satisfying resultsshowing that <strong>the</strong> parallel traversal is faster. Executi<strong>on</strong>times are specified in sec<strong>on</strong>ds: