10 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XXX, NO. XX, XXXXXXX 2001 MAXTIME is set to 50,000 <strong>for</strong> all workloads. The value used in the R-trees <strong>for</strong> now was 100,000. Each workload is described by the distribution used to generate the object lifespans,the number of different oids, the total number of changes in the evolution n object additions and deletions),the total number of object additions NB,and the total number of queries. 4.3 Experiments First,the behavior of all implementations was tested using a basic Uni<strong>for</strong>m workload. The number of lifespans per object follows a uni<strong>for</strong>m distribution between 20 and 40. The total number of distinct oids was jUj ˆ8; 000,the number of real changes n ˆ 466; 854,and NB ˆ 237; 606 object additions. Hence,the average number of lifespans per oid was NB 30 we refer to this workload as Uni<strong>for</strong>m-30). The number of queries was 115,878. Fig. 3a presents the average number of pages accessed per query by all methods. The PPLH methods have the best per<strong>for</strong>mance,about two pages per query. The ALH approach uses more query I/O about 1.5 times in this example) because of the larger buckets it creates. The MVBT also uses more I/O about 1.75 times) than the PPLH approaches since a tree path is traversed per query. The R i uses more I/Os per query than the MVBT about 11.5 I/Os),mainly due to tree node overlapping and larger tree height its height relates to the total number of oid lifespans while MVBT's height corresponds to the alive oids at the time specified by the query). The R p tree has the worse query per<strong>for</strong>mance an average of 28.3 I/Os per query). The per<strong>for</strong>mance of the R- tree methods has been truncated in Fig. 3a to fit the graph. While using a separate dimension <strong>for</strong> the two endpoints of a lifespan interval allows <strong>for</strong> better clustering see also the space usage in Fig. 3c),it makes it more difficult to check whether an interval contains a query time instant. Fig. 3b shows the average number of I/Os per update. The best update per<strong>for</strong>mance was given by the PPLH-s method. In PPLH-l,the NT array implementation inside each page limits the actual page area assigned <strong>for</strong> storing oids and,thus,increases the number of pages used per bucket. The MVBT update is longer than PPLH-s since the MVBT traverses a tree <strong>for</strong> each update instead of quickly finding the location of the updated element through hashing). The update of R i follows; it is larger than the MVBT since the size of the tree traversed is related to all oid lifespans while the size of the MVBT structure traversed is related to the number of alive oids at the time of the update). The R p tree uses larger update processing than the R i because of the overhead to store an interval as two points. The ALH had the worse update processing since all lifespans with the same oid are thrown on the same bucket, creating large buckets that must be searched serially. The space consumed by each method appears in Fig. 3c. The ALH approach uses the smallest space since it stores a single record per oid lifespan and uses ªcontrolledº splits with high utilization. The PPLH-s method has also very good space utilization,very close to ALH. The R-tree methods follow; R p uses slightly less space than the R i because paginating intervals putting them into bounding rectangles) is more demanding than with points. Note that similarly to ALH,both R* methods use a single record per oid lifespan; the additional space is mainly because the average R-tree page utilization is about 65 percent. PPLH-l uses more space than PPLH-s because the NT array implementation reduces page utilization. The MVBT has the largest space requirements,about twice more space than the ALH and PPLH-s methods. To consider the effect of lifespan distribution,all approaches were compared using five additional workloads called the exponential,step,normal,Poisson,and uni<strong>for</strong>mconsecutive). These workloads had the same number of distinct oids …jUj ˆ8; 000†,number of queries 115,878),and similar n … 0:5M† and NB ˆ…30† parameters. The Exponential workload generated the n k lifespans per oid using an exponential distribution with probability density function f…x† ˆexp… x† and mean 1= ˆ 30. The total number of changes was n ˆ 487; 774,the total number of object additions was NB ˆ 245; 562,and NB ˆ 30:7. In the Step workload,the number of lifespans per oid follows a step function. The first 500 oids have four lifespans,the next 500 have eight lifespans,and so on,i.e.,<strong>for</strong> every 500 oids, the number of lifespans advances by four. In this workload, we had n ˆ 540; 425, NB ˆ 272; 064,and NB ˆ 34. The Normal workload used a normal distribution with ˆ 30 and 2 ˆ 25. Here,the parameters were n ˆ 470; 485, NB ˆ 237; 043,and NB ˆ 29:6. For the Poisson workload the first lifespan <strong>for</strong> every oid was generated randomly between time instants 1 and 500. The length of a lifespan was generated using a Poisson distribution with mean 1,100. Each next start time <strong>for</strong> a given oid was also generated by a Poisson distribution with mean value 500. For this workload,we had n ˆ 498; 914, NB ˆ 251; 404,and NB ˆ 31. The main characteristic of the Poisson workload is that the number of alive oids over time can vary from a very small number to a large proportion of jUj,i.e.,there are time instants where the number of alive oids is some hundreds and other time instants where almost all distinct oids are alive. The special characteristic of the Uni<strong>for</strong>m-consecutive workload is that it contains objects with multiple but consecutive lifespans. This scenario occurs when objects are updated frequently during their lifetime. Each update is seen as the deletion of the object followed by the insertion of the updated object at the same time. Since the object retains its oid through updates,this process creates consecutive lifespans <strong>for</strong> the same object the end of one lifespan is the start of the next lifespan). This workload was based on the Uni<strong>for</strong>m-30 workload and had n ˆ 468; 715, NB ˆ 236; 155, and NB ˆ 30. An object has a single lifetime which is cut into consecutive lifespans. The start_times of an object's lifespans are chosen uni<strong>for</strong>mly. Fig. 4 presents the query,update and space per<strong>for</strong>mance under the new workloads. The results resemble the Uni<strong>for</strong>m-30 workload. For brevity,we have excluded the R-tree-based methods from the remaining discussion as they consistently had much worse query per<strong>for</strong>mance; the interested reader can find the detailed per<strong>for</strong>mance in [17]). As be<strong>for</strong>e,the PPLH-s approach has the best overall per<strong>for</strong>mance using slightly more space than the ªminimalº space of ALH. PPLH-l has the same query per<strong>for</strong>mance with PPLH-s,but uses more updating and space. Note that
KOLLIOS AND TSOTRAS: HASHING METHODS FOR TEMPORAL DATA 11 Fig. 3. a) Query, b) update, and c) space per<strong>for</strong>mance <strong>for</strong> all implementations on a uni<strong>for</strong>m workload with 8K oids, n 0:5M and NB 30.