6.830 Problem Set 2 (2009) - MIT Database Group

More documents

Recommendations

Info

6.830 Problem Set 2 Solutions 21. [5 points]: Query 1:SELECT r1.nameFROM researchers AS r1, researchers AS r2, grants, grant_researchers AS grWHERE grants.pi = r2.idAND grants.id = gr.grantidAND gr.researcherid = r1.idAND r1.org = 10AND r1.org != r2.org;Answer:hashindexnested loopsindex scangrants.idseq scan r2hashseq scan grfilter org=10a.seq scan r1b. Working from the bottom up, the query scans/filters r1 because using the index on r1.id would require lots of randomseeks into r1 to test org=10. There’s no predicate on grant researchers so the only choice is to sequentially scan it.Hash join is in general the best choice when there isn’t a need for output in sorted order or an obvious index-basedplan.It is somewhat unclear why it chooses to do an index-nested loops join with the joined ri/gr table and grants – itestimates that it will do 1929 index lookups which sounds expensive relative to building a hash table on the grantstable. This appears to be a bad plan choice.Again, it chooses a hash join for the top-most join, because that’s faster than the random seeks that would berequired to use r2’s index, and a large fraction of r2’s pages will be examined.c. 1586d. 28e. The counts are only wrong in the top-most join (e.g., in the check of r1.org r2.org).f. The plan looks reasonable, except for the choice of index-nested-loops for the middle join.
6.830 Problem Set 2 Solutions 32. [10 points]: Query 2:SELECT r.name, gr.grantidFROM grant_researchers AS gr, grant_programs AS gp, grant_fields AS gf, researchers as rWHERE gr.grantid = gp.grantidAND gr.grantid = gf.grantidAND gr.researcherid = r.idAND gf.fieldid < 200AND gp.programid < 200ORDER BY gr.grantid;Answer:sort on grantidmergehashmergeon gp.grantid = gf.grantida.seq scanresearchersseq scangrindex scangpindex scangfb. This is a very interesting plan. First, it is a bushy plan, which joins gp and gf (using a merge join), gr and r (usinga hash join) and then joins those two results (using a merge join.) Merge joins are selected because they producedata in grantid order. Note that it joins gp and gf, which don’t actually have a join predicate in the above query;surprisingly, the join condition is on gp.grantid = gf.grantid, which is a predicate the system is able to infer fromforeign key relationships and the conditions gr.grantid = gf.grantid and gr.grantid = gp.grantid.Hash join is used between r and gr because hash join is generally the best way to combine two large tables withouta selective predicate between them (as above.) There’s no way to use a sort-merge join for this join to produce thatdata in grantid order. It is somewhat unusual that it chooses to perform this join first – it seems that first producing arestricted version of gr joined with gf and gp and then joining that with r might be more efficient. The sort is neededto get the data in grantid order for the outer merge join.c. 20514d. 14606e. Its estimate of the hash join size is perfect. It slightly overestimates the size of the middle merge join (actual=9986,estimated = 11894.) This probably accounts for the error in its over all estimate of the final join size. In general,this is not a bad estimate of the total cost.f. As noted above, this is a surprisingly sophisticated plan. A better plan might be to first join gf and gp with gr, andthen join that result with r, using a sort merge join, since that would have produced an intermediate table with anestimated 11894 results, instead of the 13496 rows in the original gr table. However, the plan that is chosen seemsreasonable given that this reduction isn’t particularly significant.
Page 1: 6.830 Problem Set 2 Solutions 16.83
Page 5 and 6: 6.830 Problem Set 2 Solutions 5Hash
Page 7 and 8: 6.830 Problem Set 2 Solutions 7Part
Page 9 and 10: 6.830 Problem Set 2 Solutions 9He n
Page 11 and 12: 6.830 Problem Set 2 Solutions 11Par
Page 13: 6.830 Problem Set 2 Solutions 13Ste

6.830 Problem Set 2 (2009) - MIT Database Group

Create successful ePaper yourself

Delete template?

Save as template?