11.07.2015 Views

A preliminary examination of code review processes in open source ...

A preliminary examination of code review processes in open source ...

A preliminary examination of code review processes in open source ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 6: First column <strong>in</strong>dicates percentage <strong>of</strong> commits <strong>review</strong>edper year. Second column <strong>in</strong>dicates percentage <strong>of</strong> commits<strong>review</strong>ed by two or more <strong>review</strong>ers.defect densities can be solely credited to post-commit <strong>review</strong>. Unlikeproprietary projects, testers and users have access to and canpost bugs about the current Apache <strong>source</strong>. Furthermore, it is difficultto obta<strong>in</strong> an accurate measure <strong>of</strong> defect density (and given thatthis is a report I have not had time to <strong>review</strong> the literature on defectdensities <strong>in</strong> <strong>open</strong> <strong>source</strong> projects). A more recent study by Reason<strong>in</strong>gInc. 15 <strong>in</strong>dicated that the Apache project has a defect densitycomparable to proprietary s<strong>of</strong>tware.It must be noted that this type <strong>of</strong> post-commit <strong>review</strong> does not scaleto large projects with less modular architectures than Apache’s. Itwould appear (although I have not read the literature on peer andextreme programm<strong>in</strong>g) that post-commit <strong>review</strong> is, <strong>in</strong> effect, a type<strong>of</strong> distributed peer programm<strong>in</strong>g. In order to <strong>review</strong> <strong>code</strong> <strong>in</strong> sucha short time period, <strong>review</strong>ers must be <strong>in</strong>timately familiar with the<strong>code</strong>, and they must understand the effect <strong>of</strong> the <strong>code</strong> on the systemas a whole. This type <strong>of</strong> distributed peer programm<strong>in</strong>g is potentiallymore efficient than traditional co-located peer programm<strong>in</strong>g(where one developer controls the <strong>in</strong>put devices and the other commentsand helps <strong>in</strong> navigation), because the peer must only exam<strong>in</strong>ewhat the other developer feels is the correct solution, not every step,<strong>in</strong>clud<strong>in</strong>g <strong>in</strong>valid ones, to the solution. If the peer does not feel thepatch is correct, they can discuss the patch and give the developernavigational and other help. Furthermore, the advice is not limitedto a s<strong>in</strong>gle peer. Future experimentation is needed to validate theseresults5.8 How many <strong>review</strong>ers <strong>review</strong> a s<strong>in</strong>gle patch?Pre-commit <strong>review</strong> Although we do not know how many <strong>in</strong>dividualsun<strong>of</strong>ficially <strong>review</strong> a patch (i.e., discuss the patch on the mail<strong>in</strong>glist), we do know how many <strong>review</strong>ers <strong>of</strong>ficially <strong>review</strong> patches thatwere accepted. For the lifetime <strong>of</strong> the project, 56% <strong>of</strong> patches were<strong>review</strong>ed by one <strong>in</strong>dividual, 82% <strong>of</strong> <strong>review</strong>s were performed byone or two <strong>review</strong>ers, and only 18% <strong>of</strong> <strong>review</strong>s had more than two<strong>review</strong>ers. Figure 6 shows the percentage <strong>of</strong> commits that were <strong>review</strong>ed<strong>in</strong> each year and the number <strong>of</strong> these commits that <strong>in</strong>volvedmore than one <strong>review</strong>er. Aga<strong>in</strong>, it is obvious that certa<strong>in</strong> years hadhigh levels <strong>of</strong> <strong>in</strong>volvement <strong>in</strong> the <strong>review</strong> process while other did not(Compare with figure 3.Post-commit <strong>review</strong>. The median number <strong>of</strong> responses is two(mean 3.8). Normally, the first email describes the problem withthe patch (is a response to the commit) and the second email is theresponse address<strong>in</strong>g the patch problem. Exam<strong>in</strong><strong>in</strong>g 80% <strong>of</strong> the <strong>review</strong>s<strong>in</strong> ascend<strong>in</strong>g order revealed that these patches received 4 orfewer responses (not <strong>in</strong>clud<strong>in</strong>g the <strong>in</strong>itial problem/<strong>review</strong> email).This f<strong>in</strong>d<strong>in</strong>g shows that not only does the process <strong>of</strong> <strong>review</strong> happenquickly (see above), but the fix to the (small) patch is usually15 The results can be obta<strong>in</strong>ed at http://www.reason<strong>in</strong>g.com/downloads.html December 2005obvious and does not require a great deal <strong>of</strong> discussion. S<strong>in</strong>ce thenumber <strong>of</strong> <strong>in</strong>dividuals who discuss a patch must be equal to or lessthan the number <strong>of</strong> email messages, we expect that usually onlytwo people discusses any given patch: the <strong>review</strong>er and the contributor.There are a few patches that have a large number <strong>of</strong> replies,the maximum number <strong>of</strong> responses is 80. Exam<strong>in</strong><strong>in</strong>g this emaildiscussion, we discovered that the majority <strong>of</strong> the responses werenot related to the patch, but some other problem that the patch hadmade obvious to the group. We expect that this type <strong>of</strong> tangentialdiscussion or some controversial issue will be found <strong>in</strong> otherpatches with high response rates. Future work is required to validatethese results and to determ<strong>in</strong>e the time between <strong>review</strong> andresponse to the problems found by the <strong>review</strong>.5.9 Patch K<strong>in</strong>dWhat k<strong>in</strong>ds <strong>of</strong> non-<strong>source</strong> <strong>code</strong> patches are <strong>review</strong>ed? How doesthe k<strong>in</strong>d <strong>of</strong> patch affect the <strong>review</strong>?These questions are ma<strong>in</strong>ly left to future work. However, as wenoted <strong>in</strong> the patch statement section, Apache requires documentationto be submitted for new features and major changes. S<strong>in</strong>cedocumentation is not critical to the function<strong>in</strong>g <strong>of</strong> the product, weexpect that documentation is only <strong>review</strong>ed when it comes from anexternal <strong>source</strong>, and then only m<strong>in</strong>imally. Translations to other languagesusually require at least two translators to perform the task.This process serves as <strong>review</strong>.6. HYPOTHESESBefore develop<strong>in</strong>g hypotheses we need to fix some <strong>of</strong> the problems<strong>in</strong> our data, such as resolv<strong>in</strong>g the names <strong>in</strong> the mail<strong>in</strong>g list and deal<strong>in</strong>gwith a number <strong>of</strong> m<strong>in</strong>or <strong>in</strong>consistencies, which are <strong>in</strong> less than10% <strong>of</strong> our data. It would also be ideal to present our results toa group <strong>of</strong> <strong>in</strong>terested researchers to obta<strong>in</strong> a different perspectiveon these f<strong>in</strong>d<strong>in</strong>gs before develop<strong>in</strong>g hypotheses. We leave the developmentand test<strong>in</strong>g (on other projects) <strong>of</strong> hypotheses to futurework.7. CONCLUSION AND FUTURE WORKThis report is a first attempt at understand<strong>in</strong>g the <strong>code</strong> <strong>review</strong> <strong>processes</strong>used by <strong>open</strong> <strong>source</strong> projects. In exam<strong>in</strong><strong>in</strong>g the “<strong>of</strong>ficial”<strong>code</strong> <strong>review</strong> statements <strong>of</strong> four projects we found <strong>in</strong>terest<strong>in</strong>g similarities,such as the request for small, complete patches. We als<strong>of</strong>ound that differences <strong>in</strong> commit policy appeared to dictate thelevel <strong>of</strong> patch <strong>review</strong>. The amount, quality, and type <strong>of</strong> test<strong>in</strong>gappeared to be related to how easy it is to automate tests. Fewerautomated tests appears to force a project to do more formal <strong>code</strong><strong>review</strong>. Through qualitative <strong>exam<strong>in</strong>ation</strong> <strong>of</strong> development artifacts,we discovered a number <strong>of</strong> anecdotal patterns. The most <strong>in</strong>terest<strong>in</strong>gpattern was the ”committer as mediator” pattern. This patternoccurs when a core group member acts as a knowledgeable mediatorbetween those who report bugs and will try out new patches(lazy testers) and those who will <strong>in</strong>vest the energy to create a patch.We quantitatively answered our research questions by extract<strong>in</strong>gthe Apache project’s developer and commit mail<strong>in</strong>g lists <strong>in</strong>to adatabase. Surpris<strong>in</strong>gly, the core group has changed dramaticallyfrom 1996 to 2005, with a s<strong>in</strong>gle, different top developer dom<strong>in</strong>at<strong>in</strong>gthe number <strong>of</strong> commits for two year periods. These changesaffected the social and s<strong>of</strong>tware development <strong>processes</strong> <strong>of</strong> the community,mak<strong>in</strong>g lifetime analysis less mean<strong>in</strong>gful. We found thatthe core <strong>review</strong><strong>in</strong>g group was similar, but not exactly the same asthe committ<strong>in</strong>g core group. Pre-commit <strong>review</strong>ers appear to have

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!