11.07.2015 Views

A preliminary examination of code review processes in open source ...

A preliminary examination of code review processes in open source ...

A preliminary examination of code review processes in open source ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.1 Core Group ChangesThe Apache core group has changed dramatically over the lifetime<strong>of</strong> the project. Of the core group that Mockus et al. [4] studiedbetween February 1996 and May 1999, only one member is stillpart <strong>of</strong> the top 20 committers <strong>in</strong> 2005; four members rema<strong>in</strong> <strong>in</strong> theperiphery <strong>of</strong> the project. This change is not a recent phenomenon,as many <strong>of</strong> the lead<strong>in</strong>g Apache developers <strong>in</strong> 1998 made way forother developers <strong>in</strong> 1999 and left the core group <strong>in</strong> 2000. Exam<strong>in</strong><strong>in</strong>gthe top committer for each year, one notices that this <strong>in</strong>dividualrema<strong>in</strong>s on top for two years and gradually fades out over the nextfew years. In every <strong>in</strong>stance, the top committer has at least one <strong>of</strong>his top two years with more than double the number <strong>of</strong> commits <strong>of</strong>the second highest committer. It will be <strong>in</strong>terest<strong>in</strong>g to use a metricthat takes the movement <strong>of</strong> all developers, not just the top developer,<strong>in</strong>to account. We expect that these changes lead to an <strong>in</strong>flux<strong>of</strong> new and <strong>in</strong>novative ideas as well as changes <strong>in</strong> the developmentprocess. The orig<strong>in</strong>al statistics that we gathered were project lifetimestatistics, from 1997 to 2004 (1996 and 2005 are <strong>in</strong>complete),but due to core group changes, these may be mislead<strong>in</strong>g. The task<strong>of</strong> analyz<strong>in</strong>g and <strong>in</strong>terpret<strong>in</strong>g <strong>in</strong>dividual years is a much more <strong>in</strong>volvedand difficult task than analyz<strong>in</strong>g the lifetime <strong>of</strong> the project.Where possible, we present yearly and monthly data.5.2 Types <strong>of</strong> ReviewAnother complication is that the Apache project has many differenttypes <strong>of</strong> <strong>review</strong>. The three identified types are pre-commit (formal<strong>review</strong>-then-commit), post-commit (commit-then-<strong>review</strong>), and secondary<strong>review</strong>. Pre-commit <strong>review</strong> is only recorded when a patch iscommitted; this means that <strong>review</strong>ed revisions and rejected patchesare not recorded. Post-commit <strong>review</strong> is never formally recorded,but we can deduce when it occurs by look<strong>in</strong>g <strong>in</strong> the mail<strong>in</strong>g list forreplies to a committed patch. S<strong>in</strong>ce a reply to a patch likely representssomeone f<strong>in</strong>d<strong>in</strong>g a problem with the patch, we only know howmany post-commit <strong>review</strong>s conta<strong>in</strong>ed problems. We do not knowhow many post-commit patches were <strong>review</strong>ed with no problemsfound. Furthermore, we do not know how many different <strong>in</strong>dividualspost-commit <strong>review</strong>ed a patch. Secondary <strong>review</strong> occurs whenan <strong>in</strong>dividual does not <strong>review</strong> the patch themselves, but reads thecomments <strong>of</strong> the <strong>review</strong>er, and makes an additional comment themselves.This type <strong>of</strong> <strong>review</strong> is only recorded as replies to a patch <strong>in</strong>the mail<strong>in</strong>g list; it can apply to both types <strong>of</strong> <strong>review</strong>s and is ma<strong>in</strong>lyleft to future work.5.3 Accept? Reject?What percentage <strong>of</strong> <strong>review</strong>ed patches are accepted? Rejected?Over the lifetime <strong>of</strong> the project, 5747 patches were submitted forpre-commit <strong>review</strong>. Of these patches we were able to trace only2522 to patch commits (44%). We expect that many patches weresubmitted more than once when they were ignored or if they requiredpatch revisions. To determ<strong>in</strong>e why 56% <strong>of</strong> pre-commit <strong>review</strong>edpatches were rejected, we <strong>in</strong>tend to do more detailed dataanalysis <strong>in</strong>clud<strong>in</strong>g the use <strong>of</strong> more sophisticated message thread<strong>in</strong>gtechniques and manual classification <strong>of</strong> a smaller time period.Over the lifetime <strong>of</strong> the project, 9% <strong>of</strong> post-<strong>review</strong>ed commits arefound to have problems. We assume, based on qualitative <strong>in</strong>formationand post-commit <strong>review</strong> frequency (see below), that closeto 100% <strong>of</strong> the patches are post-commit <strong>review</strong>ed. This impliesthat 91% <strong>of</strong> patches are accepted. Interest<strong>in</strong>gly, 5% <strong>of</strong> pre-commit<strong>review</strong>ed patches are found to still conta<strong>in</strong> a problem when postcommit<strong>review</strong> is performed. We believe that this is because pre-Figure 1: The cumulative distribution <strong>of</strong> pre-commit <strong>review</strong>sby year<strong>review</strong> patches are generally larger, more complex, and <strong>of</strong>ten submittedby external contributors.5.4 Reviewer CharacteristicsWho performs the <strong>review</strong>?Pre-commit <strong>review</strong>. Over the lifetime <strong>of</strong> the Apache project, <strong>review</strong>swere performed by 130 <strong>in</strong>dividuals; however, many <strong>of</strong> thesepeople only <strong>review</strong>ed a s<strong>in</strong>gle patch. In 1999, Mockus et al. [4]found that the Apache project had a core group size <strong>of</strong> 15 developers;this core group made 83% <strong>of</strong> the commits. Analyz<strong>in</strong>g thesame time period, we found that the top 15 <strong>review</strong>ers performed93% <strong>of</strong> the <strong>review</strong>s. Dur<strong>in</strong>g this time, there were 55 <strong>in</strong>dividual<strong>review</strong>ers and approximately 300 [4] <strong>in</strong>dividual 14 patch contributors.It would appear that the <strong>review</strong> group is a subset <strong>of</strong> the coregroup. Indeed, 10 <strong>review</strong>ers are responsible for 84% <strong>of</strong> the <strong>review</strong>s.In figure 1, it can be seen that between 1997 and 2000 thecore <strong>review</strong>s rema<strong>in</strong>s small. However, from 2001 to 2002, the <strong>review</strong>group grows to 20 <strong>review</strong>ers perform<strong>in</strong>g 84% <strong>of</strong> the <strong>review</strong>s.In 2004, the <strong>review</strong> group appears to have shrunk to 12 <strong>review</strong>ers.We are not certa<strong>in</strong> about the cause <strong>of</strong> this apparent fluctuation <strong>in</strong><strong>review</strong>er group size. Currently, we cannot correlate it with commitcore group size because we have not resolved the patch submitternames. However, figure 2 shows the total number <strong>of</strong> commits (<strong>in</strong>flateds<strong>in</strong>ce submitter name is not resolved) and <strong>review</strong>s over thelifetime <strong>of</strong> the project. This figure demonstrates that over the lifetime<strong>of</strong> the project, the group <strong>of</strong> committers and <strong>review</strong>s is almostthe same size. The core group size is much larger, with 80% <strong>of</strong> thecommits done by 26 <strong>in</strong>dividuals, than Mockus’s orig<strong>in</strong>al f<strong>in</strong>d<strong>in</strong>g <strong>of</strong>15 developers (likely from core group changes).The previous results perta<strong>in</strong> only to pre-commit <strong>review</strong>s. S<strong>in</strong>cepost-commit and secondary <strong>review</strong> require name resolution on themail<strong>in</strong>g list (over 100,000 emails), we leave the determ<strong>in</strong>ation <strong>of</strong>who performs these <strong>review</strong>s to future work.Are the top developers (committers) also the top <strong>review</strong>ers?Pre-commit <strong>review</strong>. We exam<strong>in</strong>ed the lifetime top committers and<strong>review</strong>ers to determ<strong>in</strong>e if the same <strong>in</strong>dividuals are on top <strong>in</strong> bothroles. The group size varied from five to 20 people, each time we14 Exam<strong>in</strong><strong>in</strong>g Mockus’ scripts revealed that resolution <strong>of</strong> names wasonly performed on core group members, thus <strong>in</strong>flat<strong>in</strong>g the totalnumber <strong>of</strong> contributors, but hav<strong>in</strong>g little effect on the percentagebased size <strong>of</strong> the core group


Figure 2: The cumulative distribution <strong>of</strong> <strong>review</strong>s and commitsfor the lifetime <strong>of</strong> the project.Common Group Size Percent3 5 605 10 5013 15 8716 20 80Table 1: Lifetime top <strong>review</strong>ers and committers. The first column<strong>in</strong>dicates which <strong>in</strong>dividuals are common to both roles. Thesecond column is the total number <strong>of</strong> <strong>in</strong>dividuals (e.g., top five).The f<strong>in</strong>al column is the percentage <strong>of</strong> common <strong>in</strong>dividuals.compare the <strong>review</strong> and commit group to determ<strong>in</strong>e how many <strong>in</strong>dividualsare common to both roles. Table 1 shows that at least 80%<strong>of</strong> <strong>in</strong>dividuals are common to both roles with a group size between15 and 20 <strong>in</strong>dividuals. Surpris<strong>in</strong>gly, the percentage does not cont<strong>in</strong>ueto <strong>in</strong>crease as the core group size <strong>in</strong>creases from 15 to 20 (orfrom 20 to 26). This <strong>in</strong>dicates that the <strong>review</strong> group is not necessarilya subset <strong>of</strong> the commit group. The two roles are even less relatedwhen we exam<strong>in</strong>e the top <strong>review</strong>ers and committers. Table 1 illustratesthat 60% or less <strong>of</strong> the <strong>in</strong>dividuals are common to both roleswith a groups size <strong>of</strong> 10 to 15 <strong>in</strong>dividuals. Additionally, results regard<strong>in</strong>gthe top five <strong>review</strong>ers and committers <strong>in</strong>dicate that the <strong>review</strong>ershave been with the project for an average <strong>of</strong> 7 years whilethe committers have been with the project for average <strong>of</strong> 4.75 years.The longevity <strong>of</strong> <strong>review</strong>ers compared to committers <strong>in</strong>dicates thatstrong <strong>review</strong>ers stay with the project longer and potentially coachthe newer developers. Further study <strong>of</strong> shorter time periods and<strong>in</strong>dividual people must be done before def<strong>in</strong>ite conclusions can bedrawn.5.5 Review CyclesWhat is the frequency <strong>of</strong> <strong>review</strong>?The median number <strong>of</strong> pre-commit <strong>review</strong>s that were accepted is262 per year (mean 272). The median number <strong>of</strong> <strong>review</strong>s per monthis 18 (mean 22). The actual number is likely higher, s<strong>in</strong>ce theseresults do not account for rejected patches.The median number <strong>of</strong> post-commit <strong>review</strong>s that conta<strong>in</strong>ed a problemis 258 per year (mean 304). The median number <strong>of</strong> <strong>review</strong>s permonth is 17 (mean 23).Figure 3 shows the fluctuations <strong>of</strong> two <strong>review</strong> types at monthly <strong>in</strong>-Figure 5: The top <strong>in</strong>dividual <strong>review</strong>ers and the two years theywere the top <strong>review</strong>er.tervals. S<strong>in</strong>ce more <strong>review</strong>s will likely lead to more pre-commit<strong>review</strong> acceptance and post-commit <strong>review</strong>s that f<strong>in</strong>d problems,this figure adequately shows the fluctuation <strong>in</strong> both <strong>review</strong> types.It appears that few months had high levels <strong>of</strong> both types <strong>of</strong> <strong>review</strong>.The years 2000, 2001, and 2002 (Apache 2.0 was released<strong>in</strong> 2002) had a susta<strong>in</strong>ed <strong>in</strong>crease <strong>in</strong> commits over other years (Seefigure 4. These three years also have the highest levels <strong>of</strong> postcommit<strong>review</strong>s and the lowest levels <strong>of</strong> pre-commit <strong>review</strong>s. Qualitatively,we know that pre-commit <strong>review</strong>ed patches are larger thanpost-commit <strong>review</strong>ed patches. This size difference would producemore commits <strong>in</strong> years with a large number <strong>of</strong> post-commit <strong>review</strong>s.We discuss other reasons for these fluctuations later. Formalcorrelational tests (e.g., Pearson correlations) need to be performedbefore we can draw def<strong>in</strong>ite conclusions.When are <strong>review</strong>s performed?The Apache policies state that changes to pre-release <strong>code</strong> can bepost-commit <strong>review</strong>ed, while changes to post-release <strong>code</strong> must bepre-commit <strong>review</strong>ed. We have not exam<strong>in</strong>ed the prevalence <strong>of</strong>each type <strong>of</strong> <strong>review</strong> <strong>in</strong> pre- and post release <strong>in</strong> each <strong>of</strong> the Apachebranches. However, <strong>exam<strong>in</strong>ation</strong> <strong>of</strong> figure 3 shows that the release<strong>of</strong> Apache 1.3 <strong>in</strong> June <strong>of</strong> 1998 and Apache 2.0 <strong>in</strong> April <strong>of</strong> 2002were followed by an immediate decrease <strong>in</strong> post-commit <strong>review</strong>and a more gradual and less obvious <strong>in</strong>crease <strong>in</strong> pre-commit <strong>review</strong>.It would be <strong>in</strong>terest<strong>in</strong>g to mark all releases on figure 3.Apache policy also states that pre-commit <strong>review</strong> must be performedon new features and complex changes, while the post-commit <strong>review</strong>policy applies to all other patches. We do not believe that thisis always applied <strong>in</strong> practice. Figure 5 show how <strong>of</strong>ten each type<strong>of</strong> <strong>review</strong> was performed for patches submitted by the top committer.It appears impossible that the top committer for 1999 and2000 only contributed seven patches that required pre-commit <strong>review</strong>(i.e., were complex or were new features). Look<strong>in</strong>g throughthe mail<strong>in</strong>g list we found that this <strong>in</strong>dividual was aggressive towardhis <strong>review</strong>ers and toward people he pre-commit <strong>review</strong>ed. We believethat the type <strong>of</strong> <strong>review</strong> performed is more dependent on theculture <strong>of</strong> the core group than on the type <strong>of</strong> patch.5.6 Merit-based trustHow does merit-based trust between actors affect the <strong>review</strong>? Aremore trusted <strong>in</strong>dividuals <strong>review</strong>ed less <strong>of</strong>ten? How much feedbackis provided <strong>in</strong> the <strong>review</strong>?


Figure 3: The blue l<strong>in</strong>e represents pre-commit <strong>review</strong>s. The red l<strong>in</strong>e represents post-commit <strong>review</strong>s.Figure 4: The number <strong>of</strong> raw commits per year.


The Apache policy <strong>of</strong> post-commit <strong>review</strong> applies only to trusteddevelopers with commit privileges. This policy is left to the <strong>in</strong>terpretation<strong>of</strong> the <strong>in</strong>dividual and the current understand<strong>in</strong>g <strong>in</strong> thecore group (See 5. All <strong>in</strong>dividuals who do not have commit privileges(i.e., are not <strong>in</strong> the core group) must be formally pre-commit<strong>review</strong>ed. In future studies, we would like to understand how thetrust affects the number <strong>of</strong> comments made <strong>in</strong> a <strong>review</strong>. We expectthat newer, promis<strong>in</strong>g developers would be given more feedbackthan older developers with whom the <strong>review</strong>er has an establishedrelationship.5.7 Time and Patch SizeHow long do <strong>review</strong>s take to perform? How does the patch sizeaffect the <strong>review</strong>?Pre-commit <strong>review</strong> requires match<strong>in</strong>g patches submitted to the mail<strong>in</strong>glist with committed patches. This match<strong>in</strong>g is difficult (rejectedand resubmitted patches) and error prone; it is left to future work.However, we understand qualitatively that the more formal precommit<strong>review</strong> takes longer than post-commit <strong>review</strong>.Post-commit <strong>review</strong>s. The follow<strong>in</strong>g quotations are from the discussionthat ultimately led to Apache’s acceptance <strong>of</strong> the committhen-<strong>review</strong>(post-commit) policy. They illustrate many <strong>of</strong> the problemsassociated with pre-commit <strong>review</strong>. These quotes also qualitativelyexpla<strong>in</strong> the sudden drop <strong>in</strong> pre-commit <strong>review</strong> experienced<strong>in</strong> early 1998 (See figure 3).“This is a beta, and I’m tired <strong>of</strong> post<strong>in</strong>g bloody one-l<strong>in</strong>e bug fixpatches and wait<strong>in</strong>g for votes.” “Mak<strong>in</strong>g a change to apache, regardless<strong>of</strong> what the change is, is a very heavyweight affair.”, top<strong>review</strong>er <strong>in</strong> 1997 compla<strong>in</strong><strong>in</strong>g about the <strong>review</strong> process <strong>in</strong> January<strong>of</strong> 1998.“Dur<strong>in</strong>g the 1.2.5 release [before the acceptance <strong>of</strong> commit-then<strong>review</strong>],[name removed] noted how many people are just +1<strong>in</strong>g[approv<strong>in</strong>g] <strong>code</strong> without really look<strong>in</strong>g at it. I’ve done it. I’m surewe all have.”, part <strong>of</strong> discussion <strong>in</strong> January <strong>of</strong> 1998.”I th<strong>in</strong>k the people do<strong>in</strong>g the bulk <strong>of</strong> the committ<strong>in</strong>g appear veryaware <strong>of</strong> what the others are committ<strong>in</strong>g. I’ve seen enough cases<strong>of</strong> hard to spot typos be<strong>in</strong>g po<strong>in</strong>ted out with<strong>in</strong> hours <strong>of</strong> a commit.”,part <strong>of</strong> discussion <strong>in</strong> January <strong>of</strong> 1998.Post-commit <strong>review</strong>s happen very soon after the patch is committedand are done surpris<strong>in</strong>gly fast (even faster than the last quotewould have us believe). Table 2) shows that 46% <strong>of</strong> post-commit<strong>review</strong>s occurred <strong>in</strong> less than an hour and 84% occurred <strong>in</strong> lessthan 12 hours. To have response times like these, the Apache groupmust watch the commit mail<strong>in</strong>g list very closely; this is why webelieve that members <strong>of</strong> the core group <strong>review</strong> close to 100% <strong>of</strong>the patches. Additionally, given the short time frame, the size <strong>of</strong>the patch is likely very small when compared to longer-term <strong>review</strong>sthat exam<strong>in</strong>e a larger fix or problem. Mockus et al. [4] foundthat Apache had smaller patch sizes than the proprietary projectsthey exam<strong>in</strong>ed; however, they did not understand why the patcheswere smaller. Furthermore, figures 4 and 3 <strong>in</strong>dicate that the yearswith the largest number <strong>of</strong> commits also had higher levels <strong>of</strong> postcommit<strong>review</strong>. S<strong>in</strong>ce the patch is likely small, the <strong>review</strong> does nottake very long, so the <strong>review</strong> does not tax the <strong>review</strong>er’s attention.Over the lifetime <strong>of</strong> the project, 9% <strong>of</strong> all commits received a postcommit<strong>review</strong> response (i.e., there was problem with the patch). IfTime period Cumulative percentless 1 hour 45.8%less 12 hours 83.8%less 1 day 91.3%less 1 week 97.9%more 1 week 99.6Table 2: The cumulative percent <strong>of</strong> responses to commits <strong>in</strong> agiven time frame. Notice that over 90% <strong>of</strong> problems are found<strong>in</strong> less than a s<strong>in</strong>gle day.we assume that all commits are <strong>review</strong>ed (high frequency and qualitatively),then this means that 9% <strong>of</strong> the time the <strong>review</strong>er founda problem <strong>in</strong> the patch, and 91% <strong>of</strong> the time the <strong>review</strong>er did notf<strong>in</strong>d a problem <strong>in</strong> the patch. If pre-commit <strong>review</strong> was used on allpatches (as is the case with Mozilla), then n<strong>in</strong>e times out <strong>of</strong> ten thedeveloper would wait for a <strong>review</strong> only to hear that his or her patchwas f<strong>in</strong>e (false negative).Traditionally, views toward <strong>code</strong> <strong>review</strong> require patches to conta<strong>in</strong>a complete solution to a given problem or new feature. Reviewsare taken very seriously and only performed at regular and usuallylong <strong>in</strong>tervals. This long <strong>review</strong> cycle allows problems to becomeembedded <strong>in</strong> the <strong>code</strong>, mak<strong>in</strong>g their removal difficult. In contrast,post-commit <strong>review</strong> requires solutions (or partial solutions) tosmall (or part <strong>of</strong> larger) problems. S<strong>in</strong>ce the <strong>code</strong> is immediatelycommitted, the contributor and other developers can start us<strong>in</strong>g this<strong>code</strong>. If there is a problem <strong>in</strong> the <strong>code</strong>, it is noticed very soon afterit is committed, and is relatively easy to remove, as there will befewer dependencies on the problematic <strong>code</strong>.Mockus et al. [4] found that unlike proprietary projects, the Apacheproject’s defect density rema<strong>in</strong>ed the same before and after a release.They mention <strong>in</strong> pass<strong>in</strong>g that this result “may <strong>in</strong>dicate thatfewer defects are <strong>in</strong>jected <strong>in</strong>to the <strong>code</strong>, or that other defect-f<strong>in</strong>d<strong>in</strong>gactivities such as <strong>in</strong>spections are conducted more frequently or moreeffectively.” [4] Our quantitative result shows that <strong>in</strong>deed there arelikely fewer defects <strong>in</strong>jected <strong>in</strong>to the <strong>code</strong> because <strong>review</strong>s are conductedvery frequently on small patches. There are efficiency ga<strong>in</strong>sbecause <strong>review</strong>s are performed on small patches, because developersdo not have to wait for a <strong>review</strong> (90% <strong>of</strong> the time the patchpasses the <strong>review</strong>), and due to the relative ease and speed withwhich bad patches are caught and removed from the <strong>code</strong> (a vetoedpatch must be either reversed, or an immediate fix accepted bythe <strong>review</strong>er).When compared to proprietary projects, Apache’s pre-release defectdensities are lower, but Apache’s post-release defect densities(the same amount as pre-release defects) are higher. This differencecan be expla<strong>in</strong>ed by our f<strong>in</strong>d<strong>in</strong>g that only 8.7% <strong>of</strong> problematicpatches, which are found by post-commit <strong>review</strong>, are discoveredafter a s<strong>in</strong>gle day has passed. This result also makes <strong>in</strong>tuitive sensefor two reasons. First, the <strong>review</strong>er has already <strong>review</strong>ed and acceptedthe patch and will likely use other defect detection techniquesbefore re-<strong>review</strong><strong>in</strong>g a patch. Second, as more time elapses,the number <strong>of</strong> small, fragmented patches to exam<strong>in</strong>e <strong>in</strong>creases,mak<strong>in</strong>g it difficult to f<strong>in</strong>d the <strong>of</strong>fend<strong>in</strong>g patch (tool support couldhelp). Defects that make it through post-commit <strong>review</strong> are mostlyfound by other defect detection techniques. S<strong>in</strong>ce post-release defectdensities <strong>in</strong> Apache are high, it would appear that other <strong>open</strong><strong>source</strong> defect detection strategies are less effective than traditionalones. However, we do not feel that the credit for low post-release


Figure 6: First column <strong>in</strong>dicates percentage <strong>of</strong> commits <strong>review</strong>edper year. Second column <strong>in</strong>dicates percentage <strong>of</strong> commits<strong>review</strong>ed by two or more <strong>review</strong>ers.defect densities can be solely credited to post-commit <strong>review</strong>. Unlikeproprietary projects, testers and users have access to and canpost bugs about the current Apache <strong>source</strong>. Furthermore, it is difficultto obta<strong>in</strong> an accurate measure <strong>of</strong> defect density (and given thatthis is a report I have not had time to <strong>review</strong> the literature on defectdensities <strong>in</strong> <strong>open</strong> <strong>source</strong> projects). A more recent study by Reason<strong>in</strong>gInc. 15 <strong>in</strong>dicated that the Apache project has a defect densitycomparable to proprietary s<strong>of</strong>tware.It must be noted that this type <strong>of</strong> post-commit <strong>review</strong> does not scaleto large projects with less modular architectures than Apache’s. Itwould appear (although I have not read the literature on peer andextreme programm<strong>in</strong>g) that post-commit <strong>review</strong> is, <strong>in</strong> effect, a type<strong>of</strong> distributed peer programm<strong>in</strong>g. In order to <strong>review</strong> <strong>code</strong> <strong>in</strong> sucha short time period, <strong>review</strong>ers must be <strong>in</strong>timately familiar with the<strong>code</strong>, and they must understand the effect <strong>of</strong> the <strong>code</strong> on the systemas a whole. This type <strong>of</strong> distributed peer programm<strong>in</strong>g is potentiallymore efficient than traditional co-located peer programm<strong>in</strong>g(where one developer controls the <strong>in</strong>put devices and the other commentsand helps <strong>in</strong> navigation), because the peer must only exam<strong>in</strong>ewhat the other developer feels is the correct solution, not every step,<strong>in</strong>clud<strong>in</strong>g <strong>in</strong>valid ones, to the solution. If the peer does not feel thepatch is correct, they can discuss the patch and give the developernavigational and other help. Furthermore, the advice is not limitedto a s<strong>in</strong>gle peer. Future experimentation is needed to validate theseresults5.8 How many <strong>review</strong>ers <strong>review</strong> a s<strong>in</strong>gle patch?Pre-commit <strong>review</strong> Although we do not know how many <strong>in</strong>dividualsun<strong>of</strong>ficially <strong>review</strong> a patch (i.e., discuss the patch on the mail<strong>in</strong>glist), we do know how many <strong>review</strong>ers <strong>of</strong>ficially <strong>review</strong> patches thatwere accepted. For the lifetime <strong>of</strong> the project, 56% <strong>of</strong> patches were<strong>review</strong>ed by one <strong>in</strong>dividual, 82% <strong>of</strong> <strong>review</strong>s were performed byone or two <strong>review</strong>ers, and only 18% <strong>of</strong> <strong>review</strong>s had more than two<strong>review</strong>ers. Figure 6 shows the percentage <strong>of</strong> commits that were <strong>review</strong>ed<strong>in</strong> each year and the number <strong>of</strong> these commits that <strong>in</strong>volvedmore than one <strong>review</strong>er. Aga<strong>in</strong>, it is obvious that certa<strong>in</strong> years hadhigh levels <strong>of</strong> <strong>in</strong>volvement <strong>in</strong> the <strong>review</strong> process while other did not(Compare with figure 3.Post-commit <strong>review</strong>. The median number <strong>of</strong> responses is two(mean 3.8). Normally, the first email describes the problem withthe patch (is a response to the commit) and the second email is theresponse address<strong>in</strong>g the patch problem. Exam<strong>in</strong><strong>in</strong>g 80% <strong>of</strong> the <strong>review</strong>s<strong>in</strong> ascend<strong>in</strong>g order revealed that these patches received 4 orfewer responses (not <strong>in</strong>clud<strong>in</strong>g the <strong>in</strong>itial problem/<strong>review</strong> email).This f<strong>in</strong>d<strong>in</strong>g shows that not only does the process <strong>of</strong> <strong>review</strong> happenquickly (see above), but the fix to the (small) patch is usually15 The results can be obta<strong>in</strong>ed at http://www.reason<strong>in</strong>g.com/downloads.html December 2005obvious and does not require a great deal <strong>of</strong> discussion. S<strong>in</strong>ce thenumber <strong>of</strong> <strong>in</strong>dividuals who discuss a patch must be equal to or lessthan the number <strong>of</strong> email messages, we expect that usually onlytwo people discusses any given patch: the <strong>review</strong>er and the contributor.There are a few patches that have a large number <strong>of</strong> replies,the maximum number <strong>of</strong> responses is 80. Exam<strong>in</strong><strong>in</strong>g this emaildiscussion, we discovered that the majority <strong>of</strong> the responses werenot related to the patch, but some other problem that the patch hadmade obvious to the group. We expect that this type <strong>of</strong> tangentialdiscussion or some controversial issue will be found <strong>in</strong> otherpatches with high response rates. Future work is required to validatethese results and to determ<strong>in</strong>e the time between <strong>review</strong> andresponse to the problems found by the <strong>review</strong>.5.9 Patch K<strong>in</strong>dWhat k<strong>in</strong>ds <strong>of</strong> non-<strong>source</strong> <strong>code</strong> patches are <strong>review</strong>ed? How doesthe k<strong>in</strong>d <strong>of</strong> patch affect the <strong>review</strong>?These questions are ma<strong>in</strong>ly left to future work. However, as wenoted <strong>in</strong> the patch statement section, Apache requires documentationto be submitted for new features and major changes. S<strong>in</strong>cedocumentation is not critical to the function<strong>in</strong>g <strong>of</strong> the product, weexpect that documentation is only <strong>review</strong>ed when it comes from anexternal <strong>source</strong>, and then only m<strong>in</strong>imally. Translations to other languagesusually require at least two translators to perform the task.This process serves as <strong>review</strong>.6. HYPOTHESESBefore develop<strong>in</strong>g hypotheses we need to fix some <strong>of</strong> the problems<strong>in</strong> our data, such as resolv<strong>in</strong>g the names <strong>in</strong> the mail<strong>in</strong>g list and deal<strong>in</strong>gwith a number <strong>of</strong> m<strong>in</strong>or <strong>in</strong>consistencies, which are <strong>in</strong> less than10% <strong>of</strong> our data. It would also be ideal to present our results toa group <strong>of</strong> <strong>in</strong>terested researchers to obta<strong>in</strong> a different perspectiveon these f<strong>in</strong>d<strong>in</strong>gs before develop<strong>in</strong>g hypotheses. We leave the developmentand test<strong>in</strong>g (on other projects) <strong>of</strong> hypotheses to futurework.7. CONCLUSION AND FUTURE WORKThis report is a first attempt at understand<strong>in</strong>g the <strong>code</strong> <strong>review</strong> <strong>processes</strong>used by <strong>open</strong> <strong>source</strong> projects. In exam<strong>in</strong><strong>in</strong>g the “<strong>of</strong>ficial”<strong>code</strong> <strong>review</strong> statements <strong>of</strong> four projects we found <strong>in</strong>terest<strong>in</strong>g similarities,such as the request for small, complete patches. We als<strong>of</strong>ound that differences <strong>in</strong> commit policy appeared to dictate thelevel <strong>of</strong> patch <strong>review</strong>. The amount, quality, and type <strong>of</strong> test<strong>in</strong>gappeared to be related to how easy it is to automate tests. Fewerautomated tests appears to force a project to do more formal <strong>code</strong><strong>review</strong>. Through qualitative <strong>exam<strong>in</strong>ation</strong> <strong>of</strong> development artifacts,we discovered a number <strong>of</strong> anecdotal patterns. The most <strong>in</strong>terest<strong>in</strong>gpattern was the ”committer as mediator” pattern. This patternoccurs when a core group member acts as a knowledgeable mediatorbetween those who report bugs and will try out new patches(lazy testers) and those who will <strong>in</strong>vest the energy to create a patch.We quantitatively answered our research questions by extract<strong>in</strong>gthe Apache project’s developer and commit mail<strong>in</strong>g lists <strong>in</strong>to adatabase. Surpris<strong>in</strong>gly, the core group has changed dramaticallyfrom 1996 to 2005, with a s<strong>in</strong>gle, different top developer dom<strong>in</strong>at<strong>in</strong>gthe number <strong>of</strong> commits for two year periods. These changesaffected the social and s<strong>of</strong>tware development <strong>processes</strong> <strong>of</strong> the community,mak<strong>in</strong>g lifetime analysis less mean<strong>in</strong>gful. We found thatthe core <strong>review</strong><strong>in</strong>g group was similar, but not exactly the same asthe committ<strong>in</strong>g core group. Pre-commit <strong>review</strong>ers appear to have


een with the project for a longer period <strong>of</strong> time. We found thatthree different types <strong>of</strong> <strong>review</strong> exist: pre-commit, post-commit, andsecondary <strong>review</strong>. Pre-commit <strong>review</strong> was similar to traditional formal<strong>review</strong>, while post-commit <strong>review</strong> appears unique to OSS development.Post-commit <strong>review</strong> allows trusted developers to commitpatches without first be<strong>in</strong>g <strong>review</strong>ed. When the patch is later<strong>review</strong>ed, if it is found to conta<strong>in</strong> an error, it must be fixed immediatelyor removed. Most errors (91.3% <strong>of</strong> the time) are found<strong>in</strong> less than a day allow<strong>in</strong>g no time for a problematic patch to becomeembedded <strong>in</strong> the <strong>code</strong>. The process <strong>of</strong> post-commit <strong>review</strong>borrows elements <strong>of</strong> peer programm<strong>in</strong>g, but is potentially moreefficient because peers only exam<strong>in</strong>es each other’s f<strong>in</strong>al and bestattempts. Post-commit <strong>review</strong> generally had a discussion <strong>of</strong> fouremails <strong>in</strong>dicat<strong>in</strong>g that likely only the <strong>review</strong>er and committer were<strong>in</strong>volved. With pre-commit <strong>review</strong> few patches were <strong>review</strong>ed bymore than two <strong>in</strong>dividuals.Throughout this paper we have mentioned areas we believe deservefuture work; however, the most <strong>in</strong>terest<strong>in</strong>g are the post-commit <strong>review</strong>process, the changes to the core group, and the relationshipbetween test<strong>in</strong>g, <strong>review</strong>, and the number <strong>of</strong> defects <strong>in</strong> the s<strong>of</strong>twareproduct.8. REFERENCES[1] A. Capiluppi, P. Lago, and M. Morisio. Characteristics <strong>of</strong><strong>open</strong> <strong>source</strong> projects. volume 26, pages 317–327. SeventhEuropean Conference on S<strong>of</strong>tware Ma<strong>in</strong>tenance andReeng<strong>in</strong>eer<strong>in</strong>g, March 2003.[2] D. M. German. Decentralized <strong>open</strong> <strong>source</strong> global s<strong>of</strong>twaredevelopment, the GNOME experience. Journal <strong>of</strong> S<strong>of</strong>twareProcess: Improvement and Practice, 8(4):201–215, 2004.[3] D. M. German. Us<strong>in</strong>g s<strong>of</strong>tware trails to reconstruct theevolution <strong>of</strong> s<strong>of</strong>tware. Journal <strong>of</strong> S<strong>of</strong>tware Ma<strong>in</strong>tenance andEvolution: Research and Practice, 16(6):367–384, 2004.[4] A. Mockus, R. T. Field<strong>in</strong>g, and J. Herbsleb. Two case studies<strong>of</strong> <strong>open</strong> <strong>source</strong> s<strong>of</strong>tware development: Apache and Mozilla.ACM Transactions on S<strong>of</strong>tware Eng<strong>in</strong>eer<strong>in</strong>g andMethodology, 11(3):1–38, July 2002.[5] E. S. Raymond. The Cathedral and the Bazaar. O’Reilly andAssociates, 1999.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!