11.07.2015 Views

A preliminary examination of code review processes in open source ...

A preliminary examination of code review processes in open source ...

A preliminary examination of code review processes in open source ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.1 Core Group ChangesThe Apache core group has changed dramatically over the lifetime<strong>of</strong> the project. Of the core group that Mockus et al. [4] studiedbetween February 1996 and May 1999, only one member is stillpart <strong>of</strong> the top 20 committers <strong>in</strong> 2005; four members rema<strong>in</strong> <strong>in</strong> theperiphery <strong>of</strong> the project. This change is not a recent phenomenon,as many <strong>of</strong> the lead<strong>in</strong>g Apache developers <strong>in</strong> 1998 made way forother developers <strong>in</strong> 1999 and left the core group <strong>in</strong> 2000. Exam<strong>in</strong><strong>in</strong>gthe top committer for each year, one notices that this <strong>in</strong>dividualrema<strong>in</strong>s on top for two years and gradually fades out over the nextfew years. In every <strong>in</strong>stance, the top committer has at least one <strong>of</strong>his top two years with more than double the number <strong>of</strong> commits <strong>of</strong>the second highest committer. It will be <strong>in</strong>terest<strong>in</strong>g to use a metricthat takes the movement <strong>of</strong> all developers, not just the top developer,<strong>in</strong>to account. We expect that these changes lead to an <strong>in</strong>flux<strong>of</strong> new and <strong>in</strong>novative ideas as well as changes <strong>in</strong> the developmentprocess. The orig<strong>in</strong>al statistics that we gathered were project lifetimestatistics, from 1997 to 2004 (1996 and 2005 are <strong>in</strong>complete),but due to core group changes, these may be mislead<strong>in</strong>g. The task<strong>of</strong> analyz<strong>in</strong>g and <strong>in</strong>terpret<strong>in</strong>g <strong>in</strong>dividual years is a much more <strong>in</strong>volvedand difficult task than analyz<strong>in</strong>g the lifetime <strong>of</strong> the project.Where possible, we present yearly and monthly data.5.2 Types <strong>of</strong> ReviewAnother complication is that the Apache project has many differenttypes <strong>of</strong> <strong>review</strong>. The three identified types are pre-commit (formal<strong>review</strong>-then-commit), post-commit (commit-then-<strong>review</strong>), and secondary<strong>review</strong>. Pre-commit <strong>review</strong> is only recorded when a patch iscommitted; this means that <strong>review</strong>ed revisions and rejected patchesare not recorded. Post-commit <strong>review</strong> is never formally recorded,but we can deduce when it occurs by look<strong>in</strong>g <strong>in</strong> the mail<strong>in</strong>g list forreplies to a committed patch. S<strong>in</strong>ce a reply to a patch likely representssomeone f<strong>in</strong>d<strong>in</strong>g a problem with the patch, we only know howmany post-commit <strong>review</strong>s conta<strong>in</strong>ed problems. We do not knowhow many post-commit patches were <strong>review</strong>ed with no problemsfound. Furthermore, we do not know how many different <strong>in</strong>dividualspost-commit <strong>review</strong>ed a patch. Secondary <strong>review</strong> occurs whenan <strong>in</strong>dividual does not <strong>review</strong> the patch themselves, but reads thecomments <strong>of</strong> the <strong>review</strong>er, and makes an additional comment themselves.This type <strong>of</strong> <strong>review</strong> is only recorded as replies to a patch <strong>in</strong>the mail<strong>in</strong>g list; it can apply to both types <strong>of</strong> <strong>review</strong>s and is ma<strong>in</strong>lyleft to future work.5.3 Accept? Reject?What percentage <strong>of</strong> <strong>review</strong>ed patches are accepted? Rejected?Over the lifetime <strong>of</strong> the project, 5747 patches were submitted forpre-commit <strong>review</strong>. Of these patches we were able to trace only2522 to patch commits (44%). We expect that many patches weresubmitted more than once when they were ignored or if they requiredpatch revisions. To determ<strong>in</strong>e why 56% <strong>of</strong> pre-commit <strong>review</strong>edpatches were rejected, we <strong>in</strong>tend to do more detailed dataanalysis <strong>in</strong>clud<strong>in</strong>g the use <strong>of</strong> more sophisticated message thread<strong>in</strong>gtechniques and manual classification <strong>of</strong> a smaller time period.Over the lifetime <strong>of</strong> the project, 9% <strong>of</strong> post-<strong>review</strong>ed commits arefound to have problems. We assume, based on qualitative <strong>in</strong>formationand post-commit <strong>review</strong> frequency (see below), that closeto 100% <strong>of</strong> the patches are post-commit <strong>review</strong>ed. This impliesthat 91% <strong>of</strong> patches are accepted. Interest<strong>in</strong>gly, 5% <strong>of</strong> pre-commit<strong>review</strong>ed patches are found to still conta<strong>in</strong> a problem when postcommit<strong>review</strong> is performed. We believe that this is because pre-Figure 1: The cumulative distribution <strong>of</strong> pre-commit <strong>review</strong>sby year<strong>review</strong> patches are generally larger, more complex, and <strong>of</strong>ten submittedby external contributors.5.4 Reviewer CharacteristicsWho performs the <strong>review</strong>?Pre-commit <strong>review</strong>. Over the lifetime <strong>of</strong> the Apache project, <strong>review</strong>swere performed by 130 <strong>in</strong>dividuals; however, many <strong>of</strong> thesepeople only <strong>review</strong>ed a s<strong>in</strong>gle patch. In 1999, Mockus et al. [4]found that the Apache project had a core group size <strong>of</strong> 15 developers;this core group made 83% <strong>of</strong> the commits. Analyz<strong>in</strong>g thesame time period, we found that the top 15 <strong>review</strong>ers performed93% <strong>of</strong> the <strong>review</strong>s. Dur<strong>in</strong>g this time, there were 55 <strong>in</strong>dividual<strong>review</strong>ers and approximately 300 [4] <strong>in</strong>dividual 14 patch contributors.It would appear that the <strong>review</strong> group is a subset <strong>of</strong> the coregroup. Indeed, 10 <strong>review</strong>ers are responsible for 84% <strong>of</strong> the <strong>review</strong>s.In figure 1, it can be seen that between 1997 and 2000 thecore <strong>review</strong>s rema<strong>in</strong>s small. However, from 2001 to 2002, the <strong>review</strong>group grows to 20 <strong>review</strong>ers perform<strong>in</strong>g 84% <strong>of</strong> the <strong>review</strong>s.In 2004, the <strong>review</strong> group appears to have shrunk to 12 <strong>review</strong>ers.We are not certa<strong>in</strong> about the cause <strong>of</strong> this apparent fluctuation <strong>in</strong><strong>review</strong>er group size. Currently, we cannot correlate it with commitcore group size because we have not resolved the patch submitternames. However, figure 2 shows the total number <strong>of</strong> commits (<strong>in</strong>flateds<strong>in</strong>ce submitter name is not resolved) and <strong>review</strong>s over thelifetime <strong>of</strong> the project. This figure demonstrates that over the lifetime<strong>of</strong> the project, the group <strong>of</strong> committers and <strong>review</strong>s is almostthe same size. The core group size is much larger, with 80% <strong>of</strong> thecommits done by 26 <strong>in</strong>dividuals, than Mockus’s orig<strong>in</strong>al f<strong>in</strong>d<strong>in</strong>g <strong>of</strong>15 developers (likely from core group changes).The previous results perta<strong>in</strong> only to pre-commit <strong>review</strong>s. S<strong>in</strong>cepost-commit and secondary <strong>review</strong> require name resolution on themail<strong>in</strong>g list (over 100,000 emails), we leave the determ<strong>in</strong>ation <strong>of</strong>who performs these <strong>review</strong>s to future work.Are the top developers (committers) also the top <strong>review</strong>ers?Pre-commit <strong>review</strong>. We exam<strong>in</strong>ed the lifetime top committers and<strong>review</strong>ers to determ<strong>in</strong>e if the same <strong>in</strong>dividuals are on top <strong>in</strong> bothroles. The group size varied from five to 20 people, each time we14 Exam<strong>in</strong><strong>in</strong>g Mockus’ scripts revealed that resolution <strong>of</strong> names wasonly performed on core group members, thus <strong>in</strong>flat<strong>in</strong>g the totalnumber <strong>of</strong> contributors, but hav<strong>in</strong>g little effect on the percentagebased size <strong>of</strong> the core group

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!