13.07.2015 Views

Process mining software repositories - FLOSShub

Process mining software repositories - FLOSShub

Process mining software repositories - FLOSShub

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

and evolution of the system should be significant.Therefore, we require such a developer to have morerevisions in the version control system than average(of all the developers having events in the versioncontrol system). Moreover, core members must haveadded files to and modified files in the version controlsystem.• Active developers regularly contribute new featuresand fix bugs, i.e., during their activity period theyshould have Ticket-closed, VCS: A and VCS: Mevents. We interpret “regularly” as at least one eventevery month. Furthermore, to distinguish active developersand core members we require the continuousactivity period of active developers to be shorter thanthirty-six months.• Peripheral developer are those with sporadic orirregular contribution to the system functionality. Tocharacterize these developers we require them to haveboth VCS: A and VCS: M events, but no longerrequire the events to be present every month.• Bug fixer is a developer that fixes bugs discoveredeither by herself or by a different developer. Thismeans that the bug fixers should have Ticket-closedand VCS: M events, and should not belong to theother categories.• Bug reporter is an open-source counterpart of atester. Bug reporters discover bugs and report them,but do not modify the code. Bug reporting occurseither using a bug tracker or a mailing list: samplesof the mailing list messages in a number of projectshave been inspected, confirming that the mailing listsare indeed often used for reporting bugs. Hence, bugreporters have Ticket-created events or Mail threadcreated, but no events related to version control.• Reader goes beyond using the system and inspectsthe code to understand how the system works.• Passive user is attracted by the functionality ofan open source system, but does not attempt tocontribute to it.As activities of readers and passive users are often notreflected in <strong>software</strong> <strong>repositories</strong>, we do not consider theseclasses further. Indeed, the only information availableabout the behavior of passive users is the number oftimes the executables were downloaded, but as registrationis often not a prerequisite for downloading wecannot distinguish between different passive users. Similarly,registration-free access to the source code does notallow to distinguish between different readers. Therefore,we only consider the categories: bug reporter, bug fixer,peripheral developer, active developer, core member andproject leader.2) System under investigation: We have chosen tostudy aMSN, a free and open source instant messagingapplication, clone of Windows Live Messenger. UnlikeWindows Live Messenger, aMSN supports Macintoshand UNIX/Linux in addition to the Windowsplatform. At the moment of writing aMSN has beendownloaded more than 38 million times, making it 20thmost popular SourceForge project of all times. To analyseaMSN we have considered seven bug <strong>repositories</strong>(bugs, feature requests, patches, plugins, skins, supportrequests and translations), three mail archives (commits,devel and lang) and one Subversion repository locatedat https://amsn.svn.sourceforge.net/svnroot/amsn/. We have focused on the period fromFebruary 26, 2002 until July 09, 2010. In total, the<strong>repositories</strong> contained 3137 bug reports, 34947 mail messagesand 12062 revisions. The aMSN project also has adiscussion forum. However, the data of this forum cannotbe used in the current implementation of FRASR.The data from the <strong>software</strong> <strong>repositories</strong> has been exportedusing the developer case and the data source specificbinding for each data source. The developer matchinghas been calculated automatically using the simple heuristicsmentioned in Section II-C. Furthermore, we assumethat the time stamps are synchronous, i.e., when timestamps from both <strong>repositories</strong> are equal, the points in ‘realtime’ they were recorded, do not differ significantly.3) Results: Using the exported log in combination withthe ProM Dotted Chart visualization and a spreadsheetapplication, the developers were assigned to one of theavailable roles. Figure 4 presents a part of a DottedChart visualization, used in the classification. Green dotscorrespond to mail events such as Mail thread created andMail reply, black to Ticket-created, red to other bug trackerevents, blue to addition of files in the version controlsystem, and finally white to other events of the versioncontrol system (modifications, deletions and renames). Thesize of dots represents a number of events occurring inthe same week and color mixture corresponds to eventsof different kinds occurring in the same week.By inspecting Figure 4 we clearly see that the developerin the first line is represented by a sequence of overlappingwhite, red and blue dots, starting at the very beginningof the project. According to the classification rules abovethis developer (Alvaro J. Iradier Muro/airadier 2 ) will beclassified as the project leader. Furthermore, we observethat some developers are represented by long sequences ofoverlapping dots, as is, for instance, the case for AlaouiYouness/kakaroto and Boris Faure/billiob. These developersare core members of the project. Shorter sequences representactive developers, such as Arieh Schneier/lio lionand Tom Jenkins/bluetit. Finally, disconnected dots arecharacteristic for the sporadic activity of peripheral developers.This is, for instance, the case for Harry Vennik/thaven.Visual inspection of Figure 4 provides for qualitativeresults. Additional quantitative results have been obtainedby exporting the relation between the developers andactivities, expressed by a so called ProM “originatorby activity matrix”, to the spreadsheet application andperforming simple counting. In this way, out of 1725developers we have identified 1443 bug reporters, 3 bug2 The developer names are derived from the information of the associateddeveloper aliases. This includes for example a username and aname associated to an email address.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!