Comparison of Seven Bug Report Types: A Case-Study of ... - IIIT

2012 19th Asia-Pacific Software Engineering ConferenceComparison of Seven Bug Report Types: ACase-study of Google Chrome Browser ProjectSangeeta LalJaypee Institute of Information TechnologyNoida, IndiaEmail: sangeeta@jiit.ac.inAshish SurekaIndraprastha Institute of Information Technology (IIIT-D)New Delhi, IndiaEmail: ashish@iiitd.ac.inAbstract—Bug reports submitted to an issue tracking systemcan belong to different categories such as crash, regression,security, cleanup, polish, performance and usability. A deeperunderstanding of the properties and features of various categoriesof bug reports can have implications in improving softwaremaintenance processes, tools and practices. We identify severalmetrics and characteristics serving as dimensions on whichvarious types of bug reports can be compared. We perform acase-study on Google Chromium Browser open-source projectand conduct a series of experiments to calculate various metrics.We present a characterization study comparing different types ofbug reports on metrics such as: statistics on close-time, numberof stars, number of comments, discriminatory and frequentwords for each class, entropy across reporters, entropy acrosscomponent, opening and closing trend, continuity and debuggingefficiency performance characteristics. The calculated metricsshows the similarities and differences on various dimensions forseven different types of bug reports.Index Terms—Mining Software Repositories, Software Maintenance,Issue Tracking System, Mining Bug ArchivesI. INTRODUCTIONBug reports submitted to an issue tracking systems areassigned different types of labels categorizing bugs into variousclasses. An issue is generally labeled or assigned tothe software component associated with the bug report [1].Labels can be assigned to categorize an issue into defect orfeature enhancement request [2]. Labels can also be used tofurther categorize an issue to various subcategories such asperformance, security, crash or usability [3], [4]. The focusof the work presented in this paper is empirical analysis andcomparison of seven different types of bug reports (crash, regression,security, cleanup, polish, performance and usability).This is motivated by the need to increase our understandingof various types of bug reports having practically usefulapplications in building automated software engineering toolslike automatic bug classification and triaging, task assignmentbased on expertise and work complexity. We report ouranalysis on following types of bug reports (the definitions arefrom Apple bug best practices 1 and Google Chromium bugreporting guidelines 2 )1) Regression:Features that worked in prior releases andare now broken1 https://developer.apple.com/bugreporter/bugbestpractices.html2 http://www.chromium.org/for-testers/bug-reporting-guidelines/chromiumbug-labels2) Security:Potential security exposures3) Crash:Bugs which cause a machine to crash, resultingin an irrecoverable hang or loss of data4) Performance:Issues that reduce the performance or responsivenessof an application5) Usability:A cosmetic issue or an issue with the usabilityof an application6) Polish and Cleanup: Minor fixes and issues 3The research aim of the work presented in this paper is thefollowing:1) The broad objectives of the work presented in this paperis to investigate similarities and differences betweendifferent types of bug reports (cleanup, crash, polish,performance, regression, security and usability) acrossmultiple dimensions within the same project to increaseour understanding of different bug report types.2) The specific aim of the study is to conduct an in-depthempirical analysis on publicly available dataset from apopular open-source project and compare and contrastseven different types of bug reports on metrics suchas: mean time to repair, milestone changes, priority,component and reporter entropy, frequently occurringterms and debugging process quality, efficiency andcontinuity.II. RELATED WORK AND RESEARCH CONTRIBUTIONSIn this section, we review closely related work to the studypresented in this paper. Table I presents a list of papersarranged in chronological order of their publication year.Table I characterizes five research papers on the dimensionof bug report type, paper reference, experimental datasetand study objective. As shown in Table I, qualitative andquantitative analysis of various bug report types (such ascrash, performance, security and usability) is an area that hasattracted several researchers attention. Twidale and Nichols [5]investigate usability bug reports from Mozilla and GNOMEprojects and conduct a characterization study throwing lighton how developers address and resolve issues concerning userinterfaces and interaction design. They perform a qualitativestudy on the nature of usability discussions in OSS projects,3 Exact definition not mentioned and is an interpretation from bug reportdata1530-1362/12 $26.00 © 2012 IEEEDOI 10.1109/APSEC.2012.54517

TABLE ICLOSELY RELATED WORK LISTED IN A CHRONOLOGICAL ORDERBug Type Study Dataset Study Objectives1 Usability Twidale and Mozilla and GNOMENichols (2005)[5]2 Security Gegick et al.(2010) [3]3 Performance, SecurityZaman et al.(2011) [4]4 Crash Khomh et al.(2011) [6]5 Performance Zaman et al.(2012) [7]Cisco Software SystemFirefox ProjectFirefox ProjectMozilla Firefox and Google ChromeUsability bug report analysis to characterize howdevelopers address and resolve issuesAutomatic identification (statistical modeling) of securitybugs based on natural language informationpresent in bug reportsComparison between Security and Performance bugson various aspectsTriaging of crash-types based on entropy regiongraphA qualitative study on performance bugs across fourdimensions (Impact, Context, Fix and Fix validation)TABLE IIGOOGLE CHROME BROWSER EXPERIMENTAL DATASET DETAILSField Value Field ValueFirst Issue ID 2 CRASH Bug Reports 3778Last Issue ID 111889 REGRESSION Bug Reports 2938Reporting Date of First Issue Id 30-08-2008 SECURITY Bug Reports 398Reporting Date of Last Issue Id 29-01-2012 CLEAN Bug Reports 800Issues Downloaded 93951 POLISH Bug Reports 261Issues [state=closed and status=fixed ,verified, duplicate] 53110 PERFORMANCE Bug Reports 169Issues unable to download 17938 USABILITY Bug Reports 38differences between commercial and open-source projects andthe pattern of discourse emerging across projects [5].Gegick et al. [3] present a technique to automaticallyidentify security bugs by text-mining the natural languageinformation present in a bug report. Their research shows thatsecurity bugs are often mislabeled as non-security due to lackof security domain knowledge. They derive a statistical modelfrom a labeled training dataset and then apply the inducedmodel on test dataset. Their solution approach was able toidentify a high percentage of security bugs which were labeledas non-security [3].Khomh et al. [6] present a method for triaging crash-typesbased on the concept of entropy region graph and introduce atechnique that in addition to the frequency counts also capturesthe distribution of the occurrences of crash-types among theusers of a system.Zaman et al. [4] perform an empirical analysis of securityand performance bugs (non-functional bug types) from Firefoxopen-source project and compare various aspects associatedwith the two types of bugs. The study by Zaman et al [4].reveals insights on aspects like: how often the two types ofbugs are tossed and reopened, triage time, characteristics ofbug fixes, number of developers involved and number of filesimpacted.Zaman et al. [7] investigate performance bug reports ofMozilla Firefox and Google Chrome across four dimensions(Impact, Context, Fix and Fix validation) and study severalcharacteristics of performance bugs and make recommendationson how to improve the process of identifying, trackingand fixing performance bugs.In context to existing literature and closely related work, thispaper makes the following unique and novel contributions.1) A first empirical study comparing seven different typesof bug reports (cleanup, crash, polish, performance,regression, security and usability) on a popular opensourceproject (Google Chromium Browser) across multipledimensions (MTTR, priority, frequently occurringterms, bug fixing process quality, continuity and efficiencyperformance characteristics, component and reporterentropy, number of stars, comments, descriptionlength and milestone changes).2) The results and insights presented in this paper answersseveral research questions such as the degree of differencebetween metrics such as mean time to repair,priority and mile-stone changes across seven differenttypes of bug reports. The study provides evidencesof correlation between certain terms and bug reporttypes. Empirical results demonstrating similarities anddifference in bug fixing process quality, continuity andefficiency across different bug reports are presented.III. EXPERIMENTAL DATASETWe conduct experiments on publicly-available data fromopen-source projects so that our experiments can be replicatedand our results can be used for benchmarking or comparison.Google Chromium is a popular and widely used opensource browser 4 and Google Chromium bug database hasalso been used for mining software repositories experiments 5by researchers. We download bug report data from GoogleChromium issue tracker 6 using the issue tracker Java APIs 7 .4 http://www.chromium.org5 http://2011.msrconf.org/msr-challenge.html6 http://code.google.com/p/chromium/issues/list7 http://code.google.com/p/support/wiki/IssueTrackerAPI518

TABLE IIICOMPONENT AND REPORTER ENTROPY VALUES FOR GOOGLE CHROME BROWSER PROJECTProject Entropy CRASH REGR. SECUR. CLEAN POLISH PERF. USAB.Chrome Component 0.552 (2922) 0.666 (2808) 0.577 (330) 0.6345 (761) 0.637 (246) 0.777 (155) 0.764 (36)[Browser] Reporter 0.890 (2468) 0.871 (2365) 0.781 (365) 0.895 (744) 0.905 (214) 0.932 (148) 0.976 (24)Fig. 1.Component and reporter entropy graphs showing the position of seven types of bug reports [Google Chrome Browser Project]Table II shows the dataset (year 2008 to 2012) used forour experiments. As shown in the Table II, we download bugreports from start date (30-August-2008) till (29-Jan-2012).We were not able to download all the bug reports in thespecified range due to permission denied or not availableerrors. Table II shows count of closed bug reports that areresolved as fixed, verified, or duplicate. Table II also showsnumber of bug reports with respect to each of the sevenbug types. As bug labeling is not a mandatory task in bugfixing process (for Google Chrome project), not every bugreport is labeled as crash, regression, security, cleanup, polish,performance or usability. Hence, we check for the label of eachbug report and include it in our experimental dataset only if istagged (manual labeling by developers) with one of the predefinedcategories. As shown in Table II, the total number ofbug reports which are closed, verified or duplicate and belongsto one of the pre-defined categories (crash, regression, security,cleanup, polish, performance and usability) are 8030 (as onebug report can have multiple labels). Assigning meaningfullabels 8 or manual tagging is used by Google Chrome projectteam members to classify bug report. A bug report can havemultiple labels, for example, label for bug type (such assecurity, performance or usability), milestone changes, statusor priority. We perform a manual inspection of Google Chromeissues to identify the labels (and their variations) for the sevendifferent types of bug reports. For example, we categorize abug report to type crash if it has a “Crash” or “Type-Crash”or “Stability-Crash” keywords in the label field. We extractbug report priority by checking for the presence of the label“pri” in a given bug report. Milestone change informationis extracted using the keyword “movedfrom-”, “movedfrom”,“moved from m” or “bulkmove”. Other standard fields suchas bug report state, status, time taken to fix the bug are directlyaccessible using issue tracker API.8 http://code.google.com/p/support/wiki/IssueTrackerIV. EMPIRICAL ANALYSIS AND RESULTSEach of the following subsections focuses on one particulardimension or a set of coherent metrics across which the sevendifferent types of bug reports are compared and contrasted.A. Entropy across Components and ReportersWe compare various types of bug reports based on a metriccalled “Entropy”. Khomh et al. [6] define a term called ascrash-entropy which is employed in their proposed techniqueto prioritize bug reports for the purpose of triaging. Theyargue that for bug-reports which are linked to crash-types, itis not only important to prioritize based on the raw frequency(number of bug reports for the specific crash-type) but alsobased on the number of users affected by it (distribution acrossusers). Khomh et al [6]. define crash entropy as a metricthat quantifies the distribution of the occurrences of variouscrash-types across users . Two crash types may have the samenumber of bug-reports belonging to their respective categoriesbut the distribution across users can be different.Similarly, we apply the main concept (of measuring Entropy)in the context of this study and present our insights.However, there are differences between the applications of theconcept in the work done by Khomh et al. [6] and our work.Our objective is to measure, compare and contrast the entropyof seven types of bug reports across components and reporters.The formula for entropy (as used in [6] and [8]) is shown inEquation 1:i=nH n (BT )=− p i ∗ log n (p i ) (1)∑i=1In Equation 1, BT denotes bug types (one of the seven categories:crash, regression, security, clean, polish, performanceand usability). n is the total number of unique componentsor reporters (depending on the application for components orreporters) in the dataset. H n (BT ) denotes the entropy for a specificbug-type and p i is the probability of a specific component519

or reporter (depending on the application) belonging to therespective bug-type. For the variable p, p i ≥ 0 and i=n∑ p i = 1.i=1The entropy value can vary from 0 (minimal) to 1 (maximum).If the probability of all components or reporters is thesame (same distribution) then the entropy is maximal. This isbecause the p i value is the same for all n. On the other extremeend, if there is only one component or reporter associated to aparticular bug type then the entropy becomes minimal (valueof 0). The interpretation is that when entropy is low for aspecific bug type then it means a small set of components orreporters are associated with the specific bug type.Table III displays the component and reporter entropy andthe frequency values for all seven types of bug reports. Wenotice that not every bug report mentions the component name.We were not able to extract the component and reporter namefor all the bug reports in the experimental dataset and TableIII shows results for the bug reports for which we were ableto extract the component and reporter names (frequency ismentioned in brackets). The component entropy for crash andsecurity bug is 0.552 and 0.577 respectively which is relativelylower than the component entropy for performance (0.777)and usability (0.764) bugs. We notice that for the securitybug type, 242 out of 330 reports belong to only 2 out of 13components. There are 13 unique components assigned to 330security bug reports and 73.3% of the bugs (bug reports) areoriginating from only 2 out of 13 components. For the crashbug report type, we notice that 2617 (89.5%) out of 2922bug reports belong to 3 out of 14 components. Performancebug reports are distributed across multiple components: Internals(68/155), WebKit (36/155), UI (17/155), BrowserUI(15/155), Build (14/155), Misc (4/155) and Feature (1/155).The reporter entropy (0.781) for security bugs is relativelylower than the other types of bugs because we observe three reporters:security...@gtempaccount.com, cev...@chromium.organd aba...@chromium.org reported 48, 61 and 22 (out of 365bug reports for which we are able to extract the reporter name)bugs respectively.Khomh et al. [6] define the entopy graph for triaging crashbugs, we apply same concept in our study to better understandthe similarities and difference between the characteristics ofvarious types of bug reports. Figure 1 presents the componentand reporter entropy graph derived from Table III. The x-axis represents the frequency (or the number) bracket andy-axis represents the entropy bracket. As shown in Figure1, performance and usability bugs fall into the categoryof relatively high entropy and relatively low number ofincidents. We observe that regression and crash bugs havehigh frequency but fall in the middle bracket of reporterentropy. We believe that the entropy graph presented in Figure1 can be used by the project team to understand that impactof each type of bug report across both the dimensions offrequency and coverage (distribution across components anddistribution across reporters).B. Bug Fixing Process Quality and Performance Characteristics1) Bug Opening and Closing Trend and Continuity: Francalanciet al. [9] present an analysis of the performancecharacteristics (such as continuity and efficiency) of the bugfixing process. They identify performance indicators (bugopening and closing trend) reflecting the characteristics andquality of bug fixing process. We apply the concepts presentedby Francalanci et al. [9] in our study. They define bug openingtrend as the cumulated number of opened and verified bugsover time. In their paper, closing trend is defined as thecumulated number of bugs that are resolved and closed overtime. We plot the opening and closing trend for varioustypes of bugs on a graph and investigate the similarities anddifferences in their characteristics. Figure 2, 3 and 4 displaysthe opening and closing trend for crash, cleanup and securitybug reports and Figure 5, 6 and 7 displays the trends forperformance, polish and regression bug reports. At any instantof time, the difference between the two curves (interval) canbe computed to identify the number of bugs which are openat that instant of time. We notice that the debugging processis of high quality as there is no uncontrolled growth ofunresolved bugs (the curve for the closing trend grows nearlyas fast or has the same slope as the curve for the opening trend)across all bug types [9] [10].All opening and closing trend graphs are plotted on the samescale and hence the differences between their characteristicsare visible. We see a noticeable and visible difference betweenthe trends for crash and regression bug reports in contrast toother types. The slope for the crash and regression curves isrelatively steep in comparison to other curves. It is interestingto note that for security bugs the opening and closing trendcurves almost overlap and this shows that in general thenumber of security bugs open at any instance of time isvery small. We notice that for regression bugs, the curveis steep (demonstrating a relatively large number of incomingand closed bugs) and yet the interval between the openingand closing curves is less (both the curves are near to eachother). The graph for usability bug is not plotted due to limitedspace in the paper but we observe the same characteristics forusability bugs as exhibited by performance and polish bugs.2) Mean Time to Repair and Release Date: We studymean time to repair (MTTR) the bug to understand qualityof bug fixing process among different categories. MTTR iscomputed as the amount of time required to close a bug(difference between the bug reporting timestamp and bugclosing timestamp). The metric MTTR is measured in terms ofnumber of hours. Results in Table IV indicates that amongstthe seven bug types, in general MTTR of crash bug reportsis lowest and that of performance and usability bugs is veryhigh. Even though the number of crash and regression bugreports are relatively higher in comparison to other typesof bug reports (refer to Table II), the close time of crashand regression bug reports is relatively less (refer to TableIV). There is a noticeable difference and clear separation of520

Fig. 8. Bug Type: Crash Fig. 9. Bug Type: Cleanup Fig. 10. Bug Type: SecurityFig. 11. Bug Type: Performance Fig. 12. Bug Type: Usability Fig. 13. Bug Type: RegressionTABLE VEXAMPLES OF BUG TYPES FROM GOOGLE CHROMIUM BROWSER PROJECT (CRASH, PERFORMANCE, USABILITY)Type Issue ID Text Snippet from Bug ReportCrash 24200 It is currently ranked #1 (based on the relative number of reports in the release). There have been68 reports from 50 clients.Crash 25823 the last step crashed on 4 timesCrash 36951 is overflowing with the below crash. There are about 60 crashes almost all of whichCrash 40472 Chrome crashed for more than 5 times with same stack traceCrash 101544 It crashes every time in less than a minutePerformance 104 Slow scrollingPerformance 3066 CPU maxes out, scrolling is laggyPerformance 11341 tends to load some pages slightly slower thanPerformance 32040 The script takes much longer to load and parse when there is more HTML on the page before itUsability 25767 If other people are also having trouble seeing this button, maybe it should be renamed or highlightedsomehow?Usability 83939 in many common Zoom use cases, the interface remains difficult to see and/or and interact withUsability 92027 feature must be introduced in Chromium (and Chrome Browser & OS) to improve multilingualismUsability 114402 we need to make some decisions about the use of this fontreport. Tables V and VI displays text snippets from some ofthe bug reports belonging to various categories. Some of theterms indicating the bug type (for example terms like slow,lag, longer for performance bugs and risk, malicious, exploitand corruption for security bugs) are marked as bold (referto Tables V and VI). We conduct experiments to identifyterms that are frequent or unique (linguistic indicators) toeach of the seven bug category. We extract all unique terms(corpus vocabulary) from titles and description of all bugreports present in our experimental dataset. We calculateprobability of each unique term corresponding to each bugcategory, using Naive Bayes classifier 10 . Table VII shows Top20 terms for each bug category sorted by their probability ofoccurrence in the respective category. Manual inspection of10 http://alias-i.com/lingpipe/docs/api/com/aliasi/classify/NaiveBayesClassifier.htmlTable VII clearly shows correlation between certain terms andbug category. For example, for crash, regression and securitybugs: “crash”, “regress”, and “secur” are highest probabilityterms respectively. Performance bug reports have terms such“perf”, “slow”, “time” and “cpu” as frequently occurring (system’sperformance indicator). Similarly, usability bug reportshave most frequent terms such as “window”, “user”, “zoom”,“menu” and “click” which captures the notion of user interfaceissues. Our experiment shows that words present in bugreport title and description are not random and we seeassociation between terms and bug types.We calculate the description length (number of characterspresent in the description) for each bug report in the experimentaldataset. Table VIII, displays the five-number summarystatistics for description length across seven different types ofbug reports. The data in Table VIII reveals that the descriptionlength for crash and security bug is the highest whereas522

TABLE VIEXAMPLES OF BUG TYPES FROM GOOGLE CHROMIUM BROWSER PROJECT (POLISH, SECURITY, CLEANUP, REGRESSION)Type Issue ID Text Snippet from Bug ReportPolish 94114 Several infobars are lacking punctuationPolish 51429 Bookmarks bubble (and extensions installed bubble) should have white backgroundPolish 89591 Misaligned text in HTML file input controlSecurity 1208 puts users at more risk from websites trying to spoof a file’s typeSecurity 23693 if the user is duped into clicking ”Create application shortcut” on a malicious pageSecurity 23979 We consider some file extensions to be maliciousSecurity 24733 because a given page can deterministically take down the whole browserSecurity 74665 This can lead to memory corruption and is probably exploitable too if you are clevercleanup 88098 Use javascript objects to make the ui code cleaner and easier to modifycleanup 30151 Toolstrip is obsolete, and these tests should be removed when toolstrip is removedcleanup 68882 src/gfx uses wstring all over. It should use string16Regression 115321 Tab crashes when coming out of the Full screen for the Sublime videosRegression 115934 Info bar offers to save a wrong passwordTABLE VIIFREQUENT TERMS[CHROME-BROWSER]Top K CRASH REGR. SECUR. CLEAN POLISH PERF. USAB.1 crash regress secur should should regress page2 chrome crash crash remov bookmark page window3 report page chrome test button mac user4 stack tab us us window perf chrome5 signatur chrome memori code text time should6 webcor window corrupt move tab sync zoom7 browser bookmark file chrome menu test open8 const doesnt audit webkit bar releas doe9 Tab browser browser api dialog cycler manag10 Int open webcor need mac startup menu11 render work page browser page slow bookmark12 std render access content chrome tab provid13 mac broken window add need linux url14 page new webkit base icon new line15 ... text free clean drag veri dialog16 googl bar sandbox extens us chrome select17 intern show render refactor manag load command18 open button url view open us click19 char click bypass file select theme us20 unsign std open renam close cpu pluginTABLE VIIIDESCRIPTION LENGTHCRASH REGR. SECUR. CLEAN POLISH PERF. USAB.Min 7 5 22 3 11 54 64Q1 556 350 328 117.5 234.75 304.25 404Median 1021 550 743 225 396.5 537.5 593Q3 2909 859 1436 422.5 603 899.5 876Max 20057 19875 16233 9551 1837 3719 2886the description length of cleanup bug reports is the lowest.Bug reports belonging to crash types mostly consists of systemgenerated crash reports and hence size of description is largeas compare to other bug types where users have to provideentire content manually.D. Correlation Between Regression and other Bug TypesA regression bug (one of the seven types of bug reportsin the evaluation dataset) can also belong to other categories(such as crash, polish and performance). We conduct experimentsto identify relation between regression bugs and othertypes of bugs. The motivation behind the experiment is toidentify bug-types which are more prone to cause regressionbugs. We extract all bug-type labels present in regression bugreports and then compute correlations. Table IX shows resultsof our experiment. Table IX reveals relatively high correlationbetween regression bugs and crash bugs (nearly 7% of all theregression bugs are crash bugs), however manual observationof 100 bugs (fixed/verified) shows different correlation. Wediscover that out of 100 regression bugs only 16 bugs belongto crash type and 43 bugs belong to the usability type. Furthermore,there are 31 bugs which can be categorized to anyof usability, polish, or cleanup (based on our interpretation ofbug description/discussion). We believe that a large number523

TABLE IXPERCENTAGE OF BUGS IN EACH BUG TYPE WHICH ALSO HAVEREGRESSION LABEL (RF: % OF REGRESSION BUGS)Bug Type Num %RFCRASH 207 7%SECURITY 0 0%CLEAN 0 0%POLISH 1 0.034%PERFORMANCE 15 0.51%USABILITY 0 0%TABLE XPERCENTAGE OF BUGS HAVING STATUS ”WON’T FIX” IN EACH BUG TYPE(MM: MEDIAN MTTR)Bug Type Num %WF %CL MMCRASH 1145 60.42% 25.51% 1320.91REGR. 501 26.44% 15.35% 390.97SECUR. 47 2.48% 11.38% 24.25CLEAN 103 5.43% 12.67% 4276.29POLISH 60 3.17% 20.27% 1643.94PERF. 68 3.59% 30.09% 1561.34USAB. 12 0.63% 25.53% 3195.29TABLE XIPERCENTAGE OF BUGS FOUND TO BE DUPLICATE IN EACH BUG TYPE ANDAMOUNT OF TIME REQUIRED IN THEIR RESOLUTIONBug Type Dupl.(%) MMCRASH 1783 (47.2%) 26.15REGRES. 775 (26.4%) 23.52SECUR. 31 (7.8%) 17.52CLEAN 39 (4.9%) 35.21POLISH 40 (15.3%) 24.13PERF. 24 (14.2%) 362.305USAB. 16 (42.1%) 1143.67TABLE XIIPERCENTAGE OF BUGS VERIFIED IN EACH BUG TYPE ((MM: MEDIANMTTR))Bug Type Num(%) MMCRASH 476 (12.26%) 167.8REGR. 855 (29.10%) 151.83SECUR. 51 (12.81%) 369.71CLEAN 64 (8%) 562.17POLISH 91 (34.86%) 474.95PERF. 33 (19.53%) 768.39USAB. 6 (15.79%) 6398.59of regression bugs are usability bugs because it is oftenchallenging to uncover them with test cases as usability bugmay be user specific. Some usability feature that is bug for oneuser may be perfectly fine for the other user or some specificbug may arise after performing series of steps by a specificuser [12].E. Status of Closed BugsA Closed bug can have different status: such as verified,duplicate, WontFix. An analysis of different close bug statuscan have implications in understanding software projectprocess. For example, a large percentage of duplicate bugreports indicate that bug reporters are not able to find the bugalready reported to the issue tracking system or large numberof users are getting impacted by that bug. A large percentageof WontFix status (closed issues) for feature requests indicatesa mismatch between users need and developers priority. TableX displays the absolute number and percentage of bug reportsin each of the categories with the WontFix Label. In TableX “Num” represents actual count of WontFix bugs in eachbug type, “%WF” represents distribution of total wontfix bugsin each category, and “%CL” represents percenatge of closedand WontFix bugs in each category. According to GoogleChromium bug reporting guidelines 11 , the label WontFix isassigned to issues that cannot be reproduced, working asintended or obsolete. Issues tagged as WontFix are closedwithout taking any action. As shown in the Table X, thepercentage of WontFix varies for different types of bug reports.We observe that a large percentage (60.42%) of WonFix bugsbelong to crash-type. We perform a manual inspection of thecrash bug reports that are not fixed to indentify the underlyingreasons. We notice that not able to reproduce a bug is oneof the major reasons for assigning the label WontFix to crash11 http://www.chromium.org/for-testers/bug-reporting-guidelinesbug reports. We also notice cases where a crash bug reportsare closed as they become obsolete and does not occur in thebuild or milestone which is different than the version on whichthe bug was observed. Regression bug reports also contributea significant percentage (26.44%) to WontFix bugs, whereascontribution of remaining five types of bug reports (security,cleanup, performance, polish and usability) is comparativelyvery small (less than 6%).We calculate number of duplicate bugs for each bug categorymotivated by the fact that large number (25-30%) ofall reported bugs are duplicate [13]–[15]. Large number ofduplicate bugs in a particular bug type can be an indicationthat large number of users are getting affected and can haveimplication on bug prioritizing. Duplicate bug report detectionhave received huge attention from the software engineeringresearch community [11], [14]. Table XI shows that crashhave highest number of duplicate bugs 1783, i.e 68.2% ofall labeled duplicate bugs. Result shows that 47.2%, 42.1%and 26.4% of all crash, usability and regression bugs areduplicate respectively. Table XI shows median MTTR valueof usability bugs is fairly high as compared to othercategories. Research community can use these results to tunetheir duplicate detection approach/tool, and further researchcan be done to handle duplicate bugs as per the category theybelong to. We also study number of bugs verified in eachbug category. In Chrome project a bug fix can be verifiedby reporter or the testing team 12 . We calculated number ofbugs verified in each category to identify whether there is anyinterest from reporters or testing team to verify bugs from aparticular bug type. Table XII shows that percentage of bugsthat are verified is higher in polish and regression bugcategory as compared to other bugs.12 http://www.chromium.org/for-testers/bug-reporting-guidelines524

TABLE XIIINUMBER OF STARSCRASH REGR. SECUR. CLEAN POLISH PERF. USAB.Min 1 0 0 0 0 0 1Q1 3 1 1 1 1 2 2Median 7 2 1 1 2 2 3Q3 12 5 2 2 3 5 9Max 179 647 59 28 40 78 391TABLE XIVNUMBER OF COMMENTSCRASH REGR. SECUR. CLEAN POLISH PERF. USAB.Min 0 0 1 1 1 1 2Q1 1 1 10 3 5 6 5Median 2 2 15 5 7 8 8Q3 3 5 22 8.5 12 14 26Max 219 647 81 71 76 59 126TABLE XVMILESTONE CHANGECRASH REGR. SECUR. CLEAN POLISH PERF. USAB.Num. BRs at leastone MoveFrom/BulkMove 2137 (56%) 2034 (69%) 37 (9.29%) 151 (18.9%) 165 (63.2%) 93 (55%) 33 (86.8%)Avg. 1.048 1.25 1 1.093 1 1.064 1.121Std. Dev. 0.350 0.209 0 0.420 0 0.383 0.477F. Number of Stars and Amount of DiscussionIn addition to bug reporter and bug fixer, several developersor users contribute to the threaded discussion and are interestedon activities related to the bug report. We compute number ofstars and number of comments associated to a bug report.Our interest is to measure number of developers or personsshowing interest and collaborating for different bug types. InGoogle Chrome project, any user can star 13 an issue indicatingusers interest on a specific issue. The issue tracking systemautomatically sends an email notification (triggered by changein the status of the issue) to all the users who starred the issue.We extract both number of start and comments for all the bugreports using the programming APIs. Table XIII reveals that ingeneral crash, regression, performance and usability bugsreceive more stars than security, cleanup and polish bugs.The Q2 and Q3 value for number of stars for crash bugs ishighest in contrast to the other six types of bugs.The Google Issue Tracker provides a facility for users to discussand communicate with each-other by posting comments.The issue tracker is not just a database for archiving bugsbut also serves as an application that acts as a central pointof communication for the project team [16]. We calculate thenumber of comments posted for each bug report and reportthe minimum, Q1, median, Q3 and maximum values in TableXIV. We observe that in general the number of commentsper bug report posted for security bug reports is relativelyhigher than other bug reports. The number of commentsper bug report (based on Q2 and Q3 values) for crash andregression bug reports is the lowest in contrast to other bugreports. Large number of personal interested (large star count13 http://code.google.com/p/support/wiki/IssueTrackeror lots of comments) in a bug is an indication of bug popularityand may impact bug fixing time [17]. Panjer et al. [17]foundthat bugs having less than four comments (less discussion) getfixed faster as compared to other bugs. Our empirical resultsshows a similar phenomenon; crash and regression have lowerQ2 value for number of comments and MTTR as comparedto other bug categories, refer to Table XIV and Table IV.G. Milestone Change FrequencyTable XV shows the absolute numbers and percentage ofmilestone changes for seven different types of bug reports.We notice that in general the percentage of milestone changesis high (more than 50%). The percentage of milestone changesfor crash, regression, polish, performance and usability bug reportsis more than 50%. We did a manual inspection of developercomments for bug reports undergoing milestone changeevent to understand the reason behind milestone changes. Wenotice comments mentioning approval of milestone changes ifthe reported bug is not a blocker for the currently assignedmilestone. According to the official blog 14 by Chromiumproject team, Google Chrome is following the release earlyand release often policy (shorter release cycle : once everysix weeks). A blog 15 mentions that because of the rapidrelease cycle, if a feature is not ready then it is moved to thenext release (new release milestones are created and ticketsare moved). A study by Baysal et al. [18], compares thelifespan of major releases of FireFox and Chrome browsersand characterizes Chrome as a fast evolving system with short14 http://blog.chromium.org/2010/07/release-early-release-often.html15 http://blog.assembla.com/assemblablog/tabid/12618/bid/36341/Secrets-of-rapid-release-cycles-from-the-Google-Chrome-team.aspx525

version cycles. As shown in Table XV, the percentage ofmilestone changes for usability bug is the highest andcleanup bugs is the lowest. As illustrated in Table XV, weobserve different characteristics across different types of bug.V. THREATS TO VALIDITYThe empirical analysis presented in this paper is basedon dataset from only one project (Google Chrome Browser).Experimental analysis from several diverse projects is neededto generalize our results. Labeling a bug report with the type(security, performance, crash etc) is not mandatory for GoogleChrome project. We observe that a large number of bug reportsare not assigned any label. In our dataset, 8030 bug reports (astatistically significant number but not complete dataset) arelabeled with at least one of the bug type. Most of the resultsare based on 15.12% of all closed (fixed/verified/duplicate)bug reports. We observe that in our experimental dataset, theabsolute number of performance and usability bugs were notmany. Our future research direction is to extend our currentwork by conducting experiments on a more diverse dataset.VI. CONCLUSIONS AND FUTURE WORKIn this work, we perform empirical study of seven differenttypes of bug categories (crash, regression, security, polish,cleanup, usability, performance) and compare those usingdifferent metrics. Following are some of the main results ofthe work:• Median MTTR of crash and regression bugs is lowest.• Performance and usability belongs to high entropy andlow incident region.• Large number of regression bugs are usability bugs.• Terms present in bug report title and description arerelated to bug type.• Description length of crash bugs is largest.• Large number of crash bugs are resolved as WontFix andduplicate.• Fixing security bugs involves highest amount of discussion.• Milestone change for usability bugs is highest and forcleanup bugs is lowest.• Bug fixing process is of high quality among all the bugtypes (for Google Chrome Project).In this paper, we report experimental results on dataset fromGoogle Chrome browser project. We plan to extend our currentwork by conducting similar experiments on diverse datasetto generalize, compare and contrast findings across multipleprojects. Our empirical analysis opens many research questionsfor the software engineering research community as itshows that different bug categories have different characteristicsand further research is needed to understand characteristicsof each bug type.ACKNOWLEDGMENTThe work presented in this paper is partially supported bythe Department of Science and Technology (DST, India) FASTgrant awarded to Ashish Sureka.REFERENCES[1] A. Sureka, “Learning to classify bug reports into components.” inTOOLS (50), ser. Lecture Notes in Computer Science, C. A. Furia andS. Nanz, Eds., vol. 7304. Springer, 2012, pp. 288–303.[2] G. Antoniol, K. Ayari, M. Di Penta, F. Khomh, and Y.-G. Guéhéneuc,“Is it a bug or an enhancement?: a text-based approach to classifychange requests,” in Proceedings of the 2008 conference of the centerfor advanced studies on collaborative research: meeting of minds, ser.CASCON ’08. New York, NY, USA: ACM, 2008, pp. 23:304–23:318.[3] M. Gegick, P. Rotella, and T. Xie, “Identifying security bug reports viatext mining: An industrial case study,” in Proc. 7th Working Conferenceon Mining Software Repositories (MSR 2010), May 2010, pp. 11–20.[4] S. Zaman, B. Adams, and A. E. Hassan, “Security versus performancebugs: a case study on firefox,” in Proceedings of the 8th WorkingConference on Mining Software Repositories, ser. MSR ’11. New York,NY, USA: ACM, 2011, pp. 93–102.[5] M. B. Twidale and D. M. Nichols, “Exploring usability discussions inopen source development,” in Proceedings of the Proceedings of the 38thAnnual Hawaii International Conference on System Sciences - Volume07, ser. HICSS ’05. Washington, DC, USA: IEEE Computer Society,2005, pp. 198.3–.[6] F. Khomh, B. Chan, Y. Zou, and A. E. Hassan, “An entropy evaluationapproach for triaging field crashes: A case study of mozilla firefox,”in Proceedings of the 2011 18th Working Conference on ReverseEngineering, ser. WCRE ’11. Washington, DC, USA: IEEE ComputerSociety, 2011, pp. 261–270.[7] S. Zaman, B. Adams, and A. E. Hassan, “A qualitative study onperformance bugs,” in Proceedings of the 9th IEEE Working Conferenceon Mining Software Repositories (MSR), Zurich, Switzerland, June 2012.[8] C. E. Shannon, “A mathematical theory of communication,” SIGMO-BILE Mob. Comput. Commun. Rev., vol. 5, pp. 3–55, January 2001.[9] C. Francalanci and F. Merlo, “Empirical analysis of the bug fixingprocess in open source projects.” in OSS, ser. IFIP, B. Russo, E. Damiani,S. A. Hissam, B. Lundell, and G. Succi, Eds., vol. 275. Springer, 2008,pp. 187–196.[10] H. Wang and C. Wang, “Open source software adoption: A status report,”IEEE Softw., vol. 18, pp. 90–95, March 2001.[11] A. Sureka and P. Jalote, “Detecting duplicate bug report using charactern-gram-based features,” in Proceedings of the 2010 Asia Pacific SoftwareEngineering Conference, ser. APSEC ’10. Washington, DC, USA: IEEEComputer Society, 2010, pp. 366–374.[12] D. M. Nichols and M. B. Twidale, “Usability processes in open sourceprojects,” Software Process: Improvement and Practice, vol. 11, no. 2,pp. 149–162, 2006.[13] J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” inProceedings of the 28th international conference on Software engineering,ser. ICSE ’06. New York, NY, USA: ACM, 2006, pp. 361–370.[14] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach todetecting duplicate bug reports using natural language and executioninformation,” in Proceedings of the 30th international conference onSoftware engineering, ser. ICSE ’08. New York, NY, USA: ACM,2008, pp. 461–470.[15] N. Jalbert and W. Weimer, “Automated duplicate detection for bug trackingsystems,” in The 38th Annual IEEE/IFIP International Conferenceon Dependable Systems and Networks, DSN 2008, June 24-27, 2008,Anchorage, Alaska, USA, Proceedings. IEEE Computer Society, 2008,pp. 52–61.[16] D. Bertram, A. Voida, S. Greenberg, and R. Walker, “Communication,collaboration, and bugs: the social nature of issue tracking in small,collocated teams,” in Proceedings of the 2010 ACM conference onComputer supported cooperative work, ser. CSCW ’10. New York,NY, USA: ACM, 2010, pp. 291–300.[17] L. D. Panjer, “Predicting eclipse bug lifetimes,” in Proceedings of theFourth International Workshop on Mining Software Repositories, ser.MSR ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp.29–.[18] O. Baysal, I. Davis, and M. W. Godfrey, “A tale of two browsers,”in Proceedings of the 8th Working Conference on Mining SoftwareRepositories, ser. MSR ’11. New York, NY, USA: ACM, 2011, pp.238–241.526

Comparison of Seven Bug Report Types: A Case-Study of ... - IIIT

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?