10.11.2014 Views

Software Reliability Analysis and Evaluation - LAAS CNRS

Software Reliability Analysis and Evaluation - LAAS CNRS

Software Reliability Analysis and Evaluation - LAAS CNRS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Folder 3<br />

<strong>Software</strong> <strong>Reliability</strong><br />

<strong>Analysis</strong> <strong>and</strong> <strong>Evaluation</strong><br />

Karama KANOUN<br />

<strong>LAAS</strong>-<strong>CNRS</strong><br />

(karama.kanoun@laas.fr)<br />

3 - 1<br />

OUTLINE<br />

☞ Motivations<br />

☞ Methods for software reliability analysis <strong>and</strong> evaluation (Folder 3)<br />

• Data collection <strong>and</strong> analysis<br />

☞ Data collection <strong>and</strong> validation<br />

☞ Descriptive statistics<br />

☞ Trend analysis<br />

• <strong>Software</strong> reliability evaluation (Folder 4)<br />

☞ Case studies (Folder 5)<br />

☞ Dependability benchmarking of operating systems (Folder 6)<br />

3 - 2<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Why <strong>Software</strong> <strong>Reliability</strong>?<br />

☞ Increasing role of software in real life systems<br />

☞ System dependability is more <strong>and</strong> more synonymous of software<br />

reliability<br />

☞ Difficulties in mastering the software development process <strong>and</strong> in<br />

reducing design faults for complex systems<br />

☞ Increasing cost of system non-dependability<br />

☞ Real needs for improving software reliability to improve system<br />

dependability <strong>and</strong> reduce maintenance cost<br />

☞ Dependability requirements are part of system requirements (as<br />

important as functional requirements)<br />

☞ Quantification is essential<br />

3 - 3<br />

Objectives of software reliability analysis <strong>and</strong> evaluation<br />

☞ Manage <strong>and</strong> improve the reliability of software products<br />

☞ Check the efficiency of development activities<br />

☞ Estimate the “effort” needed to reach a high dependability level<br />

☞ Evaluate the software reliability at the end of validation activities <strong>and</strong> in<br />

operation<br />

☞ Estimate the maintenance effort to “correct” faults activated during<br />

development <strong>and</strong> residual faults in operation<br />

☞ Needs for experimental & analytical methods <strong>and</strong><br />

techniques to reach these objectives<br />

3 - 4<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


<strong>Software</strong> vs hardware reliability<br />

Hardware<br />

• Physical faults<br />

• Operational life<br />

• Stable reliability (constant failure rate)<br />

• White-box approach<br />

• Markov models<br />

• Database for components failures<br />

<strong>Software</strong><br />

• Only design faults<br />

• Development <strong>and</strong> operation<br />

• <strong>Reliability</strong> growth (↓ failure rate)<br />

• Usually black-box approach<br />

• Specific models<br />

• Based on data collection<br />

Failure Rate<br />

Hardware in operation<br />

<strong>Software</strong> in operation<br />

Time<br />

3 - 5<br />

Measures<br />

Static Measures<br />

of the product <strong>and</strong><br />

process<br />

(quality oriented)<br />

Usage profile<br />

(environment)<br />

Dynamic Measures<br />

characterizing occurrence of<br />

failures<br />

(reliability oriented)<br />

Number of faults<br />

Fault density<br />

Complexity measures<br />

…<br />

Failure intensity<br />

Failure rate<br />

MTTF<br />

<strong>Reliability</strong><br />

…<br />

3 - 6<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


☞ Supplier point of view<br />

• During development:<br />

☞development follow up<br />

(failure intensity, fault density)<br />

☞evaluation of software reliability before operation<br />

(MTTF, pre-operational failure rate)<br />

• During operation<br />

☞product reliability follow up<br />

(residual failure rate, MTTF)<br />

☞maintenance planning<br />

(cumulative number of failures)<br />

☞ Users / customers, operational life<br />

☞be confident in the reliability level of the product<br />

(residual failure rate, MTTF)<br />

3 - 7<br />

Percentage of faults <strong>and</strong> corresponding MTTF (published by IBM)<br />

Product<br />

MTTF (years)<br />

5000 1580 500 158 50 15.8 5 1.58<br />

⇐<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

34,2<br />

34,3<br />

33,7<br />

34,2<br />

34,2<br />

32,0<br />

34,0<br />

31,9<br />

31,2<br />

28,8<br />

28,0<br />

28,5<br />

28,5<br />

28,5<br />

28,2<br />

28,5<br />

27,1<br />

27,6<br />

17,8<br />

18,2<br />

18,0<br />

18,7<br />

18,4<br />

20,1<br />

18,5<br />

18,4<br />

20,4<br />

10,3<br />

9,7<br />

8,7<br />

11,9<br />

9,4<br />

11,5<br />

9,9<br />

11,1<br />

12,8<br />

5,0<br />

4,5<br />

6,5<br />

4,4<br />

4,4<br />

5,0<br />

4,5<br />

6,5<br />

5,6<br />

2,1<br />

3,2<br />

2,8<br />

2,0<br />

2,9<br />

2,1<br />

2,7<br />

2,7<br />

1,9<br />

1,2<br />

1,5<br />

1,4<br />

0,3<br />

1,4<br />

0,8<br />

1,4<br />

1,4<br />

0,5<br />

0,7<br />

0,7<br />

0,4<br />

0,1<br />

0,7<br />

0,3<br />

0,6<br />

1,1<br />

0,0<br />

⇐<br />

3 - 8<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Overview of a global reliability analysis method<br />

Development<br />

Validation &<br />

Operation<br />

1<br />

Data related to<br />

previous projects,<br />

similar products<br />

data collection<br />

Objectives<br />

of the study<br />

Types<br />

of faults<br />

3<br />

Collected data<br />

2<br />

Data Validation<br />

Validated<br />

data<br />

Data set partition<br />

Consequences Phase Components<br />

of failures<br />

• • •<br />

4 Descriptive Analyses 5 Trend Analyses<br />

6 Model Application<br />

Descriptive Statistics<br />

<strong>Reliability</strong> Evolution<br />

<strong>Reliability</strong> Measures<br />

Feedback to the development process<br />

3 - 9<br />

☞ Some rules<br />

Setting up of a data collection process<br />

• Define clearly the objectives <strong>and</strong> the data to be collected<br />

• Motivate <strong>and</strong> imply people that will be involved<br />

• Simplify the collection process <strong>and</strong> reduce the number of data items to<br />

be collected<br />

☞ Support tools<br />

☞ Practical organization of people involved<br />

• Record <strong>and</strong> analyze data in real-time<br />

• Feedback<br />

☞ Origin of collected data<br />

• Internal: recorded during development <strong>and</strong> validation<br />

• External: by the customers<br />

3 - 10<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Data to be collected<br />

☞ Background information<br />

• Product itself: software size, language, functions, current version, workload<br />

• Usage environment: verification <strong>and</strong> validation methods, tools, etc.<br />

☞ Data relative to failures <strong>and</strong> corrections<br />

• Date of occurrence, nature of failures, consequences<br />

• Type of faults, fault location<br />

☞ Usually, recorded through<br />

• Failure Reports (FR)<br />

• Correction Reports (CR)<br />

☞ Well defined headings, well structured, easy to fill in<br />

☞ Short tick-off questions<br />

☞ Manually or automatically<br />

3 - 11<br />

☞ Failure Report (FR)<br />

Required Information<br />

• Serial number (for identification)<br />

• Report editor<br />

• Product reference, version affected (or prototype)<br />

• Date <strong>and</strong> time of failure occurrence<br />

Desirable Information<br />

• Failure occurrence condition<br />

• Failure criticality or consequences<br />

• Affected function or task<br />

• Action proposed (if any)<br />

3 - 12<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


☞ Correction Report (CR)<br />

Required information<br />

• Serial number (for identification)<br />

• Report editor<br />

• Date of correction<br />

• Correction nature<br />

• Product reference<br />

Desirable Information<br />

• Identification of the modified components<br />

☞ Integration with already existing data collection programs<br />

☞ Importance of training<br />

3 - 13<br />

Data Validation<br />

☞ Objectives<br />

• check the validity <strong>and</strong> usability of the information recorded<br />

• Keep only genuine software faults in the database<br />

☞ Elimination of:<br />

• Duplicated data (FR reporting of the same failure)<br />

• FR proposing a correction related to an already existing FR (COR)<br />

• False FR (signaling a false or non identified problem)<br />

• FR proposing an improvement (IMPROVE)<br />

• incomplete FRs or FRs containing inconsistent data (Unusable)<br />

• FR related to a hardware failure<br />

• …<br />

3 - 14<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Example 1: a telecommunications equipment<br />

(analyzed at <strong>LAAS</strong>)<br />

☞ 2 146 Failure Reports<br />

☞ Validation ⇒ 1 172 kept in the database<br />

Duplicated FRs 816 38.0%<br />

COR 53 2.5%<br />

False FR 29 1.4%<br />

IMPROVE 21 1.0%<br />

Unusable 20 0.9%<br />

Hardware 35 1.6%<br />

Total 974 45.4%<br />

.<br />

3 - 15<br />

☞ 3063 FRs<br />

Example 2: a telephone switching system<br />

(analyzed at <strong>LAAS</strong>)<br />

☞ Validation ⇒ 1853 <strong>Software</strong> FRs:<br />

<strong>Software</strong> 1853<br />

Hardware 195<br />

Documentation 165<br />

Unusable, duplicated, … 716<br />

Others 134<br />

Unusable<br />

24%<br />

Others<br />

4%<br />

Documentation<br />

5%<br />

Hardware<br />

6%<br />

<strong>Software</strong><br />

61%<br />

3 - 16<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Life cycle of an FR<br />

Internal<br />

site<br />

Identification<br />

of an<br />

abnormal<br />

behavior<br />

External<br />

sites<br />

Interface<br />

with users<br />

Database<br />

yes<br />

FR<br />

exists ?<br />

No<br />

creation<br />

of an FR<br />

<strong>Analysis</strong> &<br />

validation<br />

specialized<br />

team<br />

Already solved<br />

or being solved<br />

Proposition<br />

correction ?<br />

No<br />

Report update<br />

Implementation<br />

of the corrections<br />

FR resolved<br />

Creation of a CR<br />

Database<br />

3 - 17<br />

DESCRIPTIVE STATISTICS<br />

☞ Aim: make syntheses of the observed phenomena<br />

☞ Simple analyses<br />

• Fault typology<br />

• Fault density of components<br />

• Failure / fault distribution among software components (new, modified, reused)<br />

☞ Investigation of relationships<br />

• Fault density / size<br />

• Fault density / complexity<br />

• Fault density / life cycle phase<br />

• Nature of faults / life cycle phases<br />

• Nature of faults / components<br />

• Number of components affected by changes made to resolve an FR<br />

☞ Analyses related to the development / debugging process<br />

3 - 18<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Analyses related to the development process<br />

☞ Factors affecting time to locate <strong>and</strong> solve problems<br />

• The more FRs circulating, the more time it takes to h<strong>and</strong>le each one<br />

• Tendency to resolve the easier FRs first, the remaining ones take more time<br />

• Loss of maintainability with continued changes to resolve faults<br />

• Introduction of new faults while resolving the old<br />

☞ Average time to resolve an FR<br />

Modification request time =<br />

Time when the FR is resolved - time when it is created<br />

Measures<br />

• Responsiveness of the field support system<br />

• Complexity of maintenance<br />

3 - 19<br />

Case of the switching system of Example 2<br />

# FRs recorded:<br />

3063<br />

# FRs with record<br />

date: 3049<br />

# FRs with record<br />

date <strong>and</strong> resolve<br />

date : 2446<br />

250<br />

200<br />

150<br />

100<br />

50<br />

# FRs recorded / month<br />

# FRs resolved / month<br />

0<br />

1 10 20 30 40 50 60 68<br />

7 / 12 months<br />

22% (524)<br />

>12 months<br />

12% (307)<br />

cumulative # of unresolved FRs<br />

800<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

1 10 20 30 40 50 60 68<br />

0 / 6 months<br />

66% (1615)<br />

Time to resolve an FR<br />

3 - 20<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Data pre-processing for reliability analysis<br />

☞ Two kinds of data sets can be extracted from FRs <strong>and</strong> CRs<br />

• Time to failures (or between failures)<br />

0<br />

t 1<br />

t 2<br />

t k<br />

failure<br />

t k<br />

= time between failure k-1 <strong>and</strong> k<br />

• Grouped data<br />

☞ Number of failures per unit of time, n(k)<br />

☞ Cumulative number of failures N(k)<br />

0<br />

1 2<br />

k<br />

n(1)<br />

n(k)<br />

3 - 21<br />

Time ?<br />

☞ Time between failures<br />

• Execution time<br />

• Wall clock or Calendar time<br />

• Number of executions<br />

☞ Number of failures per unit of time<br />

• The length of the unit time depends on:<br />

☞ accuracy expected for the dependability measures<br />

☞ number of observed failures<br />

☞ objectives of the study<br />

3 - 22<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Example A<br />

Times between failures<br />

☞ Real-time control system (Musa 1)<br />

• 136 failures observed during system test (96 days)<br />

# T i Dy # T i Dy<br />

# T i Dy<br />

# T i Dy<br />

# T i Dy<br />

# T i<br />

Dy<br />

# : number of<br />

failures<br />

T i : times<br />

between<br />

failures<br />

(in seconds)<br />

Dy: day of<br />

observation<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

10<br />

11<br />

12<br />

13<br />

14<br />

15<br />

16<br />

17<br />

18<br />

19<br />

20<br />

21<br />

22<br />

23<br />

24<br />

3<br />

30<br />

113<br />

81<br />

115<br />

9<br />

2<br />

91<br />

112<br />

15<br />

138<br />

50<br />

77<br />

24<br />

108<br />

88<br />

670<br />

120<br />

26<br />

114<br />

325<br />

55<br />

242<br />

68<br />

1<br />

2<br />

9<br />

10<br />

11<br />

11<br />

17<br />

20<br />

20<br />

20<br />

20<br />

20<br />

20<br />

20<br />

20<br />

20<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

31<br />

31<br />

25<br />

26<br />

27<br />

28<br />

29<br />

30<br />

31<br />

32<br />

33<br />

34<br />

35<br />

36<br />

37<br />

38<br />

39<br />

40<br />

41<br />

42<br />

43<br />

44<br />

45<br />

46<br />

47<br />

48<br />

422<br />

180<br />

10<br />

1146<br />

600<br />

15<br />

36<br />

4<br />

0<br />

8<br />

227<br />

65<br />

476<br />

58<br />

457<br />

300<br />

97<br />

263<br />

452<br />

255<br />

197<br />

193<br />

6<br />

79<br />

31<br />

32<br />

32<br />

33<br />

34<br />

42<br />

42<br />

46<br />

46<br />

46<br />

46<br />

46<br />

46<br />

46<br />

47<br />

47<br />

47<br />

47<br />

53<br />

53<br />

54<br />

54<br />

54<br />

54<br />

49<br />

50<br />

51<br />

52<br />

53<br />

54<br />

55<br />

56<br />

57<br />

58<br />

59<br />

60<br />

61<br />

62<br />

63<br />

64<br />

65<br />

66<br />

67<br />

68<br />

69<br />

70<br />

71<br />

72<br />

816<br />

1351<br />

148<br />

21<br />

233<br />

134<br />

357<br />

193<br />

236<br />

31<br />

369<br />

748<br />

0<br />

232<br />

330<br />

365<br />

1222<br />

543<br />

10<br />

16<br />

529<br />

379<br />

44<br />

129<br />

56<br />

56<br />

56<br />

57<br />

57<br />

57<br />

57<br />

59<br />

59<br />

59<br />

59<br />

59<br />

59<br />

59<br />

59<br />

61<br />

62<br />

63<br />

63<br />

63<br />

64<br />

64<br />

64<br />

64<br />

73<br />

74<br />

75<br />

76<br />

77<br />

78<br />

79<br />

80<br />

81<br />

82<br />

83<br />

84<br />

85<br />

86<br />

87<br />

88<br />

89<br />

90<br />

91<br />

92<br />

93<br />

94<br />

95<br />

96<br />

810<br />

290<br />

300<br />

529<br />

281<br />

160<br />

828<br />

1011<br />

445<br />

296<br />

1755<br />

1064<br />

1783<br />

860<br />

983<br />

707<br />

33<br />

868<br />

724<br />

2323<br />

2930<br />

1461<br />

843<br />

12<br />

64<br />

64<br />

64<br />

65<br />

65<br />

65<br />

66<br />

66<br />

66<br />

66<br />

67<br />

67<br />

68<br />

68<br />

68<br />

69<br />

69<br />

69<br />

69<br />

70<br />

71<br />

72<br />

72<br />

72<br />

97<br />

98<br />

99<br />

100<br />

101<br />

102<br />

103<br />

104<br />

105<br />

106<br />

107<br />

108<br />

109<br />

110<br />

111<br />

112<br />

113<br />

114<br />

115<br />

116<br />

117<br />

118<br />

119<br />

120<br />

261<br />

1800<br />

865<br />

1435<br />

30<br />

143<br />

108<br />

0<br />

3110<br />

1247<br />

943<br />

700<br />

875<br />

245<br />

729<br />

4897<br />

447<br />

386<br />

446<br />

122<br />

990<br />

948<br />

1082<br />

22<br />

72<br />

73<br />

73<br />

74<br />

74<br />

74<br />

74<br />

74<br />

75<br />

76<br />

76<br />

76<br />

77<br />

77<br />

77<br />

78<br />

79<br />

79<br />

79<br />

79<br />

79<br />

80<br />

80<br />

80<br />

121<br />

122<br />

123<br />

124<br />

125<br />

126<br />

127<br />

128<br />

129<br />

130<br />

131<br />

132<br />

133<br />

134<br />

135<br />

136<br />

75<br />

482<br />

5509<br />

100<br />

10<br />

1071<br />

371<br />

790<br />

6150<br />

3321<br />

1045<br />

648<br />

5485<br />

1160<br />

1864<br />

4116<br />

80<br />

80<br />

81<br />

81<br />

81<br />

83<br />

83<br />

83<br />

83<br />

83<br />

84<br />

84<br />

87<br />

87<br />

88<br />

92<br />

3 - 23<br />

☞ Switching system<br />

Example B<br />

Number of failures per unit of time or Cumulative<br />

• 52 failures in operation (15 months)<br />

i n(i) NC(i) NS(i) i n(i) NC(i) NS(i) i n(i)<br />

NC(i)<br />

NS(i)<br />

i: unit of time (week)<br />

n(i): number of failures<br />

per unit of time<br />

NC(i): cumulative<br />

number of failures<br />

NS(i): number of systems<br />

in operation at i<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

10<br />

11<br />

12<br />

13<br />

14<br />

15<br />

16<br />

17<br />

18<br />

19<br />

20<br />

21<br />

22<br />

23<br />

2<br />

0<br />

2<br />

1<br />

1<br />

0<br />

2<br />

1<br />

2<br />

5<br />

2<br />

1<br />

2<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

2<br />

1<br />

0<br />

4<br />

2<br />

2<br />

4<br />

5<br />

6<br />

6<br />

8<br />

9<br />

11<br />

16<br />

18<br />

19<br />

21<br />

21<br />

21<br />

21<br />

21<br />

22<br />

22<br />

24<br />

25<br />

25<br />

29<br />

4<br />

10<br />

10<br />

10<br />

10<br />

12<br />

12<br />

12<br />

12<br />

12<br />

13<br />

13<br />

13<br />

13<br />

21<br />

21<br />

21<br />

21<br />

21<br />

28<br />

28<br />

28<br />

28<br />

24<br />

25<br />

26<br />

27<br />

28<br />

29<br />

30<br />

31<br />

32<br />

33<br />

34<br />

35<br />

36<br />

37<br />

38<br />

39<br />

40<br />

41<br />

42<br />

43<br />

44<br />

45<br />

46<br />

1<br />

1<br />

0<br />

0<br />

1<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

2<br />

0<br />

1<br />

30<br />

31<br />

31<br />

31<br />

32<br />

32<br />

32<br />

32<br />

32<br />

32<br />

32<br />

32<br />

33<br />

33<br />

33<br />

33<br />

33<br />

33<br />

33<br />

34<br />

36<br />

36<br />

37<br />

36<br />

36<br />

36<br />

36<br />

38<br />

40<br />

40<br />

40<br />

40<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

47<br />

48<br />

49<br />

50<br />

51<br />

52<br />

53<br />

54<br />

55<br />

56<br />

57<br />

58<br />

59<br />

60<br />

61<br />

62<br />

63<br />

64<br />

65<br />

66<br />

67<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

1<br />

0<br />

0<br />

1<br />

1<br />

6<br />

0<br />

0<br />

0<br />

1<br />

0<br />

0<br />

0<br />

1<br />

0<br />

37<br />

37<br />

37<br />

38<br />

38<br />

39<br />

40<br />

40<br />

40<br />

41<br />

42<br />

48<br />

48<br />

48<br />

48<br />

49<br />

49<br />

49<br />

49<br />

50<br />

50<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

42<br />

3 - 24<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Trend analysis<br />

☞ Objectives:<br />

• Analyze software reliability evolution<br />

• Identify periods of reliability growth <strong>and</strong> decrease<br />

corrections<br />

Failure intensity<br />

corrections<br />

. . .<br />

V i,1<br />

corrections<br />

+ spec./environment<br />

changes<br />

time<br />

3 - 25<br />

V i+1,4<br />

V i+1,3<br />

V i+1,2<br />

V i,k<br />

V i+1,1<br />

V i,2<br />

<strong>Reliability</strong> growth characterization<br />

☞ Variable: time between failures<br />

• T 1<br />

, T 2<br />

, … ,T n<br />

: time between failure i <strong>and</strong> i-1<br />

☞ <strong>Reliability</strong> growth: T i<br />

≤ T k<br />

∀ i < k<br />

st<br />

☞ Prob. {T i<br />

< x } ≥ Prob. {T k<br />

≤ x} i.e. F<br />

Ti (x) ≥ F Tk (x) ∀ i < k ∀ x<br />

☞ Variable: number of failures<br />

• N(t 1<br />

), N(t 2<br />

), … , N(t n<br />

) : cumulative number of failures between 0 <strong>and</strong> t i<br />

• H(t i<br />

) = E[N(t i<br />

)] = expectation of N(t i<br />

)<br />

• If N(t i<br />

) is a Non Homogeneous Poisson Process (NHPP):<br />

☞ reliability growth if H(t 1<br />

) + H(t 2<br />

) ≥ H(t 1<br />

+ t 2<br />

) ∀ t 1 , t 2 ≥0 <strong>and</strong> 0≤ t 1 + t 2 ≤T<br />

<strong>and</strong> inequality is strict for at least a pair t 1<br />

, t 2<br />

N(t) is a subadditive function<br />

☞ reliability decrease if H(t 1<br />

) + H(t 2<br />

) ≤ H(t 1<br />

+ t 2<br />

) ∀ t 1 , t 2 ≥ 0 <strong>and</strong> 0≤ t 1 + t 2 ≤T<br />

<strong>and</strong> inequality is strict for at least a pair t 1<br />

, t 2<br />

N(t) is a superadditive function<br />

3 - 26<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Interpretation of Subadditivity & Superadditivity<br />

☞ Subadditivity<br />

H(t 1<br />

) + H(t 2<br />

) ≥ H(t 1<br />

+ t 2<br />

) ∀ t 1<br />

, t 2<br />

≥0 <strong>and</strong> 0≤ t 1<br />

+ t 2<br />

≤T<br />

(The number of events in an interval of the form [0, t 2<br />

] is larger than the<br />

number of events taking place in an interval of the same length<br />

beginning later (i.e. in the form of [T, T+t 2<br />

])<br />

☞ The number of failures is decreasing<br />

☞ Superadditivity<br />

H(t 1<br />

) + H(t 2<br />

) ≤ H(t 1<br />

+ t 2<br />

) ∀ t 1<br />

, t 2<br />

≥0 et 0≤ t 1<br />

+ t 2<br />

≤T<br />

(The number of events in an interval of the form [0, t 2<br />

] is smaller than<br />

the number of events taking place in an interval of the same length<br />

beginning later (i.e. in the form of [T, T+t 2<br />

])<br />

☞ The number of failures is increasing<br />

3 - 27<br />

Trend tests<br />

☞ Means<br />

• Raw data ⇒ graphical tests<br />

• Analytical tests ⇒ quantitative indicators<br />

☞ Raw data<br />

• Times to successive failures<br />

• Number of failures per unit of time<br />

• Cumulative number of failures<br />

☞ Trend indicators<br />

• Empirical (arithmetical) means<br />

• Subadditivity factor<br />

• Laplace factor<br />

3 - 28<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Graphical tests: times to failures (Example A)<br />

Times to failures<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

0<br />

Failure #<br />

Cumulative times<br />

to failures<br />

t 1 + …t k<br />

100000<br />

90000<br />

80000<br />

70000<br />

60000<br />

50000<br />

40000<br />

30000<br />

20000<br />

10000<br />

0<br />

Failure #<br />

3 - 29<br />

Graphical test: grouped data (Example B)<br />

Failure<br />

intensity<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

unit of time = one week unit of time = 4 weeks<br />

Cumulative<br />

number of<br />

failures<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

t i<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

unit of time = one week unit of time = 4 weeks<br />

3 - 30<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Empirical mean<br />

Global trend<br />

ξ k<br />

: arithmetical mean of the times to failures (from failure 1 to k)<br />

t<br />

ξ k<br />

= 1 + t 2 + … t k<br />

k<br />

ξ k<br />

constitute a globally increasing series € reliability growth<br />

ξ k constitute a globally decreasing series € reliability decrease<br />

The trend is directly observed on<br />

700<br />

600<br />

500<br />

400<br />

ξ k<br />

Example A<br />

300<br />

200<br />

100<br />

0<br />

k<br />

3 - 31<br />

Empirical mean<br />

☞ Local trend<br />

• The data items are grouped into subsets containing m successive data<br />

• The average is evaluated for each subset<br />

• The impact of old data items is eliminated<br />

the evolution of ξ k<br />

☞ Example A: m = 8 ⇒ 17 groups (136 failures)<br />

3000<br />

2500<br />

2000<br />

1500<br />

1000<br />

500<br />

0<br />

3 - 32<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

10<br />

11<br />

12<br />

13<br />

14<br />

15<br />

16<br />

17<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Subadditivity factor<br />

☞ Graphical interpretation of subadditivity<br />

• H(t) = E[N(t)] is subadditive over [0,T] if :<br />

t<br />

a H<br />

(t) = ∫ H(x) dx<br />

t<br />

- H(t) ≥ 0 ∀ t ≥0 <strong>and</strong> 0≤ t ≤T<br />

0 2<br />

a H<br />

(t) =subadditivity factor<br />

H(x)<br />

a H<br />

(x)<br />

Chord<br />

x<br />

0<br />

T<br />

3 - 33<br />

Subadditivity & monotonous growth / decrease<br />

Monotonous growth:<br />

a H<br />

(x) > 0 increasing<br />

H(x)<br />

Monotonous decrease:<br />

H(x)<br />

a H<br />

(x) < 0 decreasing<br />

Cumulative number<br />

of failures<br />

x<br />

x<br />

t<br />

T<br />

t<br />

T<br />

Failure intensity<br />

h(x)<br />

h(x)<br />

x<br />

x<br />

3 - 34<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Subadditivity & local trend fluctuations<br />

Example: reliability growth with local fluctuations<br />

H(x)<br />

a H (x) ≥ 0 non decreasing<br />

0<br />

x<br />

h(x)<br />

0<br />

x<br />

3 - 35<br />

Subadditivity & trend change<br />

Decrease - Growth<br />

H(x)<br />

tangent<br />

Growth - Decrease<br />

H(x)<br />

0<br />

T L<br />

x<br />

T T<br />

0<br />

G<br />

T L<br />

T G<br />

T<br />

x<br />

a H<br />

(x)<br />

a H<br />

(x)<br />

0<br />

T L<br />

T G<br />

T<br />

x<br />

0<br />

T L<br />

T G<br />

T<br />

x<br />

/ /<br />

a H<br />

(x) < 0 a H<br />

(x) > 0 a H<br />

(x) > 0 a H<br />

(x) < 0<br />

3 - 36<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Laplace factor<br />

☞ Statistical Test of hypothesis ⇒ Laplace factor u<br />

R<strong>and</strong>om variable: times to failures T i<br />

(realization of T i<br />

= t i<br />

)<br />

u(T) =<br />

N(T) i<br />

1<br />

Σ<br />

N(T) i=1<br />

Σ tj - T<br />

j=1 2<br />

T 1<br />

√ 12 N(T)<br />

N(T) = # failures in [0,T]<br />

R<strong>and</strong>om variable: # failures per unit of time<br />

u(T) =<br />

☞ In practice<br />

k<br />

Σ (i-1) n(i) -<br />

i=1<br />

√<br />

k -1<br />

k<br />

Σ n(i)<br />

i=1<br />

u > 0 ⇒ global reliability decrease<br />

u < 0 ⇒ global reliability growth<br />

2<br />

k 2 k<br />

-1<br />

Σ n(i)<br />

12 i=1<br />

n(i) = # failure during time unit i<br />

3 - 37<br />

Interpretation of the Laplace test<br />

☞ R<strong>and</strong>om Variable: time to failure<br />

T<br />

• = mid of the observation interval<br />

2<br />

N(T) i<br />

1<br />

• c =<br />

Σ Σ tj<br />

N(T) i=1 j=1<br />

: statistical centre<br />

T<br />

2<br />

> c ⇒ reliability growth<br />

☞ R<strong>and</strong>om Variable: # failures per unit of time<br />

• The Laplace factor can be put in the form:<br />

u(T) = -<br />

T<br />

√<br />

a H<br />

(T)<br />

1<br />

12 N(T)<br />

Relationship between Laplace <strong>and</strong> subadditivity factors<br />

3 - 38<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Link : graphical tests - Laplace - Subadditivity<br />

10<br />

8<br />

6<br />

4<br />

2<br />

Failure intensity<br />

250<br />

200<br />

150<br />

100<br />

50<br />

Subadditivity factor<br />

0<br />

0<br />

50<br />

Cumulative number of failures<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0<br />

2<br />

4<br />

6<br />

8<br />

10<br />

12<br />

14<br />

16<br />

18<br />

20<br />

3<br />

2<br />

1<br />

0<br />

-1<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

Laplace factor<br />

3 - 39<br />

Laplace factor: local <strong>and</strong> global trend<br />

u(k)<br />

G<br />

L<br />

O<br />

B<br />

A<br />

L<br />

<strong>Reliability</strong><br />

decrease<br />

Local trend changes<br />

k<br />

T<br />

R<br />

E<br />

N<br />

D<br />

<strong>Reliability</strong><br />

growth<br />

<strong>Reliability</strong><br />

decrease<br />

T L1<br />

T G<br />

A B C D<br />

<strong>Reliability</strong> growth<br />

T L2<br />

<strong>Reliability</strong><br />

decrease<br />

LOCAL TREND<br />

3 - 40<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Change of time origin<br />

u(k)<br />

k<br />

T L1 T G<br />

T L2<br />

A B C D<br />

u(k)<br />

k<br />

T L1<br />

T L2<br />

B - C<br />

D<br />

3 - 41<br />

Link between trend indicators<br />

Cumulative number<br />

of failures<br />

Failure intensity<br />

Laplace factor<br />

Monotonous<br />

Growth<br />

N(k)<br />

k<br />

n(k)<br />

k<br />

u(k)<br />

k<br />

N(k)<br />

n(k)<br />

u(k)<br />

Monotonous<br />

decrease<br />

k<br />

k<br />

k<br />

Decrease<br />

followed<br />

by growth<br />

N(k)<br />

n(k)<br />

u(k)<br />

k<br />

T L<br />

k<br />

T L<br />

k<br />

T L<br />

T G<br />

Stability<br />

N(k)<br />

n(k)<br />

u(k)<br />

k<br />

k<br />

k<br />

3 - 42<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


How to use trend test results<br />

☞ Control of the efficiency of test activities<br />

• <strong>Reliability</strong> decrease at the beginning of a new activity: OK<br />

• <strong>Reliability</strong> decrease during a relatively long period of time: Pb ?<br />

• <strong>Reliability</strong> growth after reliability decrease: OK<br />

• Sudden reliability growth: caution!<br />

• Stable reliability: saturation<br />

☞ New tests<br />

☞ Following phase<br />

☞ End of test<br />

☞ Application of reliability models<br />

• Trend in accordance with model assumptions<br />

3 - 43<br />

Application to RADC data sets<br />

Rome Air Development Center (USA)<br />

System<br />

Id.<br />

#<br />

instructions<br />

#<br />

programmers<br />

#<br />

failures<br />

Type of<br />

system<br />

1 21 700<br />

2 27 700<br />

3 23 400<br />

4 33 500<br />

5 2 445 000<br />

6 5 700<br />

7 (14C) *****<br />

8(17) 61 900<br />

9(27) 126 100<br />

10(40) 180 000<br />

11A(SS1A) *****<br />

11B (SS1B) *****<br />

11C (SS1C) *****<br />

12 (SS2) *****<br />

13 (SS3) *****<br />

14 (SS4) *****<br />

9<br />

5<br />

3<br />

6<br />

7<br />

275<br />

8<br />

110<br />

8<br />

8<br />

8<br />

unknown<br />

unknown<br />

unknown<br />

unknown<br />

unknown<br />

136<br />

54<br />

38<br />

54<br />

831<br />

73<br />

36<br />

38<br />

41<br />

101<br />

112<br />

375<br />

277<br />

192<br />

278<br />

196<br />

RT.C<br />

RT.C<br />

RT.C<br />

RT.C<br />

RT.Com.<br />

RT.C<br />

military<br />

military<br />

military<br />

OS<br />

OS<br />

OS<br />

TS<br />

WP<br />

OS<br />

RT: Real-time<br />

C: control<br />

Com. : commercial<br />

WP: word Processing<br />

TS : Time sharing<br />

OS: Operating system<br />

***** : not given<br />

[John Musa], “<strong>Software</strong> <strong>Reliability</strong> Data”, Rome Air Development Center, NY, USA, 1979<br />

3 - 44<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


Laplace factors<br />

* : stable reliability<br />

System<br />

Laplace<br />

factor<br />

1 - 9,10<br />

2 - 5,73<br />

3 - 6,13<br />

4 - 8,59<br />

6 - 3,64<br />

7 - 2,14*<br />

8 - 4,65<br />

9 - 5,16<br />

10 - 9,60<br />

11A - 1,36*<br />

11B - 0,73*<br />

11C - 5,15<br />

12 + 0,74*<br />

13 - 5,64<br />

14 - 1,78*<br />

3 - 45<br />

System 2: times to failures<br />

Time to failures, ti<br />

10000<br />

9000<br />

8000<br />

7000<br />

6000<br />

5000<br />

4000<br />

3000<br />

2000<br />

1000<br />

# failures<br />

0<br />

1 6 11 16 21 26 31 36 41 46 51<br />

3 - 46<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


☞ Variable: time to failure<br />

System 2: Laplace factor<br />

u(i)<br />

# failures<br />

0<br />

-1 1 6 11 16 21 26 31 36 41 46 51<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

u(i)<br />

4<br />

3<br />

2<br />

1<br />

# failures<br />

0<br />

31 34 37 40 43 46 49 52<br />

-1<br />

-2<br />

3 - 47<br />

System 2: failure intensity<br />

n(k)<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

k<br />

0<br />

1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 18 19 20 21 22<br />

k: unit of tims = 5000 seconds of execution time<br />

3 - 48<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


System 2: Laplace factor<br />

☞ Variable: # failures<br />

u(k)<br />

failure # 31<br />

failure # 41<br />

k = unit of time<br />

0 1 2 3 4 5 6 7<br />

-1<br />

8 9 10 111213 14 1516 17 1819 20 2122<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

2 u(k)<br />

1<br />

k = unit of time<br />

0<br />

7 8 9 1011 1213 14 1516 17 1819 20 2122<br />

-1<br />

-2<br />

Unit of time = 5000 seconds of execution time<br />

3 - 49<br />

System 4: arithmetical mean<br />

1000<br />

ξ i<br />

800<br />

600<br />

400<br />

200<br />

0<br />

10 15 20 25 30 35 40 45 50<br />

# failures<br />

3 - 50<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


System 9<br />

120000<br />

100000<br />

Arithmetical<br />

mean<br />

80000<br />

60000<br />

40000<br />

20000<br />

0<br />

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41<br />

# failures<br />

Laplace<br />

Factor<br />

u(i)<br />

2<br />

1<br />

0<br />

1<br />

-1<br />

3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

# failures<br />

3 - 51<br />

System 11A<br />

200000<br />

180000<br />

Arithmetical<br />

mean<br />

160000<br />

140000<br />

120000<br />

100000<br />

11 21 31 41 51 61 71 81 91 101 111<br />

# failures<br />

3<br />

Laplace<br />

Factor<br />

2<br />

1<br />

0<br />

-1<br />

111<br />

# failures<br />

11 21 31 41 51 61 71 81 91 101<br />

3 - 52<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>


System 14<br />

Arithmetical<br />

mean<br />

300000<br />

250000<br />

200000<br />

150000<br />

100000<br />

1 21 41 61 81 101 121 141 161 181<br />

# failures<br />

Laplace<br />

factor<br />

u(i)<br />

2<br />

1,5<br />

1<br />

0,5<br />

0<br />

-0,5<br />

1 21 41 61 81 101 121 141 161 181<br />

-1<br />

-1,5<br />

-2<br />

# failures<br />

3 - 53<br />

Conclusion<br />

☞ Some systems can be modeled by an exponential distribution<br />

• System for which -2 < u < 2<br />

☞ Influence of the operational profile<br />

• Systems 11 A, B, C are 3 copies of the same program<br />

used in different environments<br />

☞ Benefits from trend analysis<br />

• Better underst<strong>and</strong>ing of the underlying processes<br />

• Follow up of the development process in real-time, fast feedback<br />

• Helpful for reliability model application<br />

3 - 54<br />

<strong>Software</strong> <strong>Reliability</strong> <strong>Evaluation</strong> - Karama KANOUN - <strong>LAAS</strong>-<strong>CNRS</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!