13.07.2015 Views

Draft Human Error and Safety Risk Analysis (HESRA) Methodology ...

Draft Human Error and Safety Risk Analysis (HESRA) Methodology ...

Draft Human Error and Safety Risk Analysis (HESRA) Methodology ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Draft</strong><strong>Human</strong> <strong>Error</strong> <strong>and</strong> <strong>Safety</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) <strong>Methodology</strong> for FederalAviation Administration Air Traffic Control Maintenance <strong>and</strong> Operations(Revision 7)January 2009Submitted to:Dino PiccioneFAA800 Independence AvenueATO-P R&D (Room 907)Washington, DC 20591Submitted by:Michael MaddoxCorinna Proctor<strong>Human</strong>CentricResearch, LLC.111 James Jackson Ave.Suite 221Cary, NC 27513-3164(919) 481-0565(919) 481-0310 Faxwww.humancentricresearch.com


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTABLE OF CONTENTS1.0 REVISION HISTORY ............................................................................................................. 32.0 INTRODUCTION.................................................................................................................... 32.1 Background ...................................................................................................................... 52.2 <strong>HESRA</strong> Overview.............................................................................................................. 62.2.1 Applicability to Life Cycle Stage....................................................................62.2.2 Proactive Assessment of <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong>/Potential ..................................62.2.3 Formalizes <strong>Risk</strong> Assessment Process..........................................................62.2.4 Based on Engineering <strong>Risk</strong> <strong>and</strong> Reliability Assessment Techniques ............72.2.5 Alternative to Quantitative <strong>Risk</strong> Assessment Techniques .............................72.2.6 Use of Data from Incident/<strong>Error</strong> History ........................................................72.2.7 Use of Data from Formal or Informal Usability Tests.....................................72.2.8 Establishes Working Group ..........................................................................82.3 Objectives ......................................................................................................................... 83.0 SCOPE AND LIMITATIONS.................................................................................................. 83.1 ATC Facility Maintenance Procedures <strong>and</strong> Systems ................................................... 83.2 Extensibility to Other Domains....................................................................................... 83.3 Applicability to Developmental <strong>and</strong> Existing ATC Facilities <strong>and</strong> Systems................ 93.4 H<strong>and</strong>les Discrete <strong>Error</strong>s .................................................................................................. 93.5 H<strong>and</strong>ling Multiple Discrete <strong>Error</strong>s .................................................................................. 94.0 ANALYSIS PROCESS ........................................................................................................ 104.1 Step 1 - Establish <strong>Analysis</strong> Team................................................................................. 104.1.1 Composition ...............................................................................................104.1.2 Roles <strong>and</strong> Responsibilities..........................................................................114.1.3 Team Tasks................................................................................................144.2 Step 2: Familiarize Team with System to be Analyzed.............................................. 154.3 Step 3: Prioritize Procedures to Analyze.................................................................... 164.3.1 Select Procedure Subset............................................................................174.3.2 Walk Through Selected Procedure(s).........................................................184.3.3 Pre-<strong>Analysis</strong> <strong>Risk</strong> Reduction ......................................................................184.4 Step 4 - Set <strong>Analysis</strong> Perspective................................................................................. 194.4.1 User Population..........................................................................................194.4.2 Usage Environment....................................................................................194.4.3 Performance Shaping Factors ....................................................................204.4.4 Overall Complexity of System, User Interface <strong>and</strong>/or Procedure ................204.5 Step 5 – Define Tasks .................................................................................................... 204.5.1 Evaluate Level of Task Detail .....................................................................214.5.2 The General Task.......................................................................................214.5.3 Define/identify More Detailed Tasks Where Required.................................224.6 Step 6 – Define Steps..................................................................................................... 224.6.1 Identify Steps to Complete Tasks ...............................................................224.6.2 Enter Steps into <strong>Analysis</strong> Tool/Document...................................................234.7 Step 7 – Define <strong>Error</strong>s, Causes, <strong>and</strong> Effects ............................................................... 234.7.1 Pre-Fill <strong>Error</strong>s <strong>and</strong> Causes .........................................................................244.7.2 Review Each Step ......................................................................................254.7.3 Develop Exhaustive <strong>Error</strong> List ....................................................................264.7.4 Relate <strong>Error</strong>s to other Steps, Components, Etc., if Appropriate ..................264.8 Step 8 – Assign Rating for <strong>Error</strong> Likelihood ............................................................... 264.8.1 Review Each <strong>Error</strong>......................................................................................274.8.2 Use Existing <strong>Error</strong>, Usability Test, <strong>and</strong> Other Data as Appropriate .............27<strong>HESRA</strong>: v7 Page i of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method4.8.3 Look Especially for Tasks That Require Skills or Capabilities That <strong>Human</strong>Users Are Unlikely to Possess ...............................................................................284.8.4 Team Discussion <strong>and</strong> Consensus ..............................................................284.8.5 Same <strong>Error</strong> for Different Task/Step.............................................................284.8.6 Internal Consistency...................................................................................284.8.7 Influenced by Elements in 3.4.....................................................................294.9 Step 9 – Assign Severity Rating ................................................................................... 294.9.1 Worst-Case Scenario .................................................................................304.9.2 Team Discussion <strong>and</strong> Consensus ..............................................................314.9.3 Account for Conditional <strong>Error</strong>s, Sequence of <strong>Error</strong>s, Etc.............................314.9.4 Not Greatly Influenced by Elements in 3.4..................................................314.9.5 Internal Consistency...................................................................................314.10 Step 10 – Assign Rating for Detection <strong>and</strong> Recovery................................................ 314.10.1 Automatic Recovery .................................................................................324.10.2 Composite of Detection <strong>and</strong> Recovery......................................................334.10.3 Influenced by Severity ..............................................................................334.10.4 Influenced by Elements in 3.4...................................................................334.11 Step 11 – Calculate Hazard Index <strong>and</strong> RPN ................................................................ 344.12 Step 12 – Analyze Criticality......................................................................................... 344.12.1 HI Criticality ..............................................................................................344.12.2 RPN Criticality ..........................................................................................364.12.3 Sort by Severity, HI, RPN .........................................................................374.12.4 Compare Levels of HI <strong>and</strong> RPN to Criticality Breakpoints.........................384.12.5 Determine Action Requirements for Each <strong>Error</strong> ........................................384.13 Step 13 – Reduce <strong>Risk</strong> .................................................................................................. 384.13.1 Develop Initial <strong>Risk</strong> Reduction Suggestions..............................................394.13.2 Re-convene <strong>HESRA</strong> Team.......................................................................394.13.3 Assign Ratings Assuming Remediation ....................................................404.13.4 Assess Impact on HI <strong>and</strong> RPN .................................................................414.13.5 Iterate Remediation if <strong>Risk</strong> Is Not Sufficiently Reduced ............................414.14 Step 14 – Produce <strong>Risk</strong> <strong>Analysis</strong> Report .................................................................... 414.14.1 Overview of System <strong>and</strong> Procedures Analyzed ........................................424.14.2 General Statement of Findings .................................................................424.14.3 Overall Recommendations Related to System <strong>and</strong> Procedures................424.14.4 Explicit Listing of “High Priority” <strong>Error</strong>s......................................................424.14.5 Explicit Listing <strong>and</strong> Description of Proposed Remedy ...............................424.14.6 Link or Provide Access to <strong>HESRA</strong> <strong>Analysis</strong> Spreadsheet (or WhateverSoftware Tool is Used to Support the <strong>Analysis</strong>)......................................................424.14.7 Statement of Concurrence of <strong>Analysis</strong> Team............................................434.14.8 Statement(s) of Exceptions from <strong>Analysis</strong> Team Members.......................434.15 Step 15 – Assign Remediation Actions ....................................................................... 434.16 Step 16 - Monitor Remediation to Ensure Actions Are Completed .......................... 43<strong>HESRA</strong>: v7 Page ii of 46


1.0 REVISION HISTORYVersion Release ate Notes1.0 August 2005Initial draft. Applicable for existing for maintenance/operationsprocedures.2.0 December 27, 2006 Updated with revised prioritization scheme.3.0 April 20074.0 June 20075.0 July 20076.0 August 20077.0 January 2009Updated in be in accord with the scalar directions <strong>and</strong> nomenclatureconventions reflected in FAA’s existing SMS documentationFix section references, flip listing order in rating scale tables, reviewdefinitions of scale anchorsReplaced maintenance with reference to maintenance oroperational procedures. Added executive summary.Added schematic of <strong>HESRA</strong> matrix. Removed spreadsheetillustration figures.Changed criticality breakpoints <strong>and</strong> definition of breakpoints forHazard Index (Table 9) <strong>and</strong> <strong>Risk</strong> Priority Number (Tables 11 <strong>and</strong>12) to accurately reflect reversed rating scales.2.0 EXECUTIVE SUMMARY<strong>Human</strong> error is the predominant component of serious incidents in the aviation(<strong>and</strong> every other) domain. Estimates of human error as an initiating orcontributing factor in serious incidents are typically in the 70-90% range. This willcome as no surprise to those who work in the aviation profession. Just as inother segments of the profession, the operations <strong>and</strong> maintenance componentsof the U.S. air traffic control (ATC) system also exhibit high levels of human errorinvolvement in incidents <strong>and</strong> accidents. Until recently, however, the FAA has nothad at its disposal an objective <strong>and</strong> straightforward human error risk estimationmethod.This is not to imply that there are not many viable human error risk analysismethods. There are quite a few. However, because of the variability in humanbehavior <strong>and</strong> the unique organizational requirements of different topical domains,a method that works well in one domain, might be a very difficult fit for anotherdomain.Most existing proactive human error analysis tools, regardless of their specificapplication domain, are based in one way or another on the engineeringcomponent failure risk estimation technique known as Failure Modes <strong>and</strong> Effect<strong>Analysis</strong>, or FMEA. Even with this common ancestry, however, different forms ofhuman error risk analysis have been developed for, example, the criticalhealthcare <strong>and</strong> space exploration fields.Some of these differences are purely the result of different domain terminology orspecific error outcomes that must be reflected in the risk analysis method. For<strong>HESRA</strong>: v5 Page 3 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Methodexample, the result of an error in an intensive care unit might be the death of apatient, whereas, the most serious outcome of an error in computer networkallocation, might be a server outage. Also, the granularity of various techniquesmight make certain techniques more appropriate than others for a particularapplication.This document provides the theoretical basis <strong>and</strong> procedural information requiredfor a human error risk analysis method named <strong>HESRA</strong> (<strong>Human</strong> <strong>Error</strong> <strong>and</strong> <strong>Safety</strong><strong>Risk</strong> <strong>Analysis</strong>). <strong>HESRA</strong> is specifically tailored to the FAA ATO environment. Itcan be applied to either mature systems or to systems under development.The FAA sponsored this work to provide system engineers, ATC, Tech Ops, <strong>and</strong>other groups within the FAA with a tool that can aid in identifying <strong>and</strong>underst<strong>and</strong>ing human error risks. More importantly, <strong>HESRA</strong> also helps identifyways in which the likelihood of human errors can be reduced <strong>and</strong> the likelihood ofearly recovery from errors can be increased.This document has been consistently updated during its development. <strong>HESRA</strong>’snomenclature <strong>and</strong> scaling methods are currently in compliance with FAA’s SMS2.0 requirements. In fact, <strong>HESRA</strong> also considers the ability of humans torecover from errors before the errors cause bad consequences. The FAA SMSframework does not consider recovery.2.1 Caveats<strong>HESRA</strong> provides a team of system experts with a way to communicate abouthuman error <strong>and</strong> risk. This discussion is best led by a practitioner trained inhuman factors <strong>and</strong> the elements of human behavior that can lead to humanfailures. It is unlikely that <strong>HESRA</strong> will be effective without the active participationof a human factors practitioner.<strong>HESRA</strong> has not yet been validated in the sense that it has been applied tomultiple systems in many settings. As of the date of this report, <strong>HESRA</strong> has beenapplied twice to a maintenance procedure required with the VSCS. <strong>HESRA</strong> iscurrently being applied to a system being developed to provide wake turbulencespacing information to air traffic controllers.<strong>HESRA</strong> is part of the ATO <strong>Human</strong> Factors toolbox. It should be considered asone of many diagnostic tools that can help identify human error risks in ATCsystems.<strong>HESRA</strong>: v7 Page 4 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method3.0 INTRODUCTIONIn the domain of human performance, error is a constant, diffuse presence. <strong>Human</strong>factors practitioners have been fascinated by human error for many decades. <strong>Error</strong>s areone of the most well studied aspects of human behavior noted for both their ubiquity <strong>and</strong>persistence. There is, in fact, no such thing as error-free human performance, at leastover any meaningful time period. Even when error-producing conditions are identified, itis very difficult (but certainly not impossible) to significantly reduce errors.The Federal Aviation Administration (FAA) has recognized this but is committed to findingways to identify <strong>and</strong> reduce the potential for human error. Given the potentialcatastrophic results of an error in the National Air Space (NAS), FAA wanted toimplement a proactive method for analyzing the risk of human error <strong>and</strong> safetyassociated with systems <strong>and</strong> procedures used in air traffic control (ATC), specifically ATCfacility maintenance/operations.This document describes the methodology <strong>and</strong> procedures that were developed forapplying human error risk analysis to ATC facility maintenance/operations. The name ofthe method is <strong>Human</strong> <strong>Error</strong> <strong>and</strong> <strong>Safety</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) <strong>and</strong> it introducesproactive human error analysis into the FAA ATC facilities maintenance/operationsenvironment.<strong>HESRA</strong> is neither the only nor most comprehensive technique that has been developed<strong>and</strong> applied in other domains. However, it is a method that has been shown to beworkable, applicable, <strong>and</strong> effective in identifying <strong>and</strong> mitigating the conditions that arelikely to increase human errors.3.1 BackgroundThere are very few “ground truths” in the realm of human behavior. Over many decadesof applied research, it is clear that human performance in any particular endeavor variesover a very wide range. Of course, the reason for this variation is as simple as it isperplexing. That is, a very broad range of variables affects human performance. Theindividual effects of each variable <strong>and</strong> their interaction are simply too complex to allow usto reasonably predict the outcome.Which leads to one of the few ground truths of the human factors domain, i.e., humanscommit errors. In fact, humans commit errors with such frequency that variousresearchers have developed classification schemes in an attempt to help us name theerrors more consistently. Thus, there are “skill-based, rule-based, <strong>and</strong> knowledge-based”errors; there are have “slips, mistakes, <strong>and</strong> violations” <strong>and</strong> there are errors of commission<strong>and</strong> errors of omission.A corollary to the ground truth that humans commit errors is that it is virtually impossibleto completely eliminate human errors. The best that can be done is to recognize theconditions that prompt human errors <strong>and</strong> try to arrange them to minimize their errorcausing effects.Not all is bad news, however. While it is true that human errors are pervasive events, it isalso true that most errors have little or no consequence. In fact, the really bad eventsthat occur as the result of human errors are quite rare. That rarity is the product of whatis typically known as the “chain of causation”. A single, isolated human error is unlikely tohave severe effects. When combined with other errors, certain process states,environmental conditions, etc., however, human errors can form a link in a causativechain that can have dramatic <strong>and</strong> very bad consequences.<strong>HESRA</strong>: v7 Page 5 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodBecause of the “chain of causation” characteristics of major accidents, we can deal withaccident prevention in two ways. First, we can identify <strong>and</strong> eliminate conditions thatelevate the risk of errors. Second, we can provide “cutouts” that short circuit the chain ofcausation so isolated errors are not allowed to propagate to an ultimate (bad) event.There are essentially two methods for preventing errors. The first, <strong>and</strong> most common, isto wait until something bad happens <strong>and</strong> then go back <strong>and</strong> figure out why it happened. ARoot Cause <strong>Analysis</strong> is an example of this type of method. In theory, processes can thenput in place to prevent the same bad thing from happening again. The second method isto examine the process or system <strong>and</strong> try to figure out which elements place users at themost risk for committing errors. Once the high-risk elements are identified, they can bechanged to present much lower risk. In effect, an error is being preventing before itoccurs.This method concentrates on the second, proactive approach to reducing human errors.The focus of this methodology is on ATC facilities maintenance/operations operations.However, <strong>HESRA</strong> can be extended to the ATC operations environment.3.2 <strong>HESRA</strong> Overview<strong>HESRA</strong> is one of a number of human error risk analysis processes developed for specificdomains. While its basis is generic, <strong>HESRA</strong> has been tailored specifically to beapplicable in the FAA ATC maintenance/operations environment.3.2.1 Applicability to Life Cycle StageSince it is an a priori method, <strong>HESRA</strong> can be applied at virtually any stage of the systemdesign, procurement, <strong>and</strong> implementation cycle. The only absolute pre-requisite forconducting a <strong>HESRA</strong> analysis is that the interaction process among human users <strong>and</strong> thesystem must be defined in enough detail to permit its decomposition into tasks <strong>and</strong> steps.This is not to say that the output of <strong>HESRA</strong> will be equally valuable at all stages ofsystem design <strong>and</strong> implementation. Since the likelihood of human errors is highlydependent on the complexity of procedures <strong>and</strong> user interfaces, <strong>HESRA</strong> is more likely toproduce detailed <strong>and</strong> valid output when the user interface(s) <strong>and</strong> procedures exist in atleast prototype form.However, there can still be great utility in conducting a preliminary risk analysis at veryearly stages of the system definition process. For example, if the design team isconsidering particular ways for users to interact with the proposed system, it is likely that<strong>HESRA</strong> can identify modes of interaction that are more or less likely to produce errorsthan other modes.3.2.2 Proactive Assessment of <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong>/Potential<strong>HESRA</strong> is a proactive risk analysis method. That is, its goal is to identify elements ofprocess <strong>and</strong> system design that are most likely to produce human errors before thoseerrors are actually committed, or at least before they result in significantly badconsequences. This a priori risk identification aspect of <strong>HESRA</strong> is its most attractivefeature. One need not wait for bad things to happen in order to identify <strong>and</strong> fix thecauses.3.2.3 Formalizes <strong>Risk</strong> Assessment Process<strong>HESRA</strong> introduces a formal, objective structure to assessing the risk of human errors inmaintenance/operations procedures. It is the explicit goal of the FAA Air Traffic<strong>HESRA</strong>: v7 Page 6 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodOperations (ATO) to move toward a safety culture in both operations <strong>and</strong>maintenance/operations. One aspect of an integrated, diffuse safety culture is anemphasis on identifying <strong>and</strong> correcting high-risk conditions before they result in harm topeople or equipment. However, an effective effort in this regard requires moving beyondoffering individual opinions regarding risk <strong>and</strong> putting in place a consistent, objective, <strong>and</strong>practical method of assessing risks.3.2.4 Based on Engineering <strong>Risk</strong> <strong>and</strong> Reliability Assessment Techniques<strong>HESRA</strong> is based on a well-developed <strong>and</strong> widely practiced engineering risk assessmenttechnique known as Failure Modes <strong>and</strong> Effects <strong>Analysis</strong>, or FMEA. Because of itsengineering heritage, <strong>HESRA</strong> can draw on a large pool of experienced risk analysispractitioners who can adapt their skills to consider task-related errors instead ofcomponent failures. Also, there are a number of existing commercial softwareapplications that support FMEA activities <strong>and</strong> data. Several of these tools can beadapted to support <strong>HESRA</strong>.3.2.5 Alternative to Quantitative <strong>Risk</strong> Assessment TechniquesThe natural tendency in an engineering organization is to frame all risk analysis in termsof precise quantitative estimates. This is a reasonable perspective given the ampleavailability of well-documented failure data related to mechanical <strong>and</strong> electroniccomponents. In fact, much of the early work on human error analysis attempted to takethis same route. However, the lack of quantitative human error data for many (actually,most) practical tasks typically dooms the purely quantitative approach to human erroranalysis.However, the lack of reasonable quantitative methods does not imply that there are nouseful alternatives. In fact, human error analysis techniques that apply ordinal scaleratings have been <strong>and</strong> are being used in a number of sophisticated, complex domains.Examples include NASA <strong>and</strong> the healthcare <strong>and</strong> medical products fields.3.2.6 Use of Data from Incident/<strong>Error</strong> HistoryOne of the very nice features of <strong>HESRA</strong> is that it does not depend on the availability ofhistorical error data. That is, it is perfectly acceptable to conduct a human error riskanalysis without referring to any particular past incidents or errors. However, justbecause it is possible to do so does not mean that such information cannot be used if it isavailable.If previous incident investigations have been done for a particular ATC system or facility,then the results of those investigations can be directly applied in a <strong>HESRA</strong> analysis. Themost likely effect of having access to error data is to inform the analysis team’sconsensus of likelihood <strong>and</strong> detection//mitigation. However, such error data can informany or all of the three rating scales (likelihood, severity, <strong>and</strong> detection/mitigation) used in<strong>HESRA</strong>.3.2.7 Use of Data from Formal or Informal Usability TestsJust as <strong>HESRA</strong> can use existing incident <strong>and</strong> error data, it can also use data fromusability tests. Often, usability tests are designed to elicit more errors than one wouldnormally see during the actual service life of a system. Therefore, the analysis team canuse these test data to inform their ratings for likelihood <strong>and</strong> detection/mitigation.<strong>HESRA</strong>: v7 Page 7 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method3.2.8 Establishes Working GroupIntroducing <strong>HESRA</strong> into the ATC maintenance/operations domain requires establishing atleast one team of people to perform the analysis. The next section of this documentdescribes the composition of the <strong>HESRA</strong> analysis team. While this document describesa single team, these same requirements can be applied to more than one <strong>HESRA</strong> team.The primary direct effect of establishing a <strong>HESRA</strong> team is to enable the ATO to identify<strong>and</strong> reduce risks associated with human errors. This is, after all, the reason forintroducing <strong>HESRA</strong>. However, the indirect effects of establishing <strong>HESRA</strong> teams can bealmost as beneficial as those from the specific analyses. Slowly, a pool of people will beformed with the perspective <strong>and</strong> breadth of view established by their participation in<strong>HESRA</strong> activities.Common effects of participating on a risk analysis team include an appreciation of theperspectives of other people on the team, the adoption of an evaluative view of systems<strong>and</strong> processes, <strong>and</strong> the realization that identifying <strong>and</strong> reducing human error risks is notas complex as one might believe prior to working on an actual analysis effort. These areall positive effects for the organization.3.3 ObjectivesThe ultimate goal of adapting <strong>HESRA</strong> to the FAA ATC facilities maintenance/operationsdomain is to allow FAA maintainers to provide the highest level of facility safety with thelowest risk of compromising safety through human error. There are a number of enablingobjectives that support this goal. These include at least the following:• Provide a proactive method with which high-risk elements of maintenance/operationsprocedures can be identified before they lead to errors with possible safety consequences.• Provide an objective method of proactively assessing the risk of various design <strong>and</strong>operational features of ATC facilities.• Introduce the formalism of a priori risk analysis into the FAA ATC facilitiesmaintenance/operations domain. This formalism is quite different than the typical post hocaccident investigation methods that are currently in place.• Provide the perspective, methods, <strong>and</strong> tools to help the FAA ATO move toward a morediffuse safety culture.•4.0 SCOPE AND LIMITATIONSWhile proactive human error risk analysis techniques, such as <strong>HESRA</strong>, have beenapplied in a broad range of domains, this application of the <strong>HESRA</strong> method has verylimited goals <strong>and</strong> scope.4.1 ATC Facility Maintenance Procedures <strong>and</strong> SystemsThe current methodology is focused on analyzing human errors associated with thedesign of procedures <strong>and</strong> systems included in the ATC facility maintenance/operationsdomain. Therefore, the rating scales included in this procedure may or may not beapplicable to a broader set of domains.4.2 Extensibility to Other DomainsSince <strong>HESRA</strong> is based on analyzing potential task-related errors, it should be extensibleto any domain or system in which procedures, either formal or informal, can be suitablydecomposed into individual tasks <strong>and</strong> steps. For existing systems in the FAA<strong>HESRA</strong>: v7 Page 8 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Methodmaintenance/operations domain, there are usually existing, detailed procedures that lendthemselves to task decomposition. For systems under development, it should certainly befeasible, using task analysis, to develop prototype maintenance/operations procedures ofsufficient detail to satisfy <strong>HESRA</strong> requirements.Whether this is also the case in other FAA domains, such as ATC, is not clear. However,even for amorphous or poorly documented procedures or systems, task analysis is likelyto yield information that can be subjected to <strong>HESRA</strong> analysis. The issue, of course, ishow difficult it might be to perform such task analyses.4.3 Applicability to Developmental <strong>and</strong> Existing ATC Facilities <strong>and</strong> SystemsOne of the very desirable features of <strong>HESRA</strong> is that it can be applied to existing facilities(systems) <strong>and</strong> to those in various planning, development, or procurement stages. Sinceit is a proactive method, <strong>HESRA</strong> does not depend on the existence of prior operatingexperience. Rather, the only requirement is a list of tasks or steps required to performspecific operations. Such a list can be developed using detailed specifications,simulations, developmental models, or actual equipment <strong>and</strong> software. As such, <strong>HESRA</strong>can be used during most phases of the FAA development lifecycle including missionanalysis, safety analysis, investment analysis, solution implementation, <strong>and</strong> in-servicemanagement.4.4 H<strong>and</strong>les Discrete <strong>Error</strong>s<strong>HESRA</strong> is based on a widely used engineering risk analysis method known as FailureModes <strong>and</strong> Effects <strong>Analysis</strong> (FMEA). The nature of all risk analysis methods based onFMEA is that they are well adapted to identify <strong>and</strong> assess discrete, i.e., individual errors.The reason for this ability is quite simple. <strong>Error</strong>s <strong>and</strong> causes are identified for each task(or step) in a procedure <strong>and</strong> then each error/cause combination is evaluatedindependently. Thus, individual errors are likely to be identified <strong>and</strong> assigned risk ratings.4.5 H<strong>and</strong>ling Multiple Discrete <strong>Error</strong>sThe feature of FMEA-based risk analysis techniques that makes them very good atidentifying individual errors also makes them less than adept at identifying multiple,dependent or conditional errors. Since errors are considered only in the context ofindividual tasks or steps, the likelihood of identifying a meaningful complex combinationof errors is dependent on the imagination <strong>and</strong> experience of the risk analysis team.The way in which conditional <strong>and</strong>/or multiple errors are considered using <strong>HESRA</strong> is in theevaluation of the severity of particular errors. At this point in the analysis process, theanalysis team is essentially answering the question “What is the worst that can happen ifthis error occurs?” It is quite a common occurrence for the answer to be “Well, itdepends. If this error occurs in combination with this other error or equipment failure,then the severity would be quite high. If it occurs in the absence of that other error orfailure, then it wouldn’t be so bad.”In this way, conditional errors can be noted during the analysis <strong>and</strong> listed as one of thefactors that explain a particular severity rating. However, <strong>and</strong> this should be clearlynoted, there is not an explicit activity in <strong>HESRA</strong> (or, to our knowledge, in any otherFMEA-based risk analysis technique) in which the analysis team is required to considermultiple or conditional errors.<strong>HESRA</strong>: v7 Page 9 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodIt might seem obvious, but it is an absolute <strong>and</strong> non-negotiable requirement that all teammembers must be neutral in terms of the outcome of the analysis. That is, teammembers cannot have an axe to grind – other than the desire to increase the safety ofthe system being analyzed. They can have no material interest (either monetary orpolitical) in the outcome of the analysis. An extreme example in this regard is someonewho might lose (or THINKS they might lose) his or her job if the analysis shows a systemto be extremely risky (or not).Having an opinion regarding the risks associated with a particular procedure or system isnot the same as having a stake in the outcome of the analysis. Everyone on the team islikely to have opinions prior to the analysis. The key to selecting team members is thatevery member of the team should be willing to change their opinions if evidence <strong>and</strong> logicso dictate.5.1.2 Roles <strong>and</strong> ResponsibilitiesThe roles <strong>and</strong> responsibilities of team members are described in Table 1, below. Thereare three important points to note in these roles <strong>and</strong> responsibilities. First, the <strong>Human</strong>Factors (HF) Specialist is the leader of the analysis team. This is appropriate for anumber of reasons, but primarily the <strong>Human</strong> Factors Specialist has the education,background, experience, <strong>and</strong> perspective to guide the analysis team in their deliberationsregarding human errors.The second feature to note in Table 1 is that the role of ATO Manager is optional. AnATO Manager can bring significant benefits to the analysis team, but it is also perfectlyacceptable to perform the risk analysis without an ATO Manager.<strong>HESRA</strong>: v7 Page 11 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTable 1. <strong>HESRA</strong> team roles <strong>and</strong> responsibilitiesRole orTeamMember<strong>Human</strong>FactorsSpecialist/Team LeaderMaintenanceSubjectMatter Expert(SME)Background• Has an underst<strong>and</strong>ing of humanperception, performance, <strong>and</strong>cognition• Knows what sorts of tasks arecompatible with common humancapabilities <strong>and</strong> which are not• Knows how to interpret data fromresearch <strong>and</strong> usability tests• Has very strong group facilitationskills• Underst<strong>and</strong>s the proceduresassociated with maintaining thesystem to be analyzed• Ideally will have actualexperience performing the tasksto be evaluatedResponsibilitiesThe HF specialist is the leader of the <strong>HESRA</strong>team. The purpose of conducting the riskanalysis is to identify system <strong>and</strong> proceduralelements, or combinations of elements, thatpose high risks for human errors. Since the HFspecialist has in-depth knowledge of humanperformance, cognition, perception, behavior,<strong>and</strong> errors, it is logical <strong>and</strong> appropriate that thisindividual take the lead in assessing risks.The HF specialist must become familiar withthe system <strong>and</strong> procedure(s) to be analyzed.The HF specialist also performs the initialtask breakdown <strong>and</strong> error mode definition.The maintenance/operations SME is the users’representative on the analysis team. The focusof this <strong>HESRA</strong> procedure is ATC facilitymaintenance/operations procedures.Maintainers are the people who conduct thoseprocedures in the field.The maintenance/operations SME must helpthe analysis team underst<strong>and</strong> likely fieldbehavior, actual procedural steps, acceptedpractices, tools, interactions among work teams,<strong>and</strong> the influence of environmental, social, <strong>and</strong>political factors. Also, themaintenance/operations SME is likely to be ableto share knowledge of past critical incidents <strong>and</strong>errors related to a given procedure – whetherthose incidents were reported or not.<strong>HESRA</strong>: v7 Page 12 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodRole orTeamMemberBackgroundResponsibilitiesTrainer • Trainers should have a greatdeal of experience conductingthe types of tasks that will beevaluated during the riskanalysis• Have taught those tasks to manymaintainers over a period of time• For new systems, shouldgenerally have a great deal ofdetailed knowledge regardingthe type of system beingevaluatedTrainers teach declarative <strong>and</strong> proceduralknowledge about the system to be analyzed.The trainer should inform the discussions of theanalysis team by indicating which proceduraltasks are difficult for trainees to master.The trainer can also help team deliberations bydescribing common mistakes made bymaintainers during training, feedback receivedfrom the field, <strong>and</strong> personal experience overmany systems <strong>and</strong> locationsSystemTechnicalSpecialist• Has in-depth knowledge ofsystem functions, operations,<strong>and</strong> interactions• Can fill the roles of bothmaintainer or trainer <strong>and</strong> systemtechnical specialist.• Can be representative of systemdeveloperFor any reasonably complex system, it isessential to have a technical specialist on theanalysis team since the range of potentialeffects of errors may not always be obvious tomaintainers. The system technical specialistcan best aid team deliberations by providing indepthknowledge of system functions, theeffects of errors, <strong>and</strong> the interactions of errors<strong>and</strong> components.The input of the system technical specialist ismost helpful when debating the potentialseverity of specific errors <strong>and</strong> the elements of asystem that might detect <strong>and</strong> mitigate theeffects of an errorATO Scientist • Has a long association withparticular operational systems,e.g., underst<strong>and</strong>s the history of asystem or class of systems• Can fill this role <strong>and</strong> that of theHF SpecialistThe ATO scientist brings a broad technical viewto the analysis team. The most useful input tothe analysis team from the ATO scientist is todescribe the error history of the procedure orfacility under review <strong>and</strong> the implications oferrors across systems.ATO Manager(Optional)• Has broad experience withvarious workers on the systembeing analyzed, as well asknowledge of the severity oferror effects on the ATC system• Has the ability to explain theeffects of errors on managementfunctionsThe ATO scientist can also bring knowledge ofpast <strong>and</strong> ongoing research applicable to thisspecific analysisAlthough not absolutely necessary, including anATO manager on the risk analysis team can bevaluable in a number of ways. Most notably,participating on the risk analysis team providesthe manager with detailed familiarity with therisk analysis process <strong>and</strong> the deliberations ofthe team.In addition to enriching the discussionsassociated with the risk analysis, the manager’sparticipation will lead to fewer back-end issues<strong>and</strong> faster buy-in of the analysis results. TheATO manager can help coordinate access tofacilities, equipment, <strong>and</strong> personnel, if suchaccess is necessary.<strong>HESRA</strong>: v7 Page 13 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodFinally, it should be noted that Table 1 defines roles <strong>and</strong> responsibilities, not specificindividual identities. It is feasible for a single person to fulfill more than one role on theanalysis team. For example, the person acting in the ATO Scientist role might also beable to act as the <strong>Human</strong> Factors Specialist. Likewise, the System Technical Specialistmight also be able to act as the Trainer. It is also possible for a single role to be filled bymore than one individual. For example, for an application of <strong>HESRA</strong> to the VoiceSwitching Communications System (VSCS), there were three individuals who filled theTrainer role <strong>and</strong> one individual who filled the Maintainer SME <strong>and</strong> System TechnicalSpecialist role.5.1.3 Team TasksWhen the team is established, it is necessary to also establish an operating frameworkwith all team members. The actual analysis tasks will be described in subsequentsections of this document. However, certain management <strong>and</strong> housekeeping tasks needto be completed before the detailed analysis begins.5.1.3.1 Hold Initial MeetingThe team, once its membership is defined, should arrange to have an initialmeeting to introduce team members to one another <strong>and</strong> determine how they willoperate for the duration of the analysis effort. It is possible for this initial meeting tobe held in a virtual environment, e.g., video or audio conference. During thismeeting, the nominal team leader, which is always the HF Specialist, will beidentified.In their invitation to join the analysis team, each member will be informed of thesystem to be analyzed <strong>and</strong> the overall timeframe for the analysis. It is likely thatnot all prospective team members will have undergone <strong>HESRA</strong> training prior totheir participation on the team. The initial meeting will provide an opportunity tointroduce the <strong>HESRA</strong> method <strong>and</strong> to discuss, in general terms, the system <strong>and</strong>procedure to be analyzed. In addition, locations for future meetings can bediscussed as well as logistical needs such as LCD projectors <strong>and</strong> other facilities.5.1.3.2 Establish Ground RulesThe team should establish simple ground rules for their interaction. Some of thisinteraction is predicated on the flow of activities in the <strong>HESRA</strong> method. However,the analysis team has great flexibility regarding how they accomplish each of thetasks in the analysis process.For example, the team might decide to discuss each error mode in a serial fashion,assigning all three ratings before moving on to the next error mode. However, ateam might decide to assign all likelihood ratings before going back <strong>and</strong> assigningseverity <strong>and</strong> detection/mitigation ratings.5.1.3.3 Agree on Level Of EffortIt is critically important that analysis team members be able to devote enough timeto the analysis process to be effective. It is not acceptable to simply have amajority of the analysis team present for their deliberations. Team members mustdiscuss the required level of effort for the analysis process <strong>and</strong> then agree, as ateam, to provide that level of effort.<strong>HESRA</strong>: v7 Page 14 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.1.3.4 Record KeepingWhile it is important to keep records of the analysis process <strong>and</strong> outcome, it is alsonecessary to keep them in such a way that they do not interfere with the primarywork of the analysis team. The primary elements <strong>and</strong> requirements of recordkeeping are described in this section.5.1.3.4.1 Select Tool to Support/Document <strong>Analysis</strong>Currently the primary documentation <strong>and</strong> reporting tool is MicrosoftExcel. However, there are several off-the-shelf tools that can becustomized to serve as a support tool for <strong>HESRA</strong> that are beingexamined as alternative to Excel.5.1.3.4.2 Configure Tool for Current <strong>Analysis</strong>Configuring an Excel spreadsheet for a particular <strong>HESRA</strong> analysisconsists of opening a new file using the <strong>HESRA</strong> template <strong>and</strong> thenentering the name of the system or procedure to be analyzed.Appendix B provides an example analysis form. Using the <strong>HESRA</strong>template is very straightforward. Clicking on the New Row button willcreate a new row just below the cursor location. The new row willreplicate the Task <strong>and</strong> Step.5.1.3.4.3 Supplement with Meeting NotesRegardless of the tool being used to support <strong>HESRA</strong>, it should besupplemented with copious meeting notes. Inevitably, there will bediscussions among the analysis team related to particular tasks,errors, or the reasoning behind assigning ratings. Any wordprocessing application can be used to collect meeting notes, but thenotes should definitely be kept in electronic form to allow easyediting <strong>and</strong> distribution.5.2 Step 2: Familiarize Team with Systemto AnalyzeIt is imperative that all team members have some level of underst<strong>and</strong>ing regarding thesystem for which procedures are being analyzed. It is somewhat of a paradox that the<strong>Human</strong> Factors Specialist, who is the nominal head of the analysis team, is likely tohave the least familiarity with the ATC system <strong>and</strong> maintenance/operationsprocedure(s) to be analyzed. In the normal course of events, a typical human factorsanalysis will require the <strong>Human</strong> Factors Specialist to become very familiar with thesystem, product, or procedure being analyzed. <strong>HESRA</strong> is really no different in thisrespect, except this familiarization must occur prior to the analysis effort.There are a number of ways to facilitate such familiarization. These include, in noparticular order, visits to ATC sites at which the system is located, walking throughrepresentative procedures, reading operation <strong>and</strong> maintenance/operations manuals,reading vendor information related to the system, interviewing maintainers <strong>and</strong>operators of the system, <strong>and</strong> spending time with training professionals who teach<strong>HESRA</strong>: v7 Page 15 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Methodtechnicians how to maintain the system. We have found that all of these approachesare likely to be used when trying to become familiar with a system.Until the <strong>Human</strong> Factors Specialist is familiar with the system to be analyzed, it isunlikely that this individual will be able to reasonably underst<strong>and</strong> themaintenance/operations procedure(s) to be analyzed. This lack of underst<strong>and</strong>ing willhamper the HF Specialist’s ability to perform the initial task <strong>and</strong> step identificationrequired to “pre-fill” the analysis spreadsheet. Even if it is possible to fill out thespreadsheet, the lack of a basic underst<strong>and</strong>ing of system layout <strong>and</strong> functions willseriously inhibit the ability to determine the effects of identified errors.5.3 Step 3: Prioritize Procedures to AnalyzeThe second step in performing a <strong>HESRA</strong> is to choose of all c<strong>and</strong>idate procedures thosethat take priority <strong>and</strong> should be analyzed. This step is important <strong>and</strong> we recommendthat the approach described here should be used.The analysis team should review each c<strong>and</strong>idate procedure. The review shouldconcentrate on the worst-case scenario that might pertain if serious human errors arecommitted during the conduct of the procedure. The intent here is not to concoct somewildly unlikely series of events that might lead to the loss of separation or some othervery bad event. Rather, it is the analysis team’s job to think in practical terms about theconsequences of improperly performing each procedure.The outcome of this process is that a ranking is assigned to each procedure related tothe problems it could cause to the local ATC system. The following scale can be usedto help the team prioritize certain procedures over others:1. Immediately brings down the facility or subsystem <strong>and</strong> adversely affects otherfacilities or subsystems.2. Immediately brings down the facility or subsystem, but does not affect otherfacilities or subsystems – or – leaves the facility or subsystem in a nonfunctionalmode that is not obvious to observers.3. Immediate reduction in the function of the facility or subsystem, but partialfunctionality is retained. No latent effects.4. Possible delayed minor functional effects on facility or subsystem. Noimmediate effects.5. No serious immediate or latent functional effects on the facility. Effects resultin inconvenience <strong>and</strong> can be easily addressed.In addition to the above taxonomy, there are other viable methods for generating apool of c<strong>and</strong>idate procedures that should be submitted for <strong>HESRA</strong> analysis. The tablebelow presents additional criteria the team can use to generate c<strong>and</strong>idate procedures.To help organize the procedures that could be studied, the team should organize theprocedures by rank-order such that procedures that are known to have severeconsequences or known to be difficult during training are the ones that are considered(prior to considering others that are less severe <strong>and</strong> likely might not require <strong>HESRA</strong>).<strong>HESRA</strong>: v7 Page 16 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodWe recommend that eliminating procedures prior to analysis be done as a two-stepprocess. First, the analysis team should discuss the procedure <strong>and</strong> agree that it can beeliminated. Second, other individuals should be consulted who can determine thateliminating the procedure will not compromise technical, regulatory, or managementrequirements. Eliminating a routine maintenance/operations procedure requires alsoremoving it from the Maintenance Management System (MMS).5.4 Step 4 - Set <strong>Analysis</strong> Perspective<strong>Risk</strong> analysis is not performed in a vacuum, nor is it typically done on a hypotheticalsystem. Usually, but not always, <strong>HESRA</strong> will be directed at a realmaintenance/operations procedure that relates to an actual system operating in the ATCenvironment. Another alternative is to embed human error risk analysis into thedevelopment process for new or replacement systems. That is, themaintenance/operations procedures to be analyzed are for a facility or system that hasnot yet been deployed.The elements of the analysis perspective described in this document aremaintenance/operations-oriented. However, to be clear, these exact sameconsiderations will apply to any FAA domain, including ATC operations.In performing a risk analysis (<strong>HESRA</strong>) for any system or procedure, we have to makecertain assumptions concerning the type(s) of users, the usage environments, the taskenvironment, the overall complexity of the UI, etc. Brief descriptions of theseconsiderations, as applied to the particular product or system <strong>and</strong> the specific analysis,are provided below.5.4.1 User PopulationThe user population for the particular maintenance/operations procedure should be listedin as much detail as necessary to allow the analysts to determine the users’ perspectivewhile conducting the procedure.The level of training <strong>and</strong> experience that users are expected to have with the particularprocedure should be described. For example, will users have undergone training specificto the procedure? Will their training bring every user up to a minimum level ofcompetence? Even if users have been trained to use the procedure, will they have anyactual job experience with it? That is, for purposes of the analysis, users can be trained,but still be considered new users.5.4.2 Usage EnvironmentAlthough there is a wide range of potential usage environments, we are assuming forpurposes of the <strong>HESRA</strong> that the product or system will be used in the ATC domain.However, even within the ATC domain, there are a number of different types of facilities,e.g., ARTCC, SSC, TRACON, etc. The description of the usage environment shouldinclude both its physical aspects, e.g., indoors vs. outdoors, hot vs. cold, etc., as well asthe operational environment. For example, will users be under time stress, will they bemaking life-<strong>and</strong>-death decisions, are users subject to punitive actions by management,etc? Stressful physical <strong>and</strong> operational environments elevate the likelihood of humanerrors.<strong>HESRA</strong>: v7 Page 19 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.4.3 Performance Shaping Factors<strong>Human</strong> performance, especially the likelihood of committing errors, is strongly influencedby a number of factors. For purposes of the <strong>HESRA</strong>, we should explicitly list thosefactors we consider to positively <strong>and</strong> negatively affect users’ performance. For example,some common negative factors are the following:• Time pressure• Fatigue• Multi-tasking• Noise• Physical exertion• Poor communication• Confusing terminologyPositive factors include, but are certainly not limited to, the following:Well-designed user interfaceGood communication linksGood trainingWell-written proceduresLack of time pressureQuiet workspace5.4.4 Overall Complexity of System, User Interface <strong>and</strong>/or ProcedureWhile feature-rich UI’s are often viewed as a good thing, complex user interfacesincrease the likelihood of user confusion <strong>and</strong> errors. In general, the more choices a userhas regarding which actions to take, the more likely it is that they will make an incorrectchoice. In this regard, the complexity of the user interface for the procedure or systembeing evaluated should be generally rated as low, moderate, or high. An explanation ofthis rating should also be included in the documentation for the analysis.5.5 Step 5 – Define TasksThe most fundamental activity in the <strong>HESRA</strong> process is defining or identifying the tasksthat will be analyzed for risk. Fortunately, most ATC facility maintenance/operations isvery much procedure-oriented, so for any activity, a detailed, task- or step-orientedprocedure is likely to exist. The major exception to this statement is for systems stillunder development. Detailed maintenance/operations procedures for suchdevelopmental systems might not exist when the <strong>HESRA</strong> process is conducted. If aprocedure already exists, then defining tasks requires much less effort than in situationswhere detailed procedures do not exist. If a procedure does not yet exist, then someform of task analysis is warranted. (see 3.4.3, below)The activities undertaken by technicians to complete a maintenance/operationsprocedure (or any procedure) can be divided into “tasks” <strong>and</strong> “steps”. This is a somewhatarbitrary distinction, but a helpful one when it comes to organizingmaintenance/operations actions for analysis. An easy way to distinguish between tasks<strong>and</strong> steps is that tasks define what has to be done, but not necessarily how to do it.Steps, which are usually embedded within tasks, tell how to do what the task requires.<strong>HESRA</strong>: v7 Page 20 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodThe sequence of activities in the <strong>HESRA</strong> process is to first define tasks <strong>and</strong> give each aone-word name that can be used to define all the steps <strong>and</strong> errors associated with it.The actual name given to the task is not particularly important, but should reflect thenature of the task. For example, if I identify a series of steps that allow me to log onto acomputer, then a reasonable name might be “Login” or “Logon”.5.5.1 Evaluate Level of Task DetailTasks <strong>and</strong> steps listed in various FAA maintenance/operations procedures contain a widerange of detail regarding the specific actions associated with them. In most cases, veryspecific procedures exist for each periodic maintenance/operations task. However, evenfor these periodic procedures, the level of detail can vary dramatically. In the VSCSprocedure “Clean <strong>and</strong> check WS printers”, for example, the second numbered step is“Remove paper from the printer.” Contrast this with the second numbered step in the“Reboot VCSU Servers” procedure, i.e., “At the VCSU Ops Console, select the key twice.”There is some leeway in determining the appropriate level of task detail needed for riskanalysis. The key to knowing whether there is enough detail in the existing procedure isthat it should describe what must be done., but not (necessarily) how to do it. In manycases, ATC maintenance/operations procedures mix both tasks <strong>and</strong> task steps.Sometimes, the steps are explicitly associated with a task. These instances are typicallylisted in this syntax: “Do “X” by completing the following steps – “. Then, the stepsrequired to complete the higher-level task are listed.More often, the HF Specialist must determine the higher-level task associated with aseries of detailed steps. For example, in the “Reboot VCSU servers” procedure, the firsttask is to determine which of the two servers is currently “active” <strong>and</strong> which is the“st<strong>and</strong>by” server. There are a number of steps associated with this aggregate task, but itis up to the analyst to lump these steps together <strong>and</strong> give the task group a meaningfuldefinition <strong>and</strong> label. In this example, we could define this group of steps as “Identifyactive server” <strong>and</strong> it’s task label as “Identify.”The Maintenance Management System (MMS) is a good source of non-specificprocedures. The MMS order to “Perform VCSU server modeover” is a good example ofdescribing what must be done without providing any detail about how that certification isaccomplished. This level of detail is not sufficient to perform a <strong>HESRA</strong> analysis.However, the actual maintenance/operations procedure for this general task, which isreferenced in the MMS, contains more detailed task descriptions.5.5.2 The General TaskIn defining tasks, there is one task category that is not necessarily associated with anyparticular set of steps in the procedure or system function being analyzed. Rather, it isrelated to the procedure as a whole. This task category is defined as “general” <strong>and</strong>relates to the common errors of not beginning a procedure or not completing it once it isstarted. It can also address not performing the procedure in the proper order, if theprocedure is typically done in a particular order with other procedures. The “General” taskcategory should always be the first one listed in the <strong>HESRA</strong> spreadsheet.<strong>HESRA</strong>: v7 Page 21 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.5.3 Define/identify More Detailed Tasks Where RequiredThe lack of task details will probably not be an issue for most existing ATC facilitymaintenance/operations procedures. These procedures have been developed <strong>and</strong>refined over a period of time, as have the training materials for them. It is likely thathighly detailed task descriptions exist for these procedures. However, for new systems orthose under development, such detailed procedures might not yet exist.The best course of action for analyzing procedures that do not yet contain detailed taskdescriptions is to walk through the procedures on representative systems or equipment.Some procedures contain statements related to overall functional activities. For example,“Swap active disk drives”. This is a very general statement of what must be done, but itdoes not include either lower level tasks or the detailed steps associated with them.It is possible to fill in the lower level “what to do” <strong>and</strong> “how to” information, if there isenough expertise <strong>and</strong> experience on the analysis team. However, if little or no task detailexists <strong>and</strong> the requisite technical expertise <strong>and</strong> operational experience is not present onthe analysis team, it might be a better use of the analysis team’s time to postpone the riskanalysis until a separate task analysis effort is completed.If there are only a few gaps in the level of task detail for a procedure, then the analysisteam can probably provide this detail as part of the risk analysis process. The HFSpecialist should make the decision regarding whether task analysis is appropriate aspart of the risk the analysis process.5.6 Step 6 – Define StepsThe next activity in the <strong>HESRA</strong> process is to define the steps required to complete thetasks identified as described above. This activity would be considerably simplified if wecould assume that the steps listed in the selected maintenance/operations procedurecould be entered into the <strong>HESRA</strong> spreadsheet. However, this is typically not a practicalapproach for at least two reasons.First, as we’ve noted previously, the individual, numbered steps in existing proceduresspan a wide range of detail. If we proceed by entering those procedural steps with thehighest level of detail, then subsequent analytical activities become tremendouslydetailed <strong>and</strong> complex. More to the point, addressing technician actions at a very detailedlevel is unlikely to yield error risk estimates of much practical value.Second, some procedures are written at a very high level of abstraction <strong>and</strong> contain fewdetailed, “how to” steps. A good example of this is the VSCS “Power Fail Recovery”procedure, which contains almost no detail for the individual task categories. Thissituation is likely to be more prevalent for systems under development. However, as theVSCS example demonstrates, it can also occur for very mature systems <strong>and</strong> facilities.5.6.1 Identify Steps to Complete TasksAn assumption regarding task analysis in the <strong>HESRA</strong> context is that the “what to do”tasks have already been defined. That is, we assume the analysis can start with a list oftasks that define what must be done, but not necessarily how to do it. If this level of taskdefinition does not exist, then a task analysis effort apart from the risk analysis iswarranted.<strong>HESRA</strong>: v7 Page 22 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodFor each “what to do” task, the analysis team, or a subset of the team consisting of theHF Specialist, Maintenance SME, <strong>and</strong> Trainer, should identify the steps necessary toactually complete the task. The real trick here is to define individual steps at theappropriate level of detail. There is no commonly accepted definition of the term “step”.However, in the context of human error risk analysis, a step can be defined as a specifichuman action, or series of actions, that can be completed either correctly or incorrectly.There can be many steps required to complete any given task. For example, in theexample used in 3.4.1, “Perform VCSU server modeover”, the (existing) detailedprocedure lists 15 steps. Some of these steps really encompass very small increments ofactivity, e.g., “…select the key twice.” Such highly detailed steps will usually notadd much to the error analysis. In fact, they might make it less likely that important errorswill be identified because they simply add noise to the analysis.One helpful way to define steps is to think of how you might describe the steps tosomeone who isn’t familiar with the system or procedure you are analyzing. First, youwould try to list the big, incremental tasks that have to be completed. For example, youmight say “First, you have to make sure you know which server to shut down, then youhave to shut it down, verify it is shut down, <strong>and</strong> then re-start it. Once it’s restarted, youhave to make sure it’s running OK, <strong>and</strong> then make it the active server. Then you repeatthe process with the other server.”Then, for each high-level task, you would describe the steps required to accomplish it.For the first task above, you might say “First, you make sure you’re logged onto an OpsControl Console, then you identify the st<strong>and</strong>by server, then you make sure all theresources are assigned to the active server, then you change the mode of the st<strong>and</strong>byserver to ‘offline maintenance/operations’.”This level of step detail often will aggregate a number of steps in the actual procedure,which is a good thing from an analysis perspective – so long as the aggregation doesn’thide a potential error that should be evaluated.5.6.2 Enter Steps into <strong>Analysis</strong> Tool/DocumentAs steps are identified, they should be entered into the <strong>HESRA</strong> spreadsheet. Ifthe step description can be condensed, it is acceptable to do so, as long as thereis a complete step description kept in supplemental documentation. In addition,each Step should be identified with a number. This number serves only toidentify the step/error/cause combination within the task grouping. For example,I might want to refer to Step 2 in the General task grouping. There is no otherimplication for step number.5.7 Step 7 – Define <strong>Error</strong>s, Causes, <strong>and</strong> EffectsThe real power of <strong>HESRA</strong> lies in properly defining the errors that can occur at each stepin a maintenance/operations procedure. After all, errors are what we are trying toprevent. To the extent the risk analysis team is thorough <strong>and</strong> conscientious in definingerrors, <strong>and</strong> then the remainder of the <strong>HESRA</strong> process allows them to do a reasonable jobof ranking those errors. However, neither <strong>HESRA</strong> nor any other human error riskanalysis method is likely to prevent an error that has not been defined.<strong>HESRA</strong>: v7 Page 23 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodThis part of the <strong>HESRA</strong> process tends to become tedious <strong>and</strong> repetitive. It is likely thatthe same errors will be defined for many procedural steps. Also, the sheer number oferrors that can be associated with a specific procedure step is often surprising.5.7.1 Pre-Fill <strong>Error</strong>s <strong>and</strong> CausesThis is a good point in the process for the HF Specialist to pre-fill the <strong>HESRA</strong>spreadsheet with potential errors <strong>and</strong> causes for each step. There are at least two goodreasons for the HF Specialist to complete this activity without necessarily soliciting inputfrom the rest of the analysis team. First, the errors <strong>and</strong> causes in which we areinterested are the result of human actions, so the HF Specialist is likely to have the mostexpertise of any team member. Second, this is a very tedious process <strong>and</strong> it is best not totake up the time of the entire team.It is also appropriate during this activity for the HF Specialist to note the effects of specificerrors <strong>and</strong> to identify mitigating factors. However, these are not a primary purpose of thepre-fill activity. In fact, the value of <strong>HESRA</strong> is enhanced by team discussions of effects<strong>and</strong> mitigating factors, so those can certainly be left for the entire team to identify.During this process, each step will likely spawn a number of entries in the <strong>HESRA</strong>spreadsheet. Each row within a task category represents one particular type of errorrelated to a single step. Each type of error, in turn, can be duplicated to show how aparticular error can be caused by different factors.Certain types of human errors tend to be committed over <strong>and</strong> over – <strong>and</strong> for the samereasons –regardless of the domain in which the errors occur. To make sure that thesecommon errors are considered in the <strong>HESRA</strong> process, we have adapted a list of theseerrors <strong>and</strong> causes from other sources. This type of list is often called a taxonomy, butthat term is not important for the work of a <strong>HESRA</strong> team. We will simply call the listspresented below a framework for identifying human errors.As an example, let’s consider the step “Identify the active server.” A proper outcome forthis step would be “The active server is properly identified.” A fundamental error for thisstep might be to simply skip it. There can be multiple possible causes for such an error,such as an external distraction, lack of knowledge regarding how to identify the server,time pressure, etc.. Each of these causes would merit a separate row in the <strong>HESRA</strong>spreadsheet.Table 3. Framework of human errors <strong>and</strong> causes.Common <strong>Error</strong>sExternal Causes of <strong>Error</strong>sSkip a stepPerform a step out of orderFail to start a stepFail to complete a stepUse the wrong equipmentProblems with the procedureEquipment designEnvironment (noise, heat, vibration, etc.)Time pressureOrganizational factorsImproper/poor schedulingDistractions<strong>HESRA</strong>: v7 Page 24 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodInternal Causes of <strong>Error</strong>sImproper trainingFatigueStressExcessive memory loadLack of familiarity (with procedure or system)Cognitive capability exceededPhysical capability exceededAttention not maintainedRemember, this is simply a pre-fill of the <strong>HESRA</strong> spreadsheet by the HF Specialist.There is no guarantee that this pass will be either complete or accurate. The purpose ofdoing the pre-fill is to make the analysis process more efficient. When the entire teamevaluates the steps, errors, <strong>and</strong> causes, then some might be added, some deleted, <strong>and</strong>others modified.5.7.2 Review Each StepOnce the <strong>HESRA</strong> spreadsheet has been pre-filled, the analysis team should re-conveneto perform the detailed analysis. The team should discuss each procedure step in serialfashion. Finish identifying the potential errors associated with one step beforeproceeding to the next. Each member of the analysis team should ensure that theyunderst<strong>and</strong> the human <strong>and</strong> machine actions described in the task step, the conditionsunder which the step is performed, <strong>and</strong> the relationship of the step to previous <strong>and</strong>subsequent steps.Causal factors for each error should be reviewed by the team <strong>and</strong> modified as necessary.Some of the most interesting <strong>and</strong> illuminating discussions will occur when the teamidentifies the effects of each error on human <strong>and</strong> system performance. In previousanalyses, we have found that some of the errors have never been considered by anyoneon the analysis team. Sometimes, the effects of these errors are so subtle or complexthat outside experts must be consulted to determine the most likely effects.For a good example of the type of “effects” discussion that can occur during a <strong>HESRA</strong>analysis, consider the actual case of a procedural requirement to verify that all processes<strong>and</strong> resources are assigned to the active server (<strong>and</strong> not the st<strong>and</strong>by server) beforeproceeding. What happens if the technician proceeds without verifying resourceassignments <strong>and</strong> they turn out to be incorrectly assigned? The effects of this error took a<strong>HESRA</strong> analysis team a while to determine <strong>and</strong> resulted in the team calling in otherexperts.When identifying the effects of specific errors, remember that an error can have bothsystem <strong>and</strong> human effects. System effects usually relate to the temporary or permanentloss or degradation of one, or more, system function. <strong>Human</strong> effects can includeincreased workload, confusion, injury, <strong>and</strong>, in rare cases, death.Finally, any mitigating factors that would impact the error should be described. A commonmitigating factor is that it might be immediately obvious that a step in a procedure hasbeen omitted. In the procedural step discussed above, for example, the following stepcannot be completed if the key is not pressed twice because the dialog windowrequired in Step 3 will not be displayed. This is a very strong mitigating factor.A much less salient mitigating factor is the disk slot numbering scheme on which theVSCS maintainer depends to select the proper disk to remove in the “Perform update ofserver gold mirrored drives” procedure. If an incorrect disk is removed, there is little to<strong>HESRA</strong>: v7 Page 25 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Methodnotify the maintainer that this error has occurred. In addition, the slot-numbering schemeis inconsistent with the disk numbers, i.e., slot “0” contains disk “1”. This designdiscrepancy might actually induce the maintainer to remove the wrong disk.5.7.3 Develop Exhaustive <strong>Error</strong> ListThe goal of the analysis team is to develop an exhaustive list of errors for each task step.The team should not concern itself with the likelihood, severity, or mitigation that mightpertain for each error – at least, not at this point in the analysis.This perspective tends to be problematic for some analysis team members. Thetendency is to do a quick mental evaluation of potential errors <strong>and</strong> then discount (<strong>and</strong> notmention) those that are thought to have a low probability of occurrence or trivialconsequences. Do not do this! It will defeat the purpose of the risk analysis. We arevery much interested in low probability events. There are many opportunities later in the<strong>HESRA</strong> process to assign low ratings for probability <strong>and</strong> severity. This is not the place todo it.The appropriate perspective for this part of the analysis is to become skeptical <strong>and</strong>evaluative. Do not accept the premise that any step is so easy that there cannot beerrors. For example, one error that should always be listed for a procedure step is simplyfailing to do that step. There can be a number of reasons for skipping a step in aprocedure, but it is one of the most common human errors.5.7.4 Relate <strong>Error</strong>s to other Steps, Components, Etc., if AppropriateOne of the weaknesses of FMEA-based risk analysis methods is their inherent lack of amechanism to easily tie various errors together. We might find during the analysis thatcertain errors might cause (or prevent) other errors, or at least make other errors morelikely.For example, suppose it is apparent to the team that a certain error is much more likely tooccur if an error has already occurred in a previous step. This should definitely be notedin the <strong>HESRA</strong> documentation.In <strong>HESRA</strong>, we have provided a way to at least note these associations for later use. Inthe “comments” section of the analysis spreadsheet, the analysis team should explicitlynote any such connections, even if they are only theoretical or deemed to be unlikely toactually occur.5.8 Step 8 – Assign Rating for <strong>Error</strong> LikelihoodLikelihood refers to the overall probability, in nominal terms, of a particular error occurringdue to a specific cause. Each row in the <strong>HESRA</strong> spreadsheet applies to a single errorwith a particular cause. The exact same error with a different cause might have a verydifferent likelihood of occurring.Remember that the training <strong>and</strong> experience of users, as well as the operationalenvironment <strong>and</strong> performance shaping factors, will influence which likelihood rating isassigned to a particular error. The team should rate each error using the 5-point scaleshown in Table 4, below.<strong>HESRA</strong>: v7 Page 26 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodFor reference, the following table also shows the 5-point Likelihood scale from FAA’s<strong>Safety</strong> Management System (SMS). SMS <strong>and</strong> <strong>HESRA</strong> differ in likelihood definitions duemainly to the difference between human error rates <strong>and</strong> component failure rates.Table 4. <strong>HESRA</strong> <strong>Error</strong> Likelihood Ratings (FAA SMS Ratings/Terminology)<strong>Error</strong> Likelihood Rating Category <strong>Error</strong> Likelihood Rating Definition1(A)2(B)3(C)Extremely Likely(Frequent)Likely(Probable)Occasional(Remote)Likely to occur on the order of once every 3-4times the task is performed.Likely to occur on a regular basis, on theorder of once every 10 times the task isperformed.Likely to occur sporadically over the life of thesystem, on the order of once every 25 timesthe task is performed.4(D)5(E)Unlikely(Extremely Remote)Extremely Unlikely(Extremely Improbable)Not likely to occur more than 5-10 times overthe life of the system.Not likely to occur more than once or twiceduring the operational life of the system.Typically, it takes a risk analysis team some period of time before all members arecomfortable with the meaning of specific ratings. For some team members, this will bethe first time they have participated in a risk analysis of any kind, much less one in whichrisks of human error is considered. The HF Specialist should take the lead in the earlydiscussions of assigning likelihood ratings.5.8.1 Review Each <strong>Error</strong>When assigning likelihood ratings, the analysis team should consider each error in serialfashion. However, it is certainly acceptable during group discussions to compare <strong>and</strong>contrast one error with another. Often, this helps the team put particular errors inperspective. Also, depending on the specific equipment <strong>and</strong> task conditions, the exactsame error might be rated differently for different maintenance/operations task steps.5.8.2 Use Existing <strong>Error</strong>, Usability Test, <strong>and</strong> Other Data as AppropriateThere is often a lot of discussion among the team regarding the basis upon which errorlikelihood should be assigned. The HF Specialist <strong>and</strong> the ATO Scientist can provideinformation related to the specific error in terms of typical human capabilities <strong>and</strong> systemhistory, respectively. Also, the Maintenance SME <strong>and</strong> Trainer might have good insightsinto the typical field experience with a particular error.Beyond these sources of information, it is both possible <strong>and</strong> advisable to use any datathat might pertain. For example, maybe a number of known, similar errors have beencommitted <strong>and</strong> reported. In some cases, a formal usability test might have beenconducted on the system or procedure being analyzed. This is not likely for oldersystems, but usability test results might exist for newer systems.These data usually cannot be applied directly to an error mode under analysis. It is rare,but not unheard of, to find data for the exact error mode being evaluated. More often,existing error <strong>and</strong> usability data are best applied by informing the discussion among team<strong>HESRA</strong>: v7 Page 27 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Methodmembers. Also, it is very useful for team members to know that a particular error hasoccurred <strong>and</strong> that it caused consequences of specific severity.5.8.3 Look Especially for Tasks That Require Skills or Capabilities That<strong>Human</strong> Users Are Unlikely to PossessRequiring maintainers to complete a task in a way that challenges basic humancapabilities often precipitates human errors. For example, suppose a procedural steprequires the maintainer to remember readings from a measurement taken in a precedingstep. Such a task requires the use of short-term memory, which is notoriously prone toerror. Another example is a step that requires numbers or letters to be written down <strong>and</strong>used in a subsequent step. Transcription is also a task that is subject to fairly high errorrates.When the analysis team discusses potential errors that are due to such fundamentalviolations, the HF Specialist should make certain the team underst<strong>and</strong>s that the likelihoodof these types of errors should be elevated.5.8.4 Team Discussion <strong>and</strong> ConsensusIt is critically important that analysis team members come to a common underst<strong>and</strong>ing ofwhat the likelihood ratings mean <strong>and</strong> how to assign ratings to errors. As noted above, itwill usually take a new team some time, typically on the order of a couple of hours, tobecome comfortable assigning likelihood ratings. Things progress much more smoothlyafter that point.Team members who are not HF professionals often have a very difficult timeunderst<strong>and</strong>ing that assigning relatively high likelihood ratings does not reflect badly onthe people who will be performing the work being analyzed. In this case, of course, thosepeople are FAA ATC maintainers, so there will be some sensitivity in this regard. The HFSpecialist <strong>and</strong> the ATO Scientist can greatly facilitate this process by using examples <strong>and</strong>showing how system characteristics are usually the cause of high likelihood ratings – notlack of skill or motivation on the part of maintainers.5.8.5 Same <strong>Error</strong> for Different Task/StepAs was discussed above, the same error applied to a different maintenance/operationstask could have drastically different likelihood ratings. This point is an important point forthe analysis team to underst<strong>and</strong>, because the tendency will be to simply assign the samelikelihood rating to identical errors.The use of procedure walkthroughs can illustrate this point nicely. Using the example ofmeasuring voltage at two test points, one of the errors will undoubtedly be placing thetest probes on the wrong test points. It will be easy to see that the likelihood of this errormight be quite high for test points buried in the innards of a circuit card cage <strong>and</strong> quitelow for test points brought out to the front panel of the card cage.5.8.6 Internal ConsistencyThe point was just that the same errors could have vastly different likelihoods ofoccurrence if they are subject to vastly different equipment configurations, taskenvironments, etc. However, the converse is also true. The same error should haveroughly the same likelihood ratings if the task circumstances are equivalent.Using the same example as in 4.7.5, suppose the analysis team is assessing the step ofmeasuring voltage at a pair of front panel test points. If there is another task that<strong>HESRA</strong>: v7 Page 28 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Methodduplicates this step at an adjacent panel, then both the types of errors <strong>and</strong> the likelihoodof occurrence of those errors should be roughly the same.One must be very careful, however, in determining where such internal consistency iswarranted <strong>and</strong> where it is not. Even a small change in the task environment can have asignificant effect on likelihood ratings. Suppose, for example, the second instance of thisstep occurs at a height that is 4 inches above the floor, whereas the first instance occursat chest level. Or, suppose the second instance of the task has to be done underextreme time pressure, whereas the first instance can be done in a deliberate, unhurriedmanner.5.8.7 Influenced by Elements in 4.4In section 4.4, we described how to set the perspective for a risk analysis. In terms ofassigning likelihood ratings, this perspective is very important. All of the elementsdescribed in 4.4 can <strong>and</strong> should influence the analysis team’s discussion <strong>and</strong> ultimatedecision regarding the likelihood rating for each error mode.5.9 Step 9 – Assign Severity RatingThe severity scale is presented in Table 6. It reflects what could happen if the particularerror under consideration is actually committed. A Severity rating should be assigned ifone or more of the definition statements apply.Of the three ratings that are assigned during a human error risk analysis, the severityrating is the least subject to the wide variances in human behavior. It is also the leastamenable to remediation. In other words, the severity of an error outcome is notparticularly dependent on anything other than the design of the system.This is not to say that the severity cannot be reduced, but, for a given system design,there is not likely to be too much argument about its potential outcome. There are anumber of considerations related to the severity rating. These are described below.<strong>HESRA</strong>: v7 Page 29 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTable 6. Severity Rating Scales (FAA SMS Category Names)Severity Rating Category Severity Definition1 Catastrophic(Catastrophic)2 Critical(Hazardous)• Serious injury, death, permanent loss ofone or more equipment functions• Extended loss of function/service• Major increase in maintainer or ATCworkload• Increased safety risk for FAA personnel• Loss of positive A/T control• Extended reduction of safety margin• Serious injury or moderate temporary lossof equipment function• Moderate increase in maintainer or ATCworkload• No safety margin for FAA personnel• Potential loss of A/C separation• Brief reduction in local safety margin3 Significant(Major)• Moderate injury or moderate equipmentdamage• Loss of redundancy for a critical component• Slight increase in maintainer or ATCworkload• Decreased safety margin for FAA personnel• Increased risk should additional errors orequipment failures occur• Potential increased stress on remainingfunctional equipment4 Marginal(Minor)5 Negligible(No <strong>Safety</strong> Effect)• Minor injury or slight equipment damage• Work around• Loss of redundancy for a non-criticalcomponent• Increased risk of more serious effects• Minimal decrease of safety margin• No injury or equipment damage• No significant effect ono safetyo function/serviceo schedule5.9.1 Worst-Case Scenario<strong>Error</strong>s can have different effects in different circumstances. Our perspective during<strong>HESRA</strong>, however, should be to identify the worst-case scenario when assigning severityratings. This is not to say that the analysis team has to imagine the most bizarrecombination of circumstances imaginable to arrive at the severity rating. However, whenconsidering a number of possible outcomes of an error, the team should choose theoutcome with the most severe consequences.<strong>HESRA</strong>: v7 Page 30 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.9.2 Team Discussion <strong>and</strong> ConsensusThere is not likely to be a wide range of opinions on the analysis team regarding theseverity of outcomes for a particular error. Remember, the assumption for this part of theanalysis is that the error will occur. There is no need to be concerned about thelikelihood that it will occur or how its effects might be mitigated. For purposes of severityratings, assume the error will occur <strong>and</strong> that it will not be detected or mitigated.5.9.3 Account for Conditional <strong>Error</strong>s, Sequence of <strong>Error</strong>s, Etc.This is the place in <strong>HESRA</strong> where the analysis team can take into account errorsequences <strong>and</strong> conditional consequences, i.e., errors with consequences that can varygreatly based on other errors. For example, consider an error like performing aprocedural step out of sequence. The consequences of that type of error might varygreatly depending on where, in the overall procedural sequence, the step is actuallyperformed.Also, the severity of an error might change drastically if another error has beencommitted earlier in the procedure. These types of combinations of errors <strong>and</strong> sequencedependencies should be noted on the <strong>HESRA</strong> spreadsheet. Even if the team decides toassign a severity rating that is not related to a particular combination or sequence, it is agood idea to document the fact that the team considered them <strong>and</strong> recognizes theirmagnification effect on the severity of the error outcome.5.9.4 Not Greatly Influenced by Elements in 4.4Since the primary assumption in this part of the analysis is that the error under discussionwill occur, the performance shaping factors listed in 4.4 don’t play a role in the severityrating. We’re assuming that the error will occur <strong>and</strong> will not be mitigated. Performanceshaping factors don’t matter for severity ratings.5.9.5 Internal ConsistencyOnce the team has assigned a severity rating for a particular effect, then that ratingshould also be applied to any other identical effects listed in the <strong>HESRA</strong> spreadsheet.For example, if an error will cause the loss of a particular system component, then losingthat component will typically have the exact same severity rating regardless of theparticular error that causes its loss.5.10 Step 10 – Assign Rating for Detection <strong>and</strong> RecoveryThe final rating to be assigned by the analysis team is related to the likelihood <strong>and</strong>timeliness of detecting <strong>and</strong> recovering from an error. Detection means that someone orsomething realizes that an error has been committed. It is important to underst<strong>and</strong> thatan error can be detected by an automated piece of equipment or by a person. Each errorshould be rated for detection <strong>and</strong> recovery, using the rating scales in Table 7.The FAA SMS framework has no equivalent for the <strong>HESRA</strong> recovery scale <strong>and</strong> recoveryis not considered in the SMS determination of risk.<strong>HESRA</strong>: v7 Page 31 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTable 7. Detection/Recovery Rating ScalesRecovery Rating Category Recovery Scale Definition1 Very Low Detection <strong>and</strong>/or recovery are notlikely to occur until the errorpropagates through the operationalsystem(s)2 Low Detection <strong>and</strong>/or recovery aredelayed until the error causes atleast some serious effects on theoperational system(s)3 Moderate Detection <strong>and</strong>/or recovery occurafter a moderate delay, but in timeto prevent all but minor effects onthe operational system(s)4 High Immediate or very quick detection.Recovery requires manualintervention, but is likely to be donebefore the error causes anyoperational effects.5 Very High Immediate, automatic detection<strong>and</strong>/or recoveryFor maintenance/operations procedures, the person who typically detects an error is themaintainer who committed the error. It can also be a person in the vicinity of themaintainer who commits the error or someone monitoring the system on which themaintainer is working.Recovery is the process of finding <strong>and</strong> “fixing” the error so it does not cause a harmfuleffect. An underlying assumption in all human error risk analysis processes is that oncedetected, the person who detects an error will recover from it if it is possible to do so.The catch here is that it might not be possible for people to recover from an error in atimely manner.5.10.1 Automatic RecoverySometimes, error recovery requires no human intervention, such as automaticallyswitching over to a redundant backup system when an error causes the primary systemto fail. It is entirely possible for automatic recovery to take place without a human beingnotified that an error has caused a system failure. This can leave the system in avulnerable state because subsequent errors or failures cannot be automaticallyrecovered. Fortunately, these instances tend to be rare.The analysis team should consider this type of recovery when performing the analysis.The tendency is to assume that it will occur with no thought about what happens if it fails.The mechanism for automatic recovery should be identified in the spreadsheet templatecomment field <strong>and</strong> discussions held with others to underst<strong>and</strong> the implications of a failureof the automatic recovery.<strong>HESRA</strong>: v7 Page 32 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodIn all instances in which automatic recovery is assumed to the overall risk of a particularerror-cause pair, an additional <strong>HESRA</strong> spreadsheet entry should be included in which theautomatic recovery is assumed to be inoperative. These entries should be visually codedso it is obvious to those examining the spreadsheet or subsequent report(s).5.10.2 Composite of Detection <strong>and</strong> RecoveryThe detection <strong>and</strong> recovery scale, which is shown in Table 4, is a composite scale. Thatis, the anchor points on the scale are defined in terms of both detection <strong>and</strong> recovery.This is intentional, but it might cause some confusion among analysis team members.The basic idea is this: We want to assign this rating based on the timeliness <strong>and</strong>likelihood that the effects of the error are blocked from propagating through the system.That is, we don’t want the effects of the error to spread.When one considers the risk of human errors, it is really the consequences of thoseerrors that we most want to avoid. The old expression “no harm, no foul” sums up thedetection <strong>and</strong> recovery processes in the overall risk analysis domain. Blocking harmfuleffects requires both detection <strong>and</strong> recovery, which is the reason these actions arecombined in a scale5.10.3 Influenced by SeverityBoth detection <strong>and</strong> recovery can be influenced by the severity of the effects of the error.For example, suppose a particular error causes a software application to crash in anobvious way. This is likely to be pretty easy to detect. Also, while a lot depends on theexact configuration of the computer, re-initiating a single application isn’t likely to beterribly difficult or time-consuming.However, suppose instead that an error crashes a server. That event might or might notbe so obvious, since some equipment can operate without talking to the server for someperiod of time. Also, bringing a server back up from a crash can be difficult <strong>and</strong> can takequite a bit of time.5.10.4 Influenced by Elements in 3.4Detection <strong>and</strong> recovery are very much dependent on human capabilities <strong>and</strong> limitations.As such, they can be heavily influenced by the performance shaping factors described in3.4. Consider these two fictional situations. In the first, the maintainer is operating in aquiet environment <strong>and</strong> is performing a familiar procedure. The system on which theprocedure is being performed is critical to ATC operations, but a hot backup system isoperating perfectly.In the second scenario, the maintainer is working on exactly the same system, but is in arather noisy <strong>and</strong> crowded location. The hot backup system is down for a period of timetherefore the system on which the maintainer is working is absolutely essential to ATCoperations.The maintainer commits an error that brings down the system on which he or she isworking. The factors that influence the maintainer’s ability to detect <strong>and</strong> recover from thiserror will be quite different for these two scenarios. In particular, environmental factors(noise, cramped workspace), time pressure (in the second scenario, it is critical to get thesystem back up as soon as possible), <strong>and</strong> external dem<strong>and</strong>s for action will likely causethe maintainer in the second scenario to require longer to mitigate the error.<strong>HESRA</strong>: v7 Page 33 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.11 Step 11 – Calculate Hazard Index <strong>and</strong> RPNOnce the appropriate ratings are entered into the <strong>HESRA</strong> spreadsheet, theHazard Index (HI) <strong>and</strong> <strong>Risk</strong> Priority Number (RPN) should be calculated. HI iscalculated by multiplying Likelihood <strong>and</strong> Severity. RPN is calculated bymultiplying the HI by the Detection/Recovery rating. When using the <strong>HESRA</strong>spreadsheet, these values are automatically calculated.5.12 Step 12 – Analyze CriticalityCriticality analysis is a general term used in the risk analysis domain. It sounds complex,but is actually very simple. The idea is that each of the errors identified <strong>and</strong> rated by theanalysis team must be categorized according to its overall potential to cause bad thingsto happen. That potential is, by definition, related to the Hazard Index <strong>and</strong> <strong>Risk</strong> PriorityNumber associated with each error.The categorization is driven by comparing the value of the HI <strong>and</strong> RPN for a specificerror-cause combination to pre-defined “breakpoints”, where a breakpoint is defined asbeing equal to or greater than some value of the HI or RPN. While the <strong>HESRA</strong>spreadsheet calculates these breakpoints <strong>and</strong> assigns a criticality value as illustrated inFigure 10, the process for arriving at those values is discussed below.5.12.1 HI CriticalityThere are a number of ways of considering the various ratings assigned to specificerrors. The <strong>Risk</strong> Priority Number (RPN) is a metric that takes into account all three errorratings, i.e., likelihood of occurrence, potential severity of effects, <strong>and</strong> recovery. However,it is also a good practice to consider each failure mode without regard for the likelihood ofdetection <strong>and</strong> recovery. Again, this is the Hazard Index, which can be found bymultiplying Likelihood of occurrence <strong>and</strong> potential Severity.Hazard Index is equivalent, at least in conceptual terms, to the “risk” assigned in the FAASMS framework. Those categories are used to help the analysis team identify errors thatshould be dealt with immediately, subject to further analysis, or other actions.It is quite possible for an error to have a very high hazard index, but a relatively low RPN.If this is the case, we might want to examine the recovery opportunities very carefully,since failing to recover from an error will have very serious consequences.The potential range of the hazard index for any particular error is illustrated in Table 9.<strong>HESRA</strong>: v7 Page 34 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTable 9. Criteria for Rank Ordering Hazard Index RatingHazardIndexValueCategory Definition/Action Notes20 - 25 ExtremelyLow• No system or safetyimplications, even withhigh recovery rating• One rating must be “5”.• The other rating can be “5” or “4”.12 - 16 Low • Unlikely to have systemor safety implications,even with high recoveryrating8-10 Moderate • Potentially significantsystem or safetyimplications• Moderate recoveryrating is required3 - 6 High • Significant system orsafety implications ifnot recovered• Outcome is highlydependent on recovery• Allows one rating of “3”.• Does not allow any rating to beeither “2” or “1”.• Allows one rating of “2”.• Does not allow any rating to be“1”.• Allows one rating of “1”.• Max rating is either both ratings of“2” or one• of “1” <strong>and</strong> the other “3” dependenton recovery1 - 2 ExtremelyHigh• Critical system orsafety implications ifnot recovered• Outcome is highlydependent on recovery• Each rating either “1” or “2”In the FAA SMS framework, Table 9 is supplanted by a “risk acceptability matrix”, whichis reproduced below as Table 10. In this matrix, various combinations of likelihood <strong>and</strong>severity are assigned a color code denoting one of three levels of acceptability denotedas “high”, “medium” <strong>and</strong> “low.” The full definitions for these risk acceptability levels arecontained in the FAA SMS Manual, however, they are briefly described as follows:• High – Unacceptable risk that must be reduced to “medium” or “low” before thechange being contemplated is implemented.• Medium – The minimum acceptable level of risk associated with a change. Thechange can be implemented, but most be tracked.• Low – An acceptable level of risk that allows contemplated changes to be madewithout further monitoring.<strong>HESRA</strong>: v7 Page 35 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTable 10. SMS <strong>Risk</strong> Acceptability MatrixSeverityLikelihoodNo <strong>Safety</strong>Effect5Minor4Major3Hazardous2Catastrophic1FrequentAProbableBRemoteCExtremelyRemoteDExtremelyImprobableEHigh <strong>Risk</strong>Medium <strong>Risk</strong>Low <strong>Risk</strong>5.12.2 RPN Criticality<strong>Risk</strong> priority, which is closely associated with the concept of “criticality”, is a somewhatarbitrary construct, in that artificial dividing lines have been established among thedifferent risk indices to form RPN categories, as illustrated in Table 11. Defining thesebreakpoints is not a science. However, such categorization is a useful exercise in that itallows one to prioritize the resources so they are directed at the most “serious” errors.The category breakpoints have been established on a purely arithmetic basis, as shownin Table 12.The FAA SMS framework does not recognize the construct of RPN <strong>and</strong> assigns no riskcriticality based on its value for any potential human error.Table 11. RPN <strong>Risk</strong> CategoriesRPN Category Definition/ActionN/A Single Failure Condition • Criticality will become “High <strong>Risk</strong>” if a componentfails or a software error occurs.90 - 125 Extremely Low <strong>Risk</strong> • No system or safety implications.• No further design or evaluation efforts required.60 - 89 Low <strong>Risk</strong> • No significant system or safety implications.• Unlikely that significant design, training, orprocedural changes will be required.<strong>HESRA</strong>: v7 Page 36 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodRPN Category Definition/Action28-59 Moderate <strong>Risk</strong> • Potentially significant system or safety implications• Possible that significant design, training, orprocedural changes will be required.• If system is not yet deployed, error mode shouldbe further evaluated <strong>and</strong> then monitored duringusability testing.9 - 27 High <strong>Risk</strong> • Significant system or safety implications.• Likely that significant design, training, orprocedural element will be required.• <strong>Error</strong> mode should be further evaluated <strong>and</strong>specifically addressed with usability testing.1 - 8 Extremely High <strong>Risk</strong> • Critical system or safety implications.• If an existing system, then immediate remediationshould take place.• If system is not yet deployed, significant design,training, or procedural changes are requiredbefore the system is deploy• <strong>Error</strong>s should be specifically addressed withusability testing (after a “fix” is made).Table 12. RPN <strong>Risk</strong> Category BreakpointsRPN Category DefinitionN/A Single Failure Condition Automatic detection <strong>and</strong>mitigation. No human interventionrequired.90 - 125 Extremely Low <strong>Risk</strong> All ratings are “5” or “4”.60 - 89 Low <strong>Risk</strong> One rating can be a “3”. Otherratings can be “5” or “4”.28-59 Moderate <strong>Risk</strong> No rating of “1” is allowed. Onerating of “2” is allowed.9 - 27 High <strong>Risk</strong> One rating of “1” allowed. Otherratings can be valid combinationsof “2”-“5”.1 – 8 Extremely High <strong>Risk</strong> All three ratings can be”2”, “1”, orany combination of “2” <strong>and</strong> “1”. Itis mathematically feasible for onerating to be a “3”, in which casethe other two ratings are one’s, orone “1” <strong>and</strong> one “2”<strong>Risk</strong> Priority Number (RPN) criticality breakpoints are based on the maximum values ofratings that go to make up the RPN5.12.3 Sort by Severity, HI, RPNOnce all the errors for all task steps have been rated, sort the errors in two ways. First,sort by Hazard Index (HI), which is the product of the likelihood <strong>and</strong> severity rankings.Second, sort by <strong>Risk</strong> Priority Number (RPN), which is the product of all three rankings.<strong>HESRA</strong>: v7 Page 37 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodAfter each sort, the analysis team should discuss those errors that rise to the highestcriticality levels. These are c<strong>and</strong>idates for immediate action to reduce their risk.Why do we evaluate errors according HI <strong>and</strong> RPN? By examining the highest criticalityerrors from each sort, instances in which detection <strong>and</strong> recovery play major roles inreducing overall risk should be fairly obvious. Team members should be inherentlyskeptical of the RPN for those errors that have the lowest hazard indexes.For errors that have high HI’s <strong>and</strong> low RPN’s, we are essentially saying “This error islikely to occur <strong>and</strong> its consequences are severe, but we’ll detect it <strong>and</strong> make sure thoseconsequences never occur.” This might be an accurate reflection of reality. However, itcan also be an instance of explaining away high-risk errors that are politically sensitive,don’t conform to current management or agency directives, or won’t be acknowledged forother reasons. Be careful about unsupported detection <strong>and</strong> recovery claims. In addition,even when there are low Detection <strong>and</strong> Recovery ratings, an error with a low HI shouldbe considered for remediation due to its potential consequences.5.12.4 Compare Levels of HI <strong>and</strong> RPN to Criticality BreakpointsAs noted in 3.12.1, the criticality categories are essentially defined by numericalbreakpoints. The Hazard Index can take on values from 1 to 25 (both the likelihood <strong>and</strong>severity scales are from 1 to 5). The <strong>Risk</strong> Priority Number can take on values from 1 to125. Therefore, each of the criticality categories must be defined in terms of the lowest<strong>and</strong> highest values of HI <strong>and</strong> RPN that would place the error risk in that category.In reality, it is easiest to program these categories into the <strong>HESRA</strong> spreadsheet <strong>and</strong> thenautomatically assign each error to the appropriate HI <strong>and</strong> RPN criticality category. If thisis done, then the sorting described above can be done using these criticality categories.5.12.5 Determine Action Requirements for Each <strong>Error</strong>Determining action requirements for each error is really a pre-determined step. Note inthe introduction to 3.12, that each criticality category has, as part of its definition, ageneral action assignment. Keep in mind we are not determining exactly what has to bedone. That is the next step in the process (3.13). At this stage, we are simply decidingwhether anything has to be done <strong>and</strong>, if so, how quickly must it be completed.For example, suppose the risk for a particular error falls into the “Extremely High <strong>Risk</strong>”criticality category. Based on the definition for that category, we know that somethinghas to be done <strong>and</strong> that it has to be done quickly. On the other h<strong>and</strong>, if the risk falls intothe “Extremely Low <strong>Risk</strong>” category, we know that no action may be necessary.5.13 Step 13 – Reduce <strong>Risk</strong>The output of the <strong>HESRA</strong> analysis is a list of procedural steps <strong>and</strong> human errors orderedaccording to their risk - higher risks earlier in the list. We developed the followingstrategy for developing risk reduction ideas: The <strong>HESRA</strong> team must develop at least oneremediation strategy for each high-risk error. In effect, the analysis process up to thispoint has identified elements of high-risk in the procedures that have been analyzed. Thenext logical question is “What do we do about it?” These deliberations should be<strong>HESRA</strong>: v7 Page 38 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method• Do not attempt to reduce risk by doing things automatically. In the analyticalphase, we considered all automatic mitigations. In this phase, let’s fix things theold-fashioned way – by actually fixing them.• The risk priority number has got to go UP as a result of the risk reduction ideas.It is not sufficient to increase likelihood <strong>and</strong> decrease the recovery score, thushaving no net effect on the risk priority number. For each idea we come up with,we have to go back <strong>and</strong> rate the likelihood <strong>and</strong> mitigation. If we can’t elevate theproduct of these numbers, then the idea isn’t really reducing the risk.• When attempting to increase likelihood or recovery, be sure to address the lowerrating first. For example, if the likelihood rating is a “4” <strong>and</strong> the recovery rating isa “1”, look at ways of increasing the recovery score before addressing thelikelihood score.• When developing risk reduction ideas, use the principle of “as high as isreasonably achievable”, or ALARA. Don’t presuppose that decreasing therecovery score from a “1” to a “2” will be good enough. Try to increase the scoreto a “5” <strong>and</strong> see what happens.• Every idea has to be reasonable from a technological <strong>and</strong> policy perspective. Itdoes us no good to come with ideas that everyone acknowledges will simply notget done within the existing technology <strong>and</strong> policy framework of the FAA. We’renot so much worried about money resources, since the team probably shouldn’tbe worrying about paying for the risk reduction.• Even if most (or all) of the identified error modes have low risk ratings, the teamshould attempt to determine whether the overall effect of lots of little problemsreduces the safety of the system.The <strong>HESRA</strong> team members should bring to the risk reduction discussion their individualperspectives <strong>and</strong> expertise, along with what they know is feasible <strong>and</strong> possible. It is atthis point in the process that technical, management, <strong>and</strong> policy “fixes” can be reasonablyconsidered. Fixes that are not possible, for whatever reason, should be removed fromfurther consideration. It does no good to propose a risk reduction strategy that st<strong>and</strong>salmost no chance of being implemented because of cost, policy, technical, or otherembedded issues.The analysis team does not have to completely develop the remediation strategies, onlysuggest alternatives that will work to reduce the scale ratings that caused the low HI orRPN. The details of each remediation strategy should be worked out apart from the riskanalysis.5.13.3 Assign Ratings Assuming RemediationThe purpose of suggesting remediation strategies is to reduce the risk indexes thatcaused the error to move to a high criticality category. A reasonable question, then, is“How much will this solution reduce the risk?” In order to assess the magnitude of therisk reduction, the analysis team should assume that the suggested remediation isdeveloped <strong>and</strong> done correctly. They should then assign provisional ratings on each ofthe three risk scales for that remediation using the spreadsheet.<strong>HESRA</strong>: v7 Page 40 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.13.4 Assess Impact on HI <strong>and</strong> RPNOnce the provisional risk scale ratings are assigned for a suggested remediation, it willthen be feasible to calculate new HI <strong>and</strong> RPN values. To state the obvious, the HI <strong>and</strong>RPN should be higher for the remediation than for the original error. It is up to theanalysis team to determine whether the magnitude of risk reduction is sufficient orwhether another or more remediation is necessary.At this point, assigning remediation risk ratings is a conceptual exercise. That is, theanalysis team is assigning risk ratings as if a remediation is done, as if it is donecorrectly, etc. However conceptual the exercise might be, it forces the analysis team toconsider <strong>and</strong> discuss remediation options with an eye towards actually reducing risk.5.13.5 Iterate Remediation if <strong>Risk</strong> Is Not Sufficiently ReducedRemediation typically has the goal, perhaps unstated, of moving the overall risk of anerror down to the lowest two criticality categories, i.e., “Low” or “Extremely Low”. At thevery least, remediation should move the error down one level of criticality. For example,if the original risk analysis places the error mode into an “Extremely High <strong>Risk</strong>” riskcategory, the remediation will aim to at least move it out of that category. When riskscale values are assigned to the remediation <strong>and</strong> risk indexes are re-calculated, this goalwill have been met, or not.The overall risk can be reduced, but the error might still fall into the same criticalitycategory as before the remediation. Since criticality is assessed only categorically, sucha reduction is not likely to be acceptable. Therefore, another form of remediation shouldbe identified <strong>and</strong> the process repeated.It is entirely possible that the risk analysis team will be unable to identify a remediationthat will sufficiently lower the criticality of an error. How could this happen? Well, itmight be that the severity of the effects of an error are so pronounced that slightreductions in the likelihood of occurrence or mitigation don’t lower the overall criticalityenough. Severity of effects is not typically amenable to easy remediation.The effect of losing approach radar, for example, is whatever that effect might be. Theanalysis team isn’t going to be able to change that effect without, perhaps, adding aredundant radar system, which is not something they’re likely to be able to do.If sufficient remediation cannot be identified, then the error should be flagged for furtheranalysis. The fact that the analysis team, with all their expertise, cannot think of effectiveremediation is an indication that the procedure <strong>and</strong>/or the system itself might need someserious re-design.5.14 Step 14 – Produce <strong>Risk</strong> <strong>Analysis</strong> ReportThe product of the <strong>HESRA</strong> analysis is a report that more or less encapsulates the riskanalysis process – as applied to this particular procedure or system. The components ofthe report are described below.<strong>HESRA</strong>: v7 Page 41 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) Method5.14.1 Overview of System <strong>and</strong> Procedures AnalyzedThe report should briefly describe the target of the analysis effort. This might be aparticular facility, a system that is being developed, a series of maintenance/operations oroperational procedures, etc. It should also indicate the reason(s) for conducting theanalysis. For example, was the analysis prompted by a particular event, by arequirement in the development or procurement process, by a pending change to asystem or facility, etc.5.14.2 General Statement of FindingsIn general, what did the risk analysis reveal? This part of the report should not providethe details of the analysis. Rather, a prose statement describing in general terms theoutcome of the analysis. For example, the team might report that they found theprocedure to be generally free of high-risk elements, with the exception of a few tasks forwhich certain high-probability errors could have severe consequences. If these high-riskelements can be easily “fixed” (or not), then the report should so state.5.14.3 Overall Recommendations Related to System <strong>and</strong> ProceduresIt is a fact of life that very few people are going to read a <strong>HESRA</strong> report from cover tocover. Therefore, it is important for the analysis team to provide a concise description oftheir recommendations regarding the procedure(s), facility, or system it has analyzed.This can be done with a bullet-point list of recommendations. The idea is to convey tothe reader the steps the team feels need to be taken to reduce whatever risks they foundto be too high.5.14.4 Explicit Listing of “High Priority” <strong>Error</strong>sInevitably, some errors will float to the top of the criticality hierarchy. These are the highriskelements that <strong>HESRA</strong> is designed to find <strong>and</strong>, hopefully, eliminate. The <strong>HESRA</strong>report should explicitly list <strong>and</strong> describe the errors that are considered high priority by theanalysis team. If there are characteristics of these errors that are counterintuitive, thenthe report should explain why <strong>and</strong> how the ratings were assigned. The readers of thereport will not be privy to the deliberations of the analysis team, so it is perfectlyreasonable to spend some ink explaining how risk associated with these errors came tobe rated so critically.5.14.5 Explicit Listing <strong>and</strong> Description of Proposed RemedyThis section of the report should be interwoven with the information contained in theprevious section. That is, for each high-priority error mode, the proposed remedy shouldbe listed <strong>and</strong> described. As noted in the body of this document, the analysis team will notnecessarily know the details of each proposed fix. For example, a perfectly validrecommendation is to add a coding dimension for a procedural step that uses only colorcoding.The level of detail for this information should be sufficient for ATO management to assignresponsibility for developing <strong>and</strong> implementing the remedy.5.14.6 Link or Provide Access to <strong>HESRA</strong> <strong>Analysis</strong> Spreadsheet (or WhateverSoftware Tool is Used to Support the <strong>Analysis</strong>)The risk analysis is embodied in the <strong>HESRA</strong> spreadsheet. All the work of the riskanalysis team, including the task breakdown, errors, ratings, criticality calculations, <strong>and</strong>notes are contained in the spreadsheet. <strong>HESRA</strong> spreadsheets tend to be fairly large.<strong>HESRA</strong>: v7 Page 42 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodTherefore, they are not amenable to simply being printed out <strong>and</strong> included in a paperreport.The best way to convey the <strong>HESRA</strong> spreadsheet information is to save a read-onlyversion of the file in an accessible location. In the <strong>HESRA</strong> report, include a link to thespreadsheet. Since the report will likely be distributed in soft form (as well as in hardcopy), the link will allow any interested person to see the actual, detailed information itcontains.5.14.7 Statement of Concurrence of <strong>Analysis</strong> TeamThe <strong>HESRA</strong> report should contain a section for team members to state their reasons forconcurrence with the general report findings. This might seem a bit odd, as in “What partof yes don’t you underst<strong>and</strong>?” However, various team members might want to clarify whythey agree with the report findings. Their reasons might not be obvious or intuitive.5.14.8 Statement(s) of Exceptions from <strong>Analysis</strong> Team MembersThe findings of the <strong>HESRA</strong> team are adopted by consensus, not unanimity. Therefore,some team members might want to document their views <strong>and</strong> objections to certainfindings. It is valuable to document these exceptions, since dissenting team memberscan raise quite valid issues that will shed light on the overall report findings. This is theappropriate place for team members to state their concerns <strong>and</strong> objections to individualrisk ratings, to the selection or elimination from consideration of particular risk reductionstrategies, or to any other aspect of the <strong>HESRA</strong> process.5.15 Step 15 – Assign Remediation ActionsThis is not within the purview of the risk analysis team, but should be done by ATOmanagement with the advice of the appropriate risk analysis team members. It isincluded in this method description for the sake of completeness. Remediation should beassigned to specific people or organizations. Without such assignment, it is unlikely thatthe remediation will be undertaken <strong>and</strong> completed in a timely manner.5.16 Step 16 - Monitor Remediation to EnsureActions Are CompletedAs with the previous step, monitoring remediation is not within the scope of the analysisteam’s work. However, it is critically important that the work of implementing the team’srecommendations be monitored until it is complete. This step is the responsibility ofappropriate ATO management.<strong>HESRA</strong>: v7 Page 43 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodAppendix A – Definition of TermsBrief – A period of time on the order of minutes, up to one hour.Delays – Any incremental time added to scheduled departure or arrival times due to degradedATC facilities operation.Extended – A period of time on the order of hours, or longer.<strong>Safety</strong> Margin – The buffer between minimal local ATC safety, i.e., ability to maintain positiveA/C control, <strong>and</strong> the current level of safety.Maintainer <strong>Safety</strong> – The ability of System Specialists to perform their job tasks without asignificant risk of injury.Maintainer Workload – The current requirement for physical, perceptual, <strong>and</strong> cognitive capacityto perform job tasks related to maintaining <strong>and</strong>/or restoring ATC functions <strong>and</strong> services.Function – The ability of hardware, software, communication channels, etc., to support ATCtasks.Service – The ability of the ATC facilities to provide a specific functional capability to A/C <strong>and</strong>ATC. Examples include radar, ILS, A/G comm., etc.<strong>HESRA</strong>: v7 Page 44 of 46


FAA <strong>Human</strong> <strong>Error</strong> <strong>Risk</strong> <strong>Analysis</strong> (<strong>HESRA</strong>) MethodAppendix B – Schematic of the <strong>HESRA</strong> Matrix<strong>HESRA</strong>: v7 Page 45 of 46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!