Because they interact with the policy process, initiatives focused on context share many things incommon with advocacy and policy change ef<strong>for</strong>ts in general. Both may include similar strategiessuch as coalition building, communication campaigns, grassroots organizing, and media advocacy.And both are “hard to measure” because they evolve over time and their activities and outcomes canshift quickly. 29These similarities in strategy and purpose allow systems initiatives focused on context to draw fromthe now growing body of work on advocacy and policy change evaluation to identify useful evaluationapproaches. 30 For example, evaluation methods that are applicable to both advocacy ef<strong>for</strong>ts andsystems initiatives may include public polling, media tracking, policy tracking, policymaker or bellwetherinterviews, or intense-period debriefs. 31ExamplesThe Build Initiative evaluation uses a theory of change approach. The initiative has three theories ofchange that focus on 1) what an early childhood system must include to produce results; 2) the actionsor strategies needed to build an early childhood system, and 3) what special role outside Build supportscan provide to catalyze change. 32 Since Build began in 2002, the evaluation has focused on thesecond theory of change. In recent years, the evaluation has attempted to plausibly trace state-levelpolicy and funding changes back to Build Initiative activities.The Child and Family Policy Center leads the evaluation and partners with local evaluators to collectdata in the Build states. Local evaluation partners document theory of change components using acase study approach that employs methods they determine are most relevant in their state, such asinterviews, surveys, document review, and participant observation. Evaluators use data from thesemethods to critically examine the state’s progress around the Build theory of change and to determinewhether system-related results can be linked back to Build Initiative activities. The Child and FamilyPolicy Center then produces an annual report on overall initiative progress that includes a cross-stateanalysis of results.Another example comes from the Urban Health Initiative evaluation conducted by New YorkUniversity’s Center <strong>for</strong> Health and Public Service Research. In a unique design choice, evaluatorsintegrated a theory of change approach with a quasi-experimental comparison group design.Evaluators identified 10 non-initiative cities to compare with initiative cities on outcome and impactmeasures, including leadership, collaboration, and the use of data. Like other theory of changeevaluations, evaluators compared program theory and experience; but they believed they couldstrengthen their approach by integrating a comparison group into the design to rule out alternativeexplanations <strong>for</strong> evaluation findings. 3329Harvard Family Research Project (2007). Advocacy and Policy Change. The Evaluation Exchange, 13(1). Cambridge, MA: Author.30Ibid.31For a description of the bellwether methodology see Blair, E. (2007). <strong>Evaluating</strong> an issue’s position on the policy agenda: The bellwether methodology. TheEvaluation Exchange, 13(1), 29. For a description of the intense-period debrief see Bagnell Stuart, J. (2007). Necessity leads to innovative evaluation approachand practice. The Evaluation Exchange, 13(1), 10-11.32Bruner, C. (2004). Toward a theory of change <strong>for</strong> the Build Initiative: A discussion paper. Retrieved on June 27, 2007 from http://www.buildinitiative.org/docs/TowardaTheoryofChange.doc.33Weitzman, B.C., Silver, D. & Dillman, K. (2002). Integrating a comparison group design into a theory of change evaluation: The case of the UrbanHealth Initiative. American Journal of Evaluation, 23(4), 371-385.19A <strong>Framework</strong> <strong>for</strong> <strong>Evaluating</strong> <strong>Systems</strong> <strong>Initiatives</strong>
2 <strong>Evaluating</strong> Components<strong>Initiatives</strong> that concentrate on a system’s components attempt to improve the system by shoringup its individual subsystems, programs, or interventions. For example, these initiatives may pilotnew programs, expand access to particular programs or services, or introduce quality improvementinitiatives.Evaluation QuestionsEvaluations of initiatives focused on components share much in common with traditional programevaluations, as they both assess individual programs or interventions. Also like program evaluations,initiative evaluations in this area address questions in two main areas—program implementation andprogram impacts. Key questions include:1) Did the initiative design and implement system components as intended?2) Did the components produce their intended impacts <strong>for</strong> beneficiaries?Evaluation MethodologiesAgain, because the focus is on individual programs or interventions, evaluations here can borrowfrom traditional program evaluation approaches which feature the systematic application of socialscience research designs and methods to assess the implementation and effectiveness of programsor interventions.Evaluations that examine implementation use some <strong>for</strong>m of program monitoring or process evaluation.Program monitoring addresses questions about 1) the extent to which the program is reaching itstarget population, 2) whether program delivery matches design expectations, and 3) what resourceshave been used to deliver the program. 34 Program monitoring often goes hand in hand with impactassessments (see below) as monitoring addresses questions about why a program was or was noteffective. A wide array of both quantitative and qualitative methods can be used <strong>for</strong> programmonitoring, such as observations, participant surveys or focus groups, staff member interviews, anddocument or record reviews.Evaluations that examine questions about program impacts may use experimental or quasiexperimentaldesigns that employ a range of possible quantitative or qualitative methods, althoughquantitative data generally prevail in impact assessments. These designs assign (randomly or nonrandomly)individuals to participant and non-participant groups and then compare those groupsusing, <strong>for</strong> example, repeated measurements. Experimental or quasi-experimental designs generallyprovide the most definitive attributions of causality and remain the program evaluation “goldstandard.” They are expensive to construct and implement, however, and random assignment maynot be appropriate <strong>for</strong> programs that feature enrollment inclusivity and openness because theyviolate the program’s design.34Rossi, P., & Freeman, H. (1993). Evaluation: A Systematic Approach. Newbury Park, CA: Sage Publications.20A <strong>Framework</strong> <strong>for</strong> <strong>Evaluating</strong> <strong>Systems</strong> <strong>Initiatives</strong>