The relevance of delivery mode and other programme ...

More documents

Recommendations

Info

38 BLOK ET AL. / EFFECTIVENESS OF EARLY CHILDHOOD EDUCATIONmental conditions and one control condition. Two experimentalcomparisons were within the domain of our researchquestion. One was that between the CC (control condition,i.e., no preschool, no follow-up programme) and the EEcondition, in which students received preschool education forthe first 5 years of their life, followed by a supplementaryEducational Support Program from kindergarten through 2ndgrade. The second comparison we extracted from the availabledata was that between the CC condition and the EC condition,in which students received only the preschool programme andnot the follow-up programme. The data we used came fromRamey, Campbell, Burchinal, Skinner, Gardner, and Ramey(2000), who presented IQ data from 5 to 15 years old, andschool performance data from 8 to 15 years old.The results of the Chicago Child-Parent Center andExpansion Program (CPC&EP) have also been widelyrepresented in the research literature. We selected the outcomesreported in Reynolds (1994) as our reference data,because the presented intervention groups were the mostrelevant to our research question. We extracted one experimentalcomparison from these data, namely the full interventiongroup with follow-on (denoted as PS þ KG þ PG-3 byReynolds) with the non-CPC comparison group.We found two studies on the effectiveness of a supplementaryemergent literacy curriculum compared to a standardHead Start programme (Whitehurst, Epstein, Angell, Payne,Crone, & Fishel, 1994; Whitehurst, Zevenbergen, Crone,Schultz, & Velting, 1999). The second study is a replication ofthe first, and includes a follow-up of both the original cohortand the replication cohort. Unfortunately, the outcomes of thetwo studies were not reported independently, as the secondarticle (Whitehurst et al., 1999) combined the results of bothcohorts. This left us no choice but to use the outcomes of thesecond article only, as it concerned the biggest sample andprovided follow-up results. The two Whitehurst et al. studiestherefore resulted in one experimental comparison, namelyHead Start with an emergent literacy add-on contrasted with aHead Start-only condition.After all decisions had been made, there remained 34experimental comparisons.Coding of variablesThe experimental comparisons in the database were coded forseveral characteristics (see Table 1). Variables 1–3 concerndesign characteristics, variables 4–11 concern sample characteristics,and variables 12–17 concern characteristics of theexperimental intervention.Because most experimental comparisons resulted in multipleoutcomes, other variables (variables 18–25) were coded atTable 1Coding scheme for the experimental comparisons, and reliability of codingInter-coderVariable Scale reliability a1. Subject assignment 0. Strictly controlled (randomisation or matching at subject level); 1. no strict control 87(randomisation or matching at group level, post hoc comparison, or no control at all)2. Treatment fidelity 0. High in most respects; 1. unknown 1003. Intervention in control group 0. Standard programme, not under control of experimenter; 1. unknown programme 92or no programme at all4. Nation 0. USA; 1. other than USA 1005. Recency of programmeNumerical (minus 1900).93 b(year implementation started)6. Size of experimental group Number of students 1.00 b7. Size of control group Number of students .99 b8. Mean age of students at onset of study Number of months (before birth coded as 0) .96 b9. Percentage of students from ethnic Percentage.96 bminorities10. Level of education of parents 1. Low; 2. mixed; 9. unknown 8711. Level of income of parents 1. Low; 2. mixed; 9. unknown 9312. Delivery mode 1. Home-based; 2. centre-based; 3. combination of home- and centre-based 9613. Length of programme Number of months (a year equals 10 months, unless otherwise indicated by.99 bexperimenter)14. Intensity of programme Number of hours per week .91 b15. Continuation after K 0. No; 1. yes 10016. Inclusion of social or economical 0. No; 1. yes 85support17. Inclusion of coaching of parenting skills 0. No; 1. yes 8518. Effect size at pretest Numerical 1.00 b19. Standard error of pretest effect size Numerical .94 b20. Domain of the posttest 0. Cognition 1. socioemotional development 9321. Time of measurement of posttest Number of months after intervention ended, coded on a time scale of years 1.00 b22. Type of posttest score 0. Observed score; 1. gain score or score adjusted for covariates 9423. Type of posttest effect size 0. Derived by reviewers; 1. reported by experimenters 10024. Effect size at posttest Numerical 1.00 b25. Standard error of posttest effect size Numerical .94 ba Percentage of classifications agreed upon by the two coders, unless otherwise indicated.b Product–moment correlation between the codes of the two coders.
INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT, 2005, 29 (1), 35–47 39the level of effect sizes. Hedges’ unbiased estimate d was usedas an effect size estimate (variables 18 and 24). This statisticuses the within-group standard deviation as a method ofstandardisation. It includes a correction factor to obviate biasresulting from small samples. The standard error of the effectsize (variables 19 and 25) was estimated following Hedges andOlkin (1985, p. 86, Eq. 15). Whenever possible, we usedobserved scores to calculate effect sizes. Several experimenters,however, reported only gain scores or scores adjusted forcovariates, indicated by variable 22 (type of posttest score).Some reported outcomes were inherently negative, forinstance when behaviour ratings referred to negative behaviour.In these cases, outcomes were recoded simply bychanging the sign. This correction procedure was applied tothe studies by Goodson et al. (2000), Johnson and Walker(1987), Scarr and McCartney (1988), and Seitz, Rosenbaum,and Apfel (1985).Two independent coders coded all the studies. Inter-coderreliability was estimated by determining the rate of agreementin the case of a nominal scale, or the product–momentcorrelation in the case of an interval scale. The results arereported in the last column of Table 1. The reliability provedto be satisfactory, ranging between 85 and 100% for nominalvariables, and between .91 and 1.00 for interval variables. Inthe case of divergent codes, final codes were established bymutual agreement, and used in subsequent analyses.Many study designs either did not incorporate a pretest ordid not report sufficient statistics to estimate an effect size thatcaptured the initial differences between conditions at thepretest. We were able to determine pretest effect sizes for only40% of our cases. To prevent an excessive loss of data, wedecided to impute zero scores for missing effect sizes atpretesting. This value is close to the mean value we found forcases, which allowed us to estimate a pretest effect size (meanvalue 0.06 with a corresponding standard error of 0.04).Integration of effectsThe coding phase resulted in a file containing 207 differentoutcomes (171 in the cognitive domain, 36 in the socioemotionaldomain) from the 34 experimental comparisons. Weanalysed the data in two steps. We first aggregated effect sizesto the level of the experimental comparisons. This aggregationwas performed separately for each domain and time ofmeasurement, varying from 0 to 180 months after theintervention ended. This aggregation was conducted byweighted integration, in which the results were weighted ininverse proportion to their standard error (i.e., the greater thestandard error, the smaller the weight). The aggregated effectsizes and the corresponding standard errors were estimatedfollowing Hedges and Olkin (1985, p. 112, Eqs. 8 and 9). Thisaggregation model assumes the results within one study to behomogeneous and to differ from each other only on the basis ofrandom differences between the outcome variables. Thestandard errors of aggregated effect sizes are generally smallerthan the standard errors corresponding to the constituent effectsizes, which seems a fair reward for using more than oneoutcome measure. All calculations on aggregations were doneusing the Meta programme (Schwarzer, 1989). This first stepresulted in 85 different outcomes (71 in the cognitive domain,14 in the socioemotional domain).As a second step, outcomes or effect sizes were integratedinto an overall effect size, separately for each domain. Theintegration was performed according to the random effectsmodel (Hedges & Olkin, 1985). The model we specifiedacknowledges the hierarchical and longitudinal nature of ourdata. The model splits the effect size d ijt for experimentalcomparison j from study i at moment t into two components,namely a true effect size d ijt , and an error component e ijt . Thetrue effect sizes are assumed to vary across measurementmoments t, comparisons j, and studies i. The variance of d ijt isexplained by the regression model:d ijt ¼ g 0 þ g n Z nijt þ u ijt þ v ij þ w i (1)where g 0 is the grand mean, Z nijt are characteristics of thestudies (n being the index referring to the characteristics),comparisons and measurement moments, and u ijt ,v ij , and w iare residual error terms at the three levels distinguished. Themodel makes it possible to distinguish between three variancecomponents, viz. s 2u (the variance between measurementmoments t), s 2 v (the variance between experimental comparisonsj ), and s 2 w (the variance between studies i ). The modelalso enables testing whether any of the parameter variances aresignificantly different from zero with the test statistic Q. Ifstudy outcomes are heterogeneous, it is worthwhile trying torelate the heterogeneity to the various characteristics Z nijt .Ifnot, the study outcomes are homogeneous and no explanatoryvariables need to be introduced in equation (1). Thespecification and testing of models was carried out withMLwiN, using restricted maximum likelihood estimation(Goldstein et al., 1998; Hox, 2002). Analyses were performedseparately for both domains (cognition, socioemotional development).ResultsDescription of the studies in the databaseThis subsection briefly describes the studies in our database,which yielded 34 different comparisons (Table 2).Assignment of the subjects to the different conditions of thecomparison proceeded according to strict guidelines (atrandom, by matching, or by blocking) in only 16 cases. Inother cases, less strict procedures were followed (e.g., randomassignment or matching of intact groups), or assignment wasnot under the control of the investigator.Treatment fidelity was reported to be high in all or mostrespects for 11 of the 34 comparisons. For the othercomparisons, no information could be found. However, thisdoes not necessarily mean that the treatment was jeopardised.We found the same lack of information with respect to thecontrol condition. Students in the control condition mostlyfollowed a ‘‘standard programme’’.The sample size was generally small, averaging 77 for boththe experimental and the control conditions. This averageexcludes the outlying large sample size of the study byGoodson et al. (2000), which featured about 1600 childrenin both conditions. The experimental group contained morethan 100 students in only 8 of the 34 comparisons. Evidently,such small sample sizes imply generally low power to detect adifference in outcomes. Most students belonged to an ethnicminority group (average: 81%, taking experimental and controlgroups together). Median student age at the start of theintervention programme showed considerable variation, rangingfrom pre-birth to 64 months (average 37 months). Boththe socioeconomic status and the income of parents were
Page 1 and 2: International Journal of Behavioral
Page 3: INTERNATIONAL JOURNAL OF BEHAVIORAL
Page 7 and 8: INTERNATIONAL JOURNAL OF BEHAVIORAL
Page 13: INTERNATIONAL JOURNAL OF BEHAVIORAL

The relevance of delivery mode and other programme ...

Create successful ePaper yourself

Delete template?

Save as template?