Is headspace making a difference to young people’s lives?
Evaluation-of-headspace-program
Evaluation-of-headspace-program
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Appendix C<br />
Data Cleaning and Analysis<br />
This section describes the challenges that were encountered in the process of survey data cleaning.<br />
While quality problems were present within single surveys, the need <strong>to</strong> integrate multiple surveys with<br />
different formats added <strong>to</strong> the complexity of the process of data cleaning. The task of data cleaning<br />
involved detecting and removing errors and inconsistencies from data in order <strong>to</strong> improve its quality<br />
and minimise their impact on the analyses. Given the large scale and the complexities of the data<br />
cleaning exercise, a number of quality assurance strategies were put in place <strong>to</strong> minimize the scope<br />
for error. These included code walk-throughs <strong>to</strong> ensure that no errors were present, continuous<br />
checking of data outputs and spot checks of individual records.<br />
The following main problems were encountered as part of the data cleaning process:<br />
Definition of key variables<br />
Defining variables central <strong>to</strong> the evaluation involved a number of challenges associated with design<br />
problems in the surveys. First, information on some of the key variables was incomplete (e.g. only<br />
the study status of those at school could be captured in intervention group and 18-25 years old<br />
comparison group surveys). As a result, such variables could not be meaningfully utilised in the<br />
analysis. Second, some survey questions were included in one of the two waves only. For arguably<br />
time-invariant variables, such as postcodes, information from one wave, where possible, was carried<br />
over <strong>to</strong> the next one. However, time-varying variables could not be utilised in such instances (e.g.<br />
questions on self-harm and suicidal intentions/attempts in YMM were asked in wave 1 only). Third,<br />
information on some variables important for the analysis was not included in surveys and had <strong>to</strong><br />
be merged from external sources. These included information on remoteness and socio-economic<br />
status of respondents’ residential areas, where it was assigned from external sources based on the<br />
reported postcodes.<br />
Representativeness of surveys<br />
The representativeness of surveys is essential for generalising the results of the analysis. Survey<br />
weights, if included in a survey, are commonly utilised means <strong>to</strong> achieve representativeness. While<br />
the intervention group survey did not include weights, our comparisons across a range of observable<br />
characteristics between the survey individuals and <strong>to</strong>tal <strong>headspace</strong> clients (as captured by the<br />
hCSA dataset) confirmed that it can be used <strong>to</strong> make inferences on the <strong>headspace</strong> population as<br />
a whole. Survey weights were provided with both comparison group surveys. YMM survey weights,<br />
when applied, led <strong>to</strong> results supporting the representativeness of its participants over the general<br />
population of 12-17 years olds as captured in the 2011 Census data. No such outcome has been<br />
achieved for the comparison survey of 18-25 years olds (one potential problem is the use of limited<br />
variables (age, state and gender) as benchmarks from which <strong>to</strong> construct the weights). The results<br />
therefore need <strong>to</strong> be interpreted with this issue taken in<strong>to</strong> account.<br />
Alignment of surveys<br />
The process of arriving at a single dataset based on multiple surveys involved a number of<br />
complexities. The merging of eight surveys required attempts <strong>to</strong> resolve inconsistencies involving<br />
data representations, units, measurement periods, etc. Additionally, correctly identifying individuals<br />
across two waves was not a simple task due <strong>to</strong> some inconsistencies in identifiers that needed <strong>to</strong><br />
be resolved through alternative approaches, such as matching based on a number of observable<br />
characteristics of individuals. The number of observations and variables included in the final merged<br />
dataset had <strong>to</strong> be compromised in some cases due <strong>to</strong> inability of the evaluation team <strong>to</strong> satisfac<strong>to</strong>rily<br />
deal with some of these issues.<br />
Treatment groups for survey data analysis (DID method)<br />
Two treatment groups were selected <strong>to</strong> match <strong>to</strong> and draw comparisons <strong>to</strong> the ‘<strong>headspace</strong> treatment’<br />
group - <strong>young</strong> people who received no treatment and <strong>young</strong> people that received another mental<br />
health treatment. The ‘<strong>headspace</strong> treatment’ group comprises all persons within the <strong>headspace</strong><br />
intervention group survey who had not completed their treatment by the first wave of data collection.<br />
This group was recruited from <strong>headspace</strong> centres over a 6 months period from 6 December 2013 <strong>to</strong><br />
Social Policy Research Centre 2015<br />
<strong>headspace</strong> Evaluation Final Report<br />
177