SOUTH AFRICA’S
1HAwfit
1HAwfit
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
METHODOLOGICAL APPENDIX<br />
SELECTION OF PEERS<br />
Global peer cities were selected based on economic<br />
characteristics and competitiveness factors.<br />
Classifying and identifying peers allows policymakers<br />
and stakeholders to better understand the position of<br />
their economies in a globalized context as well as to<br />
conduct constructive benchmarking.<br />
To select peers we utilized a combination of principal<br />
components analysis (PCA), k-means clustering, and<br />
agglomerative hierarchical clustering. 1 These commonly<br />
used data science techniques allowed us to<br />
group metro areas with their closest peers given a set<br />
of economic and competitiveness indicators. For this<br />
report we selected 14 economic variables: population,<br />
nominal GDP, real GDP per capita, productivity<br />
(defined as output per worker), total employment,<br />
share of the population in the labor force, and<br />
industry share of total GDP (8 sectors). 2 We included<br />
seven additional variables that measure one of the<br />
four quantitative dimensions of the competitiveness<br />
analysis framework used in this report. The variables<br />
included are: share of the population with tertiary<br />
education (talent), stock of Greenfield foreign direct<br />
investment (FDI) (trade), number of international<br />
passengers in 2014 (infrastructure), number of highly<br />
cited papers between 2010 and 2013 (innovation),<br />
mean citation score between 2010 and 2013 (innovation),<br />
and average internet download speed in 2014<br />
(infrastructure).<br />
Our analysis proceeded in three steps. First, we<br />
applied PCA to reduce the number of dimensions<br />
of our data by filtering variables that are highly<br />
interrelated while retaining as much variance as<br />
possible. PCA generates “components” by applying<br />
a linear transformation to all the variables. 3 To<br />
successfully perform our clustering algorithm we<br />
selected the number of components that explain<br />
80 to 90 percent of the variance of a dataset. For<br />
this report we selected the first seven components,<br />
which accounted for 84 percent of the total variation<br />
of the data.<br />
The second stage applied a k-means algorithm to the<br />
seven components, a process which calculates the<br />
distance of every observation in our dataset to each<br />
other, then generates a cluster centroid and assigns<br />
each data point to the closest cluster. 4 K-means<br />
repeats this procedure until a local solution is<br />
found. This algorithm provides a good segmentation<br />
of our data and under most circumstances it is a<br />
sufficient method for partitioning data. 5 However<br />
k-means sometimes generates clusters with multiple<br />
observations, thus obscuring some of the closest<br />
economic relationships between metro areas. To<br />
improve the results of k-means we implemented<br />
a third step, hierarchical clustering, which follows<br />
a similar approach to k-means. Hierarchical<br />
clustering calculates Euclidean distances to all<br />
other observations, but generates a more granular<br />
clustering that permits clearer peer-to-peer<br />
comparison.<br />
BROOKINGS<br />
METROPOLITAN<br />
POLICY<br />
PROGRAM<br />
42