01.12.2015 Views

SOUTH AFRICA’S

1HAwfit

1HAwfit

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

METHODOLOGICAL APPENDIX<br />

SELECTION OF PEERS<br />

Global peer cities were selected based on economic<br />

characteristics and competitiveness factors.<br />

Classifying and identifying peers allows policymakers<br />

and stakeholders to better understand the position of<br />

their economies in a globalized context as well as to<br />

conduct constructive benchmarking.<br />

To select peers we utilized a combination of principal<br />

components analysis (PCA), k-means clustering, and<br />

agglomerative hierarchical clustering. 1 These commonly<br />

used data science techniques allowed us to<br />

group metro areas with their closest peers given a set<br />

of economic and competitiveness indicators. For this<br />

report we selected 14 economic variables: population,<br />

nominal GDP, real GDP per capita, productivity<br />

(defined as output per worker), total employment,<br />

share of the population in the labor force, and<br />

industry share of total GDP (8 sectors). 2 We included<br />

seven additional variables that measure one of the<br />

four quantitative dimensions of the competitiveness<br />

analysis framework used in this report. The variables<br />

included are: share of the population with tertiary<br />

education (talent), stock of Greenfield foreign direct<br />

investment (FDI) (trade), number of international<br />

passengers in 2014 (infrastructure), number of highly<br />

cited papers between 2010 and 2013 (innovation),<br />

mean citation score between 2010 and 2013 (innovation),<br />

and average internet download speed in 2014<br />

(infrastructure).<br />

Our analysis proceeded in three steps. First, we<br />

applied PCA to reduce the number of dimensions<br />

of our data by filtering variables that are highly<br />

interrelated while retaining as much variance as<br />

possible. PCA generates “components” by applying<br />

a linear transformation to all the variables. 3 To<br />

successfully perform our clustering algorithm we<br />

selected the number of components that explain<br />

80 to 90 percent of the variance of a dataset. For<br />

this report we selected the first seven components,<br />

which accounted for 84 percent of the total variation<br />

of the data.<br />

The second stage applied a k-means algorithm to the<br />

seven components, a process which calculates the<br />

distance of every observation in our dataset to each<br />

other, then generates a cluster centroid and assigns<br />

each data point to the closest cluster. 4 K-means<br />

repeats this procedure until a local solution is<br />

found. This algorithm provides a good segmentation<br />

of our data and under most circumstances it is a<br />

sufficient method for partitioning data. 5 However<br />

k-means sometimes generates clusters with multiple<br />

observations, thus obscuring some of the closest<br />

economic relationships between metro areas. To<br />

improve the results of k-means we implemented<br />

a third step, hierarchical clustering, which follows<br />

a similar approach to k-means. Hierarchical<br />

clustering calculates Euclidean distances to all<br />

other observations, but generates a more granular<br />

clustering that permits clearer peer-to-peer<br />

comparison.<br />

BROOKINGS<br />

METROPOLITAN<br />

POLICY<br />

PROGRAM<br />

42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!