Described in this appendix are the data and techniques used to construct the single metric of regional adaptive capacity. The approach is based on a statistical technique called principal component analysis (PCA), which has been widely used to create regional indexes of socioeconomic disadvantage, vulnerability, resilience and adaptive capacity (chapter 2).
Although the index of adaptive capacity has been used to rank regions according to their risk of failing to adjust to transitional pressures, it should not be used as a predictor of actual outcomes. The actual outcomes are the result of the strengths and weaknesses (as well as opportunities) within a region, the decisions made by many individual workers and businesses, the type and magnitude of disruptions that occur, and the ways external events continue to change over time. Many of these considerations have not been captured in the index (chapter 2). Consequently, the index has limited value for policy application but can be used as a litmus test to identify regions that might be at risk of failing to adapt to changing economic circumstances.
This appendix presents results for functional economic regions (FERs) (aggregations of Statistical Area Level 2 (SA2) areas, the construction of which is set out in appendix D), and SA2s. Reflecting stakeholder views, FER results are presented as the preferred results (chapter 4).
Section E.1 explains the method and the decisions made in using PCA to construct the index of regional adaptive capacity (including differences between two alternative methods to constructing indexes using PCA — single PCA and nested PCA approaches). Single PCA was chosen as the preferred approach for the adaptive capacity index.
Data sources and data transformations are described in section E.2. In part, the indicators included in the index are based on the five capitals framework outlined in chapter 2, and availability of data. The data for the indicators are drawn largely from the ABS 2016 Census of Population and Housing, the Social Health Atlases of Australia, CoreLogic property sale data, as well as a number of other ABS catalogues.
Section E.3 contains the PCA results for FERs for 2016. The sensitivity of each region’s index value was tested using bootstrapping and by examining the effect of excluding variables in the index (section E.4). Differences in the single PCA and nested PCA indexes are also compared. An overview of the results is provided in section E.5, with a detailed discussion in chapter 4.
For completeness, indexes were also created at the SA2 level for 2016 and at the FER level for 2011. These results are presented in sections E.6 and E.7 respectively.
An attachment and some supporting material accompany this appendix. Attachment A contains a referee report produced by Professor Robert Tanton that reviews the methods and results contained in a draft version of this appendix. Some variations of the analysis are included, which incorporate suggestions by the referee. In particular, the results of the nested PCA approach are included. Although the Commission prefers the single PCA approach for the analysis in chapter 4 of the report (reflecting stakeholder comments between the initial and final reports), results from the nested PCA can be considered an alternative. The R programming scripts and data, which can be used to replicate the results, are provided as supporting material on the Commission’s website, enabling anyone to verify the results or undertake their own analysis.
Further supporting material includes an Excel workbook that contains index scores for each FER in 2016 under the Commission’s preferred approach for constructing the index (single PCA). It includes a breakdown of the factors that contribute to each region’s index score, and the 90 per cent confidence intervals of each region’s score. Another Excel workbook contains similar spreadsheets for other sets of results described in this appendix, including FER 2016 results under an alternative approach (nested PCA), SA2 2016 results and FER 2011 results.
PCA is a method of summarising data by reducing the number of variables in a dataset into a new dataset with fewer variables (O’Rourke and Hatcher 2013, p. 2). The smaller set of variables can be used to construct indexes. This section begins by first providing a brief introduction to PCA, illustrated using a simple hypothetical example. Then it describes the way in which PCA was applied in creating the index of regional adaptive capacity.
101.Principal component analysis
PCA summarises data by creating a new set of variables called ‘principal components’. These are linear combinations of the original variables that are uncorrelated with each other and capture the total variation in the original dataset. The total number of principal components created is the same as the original number of variables. However, the first principal component accounts for the largest amount of variation in the original dataset, the second principal component accounts for the next largest amount, and so on.
Although principal components are uncorrelated with each other, they are correlated with the original variables. An interpretation of a principal component can be informed by assessing its correlation with the original variables. Insight into which variables are most relevant to explaining the variation in the data can be gained by examining the proportion of variance explained by a principal component, along with its interpretation.
Provided that the first few principal components capture a sufficiently large amount of the variation in the original data and can be interpreted in a meaningful way, an analyst can choose to retain just these principal components for further analysis (rather than the full set of principal components, or the full set of original data) (O’Rourke and Hatcher 2013, p. 3). The decision of how many principal components to retain is discussed further below.
PCA produces a score (a value for the new variable) for each observation in the dataset for each principal component created. For a PCA using an input dataset of observed variables, the formula for calculating observation ’s score for the principal component is:
is the principal component estimated for observation
is the standardised value of the variable for observation
is the weight attached to the variable of the principal component, estimated by the PCA.
A simple illustration
An example of how PCA transforms data is illustrated using a hypothetical dataset on employment and year 12 attainment rates for six regions (table E.1, step 1). The first step illustrates the standardisation of the original variables (by subtracting the mean of the variable from each observation, and then dividing the resultant value by its standard deviation). Standardised variables have means of zero and standard deviations of one. Standardisation ensures that variables with different units of measurement are treated on a comparable basis in the PCA.
PCA is then applied to the standardised dataset to generate the weights on each variable for each principal component (step 2a), as well as the principal components themselves (step 2b). In this example, most of the variation in the data can be represented by the first principal component, which accounts for 97 per cent of the total variation in the data. Therefore, this component summarises most of the variation in year 12 attainment and employment rates. It is highly correlated with both variables, and could be interpreted as a simple human capital index. An analyst could choose to retain just this principal component and capture most of the variation in the original data.
The transformation of data points from the original variables to the principal components is illustrated diagrammatically in figure E.1.
There are a number of options for creating an index from PCA results: using only the first principal component, equally weighting some or all components, or some system of varying weights. The approach adopted for this study weights each principal component by the extent to which that component explains the original dataset’s variance. That is, each retained principal component is weighted according to the proportion of variance in the original dataset that it explains. In the example in table E.1, if both principal components were retained, that would mean a much smaller contribution of the second principal component to the total index score. This demonstrates an advantage of the technique in summarising data in circumstances where a few principal components capture most of the variance in the original data.
Table E.1 Principal component analysis — illustrative transformation
Figure E.1 Principal component analysis — illustrative visualisation
Hypothetical dataset on year 12 attainment and employment rates
Determining the number of principal components to retain
Choosing the number of principal components to retain from a PCA requires judgment. Although there are guidelines, there are no strict rules on how to make this decision. Four criteria are commonly used (O’Rourke and Hatcher 2013, pp. 22–27).
The first criterion is the scree test, which involves plotting the eigenvalues (amounts of variance explained by the principal components respectively) in order from largest to smallest. This plot is known as a scree plot, and a hypothetical example is provided in figure E.2. If there is an elbowlike bend in the plot, with the first set of components before the bend having large eigenvalues (explaining a large amount of the total variation) and the other set of components from the bend onwards having relatively small eigenvalues, then the components in the first set are retained. In figure E.2, the first principal component (the only component before the bend) would be retained. In practice, there might not be clear bends in the plot and other criteria are considered.
Figure E.2 Example of a scree plot
The second criterion is to retain components with eigenvalues greater than one. Each standardised observed variable contributes one unit of variance to the total variance in the dataset, so any principal component that has an eigenvalue greater than one contributes more than that contributed by any one variable in the original dataset. Applying this criterion to the example used for figure E.2, the first two principal components would be retained because they have eigenvalues greater than one (as shown by the dashed line).
Cumulative proportion of variance explained
The third criterion involves retaining components until the cumulative proportion of variance explained is greater than a given threshold, usually 70 or 80 per cent (O’Rourke and Hatcher 2013, p. 19). Applying this to the example in table E.1, the first principal component would be retained because it alone captures 97 per cent of the total variation.
Finally, the interpretability of each component should be considered. Principal components are retained if the main factors contributing to those components (the variables with the largest weights or correlations, and their signs) can be interpreted in a meaningful way. In the example in table E.1, the first principal component is highly correlated with both year 12 attainment and employment, and could be interpreted as a measure of human capital.
There are two general approaches to constructing indexes using PCA. They each give different weights to indicators included in the index.
The first approach involves running a single PCA that includes all variables in a dataset (a more ‘datadriven’ approach). An example that uses this approach is the ABS SocioEconomic Indexes for Areas (SEIFA) (ABS 2013b).
The second is a nested PCA approach. This involves applying PCA independently to subgroups of variables in a dataset, then aggregating the results for each subPCA to obtain a single score for each observation (a more ‘conceptuallydriven’ approach). This approach has been used in Australia to construct indexes of community vulnerability across the MurrayDarling Basin (ABARE–BRS 2010), potential community economic resilience (Dinh et al. 2016), and adaptive capacity for farms (Nelson et al. 2009a). For the purposes of this report, the nested approach uses a ‘five capitals’ framework (chapter 2), as seen in the aforementioned studies.
The single and nested PCA approaches offer different advantages and disadvantages. The single PCA approach allows the data to have a greater role in driving the index and rankings. This requires fewer judgments by the analyst in specifying the details of the technique. However, it can complicate interpretation of the principal components. The nested approach imposes a conceptual structure, which can facilitate a more straightforward interpretation of the principal components. However, this can elevate the importance of variables that might not otherwise explain the largest share of the variance in the initial data, and requires a number of prior judgments by the analysts performing the PCA (for example, how variables are classified into subgroups, what the nature of the subindexes are, and which variables are excluded altogether).
This study examined both single and nested PCA approaches. Under each approach, PCA was conducted on indicators that were considered important to adaptive capacity. The index of adaptive capacity was created as a weighted sum of the retained principal components. Further details of both methods are described below. The Commission’s preferred method for the index is the single PCA approach, based on stakeholder consultation and the more intuitive results it produces, and these results are described in chapter 4. A comparison of results under each approach is provided in section E.4.
A variant of PCA involves ‘varimax rotation’, which changes the weights on each variable in each retained component and can aid in their interpretation. Varimax rotations were investigated for the current analysis but did not meaningfully improve interpretations of retained components. Therefore, unrotated principal components were used for the index.
Single PCA index construction
Under the single PCA approach for this study, all variables in the dataset are included in a single PCA, regardless of the capital domain the analysts considered they belonged to. Principal components from this PCA were retained based on the criteria described above.
The signs on principal component scores are arbitrary — a principal component explains the same amount of variation even if the signs on each region’s principal component score, and each indicator’s weight on the principal component, are reversed. As an example, if a principal component came out with a negative sign on the proportion of people who have completed at least year 12 education and a positive sign on the proportion of people with disability, reversing all the signs would not change the interpretation of the component. It would simply change it from a ‘negative’ direction to a ‘positive’ direction (that is, high scores, as opposed to low scores, will be considered good for adaptive capacity). However, signs can be important for aggregating multiple principal components together to create an aggregate index because they should all be contributing to the index in the appropriate direction.41 In this study, signs on retained principal components were reversed where appropriate so that a higher value of the principal component indicated greater adaptive capacity.
A further judgment was made on how to combine multiple retained principal components to form a single index of adaptive capacity. The retained principal components were standardised and then weighted according to the relative shares of variance explained by the components in the PCA. For example, if the first two principal components were retained for the index, and these accounted for 60 and 20 per cent of the total variance in the set of indicators respectively, then the first component was given a weight of after standardisation, and the second component was given a weight of (the actual weights are presented in section E.3). Similar approaches have been used in constructing indexes in past research where multiple principal components were retained (for example, Krishnan 2010; Nicoletti, Scarpetta and Boylaud 2000). This weighting approach ensures that the principal components that explain a greater share of variance in the initial dataset make a greater contribution to the index. Aggregate index scores were then standardised to have a mean of zero and standard deviation of one. The formulas used to create the index from the retained principal components are presented in box E.1.
Box E.1 Formulas for constructing the adaptive capacity index
A simple index can be constructed by running a single principal component analysis (PCA) on all of the indicators. Under this approach, retained principal components were standardised by subtracting the mean and dividing by the standard deviation, and then weighted according to the relative shares of variance explained by each component. As the mean of a principal component is zero by construction and the variance is simply the square of the standard deviation, the formula can be simplified as follows.
, and are the mean, standard deviation and variance respectively of the ’th principal component.
Nested PCA index construction
A simple index can be constructed by aggregating subindexes from several independently constructed PCAs. A subindex was formed for each capital domain by weighting retained principal components from the PCA for that capital domain, in a way similar to the single PCA approach. The adaptive capacity index was formed by taking the equally weighted sum of the standardised subindexes for each domain.
where refers to a particular capital domain, and is the total number of domains.
Raw index scores from both single PCA and nested PCA approaches were further standardised to have a mean of zero and standard deviation of one to aid comparability and interpretability.
An advantage of the single PCA approach is that it requires fewer prior judgments by the analyst about how variables in the original dataset should be assigned to subgroups prior to applying the PCA. On the downside, interpretation of principal components, and decisions about whether the signs on principal components should be reversed, can be difficult. This is because a principal component could be relatively strongly correlated with multiple indicators under different domains that might be expected to have opposite relationships with adaptive capacity.
Nested PCA index construction
Under the nested PCA approach, PCA is conducted separately on different subgroups of variables. The exact approach to nested PCA was revised following feedback from the referee (attachment A).
For this study, variables are categorised based on the five capitals framework described in chapter 2. Separate and independent PCAs were performed on variables in each subgroup — human, financial, physical, natural and social capital. As per the referee report, the aim then was to reduce the variables under each capital domain to only include those that had a relatively large correlation with the first principal component, and then retain only the first principal component from each domain to create the index. Specifically, after examining the results of each subPCA, the indicators that had a correlation of less than 50 per cent with the first principal component were dropped from the analysis. The PCAs for each subgroup were performed again with the reduced set of indicators, and the first principal component from each was retained.
In addition to these retained principal components, three other indicators were included in the index (which Commission staff considered were not relevant to any preexisting capital domain). These are measures of industry diversity, workingage population growth, and interregional mobility. These indicators together formed an ‘other’ domain. PCA was not conducted on the ‘other’ domain, but each indicator was given equal weighting within the domain.
As in the single PCA approach, signs on these other indicators and retained principal components were reversed where necessary so that a higher value indicated greater adaptive capacity. The index of adaptive capacity was formed through a weighted sum of these indicators and principal components. These raw index scores were then standardised to have a mean of zero and standard deviation of one (box E.1).
Another decision involves weighting each of the five capital domains and the ‘other’ domain in the index. Noble et al. (2003, pp. 35–36) describe various possible approaches to weighting scores across different domains to form an aggregate index. These include approaches driven by theory, empirics, policy relevance and consensus of opinion. In terms of adaptive capacity, the relative importance of a type of capital is likely to differ depending on the type of shock that a region is adjusting to. Balance between the five capitals is also an important consideration because minimum levels of one capital type might be needed to effectively use another type (Nelson et al. 2009a, p. 20). For these reasons, each domain was equally weighted. In effect, this means that each capital domain index was summed. Equal weighting approaches have been used in many other studies that construct indexes of similar concepts (for example, ABARE–BRS 2010; Dinh et al. 2016).
An advantage of the nested PCA approach is that it makes interpretation simpler, by aligning the subgroups to the analyst’s conceptual framework. The contributions of each factor to a capital domain can be examined, as well as the contribution of each capital domain to the overall index. However, indicators within capital domains that have few or no other indicators might also get disproportionately large weights in the overall index.