Saturday 9 March 2019

The Spirit Level ten years on

Kate Pickett and some science

The Spirit Level turned ten this month. A minor publishing sensation when it was published in March 2009, it used a series of scatter plots to make the case that income inequality is a major driver of a range of health and social problems. The authors, Richard Wilkinson and Kate Pickett, argue that these problems are directly linked to the rate of inequality and will rise or fall as inequality rises and falls.

I argued in The Spirit Level Delusion (2010) that most of Wilkinson and Pickett's statistical correlations were the result of selection bias in the countries, criteria and datasets used by the authors. They looked at the 50 richest countries in the world (on the basis that these societies were wealthy enough to not benefit from further growth whereas outcomes in poorer countries would be confounded by the effect of GDP). However, they only used 23 of these countries - often fewer - in their analysis. The poorest of them was Portugal but several countries with a higher GDP than Portugal were excluded for no good reason. When I added these countries, many of the associations with inequality disappeared (see footnote 1).

I won't go into the other flaws in the book here, suffice it to say that if The Spirit Level hypothesis is correct, it should apply across space and time. Most of the data used by Wilkinson and Pickett (henceforth W & P) was published between 2000 and 2004. If their correlations are proof of a golden rule about inequality - 'a theory of everything', as the BBC put it - similar associations with inequality should emerge if we use data from 2010 to 2014 or any other period. In principle, it should be possible for the authors to publish a new edition of their book every few years showing consistent trends.

They did publish a sequel to The Spirit Level last year (I reviewed it here), but there was no attempt to create their graphs with new data. And so, on the tenth anniversary of the book's publication, I thought it would be interesting to put some of their most striking claims to the test, applying W & P's own methodology to up-to-date statistics.

I gathered recent income inequality statistics from the UN's Human Development Report (2018 edition). This is the same source used by W & P in The Spirit Level. For reasons that are never made clear, W & P preferred to use the 80/20 measure rather than the more usual Gini coefficient when comparing countries. Both measures give broadly the same results, but the Human Development Report no longer uses the 80/20 measure so I have used the Gini instead.

As you might expect, there have been changes in the rates of inequality in the last 10-15 years. The Scandinavian countries continue to have some of the lowest rates although inequality has risen in all of them except Finland. Inequality in Japan is significantly higher than was reported in The Spirit Level. Rates have also increased in Israel, Ireland, Spain and the USA, but fallen in the UK, Singapore and Belgium (see footnote 2).

Life expectancy

The Human Development Report (HDR) was also the source of life expectancy figures used by W & P. These formed the basis of perhaps their most famous claim, which Richard Wilkinson had been making since 1992, that life expectancy is directly linked to income inequality and that, therefore, inequality is bad for health.

As I noted in The Spirit Level Delusion, W & P opted to use the 2004 edition of the HDR for their life expectancy figures despite using the 2006 edition elsewhere. I believe this is because they would have been unable to find a statistically significant association with inequality had they used the figures from the more recent edition (p-value = 0.116996). Using the 2004 data, they were able to achieve statistical significance in the graph shown below (p-value = 0.031, r2 = 0.20), although the association disappears when countries such as South Korea are added to the analysis.

In The Spirit Level Delusion, I showed that no such correlation existed if one used later editions of the HDR. Using the most up-to-date figures for inequality and life expectancy, we can see that there is still no correlation, even if we confine the analysis to the 23 countries selected by W & P (R=0.02, r2=0.0004, p-value=0.93).

If we add the four countries that are unequivocally richer than Portugal and were excluded from The Spirit Level for no good reason (South Korea, Hong Kong, Slovenia and the Czech Republic (now known as Czechia)), there is a statistically significant association with inequality but it is in the opposite direction to that predicted by The Spirit Level hypothesis, with greater inequality correlating with longer life expectancy (r2=0.145, R=0.385, p-value=0.0495).


After discussing life expectancy, W & P dedicate a chapter to the purported link between inequality and obesity. Again, they produce a graph showing a statistical correlation (see below). The correlation is highly dependent on the low rate in Japan and the high rate in the USA. It strangely excludes Singapore, which has a low rate of obesity and a high rate of inequality. If Singapore and the other two rich Asian countries are included, the positive association disappears (see The Spirit Level Delusion).

There are also question marks over some of the obesity estimates used by W & P. Internationally comparable obesity statistics were hard to come by when W & P wrote their book and they had to resort to a range of largely self-reported figures, some of which dated back to the mid-90s. Methods have since improved and it is possible to get a more accurate picture. In the graph below, I use figures from the World Health Organisation, except for Hong Kong and South Korea for which figures from Hong Kong's Centre for Health Protection and the OECD are taken respectively. 

Limiting ourselves to W & P's 23 countries, there is no statistically significant relationship (r2=0.0461, R=0.2147, p-value=0.33).

With all the countries included, the correlation is weaker still (r2=0.008, R=0.09, p-value=0.65).
Mental health disorders

Internationally comparable figures for mental illness prevalence were also patchy when W & P wrote their book. The graph they use to show that inequality drives mental health disorders (see below) suffers from multiple flaws. It only shows twelve countries, it excludes Singapore (again) despite W & P having included it in an earlier version of the graph (published in Oliver James' book Affluenza) and it cherry-picks figures from several different surveys which produce notably different results and are not comparable (see pp. 38-40 of The Spirit Level Delusion).

In the graphs below, I use figures from Our World In Data based on statistics from the Global Health Data Exchange (no figures are available for Hong Kong). Gathering reliable data on the prevalence of mental health disorders continues to pose problems (which the authors discuss here) but this dataset is much better than the pick-and-mix selection presented in The Spirit Level (and reproduced in The Inner Level - such is the importance of this purported finding to their hypothesis).

Using the latest inequality figures and the best prevalence data on mental disorders, there is absolutely no association between the two variables. This is true regardless of whether we use W & P's 23 countries (r2=0.069, R=-0.26, p-value=0.75)...

... or use the slightly expanded cohort (r2=0, R=0, p-value=1).


A further claim in The Spirit Level is that inequality drives violence - murder, in particular.

As with W & P's claim about obesity, the statistical evidence for this claim relies heavily on the USA being an outlier. There is no correlation among the other 22 countries and, as I showed in The Spirit Level Delusion, there is no correlation when the full complement of rich countries is included.

Using recent homicide statistics from the UN Office on Drugs and Crime, we can see that there is no statistically significant association between inequality and homicide, even if we confine our analysis to W & P's 23 countries (r2=0.14, R= 0.37, p-value= 0.08). The inclusion of the other countries makes the correlation even weaker (r2=0.06, R=0.25, p-value=0.21).

Incidentally, although the USA continues to be a huge outlier among rich societies for homicide, its murder rate is lower than it was when W & P wrote The Spirit Level, contrary to their implicit prediction. Faced with growing inequality and a falling murder rate, W & P clutched at the straw of a slight upturn in the homicide rate in 2005-06. The murder rate had risen from 5.5 per 100,000 to 5.7 per 100,000 and W & P cited this as evidence that the effect of inequality was finally manifesting itself. I was a false dawn, however, and by 2014 it had dropped to 4.7 per 100,000. Although it has since jumped to 5.3 per 100,000, it remains lower than it was when W & P claimed that there is a 'reasonable match' between the rate of homicide and the rate of inequality. As the graph below shows - with homicides in red - there isn't (see footnote 3).

Teen births

W & P made the same rash mistake when discussing teen births in the USA. Their hypothesis suggests that teen pregnancies and teen births should be getting more common as income inequality grows. Alas for them, teen births had fallen to an all-time low when they started writing their book, but they took solace from another small upwards blip and announced that, in addition to the murder rate rising, 'in 2006, the teenage birth rate also started to rise again'. The birth rate for teenagers aged 15-19 rose from 40.5 per 1,000 females to 41.9 births per 1,000. It was the first rise in fifteen years but it was not the herald of an inequality-induced epidemic of teen pregnancies. By 2017, the rate had fallen to just 18.8 per 1,000.

The Spirit Level found a correlation between income inequality and teen births. Using recent teen birth figures from the World Bank and looking at W & P's 23 countries, there is still a statistically significant relationship (r2=0.247, R=0.497, p-value=0.016). 

However, this seems to be due to the selection of countries. When the four wealthy countries that were excluded from The Spirit Level are added, the correlation disappears (r2=0.06, R=0.24, p-value=0.219).

Infant mortality

Finally, I looked at another correlation in The Spirit Level that seemed reasonably robust at first glance.

As I showed in Chapter 5 of The Spirit Level Delusion, this is another example of a correlation that disappears when the full complement of countries is analysed. In that chapter, I was less interested in the statistical claim than the causal mechanism. There seems to be no practical way in which the 'psychosocial' impact of modest differences in income inequality could cause the birth defects and congenital abnormalities that are at the root of most infant deaths in rich countries.

Looking at the evidence anew, I use infant mortality figures from the most recent HDR (except for Hong Kong where the figure comes from Hong Kong's Department of Health because no figure is given in the HDR). There is no statistically significant relationship with inequality regardless of whether we study W & P's selection of countries (r2=0.15, R=0.39, p-value=0.07) or the expanded cohort (r2=0.045, R= 0.21, p-value=0.29).

In summary, most of the biggest claims made by Wilkinson and Pickett in The Spirit Level look even weaker today than they did when the book was published. Only one of the six associations stand up under W & P's own methodology and none of them stand up when the full range of countries is analysed. In the case of life expectancy - the very flagship of The Spirit Level - the statistical association is the opposite of what the hypothesis predicts.

If The Spirit Level hypothesis were correct, it would produce robust and consistent results over time as the underlying data changes. Instead, it seems to be extremely fragile, only working when a very specific set of statistics are applied to a carefully selected list of countries.

Footnote 1
W & P's justification for leaving so many countries out of the analysis is twofold. Some countries don't have inequality data and some countries are tax havens (and therefore have distorted inequality data). The first justification is bullet-proof and the second is at least arguable, but in their efforts to avoid tax havens, they simply assume that all countries with a population of under three million is a tax haven. This makes no sense. Not only does it allow two countries that are arguably tax havens to be included (Ireland and Switzerland), but it excludes countries like Slovenia that are clearly not tax havens. Slovenia is one of the most equal countries in the world and therefore should be a star performer. It should at least be in the analysis. There is no justification at all for excluding Czechia, Hong Kong and the Republic of Korea.

Footnote 2
Inequality figures for New Zealand, Hong Kong and Singapore are not included in the UN report. Gini coefficients for these countries come from the New Zealand government, Singapore's Ministry of Finance and Oxfam respectively. These estimates are similar to estimates from other sources. All Gini coefficients are post-tax and benefits.

The change in Japan's rate of inequality since 2006 is the most striking difference between the two sets of figures. However, it has always been debatable whether its rate of inequality was as low as it was shown in The Spirit Level. Some datasets showed Japan to have quite an average rate of inequality even in 2006.

Foot note 3
The period between 1910 and 1960 is a nice example of correlation not equalling causation. The two variables seem to be moving in broadly the same direction, initially rising and then falling sharply, with homicides slightly lagging behind inequality. In fact, inequality and homicide rise and fall for very different reasons. The murder rate rose during Prohibition, peaking in the early 1930s just before alcohol was re-legalised. Inequality rose until 1929 when the Wall Street Crash put it into reverse. The fact that these two events happened at around the same time is simply a coincidence (although the economic depression gave the government a reason to legalise - and thus tax - alcohol again). From the 1960s, the two trends go in completely different directions.

All R-values are Pearson Correlation Coefficients. All thresholds for statistical significance tests are p < .05.

Read the extra tenth chapter of The Spirit Level Delusion for free here.


Divalent said...

Wouldn't it make sense to weight each country by their population in the analysis? Or has that been done?

Christopher Snowdon said...

No, it wouldn't (and it hasn't). I've been asked this a couple of times before. I don't understand where the idea comes from or why anyone would think it would be a sensible thing to do.

Anonymous said...

Slightly off-topic and I don't want to derail, but if you look at wealth inequality as distinct from income inequality, the scandi's end up having some of the highest scores. H/T to Rory Sutherland.
Do you have any thoughts on this? Is it a "true" result do you think or an artefact?
I'm not sure if imputed rent is included in income numbers. Imputed rent, derived from home ownership, is a national statistic, but not often used, and may explain the difference perhaps?
An older population of home owners, combined with a younger population of renters (which sounds eerily familiar), might have a wealth disparity, that is larger than the income difference perhaps.
Anyway, pure speculation but intrigued if you had any ideas.

Procrustes said...

Yes, I agree that you should not weight by population. You are not scoring the (rich) world’s performance, you are scoring the performance of each country as an individual system. So they should have equal weights, one per system (economy).

A great idea updating the data. It would be great if in a few months time you could do a post on how much traction this update has received in the media. I loved “The Spirit Level Delusion” when it first came out but I found that a lot of the lefties I knew had either never heard of it or dismissed it out of hand (without having read it).

Unknown said...

there is a difference between correlation and causality - science 1.01