Coronavirus und Lebenserwartung: Statistik Austria verwirrt

Der Standard berichtet:

Die Corona-Pandemie hat den stärksten Rückgang der Lebenserwartung seit Beginn der Aufzeichnungen 1951 ausgelöst. Während wir es in den letzten Jahren gewohnt waren, immer älter und älter zu werden, gibt es nun erstmals einen massiven Knick in der Lebenserwartungsstatistik.

Grund dafür ist eine sehr hohe Übersterblichkeit in den letzten Wochen des Jahres 2020. Neue Daten der Statistik Austria zeigen, dass im gesamten Vorjahr etwa zehn Prozent mehr Menschen starben als im Durchschnitt der vergangenen fünf Jahre. Insgesamt waren es rund 90.000. […]

Konkret heißt das, dass ein Mann, der heute in Österreich geboren wird, im Durchschnitt 78,9 Jahre lang leben wird. Bei einer Frau sind es durchschnittlich 83,7 Jahre. 

https://www.derstandard.at/story/2000123287763/statistik-austria-ueber-90-000-todesfaelle-2020-lebenserwartung-sinkt

Damit sank die Lebenserwartung im Vergleich zum Vorjahr bei Frauen um ein halbes Jahr, bei Männern um etwas mehr. Die Statistischen Ämter anderer Länder lieferten ähnlich verheerende Zahlen: die amerikanische Gesundheitsbehörde CDC meldete sogar einen Rückgang der Lebenserwartung von einem ganzen Jahr.

Diese Zahlen können nicht stimmen.

Experten schätzen, dass ein an COVID-19 Verstorbener im Schnitt 12 Jahre seiner Lebens verlor. Multipliziert man das mit den 8.500 Corona-Toten in Österreich kommt man auf 102.000 verlorene Lebensjahre. Bezogen auf die Gesamtbevölkerung von 8,86 Millionen ergibt das eine durchschnittliche Reduktion der Lebenserwartung von 0,01 Jahren. Das ist ein Rückgang von etwa vier Tagen, nicht sechs Monaten wie der “Standard” berichtet. Der von der Statistik Austria gemeldete Wert ist um das 40-fache zu hoch. Wie kann das sein?

Wie in diesem kurzen Artikel erklärt wird, gehen die Statistikämter bei der Schätzung der Lebenserwartung von einer wesentlichen Annahme aus: nämlich, dass das Sterberisiko in jeder Alterskohorte in Zukunft genauso bleibt wie im Jahr 2020. D.h. die Wahrscheinlichkeit, dass ein im Jahr 2020 geborener Mensch im Alter von X Jahren sterben wird, ist gleich der Sterbewahrscheinlichkeit eines X Jahre alten Menschen im Jahr 2020. Das ist eine sinnvolle Annahme in einem gewöhnlichen Jahr. Aber 2020 war kein gewöhnliches Jahr.

Mit anderen Worten: Die Statistik Austria geht implizit davon aus, dass sich die Corona-Pandemie von 2020 jedes Jahr genauso wiederholen wird. Die Mitarbeiter der Statistik Austria sind sich dieser Annahme natürlich bewusst, weshalb auch auf ihrer Website folgender der Hinweis steht:

Die für ein Kalenderjahr berechnete Lebenserwartung bei der Geburt gibt an, wie viele Jahre ein neugeborenes Kind im Durchschnitt leben würde, wenn sich die im Kalenderjahr beobachteten altersspezifischen Sterberaten in Zukunft nicht mehr ändern würden.

https://www.statistik.at/web_de/presse/125167.html

Nur das Problem ist: Die so geschätzte Lebenserwartung macht für das Jahr 2020 eben leider keinen Sinn. Und der Warnhinweis geht erwartungsgemäß in der medialen Berichterstattung völlig unter. Sogar der “Standard”, der für sich in Anspruch nimmt ein Qualitätsmedium zu sein, (des-)informiert seine Leser, dass ihre Lebenserwartung um 6 Monate gesunken sei, ohne auf die falsche Annahme, die diesem Wert zugrunde liegt, hinzuweisen.

Einigermaßen bizarr ist auch der Titel, den die österreichische Akademie der Wissenschaften für ihren Beitrag zu diesem Thema gewählt hat:

COVID VERRINGERT LEBENSERWARTUNG, STIEHLT ABER KEINE LEBENSJAHRE

Die durchschnittliche Lebenserwartung in Österreich ist laut vorläufigen Zahlen der Statistik Austria im Corona-Jahr 2020 um sechs Monate gesunken. Das heißt aber nicht, dass die Österreicher/innen jetzt weniger alt werden, erklärt Demograph Marc Luy von der ÖAW.

https://www.oeaw.ac.at/detail/news/covid-verringert-lebenserwartung-stiehlt-aber-keine-lebensjahre

Hä? Wenn COVID keine Lebensjahre stiehlt, wie kann sie dann die Lebenserwartung um ein halbes Jahr verringern? Wenn die Lebenserwartung nicht die zu erwartenden Lebensjahre misst, was misst sie dann? Und wozu braucht man dann diese “Lebenserwartung” überhaupt?

In einem normalen Jahr misst die Lebenserwartung das, was jeder glaubt, dass sie misst. Aber eben nicht in einer Pandemie. Die Statistik Austria täte gut daran, die Berechnung der Lebenserwartung für 2020 anzupassen.

(Hat tip to David Friedman, durch den ich auf das Problem aufmerksam wurde.)

Some descriptive COVID-19 regressions

Having raised the bar so incredibly high with my last post, I now want to bring it down again and show you some unsophisticated data analysis.

Every day now you can see people comparing countries on performance in this pandemic all over the place. How well ist Germany doing? How does the UK compare to France? And what about Sweden: should we have followed their hands-off approach?

All of these comparisons lack one fundamental ingredient: meaningful data. The data everybody is using (and which I will be using in a minute) is riddled with measurement issues. Most important among them is the issue of testing: who gets tested, how fast, how many gest tested – all of that varies from country to country and across time within a given country. Not even the death statistics are reliable as we learned only this week when the UK drastically corrected their number upwards.

But I thought to myself, what the heck. If everyone’s doing it, I might be forgiven for having some fun as well. And so, in between grading final exams, I pulled together some country-level data and ran some regressions.

It goes without saying that this analysis has some, shall we say, shortcomings. All I’m doing is using regressions to describe some patterns in the data. Although I did have some mental model when deciding which variables to include in my regressions, they were of the sort “I imagine X could have effect on COVID deaths” rather than any deep causal understanding of how the epidemic works (but, frankly, does anyone have that?)

So without further ado, here’s what I did. I took the data from the European Center for Disease Prevention and Control (ECDC), giving me daily new cases and new deaths for each country reporting those things, which I summed up until April 30th to get the cumulative cases and deaths. I then divided by population to get cases and deaths per capita. These are my dependent variables.

For my regressors I went on a wild hunt on the World Bank and OECD databases and downloaded everything that I thought would be interesting to regress on COVID-19. After some fooling around, I settled on the following two models:

Model 1: cumulative COVID-19 cases per capita (in logs)

The first variable (lrgdp_pc) here is PPP-adjusted GDP per capita (in logs). This is the single most important variable in “explaining” the number of cases: richer countries have more official cases. The relationship is 1:1, i.e. one percent more income is associated with one percent more cases. It is almost useless to speculate about the “causal channels” for this effect. If I were to guess, I’d say that rich countries got the virus earlier and perform more tests per capita and therefore detect more cases.

The second variable (pop65) is the share of population above the age of 65. We know that seniors are more susceptible to this disease, so any sensible model must take the age structure into account. It’s reassuring that the coefficient is positive and significant. I take this as a sanity check for my model.

The next two variables is population density (pop_dens) and share of urban population (urban). My “theory” here is that denser, more urban countries provide a more fertile environment for the virus to spread. Somewhat disappointingly population density seems to have no effect and urbanization only has a small one (a 1 percentage point higher urban share gives you 1.4% more cases per capita). And no, density and urbanization are not highly correlated (corr=0.17), glad that you’ve asked.

Lastly, I wanted to check if more open countries are more exposed. I tried to capture that with the trade share (exports plus imports divided by GDP). The answer seems to be a clear no. Being more open to international trade is not associated with more infections. In an alternative specification I checked if imports from China had a positive effect and was disappointed.

I direct your attention to the fact that the R-squared of this regression is 68.5%. I have seen papers published in decent journals with much worse goodness of fit given the sample size and number of regressors. Just saying.

Model 2: cumulative COVID-19 deaths per capita

Turning to coronavirus deaths, the first important “explanatory” variable is the number of cases (lcases_pc). Again, this is nothing more than a sanity check.

I then add all the variables from the previous model to see if they have an effect on deaths over and above the effect they have through the number of cases. Unsurprisingly, an older population has the expected positive effect on deaths: raising the share of old people by 1 percentage point raises deaths per capita by 11% (in addition to the effect through cases).

More surprising are the effects of population density and urbanization. It looks like, after controlling for the number of cases, being a denser, more urban country reduces the number of deaths. I suppose this can make sense: given the number of infections, living closer together and in cities means living closer to hospitals, which might improve the chances of getting timely and effective treatment. But this is getting dangerously close to over-interpretation of weak effect estimates (small, barely significant coefficients).

The last variable is the number of hospital beds per 1000 people. The estimated coefficient suggests that each additional bed per 1000 inhabitants lowers the number of deaths by about 15%. Austria has 7.37 beds per 1000 people, the European average is 5. So bringing all the countries of Europe to the level of Austria would cut the death rate by about 36%. That’s a big effect.

I also toyed around with various measures of health care spending (per capita or as a share of GDP). In all the regressions I checked, health spending had a positive effect, which I couldn’t make sense of. My best guess is that, conditional on hospital beds per capita, spending more on health is a sign that your health system is too expensive and inefficient which is associated both with more cases and more deaths. But it’s still kind of a head scratcher.

Excess Cases and Deaths

OK. Having run these regressions and found some interesting patterns, what else can we learn from then?

One thing is that the regression model provides a benchmark to evaluate how individual countries are doing. Admittedly, this is risky business, given the poor data quality. But I’m putting it out there nevertheless.

Below, I’m plotting the excess cases and excess deaths per capita for a number of countries. Excess cases is the difference between the actual cases and the number of cases predicted by the model. Excess deaths are calculated analogously. (Attentive readers will realize that these are just the regression residuals.) The vertical axis shows cases and deaths per 100,000 people.

Three countries stand out in terms of excess cases: Italy, UK and US. Their case numbers are far higher than what one would expect on the basis of their country characteristics.

The “worst performers” among the selected countries in terms of excess deaths are France, Britain and Italy.

China and Korea have negative excess cases and no excess deaths. That is, these countries have fewer cases (and neither fewer nor more deaths) than the model predicts.

Notice that Sweden has similar excess cases as Germany and Austria, but far higher excess deaths. Make of that what you will.

(Data file and STATA code are available on request.)

The case for rational expectations in COVID-19 modeling

British biologist Carl Bergstrom recently gave an interview to the Guardian on the topic of “bullshit”. In it, the interviewer asked Bergstrom about shortcomings of existing epidemiological models as well as their use (and misuse) by political decision makers.

[Guardian] If you had the ability to arm every person with one tool – a statistical tool or scientific concept – to help them understand and contextualize scientific information as we look to the future of this pandemic, what would it be?

[Bergstrom] I would like people to understand that there are interactions between the models we make, the science we do and the way that we behave. The models that we make influence the decisions that we take individually and as a society, which then feed back into the models and the models often don’t treat that part explicitly. Once you put a model out there that then creates changes in behavior that pull you out of the domain that the model was trying to model in the first place. We have to be very attuned to that as we try to use the models for guiding policy.

In the context of the coronavirus, the problem was this: Early models such as the one by the Imperial College in London predicted between 1.1 and 2.2 millions of Americans could die from COVID-19, depending on the severity of mitigation efforts. This eye-popping number jolted the political decision makers (Trump, Congress, the Governors, etc.) into action, locking down schools and businesses and issuing stay-home orders. The media publicity around the study probably scared many people which made them take the social distancing measures much more seriously. All of this probably helped in slowing the spread of the disease such that the same researchers had to revise their predictions downward only weeks later.

That is, the publication of the initial predictions changed the behavior of people which rendered those predictions obsolete.

Bergstrom seems to say that the problem here is with the general public. They don’t understand that the models rely on behavioral assumptions which no longer hold once people learn about the models’ predictions and adjust their actions accordingly.

But, with apologies to Shakespeare: The fault, dear Bergstrom, is not in the general public, but in your models!

The problem with those epidemiological models (at least with the SIR-types of models) is that some of their key parameters (such as the reproduction rate R0, for instance) depend, in various ways, on people’s expectations about the future path of the disease. If you don’t take that into account, your predictions will be way off.

And way off they were! Here’s the summary of a statistical evaluation of a model similar to the one used in the Imperial study:

In excess of 70% of US states had actual death rates falling outside the 95% prediction interval for that state (Figure 1)

The ability of the model to make accurate predictions decreases with increasing amount of data. (figure 2)

You might say that prediction is not the point with those models. Maybe their only purpose is to produce scary headlines to make people listen to the experts. But that is a weird proposition. If experts want the general public to take them more seriously, making wildly erroneous predictions seems like a bad strategy.

So how are we going to take people’s expectations into account in epidemiological models? Let’s see.

March: Imperial predicts 2 million deaths. Government imposes lockdown. People are scared and stay at home.

April: Imperial revises his model, now predicts 50,000 deaths. Government partially re-opens the economy. People cautiously start going out again.

May: Imperial revises his model, now predicts 200,000 deaths. Government re-imposes some lockdown measures. People are scared again.

June: Imperial revises his model, now predicts 75,000 deaths. Government opens up again. People relax again.

And so on until we have converged to a situation in which the number of deaths Imperial predicts is consistent with the government’s (and the people’s) expectations and actions.

Such a situation is what economists call a rational expectations equilibrium. I think that trying to model people’s expectations in a consistent way would improve the usefulness of epidemiological models. This is, of course, a tall order. But perhaps if economists, statisticians, and epidemiologists would put their heads together, we could move in this direction.

6 1/2 Economic Principles for the Pandemic

The Coronavirus Pandemic has fundamentally changed our world. But it hasn’t changed the validity of fundamental economic principles.

I suggest six and a half economic principles which I think are important to bear in mind during these times. Most of them were touched on by Christoph Kuzmics in his excellent series of posts. But I thought it would be worthwhile to state them in a pointed, if slightly oversimplified, way:

  1. People still respond to incentives.

So, for instance, allowing small businesses to re-open earlier than large ones means there will be more small and fewer large businesses. Paying higher unemployment benefits means there will be more unemployed people. Requiring people to wear face masks when doing X, but not when doing Y, means people will do more X and less Y. 

2. World output still equals world income still equals world expenditure.

If you shut down X% of the world economy, the world will produce X% fewer goods and services, will have X% less income, and will spend X% less. The idea that we can somehow preserve everyone’s income and spending while shutting down the production of (most) goods runs into this basic adding-up constraint. The recession is the price we pay for the lockdown which at the moment is the only weapon we have to fight the pandemic (until we have a vaccine or medical treatments). Government transfers can change who gets to consume the goods, but they don’t change the amount of goods there are. (But also see principles 4 and 6 1/2!)

3. The price mechanism is still the best way of allocating scarce resources.

If the demand for toilet paper exceeds the supply at the current price, there are two options: either you let the price of toilet paper rise or you create a shortage. Allowing a higher price is by far the better option. A higher price gives producers of toilet paper an incentive to produce more of it and gives consumers an incentive to use it more carefully and economically. The same applies to face masks, ventilators, and yes, even to hospital beds.

4. Economic inequality is still best addressed by lump-sum transfers.

The pandemic will lead to more economic inequality, because the poor are hit much harder both by the disease itself (low income correlates with worse health conditions) and by the lockdown (most low-wage jobs can’t be done from home). The best way to address this is to give an unconditional transfer to all households (a.k.a. „basic income“) financed by a tax on something that is in fixed supply (at least in the short run): a once-off wealth tax for example. The second fundamental theorem of welfare economics still applies: we can achieve any desired allocation of scarce goods (including toilet paper, face masks and hospital beds) by lump-sum taxes and transfers while letting the market do its job.

5. The government budget constraint still exists.

Every euro the government spends needs to come from any of three sources: from taxes, from borrowing, or from printing money. But in the end, these are all just different forms of taxation. Government borrowing is delayed taxation: the government will need to pay back the debt with future taxes. Printing money is a tax on nominal wealth. 

6. Public goods problems still exist.

Enforcing the lockdown requires the threat (and sometimes use) of force. (That’s why it’s called enforcing). Staying at home is a prisoner-dilemma situation. If nobody is policing the lockdown, going out of the house is a dominant strategy (i.e. it is best irrespective of whether other people stay at home or go out). Social stereotyping of defectors (public shaming corona-party-goers, for instance) can go some way, but is also just another kind of force. Some civil liberties won’t be upheld during the lockdown.

6 1/2. Government spending still has a multiplier effect (but it is probably small).

If the government buys more goods, some otherwise unemployed workers will be employed making those goods. Those workers will themselves be able to buy more goods, creating further jobs for otherwise unemployed workers, and so on. However, the multiplier logic doesn’t work quite as well during the lockdown, because some workers simply cannot go to work. Government spending can help prop up demand in those sectors that aren’t shut down, but as long as many labor-intensive industries such as construction are closed, the multiplier will be only slightly higher than 1.

Coronavirus-Dunkelziffer: alternative Schätzung

Wie ich in meinem Kommentar auf Christophs letzten Blogpost angemerkt habe, gibt es noch einen anderen Weg die Dunkelziffer der Coronavirus-Infektionen zu schätzen.

Für diese alternative Schätzung benötigt man zwei Inputs

  1. Die Infektionsfatalitätsrate (IFR): das Verhältnis der Todesfälle zu den tatsächlich Infizierten
  2. Die durchschnittliche Dauer der tödlich verlaufenden Erkrankungen

Zu beiden Inputs gibt es mittlerweile einige wissenschaftliche Erkenntnisse.

Die derzeit beste (so weit ich weiß) Schätzung der IFR kommt aus einer Zufallsstichprobe im deutschen Ort Gangelt. Dort wurden bei 14% der getesteten Einwohner Antikörper festgestellt, was auf eine vergangene Infektion hinweist. Nur 2% davon waren in der offiziellen Statistik erfasst. Die Fatalitätsrate lag bei 0,37%. Andere Schätzungen scheinen ein ähnliches Ergebnis zu erzielen. Das Centre for Evidence-Based Medicine an der Universität Oxford kommt daher zu dem Schluss:

“Taking account of historical experience, trends in the data, increased number of infections in the population at largest, and potential impact of misclassification of deaths gives a presumed estimate for the COVID-19 IFR somewhere between 0.1% and 0.36%”

Für den zweiten Input, die durchschnittliche Dauer der fatalen Infektionen, habe ich folgende Studie in “The Lancet”, einer führenden medizinischen Fachzeitschrift, gefunden:

“Using data on 24 deaths that occurred in mainland China and 165 recoveries outside of China, we estimated the mean duration from onset of symptoms to death to be 17,8 days”

Auf Basis dieser zwei Zahlen kann man die tatsächliche Infektionszahl wie folgt schätzen:

Tatsächlich Infizierte am Tag t = gemeldete Todesfälle am Tag t+18 / IFR

Zum Beispiel: Österreich hatte zum 14. April insgesamt 368 Todesfälle. Wenn wir eine IFR von 0,36% unterstellen, impliziert das, dass wir am 27. März, also vor 18 Tagen, ziemlich genau 100.000 Infizierte hatten. Davon hatten wohl sehr viele gar keine oder nur schwache Symptome. Eine Studie aus Island legt nahe, dass ca. die Hälfte der Infektionen symptomatisch verläuft. Die offizielle Statistik meldete am 27. März insgesamt 7,029 Infektionen.

Das bedeutet die Dunkelzifferquote, also das Verhältnis der tatsächlichen zu den gemeldeten Infizierten, lag Ende März bei ca. 14.

Wem diese Zahl zu hoch erscheint, sollte bedenken, dass eine IFR von 0,36% am oberen Ende des von CEBM angegebenen Intervalls liegt, d.h. meine Schätzung ist eine konservative.

Nachstehende Grafiken zeigen den Verlauf der gemeldeten und implizierten Fälle sowie die Dunkelzifferquote im März. Daraus geht hervor, dass Anfang März die tatsächlichen Infektionszahlen wahrscheinlich um mehr als das 100-fache höher waren als die offiziell bestätigten. Das ist nicht verwunderlich, weil zu dem Zeitpunkt die Testkapazität noch sehr gering war. Seitdem ist die Dunkelzifferquote stetig gesunken, was wohl an schnelleren und breiteren Tests liegt.

Caveat: Niemand sollte die hier dargestellten Zahlen als die “wahren” Fallzahlen interpretieren. Es handelt sich hier lediglich um Gedanken- und Zahlenspiele auf der Basis sehr ungenauer Daten.

Die ökonomischen Konsequenzen des Coronavirus – Einleitung

Dies ist der erste einer Reihe von Blogeinträgen, in denen ich mithilfe meiner Kolleginnen und Kollegen am Economics Department an der Uni Graz Überlegungen zu den ökonomischen Konsequenzen des Coronavirus und der gesetzten Maßnahmen, vor allem für Österreich, anstellen möchte. Ich frage mich heute aber erst einmal kurz, wie die Situation (Anzahl der Coronainfizierten und –toten) in Österreich heute (Sonntag 5. April) wohl ausgesehen hätte, wenn keinerlei Maßnahmen getroffen worden wären. Ich werde auch erklären, wie ich zu meinen Ergebnissen komme.

Continue reading