The above title is my response to a discussion that began with this email sent to be by Steve Roth:
Noah Smith had a great tweet recently, a real keeper for me [Roth].
Causation is correlated with correlation.
I would reword it:
Correlation correlates with causation. (Just not very much.)
And I wonder if the following corollaries are safe:
Non-correlation correlates (more strongly) with non-causation.
And/or:
Negative correlation correlates (much more strongly) with non-causation.
This in response to the old nostrum/saw that correlation does not imply causation.
Which has always seemed wrong to me. Of course it does! (Weakly.)
The problem is that âimplyâ is a very slippery word, so itâs a pretty useless nostrum.
Would be delighted to see a post poking at this.
I replied:
I will post something on this (at some point; weâre on a 1-2 month delay so most things donât appear right away) but my quick response is: Selection bias. If people start sending you random pairs of variables that happen to be highly correlated, sure, there might well be a connection between them, for example kidsâ scores on math tests and language tests are correlated, and this tells us something. But if someone is looking for a particular pattern, and then selects two variables that are correlated, thatâs another story. The great thing about causal identification is that itâs valid even if youâre looking to find a pattern. (Not completely, thereâs p-hacking and also you can run 100 experiments and only report the best one, etc., but thatâs still less of an issue than the fact that pure correlation does not logically tell you anything about causation. To put it another way: returning to Noahâs tweet: Correlation is surely correlated with causation in an aggregate sense, but if you take the subset of correlations that a particular motivated researcher is looking forâthen maybe not.
You could also see the above paragraph as a bit of common-sense reasoning. The expression âcorrelation does not imply causationâ is popular, and I think itâs popular for a reason, that it does capture a truth about the world.
I cc-ed Smith on this exchange and also Dan Kahan, who wrote:
For what itâs worth, my two variants would be:
1. Nothing other than correlation implies causation.
2. Correlation implies causation â except when it doesnât.Credit to D. Hume for #1 (at least for noticing that thereâs no other visible indicator of causation).
#2 is just what Andrew said: causation = correlation plus valid causal inference.
Again, the elephant in the room here is selection. People see enough random correlations that they can pick them out and interpret them how they like.
So if I had to put something on a bumper sticker (or a tweet), it would be:
Correlation does not even imply correlation
That is, correlation in the data you happen to have (even if it happens to be âstatistically significantâ) does not necessarily imply correlation in the population of interest.