Lies, Damned Lies, and Data Mining
by Clifford Asness, Ph. D. AQR Capital Management, Inc.
We are the whipping boy for a recent article on the dangers of data mining in our field. And the whipping is delivered largely based on an unsupported shot taken by my frequent foil and sparring partner, Rob Arnott. Before I take on this attack1 we need to back up a bit.
Data mining, that is searching the data to find in-sample patterns in returns that are not real but random, and then believing you’ve found truth, is a real problem in our field. Random doesn’t tend to repeat so data mining often fails to produce attractive real life returns going forward. And given the rewards to gathering assets, often made easier with a good “backtest,” the incentive to data mine is great. We’ve talked about it endlessly for years and written on it many times. But we’re not nihilists who believe everything is data mining.2,3 We are more likely to believe in-sample evidence when it’s also accompanied by strong out-of-sample evidence (across time, geography, and asset class4) and an economic story that makes sense.5 In that case, and barring exceptionally convincing evidence something has changed, we not only believe in it but will stick to it like grim death through its inevitable ups and downs. After many years of research and managing portfolios, we believe there are at least four widely known types of factors that are real (that is, they don’t just look good because of data mining).6,7,8 People are often shocked that we believe in only a few core investment concepts – somehow they think there are many more. Nope. For instance: No small firm effect. No January effect. No Super Bowl effect – though if you do believe that indicator, you should be shorting stocks this year because of Tom Brady; sorry if that’s deflating.