On Backtesting: An All-New Chapter from our Adaptive Asset Allocation Book
by Adam Butler, ReSolve Asset Management, via GestaltU
If you’ve been a regular reader of our blog, you already know that we recently published our first book Adaptive Asset Allocation: Dynamic Portfolios to Profit in Good Times – and Bad. As of this writing, it still stands as the #1 new release in Amazon’s Business Finance category. We’re pretty psyched about that.
In our book, we spent a great deal of time summarizing the research posted to GestaltU over the years. We did this in order to distill the most salient points, and also to tie seemingly disparate topics together into a cohesive narrative. Our book covers topics ranging from psychology to cognitive biases to asset valuations to retirement income planning to (of course) investment strategies. The book was meant to stand as a single source for what ought to matter to modern investors. As one ad for our book reads:
We hope we succeeded in doing that, but like any greatest hits album, we also included some fresh, now “tracks” in our book! And we thought we’d share one of them with you today. So without further ado, here’s Chapter 37, on The Usefulness and Uselessness of Backtests.
Chapter 37: The Usefulness and Uselessness of Backtests
There is a Grand Canyon-sized gap between the best and worst that backtesting has to offer. And since this book’s findings on the value of Adaptive Asset Allocation are largely based on modeled investment results, it’s only proper to include an essay on the various sources of performance decay.
The greatest fear in empirical finance is that the out of sample results for a strategy under investigation will be materially weaker than the results derived from testing. We know this from experience. When we first discovered systematic investing, our instincts were to find as many ways to measure and filter time series as could fit on an Excel worksheet. Imagine a boy who had tasted an inspired bouillabaisse for the first time, and just had to try to replicate it personally. But rather than explore the endless nuance of French cuisine, the boy just threw every conceivable French herb into the pot at once.
To wit, one of our early designs had no less than 37 inputs, including filters related to regressions, moving averages, raw momentum, technical indicators like RSI and stochastics, as well as fancier trend and mean reversion filters like TSI, DVI, DVO, and a host of other three and four letter acronyms. Each indicator was finely tuned to optimal values in order to maximize historical returns, and these values changed as we optimized against different securities. At one point we designed a system to trade the iShares Russell 2000 ETF (IWM) with a historical return above 50% and a Sharpe ratio over 4.
These are the kinds of systems that perform incredibly well in hindsight and then blow up in production, and that’s exactly what happened. We applied the IWM system to time US stocks for a few weeks with a small pool of personal money, and lost 25%.
Degrees of Freedom
The problem with complicated systems is that they require you to find the exact perfect point of optimization in many different dimensions – in our case, 37. To understand what we mean by that, imagine trying to create a tasty dish with 37 different ingredients. How could you ever find the perfect combination? A little more salt may bring out the flavor of the rosemary, but might overpower the truffle oil. What to do? Add more salt and more truffle oil? But more truffle oil may not complement the earthiness of the chanterelles.