by Adam Butler, GestaltU

At GestaltU we see ourselves as incrementalists. We arenât so much prone to true quantum leaps in thinking, but we excel at finding novel ways to apply othersâ brilliant concepts. In other words, we appreciate the fact that, for the most part, we âstand on the shoulders of giantsâ.

There are of course some true giants in the field of portfolio theory. Aside from timeless luminaries like Markowitz, Black, Sharpe, Thorpe and Litterman, we perceive thinkers like Thierry Roncalli, Attilio Meucci, and Yves Choueifetay to be modern giants. We also admire the work of David Varadi for his contributions in the field of heuristic optimization, and his propensity to introduce concepts from fields outside of pure finance and mathematics. Also, Michael Kapler has created a truly emergent phenomenon in finance with his Systematic Investor Toolkit, which has served to open up the previously esoteric field of quantitative finance to a much wider set of practitioners. I (Adam) know Iâve missed many others, for which I deeply apologize and take full responsibility. I never was very good with names.

In this article, we would like to integrate the cluster concepts we introduced in our article on Robust Risk Parity with some ideas proposed and explored by Varadi and Kapler in the last few months (see here and here). Candidly, as so often happens with the creative process, we stumbled on these ideas in the process of designing a Monte-Carlo based robustness test for our production algorithms, which we intend to explore in greater detail in a future post.

**The Curse of Dimensionality**

In a recent article series, Varadi and Kapler proposed and validated some novel approaches to the âcurse of dimensionalityâ in correlation/covariance matrices for high dimensional problems with limited data histories. Varadi used the following slide from R. Gutierrez-Osuna to illustrate this concept.

Figure 1. Curse of Dimensionality

Source: R. Gutierrez-Osuna

The âcurse of dimensionalityâ sounds complicated but is actually quite simple. Imagine you seek to derive volatility estimates for a universe of 10 assets based on 60 days of historical data. The volatility of each asset is held in a 1 x 10 vector, where each of the 10 elements of the vector holds the volatility for one asset class. From a data density standpoint, we have 600 observations (60 days x 10 assets) contributing to 10 estimates, so our data density is 600/10 = 60 pieces of data per estimate. From a statistical standpoint, this is a meaningful sample size.

Now letâs instead consider trying to estimate the variance covariance matrix (VCV) for this universe of 10 assets, which we require in order to estimate the volatility of a portfolio constituted from this universe. The covariance matrix is symmetrical along the diagonal, so that values in the bottom left half the matrix are repeated in the upper right half. So how might we calculate the number of independent elements in a covariance matrix with 10 assets?

For those who are interested in such things, the generalized formula for calculating the number of independent elements of a tensor of rank M with N elements is:

For a rank 2 tensor (such as a covariance matrix) the number of independent elements is:

Therefore, accounting for the diagonal, the covariance matrix generates (10 * 11) / 2 = 55 independent pairwise variance and covariance estimates from the same 600 data points. In this case, each estimate is derived from an average of 600/55 = 10.9 data points per estimate.

Now imagine projecting the same 60 days into a rank 3 tensor (like the 3 dimensional cube in the figure above), like that used to derive the third moment (skewness) of a portfolio of assets. Now we have 10 x 10 x 10 = 1000 elements. The tensor is also symmetrical along each vertex (each corner of the cube is symmetrical), so we can calculate the number of independent elements using the generalized equation above, which reduces to the following expression for rank=3:

Plugging in N=10, we easily calculate that there are (10 * 11 * 12)/6 = 220 *independent *estimates in this co-skewness tensor. Given that we have generated these estimates from the same 600 data points, we now have a data density of 600/220 = 2.7 pieces of data per estimate.

You can see how, even with just 10 assets to work with, to generate meaningful estimates for covariance, and especially higher order estimates like co-skewness and co-kurtosis (data density of 600/6500 = 0.09 observations per estimate), the amount of historical data required grows too large to be practical. For example, to achieve the same 60 data points per estimate for our covariance matrix as we have for our volatility vector would require 60*55 / 10 = 330 days of data per asset.

### Decay vs. Significance

In finance, we are often faced with a tradeoff between informational decay (or availability for testing purposes) and estimation error. On the one hand, we need a large enough data sample to derive statistically meaingful estimates. But on the other hand, price signals from long ago may carry less meaningful information than near term prices signals.

For example, a rule of thumb in statistics is that you need at least 30 data points in a sample to test for statistical significance. For this reason, when simulating methodologies with monthly data, many researchers will use the past 30 months of data to derive their estimates for covariance, volatility, etc. While the sample may be meaningful from a density standpoint (enough data points to be meaningful), it may not be quite as meaningful from an âeconomicâ standpoint, because price movements 2.5 years ago may not materially reflect current relationships.

To overcome this common challenge, researchers have proposed several ways to reduce the dimensionality of higher order estimates. For example, the concept of âshrinkageâ is often applied to covariance estimates for large dimensional universes in order to âshrinkâ the individual estimates in a covariance matrix toward the average of all estimates in the matrix. Ledoit and Wolf pioneered this domain with their whitepaper, Honey I Shrank the Sample Covariance Matrix. Varadi and Kapler explore a variety of these methods, and propose some novel and exciting new methods in their recent article series. Overall, our humble observation from a these analyses and a quick survey of the literature is that while shrinkage methods help overcome some theoretical hurdles involved with time series parameter estimation, empirical results demonstrate mixed practical improvement.

### Cluster Shrinkage

Despite the mixed results of shrinkage methods in general, we felt there might be some value in proposing a slightly different type of shrinkage method which represents a sort of âcompromiseâ between traditional shrinkage methods and estimates derived from the sample matrix with no adjustments. The compromise arises from the fact that our method introduces a layer of shrinkage that is more granular than the average of all estimates, but less granular than the sample matrix, by shrinking toward clusters.

Clustering is a method of dimensionality reduction because it segregates assets into groups with similar qualities based on information in the correlation matrix. As such, an asset universe of several dozens or even hundreds of securities can be reduced to a handful of significant moving parts. I would again direct readers to a thorough exploration of clustering methods by Varadi and Kapler here, and how clustering might be applied to robust risk parity in our previous article, here.

Figure 2 shows the major market clusters for calendar year 2013 and year-to-date 2014 derived using k-means, and where the number of relevant clusters is determined using the percentage of variance method (p>0.90) (find code here from Kapler).

Figure 2. Major market clusters in 2013-2014

In this universe there appear to have been 4 significant clusters over this period, which we might broadly categorize thusly:

- Bond cluster (IEF, TLT)
- Commodity (GLD, DBC)
- Global equity cluster (EEM,EWJ,VGK,RWX,VTI)
- U.S. Real Estate cluster (ICF)

Now that we have the clusters, we can think about each cluster as a new asset which captures a meaningful portion of the information from each of the constituents of the cluster. As such, once we choose a weighting scheme for how the assets are weighted inside each cluster, we can now form a correlation matrix from the 4 cluster âassetsâ, and this matrix will contain a meaningful portion of the information contained in the sample correlation matrix.

Figure 3. Example cluster correlation matrix

Once we have the cluster correlation matrix, the next step is to map each of the original assets to its respective cluster. Then we will âshrinkâ each pairwise estimate in the sample correlation matrix toward the correlation estimate derived from the assetsâ respective clusters. Where two assets are from the same cluster, we will shrink the sample pairwise correlation toward the *average* of all the pairwise correlations between assets of that cluster.

An example should help to cement the logic. Letâs assume the sample pairwise correlation between IEF and VTI is -0.1. Then we would shrink this pairwise correlation toward the correlation between the clusters to which IEF (bond cluster) and VTI (global equity cluster) respectively belong. From the table, we can see that the correlation between the bond and global equity clusters is 0.05, so the âshrunkâ pairwise correlation estimate for IEF and VTI becomes mean(-0.1, 0.05) = -0.025.

Next letâs use an example of two assets from the same cluster, say EWJ and VTI which both belong to the global equity cluster. Letâs assume the sample pairwise correlation between these assets is 0.6, and that the average of all pairwise correlations between all of the assets in the global equity cluster is 0.75. Then the âshrunkâ pairwise correlation estimate between EWJ and VTI becomes mean(0.6, 0.75) = 0.675.

### Empirical Results

We have coded up the logic for this method in R for use in Kaplerâs Systematic Investor Toolback backtesting environment. The following tables offer a comparison of results on two universes. We ran minimum risk or equal risk contribution weighting methods with and without the application of our cluster shrinkage method, using a 250 day lookback window. All portfolios were rebalanced quarterly.

EW = Equal Weight (1/N)

MV = Minimum Variance

MD = Maximum Diversification

ERC = Equal Risk Contribution

MVA = David Varadiâs Heuristic Minimum Variance Algorithm

Results with cluster shrinkage show a .CS to the right of the weighting algorithm at the top of each performance table.

Table 1. 10 Global Asset Classes (DBC, EEM, EWJ, GLD, ICF, IEF, RWX, TLT, VGK, VTI)

Data from Bloomberg (extended with index or mutual fund data from 1995-)

Table 2. 10 U.S. sector SPYDER ETFs (XLY,XLP,XLE,XLF,XLV,XLI,XLB,XLK,XLU)

Data from Bloomberg

We can make some broad conclusions from these performance tables. At very least we have achieved golden rule number 1: first, do no harm. Most of the CS methods at least match the raw sample versions in terms of Sharpe ratio and MAR, and with comparable returns.

In fact, we might suggest that cluster shrinkage delivers meaningful improvement relative to the unadjusted versions, producing a noticeably higher Sharpe ratio for minimum variance, maximum diversification, and heuristic MVA algorithms for both universes, and for ERC as well with the sector universe. Further, we observe a material reduction in turnover as a result of the added stability of the shrinkage overlay, especially for the maximum diversification based simulations, where turnover was lower by 30-35% for both universes.

Cluster shrinkage appears to deliver a more consistent improvement for the sector universe than the asset class universe. This may be due to the fact that sector correlations are less stable than asset class correlations, and thus benefit from the added stability. If so, we should see even greater improvement on larger and noisier datasets such as individual stocks. We look forward to investigating this in the near future.

*******************

**Disclaimer**

Butler Philbrick Gordillo and Associates is part of Dundee Goodman Private Wealth, a division of Dundee Securities Ltd.

This opinions expressed are solely the work of Butler|Philbrick|Gordillo and Associates and although the authors are registered Portfolio Managers with Dundee Goodman Private Wealth, a division of Dundee Securities Ltd., this is not an official publication of Dundee Securities Ltd. and the authors are not Dundee Securities Ltd. research analysts. The views (including any recommendations) expressed in this material are those of the authors alone, and they have not been approved by, and are not necessarily those of, Dundee Securities Ltd. Assumptions, opinions and estimates constitute the authorsâ judgment as of the date of this material and are subject to change without notice. The information contained in this presentation has been compiled from sources believed to be reliable, however, we make no guarantee, representation or warranty, expressed or implied, as to such informationâs accuracy or completeness.

Before acting on any recommendation, you should review the detail disclosure in this presentation and consider whether it is suitable for your particular circumstances. As no regard has been made as to the specific investment objectives, financial situation, and other particular circumstances of any person who may receive this presentation, clients should seek the advice of a registered investment advisor and other professional advisors, as applicable, regarding the appropriateness of investing in any securities or any investment strategies discussed in this presentation. Past performance is not indicative of future results and no returns are guaranteed. Investment return and principal value may fluctuate so that an investorâs shares may be worth more or less than their original cost when sold.

This presentation is for information purposes only and is neither a solicitation for the purchase of securities nor an offer of securities. This presentation is intended only for persons resident and located in the provinces and territories of Canada, where Dundee Goodman Private Wealth âs services and products may lawfully be offered for sale, and therein only to clients of Dundee Goodman Private Wealth. This presentation is not intended for distribution to, or use by, any person or entity in any jurisdiction or country including the United States, where such distribution or use would be contrary to law or regulation or which would subject Dundee Goodman Private Wealth to any registration requirement within such jurisdiction or country. Please note that, the individuals responsible for this presentation or their associates may hold securities, directly or through derivatives, in issuers that may now or in the future be selected through the Darwin Core Diversified Strategy.

No part of this publication may be reproduced without the express written consent of Dundee Goodman Private Wealth. Dundee Goodman Private Wealth, a division of Dundee Securities Ltd, is a Member-Canadian Investor Protection Fund. Dundee Goodman Private Wealth respects your time and your privacy. If you no longer wish us to retain and use your personal information for the purposes of distributing reports, please let your Dundee Goodman Private Wealth Advisor know. For more information on our Privacy Policy please visit our website at www.dundeegoodman.com

Forward-looking statements are based on current expectations, estimates, forecasts and projections based on beliefs and assumptions made by author. These statements involve risks and uncertainties and are not guarantees of future performance or results and no assurance can be given that these estimates and expectations will prove to have been correct, and actual outcomes and results may differ materially from what is expressed, implied or projected in such forward-looking statements.