Go back

Research, Insights

When Smart Beta Meets Machine Learning and Portfolio Optimization

Jason Hsu, PhD

Scroll down


Smart beta products using common factors like value, low volatility, quality, and small cap experienced an underwhelming performance from 2005–2022. On average, long-only factor portfolios built from a wider set of global factors identified in the finance literature generated significantly positive excess returns across countries, suggesting diversifying across many factors is more prudent than selecting a handful that have performed the best. Moreover, long-only portfolios built from expected returns fit to these 87 factors using linear ridge and nonlinear machine learning models likegradient boosting generated larger and more statistically significant excess returns in nearly all countries. A long-only portfolio optimized to maximize return given an aversion to tracking error delivered yet higher excess returns and information ratios across countries. Taken together, these results provide strong evidence against the claim that most of the documented factors are datamined and without investment merit.




Smart beta products using common factors (value, low vol, quality and small cap) show underwhelming performance from 2005 to 2021. However, portfolios built from a wider set of fundamental characteristics identified in the investment factor literature generate significantly positive excess returns. The outperformance of broadly diversified factor portfolios over more concentrated factor portfolios is robust across countries and over multiple time horizons.


In this research extract, we do not address the source of the abovementioned underperformance for the popular factors like value and low vol. Instead, we focus on the merit of broad diversification in factor allocation—embracing 80+ factor characteristics instead of concentrating into four. To be sure, investors will regret allocating to many of the 80+ factors instead of betting on one or two best performing ones. However, the inability to predict the “best” factor is precisely why diversification is a better idea. We demonstrate that using a diversified pool of fundamental characteristics leads to a better portfolio outcome than concentrating into traditional MBA textbook factors. As it turns out, diversification isn’t only a good idea for stocks; it is also a good idea for factors.

The Flaws of First Generation Smart Beta

When it comes to smart beta investing, the industry’s practice mirrors concentrated stock picking rather than sensible diversification. As with any concentrated portfolio, bad luck can lead to significant and extended underperformance.


Modern portfolio theory doesn’t just naively advocate 1/N diversification; not all stocks are equally attractive; certainly, that’s true as well for factors. Instead, modern portfolio theory advocates estimating expected returns and covariances to build optimally diversified portfolios. This argues for a different portfolio construction approach from gen 1 Smart Beta portfolios. The view implicit in traditional factor investing is that “simple tilting of the cap-weighted benchmark” toward “a handful of firm characteristics like B/P and small capitalization” are more than adequate for harvesting factor premium. This preference for a few curated factors and a simple portfolio construction heuristic is driven hugely by the experience that complex risk-return models involving many factors and optimization have generally produced poor portfolio outcomes. The datamining problem from using complex models involving many factors far outweigh the information gain from these models. This latter point is no longer a valid concern, today, given the advance in statistical approaches and computational power. These advances allow researchers to “anti-datamine” when handling complex factor interactions for a large universe of factors.

Using Machine Learning as a Key to Unlock the Next Generation of Smart Beta

There are 80+ documented factors in the academic literature. The investment industry has popularized four thus far in its Smart Beta push; much more work needs to be done. Which factors matters for long-term stock returns and short-term stock return? Which ones should you gain exposure to? How should you weigh them as you build portfolios?


To answer these questions, we build simple factor premium-tilted portfolios. In Smart Beta construction, stocks are included into a portfolio based on their exposure to factors. The better you can estimate the factor premiums, the more reliable will be the outperformance of the resulting portfolio. We contrast a variety of factor premium models. Linear ridge, a standard linear ML model for estimating returns from factor exposures, produces a portfolio excess return of 1.77% per annum and an alpha (against a 6-factor model) of 2.30% from 2005 to 2022 in the U.S. Gradient boosting, a non-linear ML model, generates portfolio excess return of 2.11% and an alpha of 2.55% over the same period. Both ML models outperform the traditional regression approach. The simplistic equally weighted approach, which ignores return information completely, has the worst performance on average. The ML approach is simply the state-of-art in extracting useful information on stock returns for factor portfolio construction while ameliorating datamining risk. This advantage is particularly large when returns are driven by on a large multitude of factors.
The key advantage of advanced ML models, linear or non-linear, in estimating returns is in its regularization procedure, or coefficient shrinkage. This procedure significantly reduces model over-fitting and thus reduces ill-behaved out-of-sample performance. Using non-linear ML further captures non-linear relationships amongst the factor characteristics in estimating future returns. One should not be surprised by the added benefit from modeling non-linearity. Many of the documented factor characteristics capture subtle fundamental information about companies. It would be more surprising to find that these fundamental characteristics do not interact.


The underperformance of popular factors such as value and low vol in the past 15 years raises questions on factor premium decay for the most crowded popular factors. Have these factors stopped working due to popularity or perhaps are factor excess returns simply highly volatile and prone to long periods (5 to even 10 years) of underperformance? Given these concerns, concentration into a few widely adopted factors goes against investing best practice, which are diversification and avoidance of over-confidence.


With better computational power, advanced statistical methods can now unlock the full potential of Markowitz’s original insight re: optimal diversification—how do we use our knowledge on risk and return in a scientific way that optimally balances information while avoiding model overfitting and its resulting unwarranted concentration into factors and styles. Using non-linear ML, we can build expected return models using the full universe of well- vetted academic factors and their complex interactions. Additionally, the regularization procedure in ML solves the over-fitting problem long suffered by traditional regression approaches, which leads to disappointing out- of-sample results. The resulting portfolio is one which effectively access equity premiums from a much richer universe of equity factors leading to more stable systematic return harvesting.


To gain access to the full research paper, please visit jbis.2022.1.015

Subscribe to receive the latest Rayliant research, product updates, media and events.


Sign up

Important Information

Issued by Raylint Investment Research d/b/a Rayliant Asset Management (“Rayliant”). Unless stated otherwise, all names, trademarks and logos used in this material are the intellectual property of Rayliant.

This document is for information purposes only. It is not a recommendation to buy or sell any financial instrument and should not be construed as an investment advice. Any securities, sectors or countries mentioned herein are for illustration purposes only. Investments involves risk. The value of your investments may fall as well as rise and you may not get back your initial investment. Performance data quoted represents past performance and is not indicative of future results. While reasonable care has been taken to ensure the accuracy of the information, Rayliant does not give any warranty or representation, expressed or implied, and expressly disclaims liability for any errors and omissions. Information and opinions may be subject to change without notice. Rayliant accepts no liability for any loss, indirect or consequential damages, arising from the use of or reliance on this document.

Hypothetical, back-tested performance results have many inherent limitations. Unlike the results shown in an actual performance record, hypothetical results do not represent actual trading. Also, because these trades have not actually been executed, these results may have under- or over- compensated for the impact, if any, of certain market factors, such as lack of liquidity. Simulated or hypothetical results in general are also subject to the fact that they are designed with the benefit of hindsight. No representation is being made that any account will or is likely to achieve profits or losses similar to those shown. In fact, there are frequently sharp differences between hypothetical performance results and the actual results subsequently achieved by any investment manager.