Mean-Variance Optimization

An introduction to quantitative portfolio management

Introduction & Theory

Mean-variance Optimization (MVO) is a portfolio selection method that Harry Markowitz introduced in his 1952 paper “Portfolio Selection.” MVO is predicated on the idea that investors are risk-averse and seek to maximize the expected return of their investment portfolio while minimizing its risk.Where risk is defined as the volatility of asset returns, and volatility is used interchangeably with variance. While not all definitions of risk include volatility, it is a helpful starting point for many investors and analysts. For example, suppose there is a universe of assets, and we have some capital to invest. We want to find how much money to invest in each asset to maximize our total returns per unit of volatility.

Single Asset Utility

To simplify the problem, let us first derive a function that defines the utility of a single investment:

\[U = E[R] - \dfrac{\gamma}{2}\varepsilon\]

where \(E[R]\) and \(\varepsilon\) are the expected returns and estimation error, respectively, and \(\gamma\) is a tuning parameter representing an investor’s risk aversion.The greater the risk aversion, the greater the error penalty. Behavioral economists attempt to estimate this parameter depending on their clientele needs, but I will use arbitrary values for this blog post.

Estimating the expected returns of an asset is notoriously difficult as it is impossible to see into the future. Moreover, if perfect predictions of future returns were possible, the estimation error would be zero, making investor utility equal to future returns. That is, the risk of any given investment is defined by how inaccurate future predictions are. A simple method to estimate future asset returns is by taking the average of past returns (e.g., the mean of the past 30 months will be the forecast of the following month’s returns). Likewise, a simple method to estimate the prediction error is by taking the variances of past returns. In other words, we reformulate the equation above to derive the mean-variance utility function as

\[U = \mu - \dfrac{\gamma}{2}\sigma^2\]

where \(\mu\) and \(\sigma^2\) are the mean and variance, respectively.

The mean-variance function incorporates the assumption that two assets with the same mean and variance of returns should be equally desirable to an investor. Therefore, mean-variance utility is arguably short-sighted in scope. By failing to recognize that investors are likely interested in many other performance characteristics, mean-variance utility cannot accurately estimate an investor’s utility. For example, higher-order statistical properties of past returns (e.g., skewness and kurtosis) may be useful metrics to evaluate the error of forecasted returns.

It is important to clarify that the mean-variance utility function does not assume the normal distribution of asset returns, as is commonly mistaken. Mean-variance utility assumes that investors only care about the mean and variance of asset returns, even if they concede higher moments will better account for forecasting errors. That is, \(\sigma^2 \neq \varepsilon\), and other metrics may impact the desirability of an asset (e.g., ESG scores). Moreover, many alternative utility functions are preferred conceptual frameworks for what investors truly value. Nonetheless, MVO continues to be the most influential optimization method in academia, likely because of its simplicity and strict quantitative definition.Feel free to dive into the rabbit hole of utility theory and behavioral finance if you're curious to learn more.

Portfolio Utility

Now that we understand how the mean-variance function defines investor utility for a single asset, how can we modify it to determine the utility of a portfolio of multiple assets? The concept is functionally the same; we need to derive a function for the mean and variance of portfolio returns. Let us first derive the mean function as the sum of asset mean returns multiplied by their weights:

\[\begin{aligned} \text{Portfolio Mean Return} &= w_{1}\mu_{1} + w_{2}\mu_{2} + w_{3}\mu_{3} + \dots + w_{N}\mu_{N}\\ &= \sum_{i=1}^{N}w_{i}\mu_{i} \end{aligned}\]

where \(w_{i}\) and \(\mu_{i}\) are the weight and mean of asset \(i\), respecitvely, and \(N\) is the total number of assets to be invested.All weights usually sum to one, where a sum of weights less than one implies we have cash reserves not invested, and a sum of weights greater than one implies we are investing more money than we have (e.g., leverage).

Technically speaking, this is all we need to continue. However, let us convert the mean portfolio returns equation into matrix form to make the math easier. Additionally, GPUs are particularly efficient at matrix operations, so most practitioners try to take advantage of GPU performance by converting large datasets into matrices.This post won't dive into the details of hardware acceleration, nor do use enough data to notice a difference. But keep this in mind. The matrix form of portfolio mean returns is given as

\[\begin{aligned} \text{Portfolio Mean Return} &= \begin{pmatrix} w_{1} & w_{2} & w_{3} & \dots & w_{N} \end{pmatrix} \begin{pmatrix} \mu_{1}\\ \mu_{2}\\ \mu_{3}\\ \vdots\\ \mu_{N} \end{pmatrix}\\ &= x^{\top}\mu \end{aligned}\]

where \(x^{\top}\) is the transpose vector of asset weights, and \(\mu\) is the vector of asset mean returns. Finally, it must be noted that these values are time-dependent:

\[\text{Portfolio Mean Return} = x_{t}^{\top}\mu_{t}\]

where the values at time \(t\) are estimated using data from preceding periods.

Now that we’ve got a matrix definition for the portfolio mean returns, the same logic can be applied to derive the portfolio variance:

\[\begin{aligned} \text{Portfolio Variance of Returns} &= var\left(\sum_{i=1}^{N}w_{i}\mu_{i}\right)\\ &= \sum_{i=1}^{N}\sum_{j=1}^{N}w_{i}w_{j}cov(\mu_{i},\mu_{j})\\ \end{aligned}\]

where variance expansion rules are applied. To convert into matrix for we drive:

\[\begin{aligned} & \begin{pmatrix} w_{1} & w_{2} & \dots & w_{N} \end{pmatrix} \begin{pmatrix} var(\mu_{1}) & cov(\mu_{1}, \mu_{2}) & \dots & cov(\mu_{1}, \mu_{N})\\ cov(\mu_{2}, \mu_{1}) & var(\mu_{2}) & \dots & cov(\mu_{2}, \mu_{N})\\ \vdots & \vdots & \ddots & \vdots\\ cov(\mu_{N}, \mu_{1}) & cov(\mu_{N}, \mu_{2}) & \dots & var(\mu_{N})\\ \end{pmatrix} \begin{pmatrix} w_{1}\\ w_{2}\\ \vdots\\ w_{N}\\ \end{pmatrix}\\ &= x^{\top}{\Sigma}x\\ \end{aligned}\]

where \({\Sigma}\) is the variance-covariance matrix of the universe of assets.If this is brand new content to you, I would strongly recommend that you research variance and covariance expansion rules and linear algebra methods. Maybe practice writing out these equations by hand with a universe of two or three assets to convince yourself the math checks out. Finally, lets rewite the equation to include time steps:

\[\text{Portfolio Variance of Returns} = x_{t}^{\top}{\Sigma}_{t}x_{t}\\\]

Optimization

Now that we have derived the portfolio returns mean and variance, they can be plugged into our mean-variance utility function as:

\[U = x_{t}^{\top}\mu_{t} - \dfrac{\gamma}{2}x_{t}^{\top}{\Sigma}_{t}x_{t}\]

where we obtain the utility of a portfolio of assets in linear algebra format.

The ultimate goal of MVO is to find a vector of asset weights, \(x_{t}\), such that this function returns the highest possible value. To visualize the optimization procedure, let’s take a generic palabora and draw a tangency line where the y-axis is at its maximum. Note that the values are entirely arbitrary, and the shape of the curve may be completely different in a real-world scenario (concavity will still hold, though). Also, the x-axis represents various combinations of asset weights, which doesn’t make sense in a multivariate framework. Nonetheless, we’re just trying to visualize the optimization process:

where we can see \(\dfrac{df}{dx} = \dfrac{dy}{dx} = \dfrac{0}{\text{change in }x} = 0\). That is, if we construct a portfolio such that the tangency line of mean-variance utility is equal to zero, our portfolio is optimized.

Because the mean-variance utility function is concave, we can formalize this optimization as a closed-form maximization problem by solving for \(x_{t}\) (the minimum of the inverse utility would be functionally the same):

\[\begin{aligned} \max_{x_{t}} &= x_{t}^{\top}\mu_{t} - \dfrac{\gamma}{2}x_{t}^{\top}{\Sigma}_{t}x_{t}\\ 0 &= \dfrac{df}{dx} \left(x_{t}^{\top}\mu_{t} - \dfrac{\gamma}{2}x_{t}^{\top}{\Sigma}_{t}x_{t}\right)\\ 0 &= \mu_{t} - \gamma{\Sigma}_{t}x_{t}\\ x_{t} &= \dfrac{1}{\gamma}{\Sigma}_{t}^{-1}\mu_{t} \end{aligned}\]

where \(x_{t}\) is the optimized vector of asset weights.

We’re done! Of course, there are many ways to modify this process (e.g., constrain weights to non-negative values, etc.), but this is the core intuition and math behind possibly the most important financial concept for portfolio management. Now let’s dive into a real-world implementation using python!

Python Code

Mean-variance optimization is a widely used technique in portfolio management, but a lot of the inspiration for this post comes from DeMiguel et al. (2009). Definitely check out that paper, they do a very comprehensive review of quantitative portfolio management strategies across multiple datasets. This post aims to replicate a small portion of that paper.

Load Data

First let’s import the packages we’re going to be using.

import pandas as pd
import numpy as np
import pandas_datareader.data as web

Then, we’ll use the Pandas DataReader to load our industry portfolios. These are pre-constructed portfolios that we’ll treat as individual assets. We’ll also download the Fama-French factors to obtain the risk-free rate and market factor. Finally, we subtract the risk-free rate from our industry portfolios, so all returns are in excess of the risk-free rate.

# FamaFrench 10 Industry Portfolios

industries = web.DataReader("10_Industry_Portfolios", "famafrench", start="1963-07", end="2023-01")[0].div(100)

# FamaFrench Factors

factors = web.DataReader("F-F_Research_Data_Factors", "famafrench", start="1963-07", end="2023-01")[0].div(100)

# Augment Industries to get Excess Returns

industries_ex = industries.sub(factors["RF"], axis=0)

# Add market factor

industries_ex["Mkt-RF"] = factors["Mkt-RF"]

Optimization

First, create Python code for the equation we derived in the introduction, \(x_{t} = \dfrac{1}{\gamma}{\Sigma}_{t}^{-1}\mu_{t}\):

def get_mv_weights(df_window, gamma=1, output_relative_weights=True):
"""
Get the Minimum Variance weights for a given window
:param df_window: Window of data
:param output_relative_weights: Whether to output relative weights or absolute weights
""" # Calculate variables
mean = df_window.mean().values
cov = df_window.cov().values
inv_cov = np.linalg.pinv(cov)

    # Calculate weights
    absolute_weights = (1 / gamma) * inv_cov @ mean
    relative_weights = absolute_weights / np.abs(np.sum(absolute_weights))

    return relative_weights if output_relative_weights else absolute_weights

This function takes a window of data and returns the weights solved via our mean-variance derivative. For our implementation, let’s just set \(\gamma = 1\), but feel free to research what a good risk-aversion parameter might be. Also, the equation derived in the introduction only solves for absolute weights. So, we add a function parameter to normalize these weights such that they sum to one. The absolute sum of weights is used as the denominator so that, should weights sum to a negative value, the sign of an asset’s allocation is not lost.

Now that we have our mean-variance function, let’s create a rolling window to calculate the weights of our portfolio at each time step:

# DataFrames for weights and returns to be appended.

# There are more efficient ways of doing this but I'm trying to make the code as readable and consistent as possible.

mv_weights_in_sample = pd.DataFrame(columns=industries_ex.columns)
mv_weights_out_sample = pd.DataFrame(columns=industries_ex.columns)
mv_weights_full_sample = pd.DataFrame(columns=industries_ex.columns)
ew_weights = pd.DataFrame(columns=industries_ex.columns)
df_returns = pd.DataFrame()

# How many periods to use for the window

window_size = 120

# Loop through the data to obtain mean-variance weights

for i, date in enumerate(industries_ex.index):
if i < (window_size - 1): continue

    # Get the window of data ending at 'date' with a window size of 'window_size'
    # Note that iloc[A,B] fetches values from A to B-1
    start_i = i - (window_size - 1)
    end_i = i + 1
    df_window = industries_ex.iloc[start_i:end_i]

    # Get the mean-variance weights
    weights = get_mv_weights(df_window)

    # Append the weights to the DataFrame
    mv_weights_in_sample.loc[date] = weights

# Update necessary weight DataFrames

mv_weights_out_sample = mv_weights_in_sample.shift(1)

mv_weights_full_sample = get_mv_weights(industries_ex).reshape(1, -1)
mv_weights_full_sample = np.repeat(mv_weights_full_sample, industries_ex.shape[0], axis=0)
mv_weights_full_sample = pd.DataFrame(mv_weights_full_sample, columns=industries_ex.columns, index=industries_ex.index)

ew_weights = np.ones((industries_ex.shape[0], industries_ex.shape[1])) / industries_ex.shape[1]
ew_weights = pd.DataFrame(ew_weights, columns=industries_ex.columns, index=industries_ex.index)

# Get the 1/N portfolio returns (useful as a benchmark)

df_returns["1/N"] = industries_ex.mul(ew_weights, axis=1).sum(axis=1)

# Get the mean-variance portfolio returns (in-sample)

df_returns["MV (In-Sample)"] = industries_ex.iloc[window_size-1:].mul(mv_weights_in_sample, axis=1).sum(axis=1)

# Get the mean-variance portfolio returns (out-of-sample)

df_returns["MV (Out-Sample)"] = industries_ex.iloc[window_size-1:].mul(mv_weights_out_sample, axis=1).sum(axis=1)

# Get the mean-variance portfolio returns (full-sample)

df_returns["MV (Full-Sample)"] = industries_ex.mul(mv_weights_full_sample, axis=1).sum(axis=1)

# Drop invalid rows such that all portfolios start at the same time.

# Note that we drop window_size + 1 number of rows because the row index starts at 0.

# We need to drop an extra row because out-of-sample mean-variance weights are shifted by 1 period.

df_returns = df_returns.iloc[window_size:]

After calculating the weights at each time step, we multiply them by the corresponding asset returns to obtain the final portfolio returns. We also shift the weights by one time step to obtain the out-of-sample returns. That is, the mean-variance optimal portfolio at time \(t\) is applied to returns at time \(t+1\).This is what you would do if it was a real trading strategy because you don't have access to future information. This also highlights why accurate forecasts are super important! We also calculate the mean-variance optimal weights for our full sample and a simple 1/N strategy for comparison. Finally, we drop the top rows so that each strategy’s returns are valid and aligned.

Your df_returns should look something like this:

Results

There is no unified method of interpreting the results. However, academic papers usually produce a range of tables and graphs to highlight common metrics and compare them to some relevant benchmarks. I won’t cover the details of every metric we use, but hopefully, the code will speak for itself.

	1/N	MV (In-Sample)	MV (Out-Sample)	MV (Full-Sample)
Sharpe Ratio	0.172825	0.228963	0.0385552	0.238449
Sortino Ratio	0.229944	0.307529	0.0421003	0.38138
Annualized Returns	0.0809591	0.216877	-0.0528296	0.127804
Annualized Volatility	0.149322	0.308717	0.377176	0.162132
Maximum Drawdown	-0.496305	-0.830373	-0.996717	-0.313587
Skewness	-0.621345	0.980511	-1.12029	-0.0537011
Kurtosis	2.42816	14.248	22.2821	1.1225
VaR (0.05)	-0.0678418	-0.095657	-0.125124	-0.0612105
CVaR (0.05)	-0.0990815	-0.175262	-0.259063	-0.0935619
CEQ	0.00652066	0.0164338	-0.00172963	0.010065
Annualized Turnover	0.483863	40.5315	45.7085	3.98412

We can see that mean-variance investing absolutely dominates the 1/N portfolio for in-sample testing, but as expected, the estimation error inherent in our out-of-sample testing completely destroys returns. Feel free to click on the legend items to add/remove them; the graph will automatically rescale.

Thanks for reading. I hope this was informative!

Jupyter Notebook Download

Feel free to download the Jupyter notebook I used to create this post.

Download Here