I'm trying to create portfolios using real life stock market data from the past, to test real life performance of Markowitz portfolio optimization model. Solving through CVX, however, gives me the most weird portfolios where the it invests in everything and the resulting efficient frontier is not a curve, but a zigzag line.
edit: Here's the input data:
0.770385819 1.041085387 0.459646675 1.407991775 1.032964003 1.015192832 1.761284058 0.432529846 1.196856508 1.130865217 0.948489011 1.130973958 1.021932454 1.119256276 1.090071648 1.070024263 0.898895476 0.94323697 1.07892532 0.969909847 1.427561837 0.952085375 0.911283811 1.218244825 0.732480413 1.25540862 1.367107712 0.931140812 1.185658681 0.868207764 1.014347682 1.478019669 1.86267166 1.007992203 1.132348209 0.951148514 1.008848925 0.835203173 1.568876745 0.814207457 1.008956842 0.924991843 1.091709856 0.459646675 1.251518105 1.241371696 0.902728017 0.851700526 1.23548996 1.340598122;
1.563231801 1.160694749 1.295086356 1.102851533 1.658395418 1.441289332 1.131956592 0.441946506 1.429396335 1.168607239 1.112126107 1.079003599 1.097335615 1.34966378 1.186228482 1.619232426 1.043895898 1.258081655 1.442456662 1.071825205 0.977722772 1.149757902 1.160316757 1.245202657 1.483468401 1.207680015 1.428717458 1.468165592 1.115104134 1.504980529 1.179700146 1.137373204 2.054959786 1.325712628 0.885426498 1.300327201 1.358178726 1.392186705 2.588573659 2.686215908 1.016398298 1.331078306 1.187699367 1.295086356 1.234122963 1.300353826 1.418989068 1.504121584 1.361842357 2.410075523;
2.833927266 1.074222841 1.149776806 2.5667894 1.188622645 1.27342918 1.120995734 2.557863666 1.56851897 1.299325173 1.20084143 1.09924957 1.212777332 1.190550724 1.715039578 1.689773913 1.227945996 0.810509236 1.202735818 1.102883208 1.491139241 1.081009384 0.808332936 1.069644246 1.266890108 1.239471924 1.172905087 1.102346128 1.137121269 1.139571095 1.179716604 1.191004623 1.213307241 1.209411945 1.21512834 1.065562602 1.060866403 0.983125018 0.464863672 1.66855076 1.111080906 1.196453819 1.161552216 1.149776806 1.27028591 1.618204209 1.252353924 1.262311919 1.390691261 0.737679682;
1.503269518 1.64103917 1.32772227 1.01086575 2.476334985 1.435918879 0.826643773 0.767703937 1.206691156 0.98326345 0.965173507 1.600943419 1.14800529 1.137910023 1.035384615 1.052234414 0.916417592 0.910925845 0.963225503 0.888894562 1.405772496 1.155 0.875774931 1.455537213 1.270564643 1.245505029 1.205900058 1.162396583 0.879163835 1.954987817 1.080998944 0.875943724 0.849193548 2.86717459 0.821654933 1.14574944 1.167947175 1.208396908 1.480125546 1.344855794 1.198409593 1.442598391 1.09413377 1.32772227 1.286854078 1.282799117 1.055728237 1.046595974 3.609028312 0.972209445;
2.027605491 0.975980961 1.523751475 0.60734791 0.954704656 0.948747816 1.152948392 0.555557551 1.327389277 2.006589554 1.130072166 0.988036542 0.974592813 1.085481069 1.51589896 0.770266552 1.075550379 1.17000843 2.37991297 1.374691252 0.963768116 1.033065094 1.207308029 1.03158237 1.27473305 1.323322811 0.995666963 1.701634761 1.197580575 1.796410512 1.34454464 1.414615372 0.908515353 1.074479273 1.892912954 1.215585858 1.260530805 1.946122565 1.054051499 0.464658358 1.258692039 1.415613126 1.363021507 1.523751475 1.224722064 1.193179055 1.770201997 1.155290681 0.999299308 2.09035457;
This is the CVX model:
RR = [1.15:0.01:1.21];
for i=1:length(RR)
R = RR(i);
cvx_begin quiet;
variable x(length(mu))
minimize (x'*S*x)
subject to
mu'*x >= R;
e'*x == 1.0000;
x>= 0;
x<=0.33
cvx_end
xx = [xx x];
rr = [rr mu'*x];
stdv = [stdv sqrt(x'*S*x)];
end
And here's the output:
Calling SDPT3 4.0: 57 variables, 7 equality constraints
------------------------------------------------------------
num. of constraints = 7
dim. of socp var = 6, num. of socp blk = 1
dim. of linear var = 51
*******************************************************************
SDPT3: Infeasible path-following algorithms
*******************************************************************
version predcorr gam expon scale_data
NT 1 0.000 1 0
it pstep dstep pinfeas dinfeas gap prim-obj dual-obj cputime
-------------------------------------------------------------------
0|0.000|0.000|3.9e+02|1.7e+01|7.3e+03| 1.228178e+01 0.000000e+00| 0:0:00| chol 1 1
1|0.947|1.000|2.1e+01|1.0e-01|4.2e+02| 4.638828e+00 -1.233840e+01| 0:0:00| chol 1 1
2|0.977|1.000|4.6e-01|1.0e-02|1.9e+01| 7.587821e-02 -9.896742e+00| 0:0:00| chol 1 1
3|1.000|1.000|1.8e-07|1.0e-03|2.5e+00| 2.008947e-02 -2.496417e+00| 0:0:00| chol 1 1
4|0.961|0.948|1.0e-07|1.5e-04|1.3e-01| 1.434256e-02 -1.174981e-01| 0:0:00| chol 1 1
5|0.972|0.422|1.7e-08|8.9e-05|1.1e-01| 5.465703e-03 -1.043929e-01| 0:0:00| chol 1 1
6|1.000|0.443|1.0e-08|5.0e-05|8.7e-02| 6.486070e-03 -8.040863e-02| 0:0:00| chol 1 1
7|0.684|0.739|3.9e-09|1.3e-05|4.3e-02| 4.102134e-03 -3.888568e-02| 0:0:00| chol 1 1
8|0.949|1.000|2.6e-10|1.1e-08|1.0e-02| 3.037969e-04 -1.007127e-02| 0:0:00| chol 1 1
9|1.000|1.000|8.5e-11|1.1e-09|2.5e-03| 9.298042e-05 -2.408280e-03| 0:0:00| chol 1 1
10|0.979|0.985|1.8e-12|1.3e-10|1.6e-04| 2.714452e-06 -1.545940e-04| 0:0:00| chol 1 1
11|0.985|0.985|2.5e-14|1.3e-11|2.4e-06| 4.039800e-08 -2.340381e-06| 0:0:00| chol 1 1
12|0.995|0.995|5.9e-15|1.1e-12|3.9e-08| 4.608065e-10 -3.805504e-08| 0:0:00| chol 1 1
13|1.000|0.997|4.6e-15|1.0e-12|7.4e-10| 7.917000e-12 -7.267619e-10| 0:0:00|
stop: max(relative gap, infeasibilities) < 1.49e-08
-------------------------------------------------------------------
number of iterations = 13
primal objective value = 7.91700039e-12
dual objective value = -7.26761875e-10
gap := trace(XZ) = 7.36e-10
relative gap = 7.36e-10
actual relative gap = 7.35e-10
rel. primal infeas (scaled problem) = 4.59e-15
rel. dual " " " = 1.00e-12
rel. primal infeas (unscaled problem) = 0.00e+00
rel. dual " " " = 0.00e+00
norm(X), norm(y), norm(Z) = 7.4e-01, 6.1e-09, 3.2e+00
norm(A), norm(b), norm(C) = 1.3e+01, 2.9e+00, 4.2e+00
Total CPU time (secs) = 0.37
CPU time per iteration = 0.03
termination code = 0
DIMACS: 5.9e-15 0.0e+00 1.3e-12 0.0e+00 7.3e-10 7.4e-10
-------------------------------------------------------------------
------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): +7.91702e-12
3
0.0086
0.0143
0.0073
0.0186
0.0125
0.0115
0.1894
0.0055
0.0197
0.0269
0.0117
0.0179
0.0130
0.0136
0.0198
0.0095
0.0109
0.0098
0.0208
0.0130
0.0534
0.0115
0.0100
0.0182
0.0087
0.0259
0.0193
0.0123
0.0169
0.0148
0.0143
0.0485
0.0204
0.0178
0.0270
0.0114
0.0122
0.0119
0.0077
0.0047
0.0155
0.0133
0.0165
0.0073
0.0237
0.0237
0.0124
0.0089
0.0472
0.0101
Is there a reason for this? I checked almost everything to find a connection, but I failed. Am I missing something here?
As per my comments, having more stocks than observed returns per stocks results in a singular covariance matrix if this matrix is estimated as follows:
Let $r_t$ be a vector of $N$ returns at time $t$. If we estimate $\mathbf{E}[r_tr_t']$ by $S:=\frac{1}{T}\sum_{t=1}^T r_tr_t'.$ Then any column $i$ of $S$ is $\frac{1}{T}\sum_{t=1}^T r_tr_{i,t}$ with $r_{i,t}$ being the $i$th element of $r_t$. This means that column $i$ is a linear combination of $T$ vectors. Moreover, all columns are linear combinations of the same $T$ vectors. Hence, the rank of $S$ cannot exceed $min(T,N)$. And so, if $T<N$ this results in a singular covariance matrix.
This potentially transforms a "minimization of a quadratic form"-problem into a (partly) linear optimization problem, probably resulting in a (piecewise) linear Markovitz frontier instead of the usual curved once and which may have multiple solutions.
Sidenote, $S$ is the covariance matrix for demeaned return vectors. However, the rank argument holds true for non demeaned returns.