Ciao,
Let $Y_1(t), \dots, Y_n(t)$ a set of timeseries.
Let $P(t) = \sum_{i=1}^n Y_i(t)$. I will call this object a portfolio.
Suppose you have also a set of regressor, one for each timeseries: $X_1(t), ..., X_n(t)$.
You can then build a regression model for each timeseries:
$$
Y_i(t) = f(X_t; \beta_i) + \epsilon_i(t)
$$
where $\epsilon_i(t)$ are the residuals and $\beta$ are the parameter to optimize ($f$ can be whatever you want: linear regression, BDT, NeuralNet...it's not important).
Let $\hat{Y}_i(t) \doteq f(X_t; \beta_i)$.
The problem is about forecasting $P(t)$ with a model that minimizes MAE but without optimising the portfolio at once. Namely there is a constraint such as I can not build a model like: $$ P(t) = F(X_1(t), \dots, X_n(t)) $$ The only thing I am aloud to do is build a separate model for each $i$ (i.e. $\hat{Y}_i$) and then defining portfolio forecast as: $$ \hat{P}(t) \doteq \sum\hat{Y}_i(t) $$
Let $$ MAE_P = || P(t) -\hat{P}(t) || = \sum_t |\sum_{i=i}^n Y_i(t) - \hat{Y}_i(t)| = \sum_t | \sum_{i=i}^n \epsilon_i(t)| $$ As I said the main purpose is minimize $MAE_P$.
I want to focus on what happens to $MAE_P$ when I build $\hat{Y}_i(t)$ using different loss functions.
MAE loss function
I tried first using MAE loss function also to fit $\hat{Y}_i(t)$.
It means that to find the best fit I've optimised the following problem:
$$
\min_{\beta_i}||Y_i(t) - f(X_i(t); \beta_i)||_1 = \min_{\beta_i} \sum_t |Y_i(t)- f(X_i(t); \beta_i)|
$$
I will call $\epsilon_i^{MAE}(t)$ the residuals obtained by this optimisation and $MAE_P^{MAE}$ the portfolio MAE using these models.
MSE loss function
I can repeat the optimisations but this time using MSE loss function:
$$ \min_{\beta_i}||Y_i(t) - f(X_i(t); \beta_i)||_2^2 = \min_{\beta_i} \sum_t |Y_i(t)- f(X_i(t); \beta_i)|^2 $$
This second problem will generate a second family of models with residuals $\epsilon_i^{MSE}(t)$. The portfolio error obtained using this second family is $MAE_P^{MSE}$.
Note that even if $\hat{Y}_i(t)$ are obtained using MSE loss function the main purpose is still to minimize MAE portfolio ($MAE_P$)
Conclusion and question
As expected the MAE error of the single timeseries is lower when I use MAE loss function (by construction):
$$ \sum_t |\epsilon^{MAE}_i(t)| < \sum_t |\epsilon^{MSE}_i(t)| $$
Also as expected the bias is lower using MSE optimizer:
$$ \sum_t \epsilon^{MAE}_i(t) > \sum_t \epsilon^{MSE}_i(t) $$
When I aggregate my models to compute portfolio error this happens:
$$ MAE_P^{MSE} < MAE_P^{MAE} $$
by a lot. And also:
$$ BIAS_P^{MSE} < BIAS_P^{MAE} $$
While the second statement is kind of intuitive the first one is not (to me).
I knwon that $\epsilon_i(t)$ are high correlated w.r.t time so my question is:
Is it true that, if the correlation of $\epsilon_i(t)$ is big enough, then to optimize the portfolio MAE (in the sense I've specified above) it is more convenient to optimize the single timeseries that compose the portfolio such that the bias in minimized (i.e. using MSE loss function) rather than the MAE?
Can you give me a hint about how to formalize it?