Let $L$ be the symbolic $n \times n$ lower triangular matrix and $S$ a real symmetric $n \times n$ positive semidefinite matrix. Fix a real symmetric $n \times n$ positive definite matrix $A$ with the Cholesky decomposition $A=CC^T$ and a subset $\mathcal{P}$ of entries of $S$. I want to solve
$$ \begin{aligned} \min_{L_{i,j}\in \mathbb{R}} \quad &\sum_{i,j} (L_{i,j}-C_{i,j})^2\\ \text{s.t.} \quad & L_{i,i}>0\quad \forall i\\ & (LL^T)_{k,l} = S_{k,l} \quad \forall \{k,l\} \in \mathcal{P}\,. \end{aligned} $$
When I run the experiments in Mathematica, I have observed that the results depend on the choice of $A$. If I start with $A$ that is close to $S$ rather than a random positive definite matrix, then the optimization works better (faster, lower numerical deviation from the feasible set). In fact, starting with a random $A$ sometimes produces results that are outside the tolerance level.
Why does this happen? Is this because the feasible set is non-convex, so the gradient descent (?) does not necessarily converge? Can it get stuck in a local minimum or a saddle point? Is it the same as saying that the optimization problem is ill-conditioned?