Let $A$ be an arbitrary $n \times n$ real matrix and $P$ be a random perturbation matrix with zero-mean, i.i.d. entries. Can we say anything about the eigenvalues of $A+P$? In particular, are there any conditions under which $\mathbb{E}[\lambda(A + P)] = \lambda(A)$, where $\lambda$ maps a matrix to its spectrum?
What if $A$ and $P$ are symmetric?
I found lots of material on bounding the spectrum of $A + P$, but less on its distribution or expectation.
The context of this question is a machine learning optimization problem
\begin{equation}\begin{split} \min_\theta \; & L = L_1(A) + L_2(\lambda(A)) \end{split}\end{equation} where $A = f_\theta(z)$ is a matrix generated by the function $f$, with learned parameters $\theta$, taking some input $z$ and returning $A$ symmetric by design. $L_1, L_2$ are smooth loss functions. $L$ is optimized by stochastic gradient descent, i.e. using the algorithm \begin{equation}\begin{split} \theta_{t+1} &= \theta_t - \alpha \frac{\partial L}{\partial \theta_t} \\ &= \theta_t - \alpha \left( \frac{\partial L_1}{\partial \theta_t} + \frac{\partial L_2}{\partial \lambda(A)} \frac{\partial \lambda(A)}{\partial A} \frac{\partial A}{\partial \theta_t} \right), \end{split}\end{equation} where $\alpha > 0$ and the gradient is evaluated over a batch of inputs $z$ sampled i.i.d. from a fixed distribution at each $t$.
Under certain conditions, $f_\theta$ yields $A$ with repeated or very close eigenvalues, making the $\frac{\partial \lambda(A)}{\partial A}$ in the chain rule expansion of $g \triangleq \frac{\partial L_2}{\partial \theta}$ ill-defined. This is bad, but since we are already performing stochastic gradient descent over the distribution of $z$, we would be satisfied by any estimator $\hat g \approx g$ as long as $\mathbb{E}[\hat g] = g$.
Since $A+P$ almost surely does not have repeated eigenvalues due to (TODO: theorem), it would be nice if we could construct a good estimator $\hat g$ from evaluations of $\frac{\partial \lambda(A + P)}{\partial A}$.