I am trying to learn more about the mathematical concept of Identifiability (https://en.wikipedia.org/wiki/Identifiability).
In particular, I am trying to learn about the mathematical conditions that makes a Probability Distribution Function "Identifiable" vs "Non-Identifiable". As I understand, a Probability Distribution Function that is "Non-Identifiable" means that even if an infinite amount of data were to be available, it would still be impossible to estimate all parameters within this Probability Distribution Function. In simple terms, I think this means that when a Probability Distribution Function is Non-Identifiable, there can exist multiple solutions to the corresponding Maximum Likelihood equations.
A popular example of a Probability Distribution Function that is generally considered as Non-Identifiable is a "Mixture of Normal Distributions" (https://en.wikipedia.org/wiki/Mixture_distribution). In general, we can define this as such:
$$f(x) = \sum_{i=1}^{k} \pi_i \mathcal{N}(x | \mu_i, \sigma_i^2) = \sum_{i=1}^{k} \pi_i \mathcal{N}(x | \theta_i) $$
- $f(x)$ is the probability density function of the mixture model
- $k$ is the number of components in the mixture
- $\pi_i$ is the mixing proportion of the $i-th$ component, with $\sum_{i=1}^{k} \pi_i = 1$
- $\mathcal{N}(x | \mu_i, \sigma_i^2)$ is the probability density function of a normal distribution with mean $\mu_i$ and variance $\sigma_i^2$
In the above situation, if we were to estimate a set of $ \theta_i$ and $\pi_i$ - we could make different combinations of $ \theta_i \pi_i$ from this set, and each of these combinations would be equally valid. We would have "no way of identifying" which of these combinations is valid - thus resulting in a "Non-Identifiable" model.
This leads me to my question:
Is it possible to know in advance if a "general type of model" (e.g. All Normal Mixture Distributions) will be Identifiable or Non-Identifiable? I remember from Linear Algebra that if a system of matrix equations has more variables than equations, then there can be multiple possible solutions to this system (i.e. related to the Rank of the matrix). Can a similar "rule" be applied in this situation to understand if a general class of Probability Distributions is Identifiable or Non-Identifiable?
Although I do not fully understand this, I have heard that some "modifications" (i.e. constraints) can be made to the above Mixture Model which can in fact make it Identifiable. For example, if $\theta_i \neq \theta_j$, then it becomes Identifiable. Or if $\pi_i > 0 $ . I am still a bit confused about all this. Can someone please help me understand what "modifications" (i.e. constraints) can be added to the above Mixture Model to make it Identifiable (and explain how adding these modifications make it Identifiable)?
Thanks!