Can a vector of random variables be separated into dependent and independent variation?

Question

Can a vector of random variables be separated into dependent and independent variation?

33 Views Asked by Bumbble Comm At 10 May 2026 - 3:20

Is it possible to uniquely decompose a vector $\underset{d_x \times 1}{x}$ of $d_x$ random variables into dependent and independent sources of variation?

Suppose we know the distribution $P_x$ of a mean-zero vector $\underset{d_x \times 1}{x}$. Is it possible to write $x$ as the sum of two vectors of random variables $\underset{d_x \times 1}{f}$ and $\underset{d_x \times 1}{v}$, with the following conditions:

$ x = f + v $
The consitituting random variables in $v$ are independent: $p(v) = \prod_{l=0}^{d_x-1} p(v_l)$
$f$ and $v$ are independent.
$\underset{d_x \times 1}{f}$ can be written as the function of a lower-dimensional vector: $\underset{d_x \times 1}{f} = z(\underset{d_a \times 1}{a})$, with the unknown function $z: \mathbb{R}^{d_a} \rightarrow \mathbb{R}^{d_x}$ and some $d_a < d_x$.

Under these conditions, are the distributions of $f$ and $v$, $P_f$ and $P_v$, uniquely identified?

I know the above conditions imply that $$p(x) = \int_{\mathbb{R}^{d_x}} p_f(x-v) \Big( \prod_{l=0}^{d_x-1} p_{v_l}(v_l) \Big) dv_{{d_x}-1} ... dv_0 \ \ \ \ \forall x \in \mathbb{R}^L $$

However, I have not been able to show that the distributions $P_f$ and $P_v$ have to be unique.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-07-13 10:22:02

In general, $P_f$ and $P_v$ do not have to be unique.

Simple counterexample

Suppose $a$ is a random variable and $v$ a 2-dimensional vector of random variables. $f$ and $v$ are independent. $$ \underset{2 \times 1}{x} = \begin{pmatrix} 1 \\ 1 \end{pmatrix} \underset{1 \times 1}{a} + \underset{2 \times 1}{v} $$ $$ v \sim N \begin{pmatrix} \underset{2 \times 1}{0}, & \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \end{pmatrix}, \ \ \ \ a \sim N(0, 1), \ \ \ \ v \perp\!\!\!\perp a $$ $$ \implies x \sim N \begin{pmatrix} \underset{2 \times 1}{0}, & \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} \end{pmatrix} $$ Suppose that we know $z(.)$ is a linear function in this example: $f = \begin{pmatrix} z_0 a \\ z_1 a \end{pmatrix} = \begin{pmatrix} z_0 \\ z_1 \end{pmatrix} a$. Then we have a problem with 4 unknown parameters and 3 conditions. The first two unknown parameters are $z_0$ and $z_1$, which describe the covariance of $x_0$ and $x_1$. The latter two parameters are the variances $\sigma_{v_0}^2$ and $\sigma_{v_1}^2$, which describe the amount of idiosyncratic variation in $x$. We can rewrite the linear system:

$$ \begin{pmatrix} z_0^2 + \sigma_{v_0}^2 & z_0 z_1 \\ z_0 z_1 & z_1^2 + \sigma_{v_1}^2 \end{pmatrix} = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} $$

Given that $z_0^2 + \sigma_{v_0}^2 = z_1^2 + \sigma_{v_1}^2 = 2$ we know that $z_0, z_1 \leq \sqrt{2}$. Given $z_0 z_1 = 1$, this in turn implies $z_0, z_1 \geq \frac{1}{\sqrt{2}}$. All we can say about $P_f$ and $P_v$ is up to one bounded parameter: $$ z_0 = \sqrt{2 - \sigma_{v_0}^2}, \ \ \ \ z_1 = \frac{1}{\sqrt{2 - \sigma_{v_0}^2}}, \ \ \ \ \sigma_{v_1}^2 = \frac{3 - 2\sigma_{v_0}^2}{2 - \sigma_{v_0}^2}, \ \ \ \ \sigma_{v_0}^2 \in [0, 1.5] $$

Example of Uniqueness

In a linear model, $P_f$ and $P_v$ can be identified uniquely as long as the dimension $a$ is sufficiently small.

Suppose $a$ is a random variable and $v$ a 3-dimensional vector of random variables. $f$ and $v$ are independent. $$ \underset{3 \times 1}{x} = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \underset{1 \times 1}{a} + \underset{3 \times 1}{v} $$ $$ v \sim N \begin{pmatrix} \underset{3 \times 1}{0}, & \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \end{pmatrix}, \ \ \ \ a \sim N(0, 1), \ \ \ \ v \perp\!\!\!\perp a $$ $$ \implies x \sim N \begin{pmatrix} \underset{2 \times 1}{0}, & \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} \end{pmatrix} $$ Suppose that we know $z(.)$ is a linear function in this example: $f = \begin{pmatrix} z_0 a \\ z_1 a \\ z_2 a \end{pmatrix} = \begin{pmatrix} z_0 \\ z_1 \\ z_2 \end{pmatrix} a = \underset{3 \times 1}{z} a$. Suppose that in addition we know that the dimension of $a$ is one. Then we have a problem with 6 unknown parameters and 6 conditions. The first three unknown parameters are in $\underset{3 \times 1}{z}$, which describe the covariances of $\underset{3 \times 1}{x}$. The latter three parameters are the variances $\sigma_{v_j}^2$ $l=0,1,2$, which describe the amount of idiosyncratic variation in $\underset{3 \times 1}{x}$. We can rewrite the linear system:

$$ \begin{pmatrix} z_0^2 + \sigma_{v_0}^2 & z_0 z_1 & z_0 z_2 \\ z_0 z_1 & z_1^2 + \sigma_{v_1}^2 & z_1 z_2 \\ z_0 z_2 & z_1 z_2 & z_2^2 + \sigma_{v_2}^2 \end{pmatrix} = \begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix} $$ Given that $a$ is one-dimensional by assumption, we know that this system has a unique solution: $$ z_0 = z_1 = z_2 = 1, \ \ \ \ \sigma_{v_0}^2 = \sigma_{v_1}^2 = \sigma_{v_2}^2 = 1 $$

General identifiability of the integral equation for linear systems

In both examples linearity was key to argue for identifiability based on the number of parameters. When $\underset{d_x}{x}$ has any number of dimensions $x$, we can find the number of dimensions $d_a$ that $\underset{d_a}{a}$ can have at most. We find that number such that the number of unknown parameters is smaller than the number of observed parameters (in $x$'s covariance matrix).

$$ \underbrace{\frac{1}{2}(d_a - 1) d_a}_{\text{covariances of } a} + \underbrace{d_x d_a}_{\text{linear effect of } a \text{ on } x} \leq \underbrace{\frac{1}{2} (d_x - 1) d_x}_{\text{(co-)variances of } x} $$ $$ d_a \leq \frac{1}{2} - d_x + \sqrt{2d_x^2 - 2d_x + 1} $$ $$ \underset{d_x \rightarrow \infty}{\text{lim}}\Big(\frac{d_a}{d_x}\Big) \leq \sqrt{2} - 1 \approx 0.414 $$ As $d_x$ increases, the ratio of the dimension of factors $a$ compared to the dimension $d_x$ of observed $x$, $\frac{d_a}{d_x}$, has to be smaller than $0.414$. For any combination of $k \leq d_a$ factors we consider, there must be more than $2.414k$ different components of $\underset{d_x \times 1}{x}$ associated with those $k$ factors to be able to uniquely identify the linear model parameters in $z$ and the covariance between the $k$ factors. When $d_x$ is small that ratio will be slightly different.

General identifiability of the integral equation for nonlinear systems

Two questions remain:

When does low dimensionality of $a$ ensure identifiability (if ever)?
When is $d_a$ sufficiently low?

The problem can be interpreted as a multidimensional integral equation with an unknown difference kernel. If we let $P_v$ be the difference kernel, we could apply the Fourier transform to $P_v$ and $P_x$ to find $P_f$ (Polyanin, A.D. and Manzhirov, A.V., 2008. Handbook of integral equations. CRC press, p. 586).

Unfortunately the problem here is not as straightforward. Instead of knowing a kernel, we have two unknown multidimensional distributions, on which significant dimension and independence restrictions are imposed. In a linear system I have provided an example for identifiability. I conjecture that just like in linear systems the dimension of $a$ ($d_a$) plays a key role in their nonlinear counterpart. It would be fantastic to hear the opinion on nonlinear identifiability from an expert in integral equations.

Summary

In general, $P_f$ and $P_v$ do not have to be unique. In linear systems $P_f$ and $P_v$ are uniquely identified if the dimension of $a$, $d_a$ is sufficiently small. In nonlinear systems we need to solve an atypical integral equation where instead of a known kernel dimension and independence restrictions are imposed on two unknown multidimensional distributions. I do not know the implications for identifiability $P_f$ and $P_v$ in a nonlinear system, but conjecture that the dimension of $a$, $d_a$, again plays a crucial role.

Can a vector of random variables be separated into dependent and independent variation?

There are 1 best solutions below

Simple counterexample

Example of Uniqueness

General identifiability of the integral equation for linear systems

General identifiability of the integral equation for nonlinear systems

Summary

Related Questions in STATISTICS

Related Questions in INTEGRAL-EQUATIONS

Related Questions in SYSTEM-IDENTIFICATION

Trending Questions

Popular # Hahtags

Popular Questions