I am interested in 2d convolutions. Specifically, I am interested in convex CNN networks which happens if (and only if?) their 2d convolutional layers are PSD (since this makes them monotone, and since they were the only non-monootone layers in the CNN, so my doing so the CNN consists of only convex & monotone layers; thus, it is convex).
Now, to the question itself: A 1d convolution can be seen as a Toeplitz matrix. Thus, if it's a PSD, then its SVD decomposition, according to wikipedia is just $A = \sum_{k=1}^r d_k v(f_k) v(f_k)^\text{H} = VDV^\text{H}$ where V is the Vandermonde matrix.
Now, in the 2d case: a 2d convolution is a doubly blocked Toeplitz matrix (and even doubly block circulant matrix). Here we can see a nice demonstration of how to convert each 2d convolution into a doubly blocked Toeplitz matrix. I still can't figure out what does it mean about the SVD decomposition of PSD 2d convolutions. How does it look?
Your help will be highly appreciated! Thanks in advance!
Firstly, $A = \sum_{k=1}^r d_k v(f_k) v(f_k)^\text{H} = VDV^\text{H}$ is not the SVD of a Toeplitz matrix, neither does the Wikipedia page claim that. That is known as the "Vandermonde decomposition" and the claim is that a rank appropriate PSD Toeplitz matrix can be uniquely factorized as such (Carathéodory and Fejér, 1911). For a fairly simple proof of this decomposition under stronger assumptions that $A$ is PD and Hermitian see the appendix of Bäckström. Note that there is a typo on equation 33, it should be $VA = \Lambda$.
Now to answer your question. The SVD looks like it always does. I assume you are interested in the Vandemonde decomposition. This decomposition can be generalized for multilevel Toeplitz matrices, i.e. for $n$-times block Toeplitz matrices (see Yang et al., theorem 1).
In your case. For a PSD doubly blocked Toeplitz matrix, $T$, with rank$(T) = r <\min \{n_1, n_2 \}$ we have
$$ A = \sum_{k=1}^r d_k v(f_{:k}) v(f_{:k})^\text{H} = V(f)DV(f)^H $$ where $f \in [0,1]^{2 \times r}$ and $f_{:k}$ is the $k$:th column of $f$. Furthermore, $v(f_{:k}) := v_{n_1}(f_{1k}) \otimes v_{n_2}(f_{2k}) \in \mathbb{C}^{n_1\cdot n_2}$ where $\otimes$ denotes the Kronecker product and thus $V(f) := [v(f_{:1}), v(f_{:2}), ..., v(f_{:r})] = V_{n_1}(f_{1:}) \star V_{n_2}(f_{2:}) \in \mathbb{C}^{n_1 \cdot n_2 \times r}$ where $\star$ is the Khatri-Rao product (column-wise Kronecker product). $V_{n_j}(f_{j:}), \: j=1,2$ is a Vandermonde matrix but $V(f)$ is not.
A few notes:
$v_{n_1}(f_{1k}) = [1, e^{i2\pi f_{1k}}, ..., e^{i2\pi(n-1) f_{1k}}]^T \in \mathbb{C}^{n_1}$ like in the regular (single block) Toeplitz case
$n_j, \: j = 1,2$ is the dimension of level $j$. So, following the notation in your resource for double block Toeplitz, $n_1$ is the dimension of $A$ and $n_2$ is the dimension of $A_{ij}$. Note that all $A_{ij}$ most be the same dimensions, otherwise $A$ is not Toeplitz.
If $r = \min \{ n_1, n_2 \}$ then the Vandermonde decomposition is no longer unique.