Let $ A $ be an $m \times n$ matrix of rank $ k \le \min(m,n) $. Then we decompose $ A = USV^T $, where:
$U$ is $m \times k$ is a semi-orthogonal matrix.
$S$ is $k \times k$ diagonal matrix , of which its diagonal entries are called singular values of $ A $. we denote them by $ \sigma _i = S_{ii} $.
- $V$ is $n \times k$ semi-orthogonal matrix.
- Definition: a semi-orthogonal matrix $ Q $ is a non-square matrix where $ Q^{T}Q=I $.
This is the singular value decomposition (SVD) of matrix $ A $. We define a function $ f_i: \mathbb R^{ m \times n} \to \mathbb R $ by $ f_i (A) = \sigma_i (A) $. I am interested in finding the gradient of $ f_i $ in order to practice matrix defferentiation.
I hope you can help me starting with the first steps. Here are the hints that I have been given in order to find the solution, and feel free to use them:
- Use the product rule of differentials to calculate $ dA $ where A is considered as function of $ U $, $ S $ and $ V $.
- The entries of the diagonal of anti-symmetric matrix are all zeros.
- The Hadamard product of two matrices $ A,B $ of the same size , is denoted by $$ (A \circ B )_{ij} = A_{ij} \cdot B_{ij} $$
Use the cyclic property of the trace operator. That is:
$$\mbox{Tr}(ABC) = \mbox{Tr}(CAB) = \mbox{Tr}(BCA)$$
The trace of a scalar is a scalar. That is, given $ a \in \mathbb R $:
$$ \mbox{Tr}(a) = a $$
I stuck right at the beginning, I found that the product rule is:
$$ dA = dUSV^{T} + UdSV^{T} + USdV^{T} $$
Also, I have tried to calculate $ A^{T}A $ as trying to find a useful manipulation where I can use it for the solution, and I got that it is equal to: $ VS^{T} SV^{T} $. First of all, is this what they meant by the product rule? And, second, how do I continue from here?
Let $\{e_i\}$ denote the standard basis vectors. Then $q_i=Qe_i$ is the $i^{th}$ column of $Q$.
The definition of semi-orthogonality says that the columns of $Q$ are orthonormal, i.e. $$\eqalign{ I &= Q^TQ \\ e_i^T(I)e_j &= e_i^T(Q^TQ)e_j \\ \delta_{ij} &= q_i^Tq_j \\ }$$ Multiply the SVD by the $i^{th}$ columns of $(U,V)$ to isolate the $i^{th}$ singular value. $$\eqalign{ A &= \sum_{j=1}^k \sigma_j u_j v_j^T \\ u_i^TAv_i &= \sum_{j=1}^k \sigma_j (u_i^Tu_j)(v_j^Tv_i) = \sum_{j=1}^k \sigma_j\,\delta_{ij}^2 \;=\; \sigma_i \\ }$$ Rearrange this result with the help of the trace/Frobenius product $\Big(A\!:\!B={\rm Tr}\!\left(A^TB\right)\Big)$
Then calculate the differential and gradient. $$\eqalign{ \sigma_i &= u_iv_i^T:A \\ d\sigma_i &= u_iv_i^T:dA \\ \frac{\partial\sigma_i}{\partial A} &= u_iv_i^T \\ }$$ Similarly, the singular vectors also vary with $A$. $$\eqalign{ \sigma_i u_i &= Av_i \\ \sigma_i u_i &= \left(v_i^T\otimes I_m\right){\rm vec}(A) \\ \sigma_i\,du_i &= \left(v_i^T\otimes I_m\right){\rm vec}(dA) \\ \frac{\partial u_i}{\partial{\rm vec}(A)} &= \frac{v_i^T\otimes I_m}{\sigma_i} \\ \\ \\ \sigma_i v_i^T &= u_i^TA \\ \sigma_i v_i &= \left(I_n\otimes u_i^T\right){\rm vec}(A) \\ \sigma_i\,dv_i &= \left(I_n\otimes u_i^T\right){\rm vec}(dA) \\ \frac{\partial v_i}{\partial{\rm vec}(A)} &= \frac{I_n\otimes u_i^T}{\sigma_i} \\ \\ }$$