Intuition beyond differentiating the Norm of a Matrix Vector Product

65 Views Asked by At

There is very similar question, the problem is that I do not understand the answer:)

Suppose you have a function $\,\,\,f(x) = \frac{1}{2}\left\lVert Ax-b \right\rVert_2^2$, where $x \in R^n$. Gradient of the function $\nabla_xf(x)=A^T(Ax - b)$.

Could you please explain what is the intuition beyond calculating $\nabla_xf(x)$ result?

As I know $\nabla_xf(x)=\begin{bmatrix} \frac{df}{dx_1} \\ \frac{df}{dx_2} \\ ... \\ \frac{df}{dx_n} \end{bmatrix}$

So everything I can do with my math background is to suppose $x \in R^2$ and pass all necessary coordinates to the function: $$ f(x) = \frac{1}{2} \left\lVert \begin{bmatrix}i_x & j_x \\ i_y & j_y\end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} - \begin{bmatrix}b_1 \\ b_2\end{bmatrix} \right\rVert_2^2 = \\[1ex] \frac{1}{2} \left\lVert \begin{bmatrix} i_xx_1 + j_xx_2 - b_1 \\ i_yx_1 + j_yx_2-b_2 \end{bmatrix} \right\rVert_2^2 = \\[1ex] \frac{1}{2} \bigl((i_xx_1 + j_xx_2 - b_1)^2 + (i_yx_1 + j_yx_2-b_2)^2\bigr) $$ Then I multiple all the stuff and calculate partial derivatives of the function. It's clear from my results that $\nabla_xf(x)=A^T(Ax - b)$ but my method takes up to 15 minutes. Is there any intuition to avoid calculating all the stuff?

1

There are 1 best solutions below

0
On

To catch the intuition beyond the way that gradient of the function $\,f(v) = \left\lVert Av-b \right\rVert_2^2$ is calculated, let look what the function represents graphicaly at $R^2$ space:

At the graph below, we have vectors v and b that represent respective vectors at the formulla $f(v)$. Vector Av represent transformation of v with the matrix A.

enter image description here

Decomposing formula to more simple concepts

Partial derivative by $x$: $\frac{df(v)}{dx}$ means how much $\left\lVert Av-b \right\rVert_2^2$ change with the smalest change of $X$ coordinate of vector v. We can decompose $\frac{df(v)}{dx}$ into more simple concepts:

$$ \frac{df(v)}{dx} = \frac{dAv}{dx}\frac{df(v)}{dAv} $$

The same for all othe directions. So we can write that:

$$ \nabla_{f(v)} = \nabla_{Av/dx} \nabla_{df(v)/dAv} $$

Calculating $\nabla_{df(v)/dAv}$

Let say $A \bar v = \bar a$. $\bar a = \begin{bmatrix} a_1 \\ a_2 \\ .. \\ a_n \end{bmatrix}$. $\bar b = \begin{bmatrix} b_1 \\ b_2 \\ .. \\ b_n \end{bmatrix}$. $\frac{df(v)}{dAv}$ derivative into $i_{th}$ direction equals to: $$ \lim \limits_{\delta_i \to 0} \frac{ \left\lVert \Biggl(\begin{bmatrix} a_1 \\ a_2 \\ .. \\ .. \\ .. \\ a_n \end{bmatrix} - \begin{bmatrix} b_1 \\ b_2 \\ .. \\ .. \\ .. \\ b_n \end{bmatrix}\Biggr) + \begin{bmatrix} 0 \\ 0 \\ .. \\ \delta_i \\ .. \\ a_n \end{bmatrix} \right\rVert_2^2 - \left\lVert a - b \right\rVert_2^2 }{\delta_i} = \lim \limits_{\delta_i \to 0} \frac{ (a_1 - b_1)^2 + (a_2 - b_2)^2 + .. + (a_i - b_i + \delta_i)^2 + .. + (a_n - b_n)^2 - \left\lVert a - b \right\rVert_2^2 }{\delta_i} = \lim \limits_{\delta_i \to 0} \frac{ (a_1 - b_1)^2 + (a_2 - b_2)^2 + .. + (a_i - b_i)^2 +2\delta_i(a_i - b_i) + \delta_i^2 + .. + (a_n - b_n)^2 - \left\lVert a - b \right\rVert_2^2 }{\delta_i} = \lim \limits_{\delta_i \to 0} \frac{ (a_1 - b_1)^2 + (a_2 - b_2)^2 + .. + (a_i - b_i)^2 + .. + (a_n - b_n)^2 + 2\delta_i(a_i - b_i) + \delta_i^2 - \left\lVert a - b \right\rVert_2^2 }{\delta_i} = \lim \limits_{\delta_i \to 0} \frac{ \left\lVert a - b \right\rVert_2^2 + 2\delta_i(a_i - b_i) + \delta_i^2 - \left\lVert a - b \right\rVert_2^2 }{\delta_i} = \lim \limits_{\delta_i \to 0} \frac{ 2\delta_i(a_i - b_i) + \delta_i^2}{\delta_i} = \lim \limits_{\delta_i \to 0} \frac{2\delta_i(a_i - b_i)}{\delta_i} = 2(a_i - b_i) $$

So $\nabla_{df(v)/dAv}$ in direction i equals to $2(a_i - b_i)$.

Due $\bar a = Av$:

$$ \nabla_{df(v)/dAv} = 2 \begin{bmatrix}a_1 - b_1 \\ a_2 - b_2 \\ .. \\ a_n - b_n\end{bmatrix} = 2 (Av - b) $$

Calculating $\frac{dAv}{dx_i}$

$\frac{dAv}{dx}$ is the vector that measure of how transformed vector Av will change when we add $\delta_x$ to X coordinate of vector v. In other words:

$$ \frac{dAv}{dx} = A\begin{bmatrix} v_x + \delta_x \\ v_y \end{bmatrix} - A\begin{bmatrix} v_x \\ v_y \end{bmatrix} $$

The same for Y direction:

$$ \frac{dAx}{dy} = A\begin{bmatrix} v_x \\ v_y + \delta_y \end{bmatrix} - A\begin{bmatrix} v_x \\ v_y \end{bmatrix} $$

To figure out what is $\frac{dAv}{dx}$, let think about what is matrix-vector product.

$$ Av = \begin{bmatrix} a_{11} & a_{12} & .. & a_{1n} \\ a_{21} & a_{22} & .. & a_{2n} \\ .. & .. & .. & .. \\ a_{n1} & a_{n2} & .. & a_{nn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ .. \\ x_n \end{bmatrix} = \begin{bmatrix} x_1a_{11} & x_2a_{12} & .. & x_na_{1n} \\ x_1a_{21} & x_2a_{22} & .. & x_na_{2n} \\ .. & .. & .. & .. \\ x_1a_{n1} & x_2a_{n2} & .. & x_na_{nn} \\ \end{bmatrix} = \\[1ex] = x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ .. \\ a_{n1} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ .. \\ a_{n2} \end{bmatrix} + .. + x_n \begin{bmatrix} a_{1n} \\ a_{2n} \\ .. \\ a_{nn} \end{bmatrix} = \text{linear combination of column vectors of matrix A} $$

From formula above it should be clear that partial derivative of Av in direction $x_i$ should equal to $i_{th}$ column vector of matrix A.

Calculating gradient for vector Av: $\nabla_{Av}$

$$ A = \begin{bmatrix} a_{11} & a_{12} & .. & a_{1n} \\ a_{21} & a_{22} & .. & a_{2n} \\ .. & .. & .. & .. \\ a_{n1} & a_{n2} & .. & a_{nn} \end{bmatrix} \\[1ex] \frac{dAv}{dx_i} = \begin{bmatrix} a_{1i} \\ a_{2i} \\ .. \\ a_{ni} \end{bmatrix} $$

From two facts above its clear that: $$ \nabla_{Av} = \begin{bmatrix} \frac{dAv}{dx_1} \\ \frac{dAv}{dx_2} \\ .. \\ \frac{dAv}{dx_n} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{21} & .. & a_{n1} \\ a_{12} & a_{22} & .. & a_{n2} \\ .. & .. & .. & .. \\ a_{1n} & a_{2n} & .. & a_{nn} \\ \end{bmatrix} = A^T $$

Final result

$$ \nabla_{f(v)} = \nabla_{Av/dx} \nabla_{df(v)/dAv} = 2A^T(Av - b) $$

If $\,f(v) = \frac{1}{2} \left\lVert Av-b \right\rVert_2^2$, than $\, \nabla_{f(v)} = A^T(Av - b)$