Dimension mismatch during derivation - simple

47 Views Asked by At

I want to derive the function $S(x) = ||a-x||^2$ given that $a, x \in \mathbb R^{1 \times n}$ are $n$ dimensional row vectors. The norm is the simple euclidean norm.

We also know that $x = hW+b$ where $h,b \in \mathbb R^{1 \times n}$, $W \in \mathbb R^{n \times n}$.

Specifically, I want to find $\frac{\partial S}{\partial W}$.

From chain rule we have that $\frac{\partial S}{\partial W} = \frac{\partial S}{\partial x}\frac{\partial x}{\partial W}$.

If I am not mistaken, $\frac{\partial S}{\partial x}$ is simply $-2(a-x) \in \mathbb R^{1 \times n}$

So overall, we have $\frac{\partial S}{\partial W} = -2(a-x)\frac{\partial x}{\partial W}$. And herein lies the problem.

$W$ is $n$ by $n$. And so $\frac{\partial S}{\partial W}$ should also be $n$ by $n$. but $\frac{\partial S}{\partial W} = -2(a-x)\frac{\partial x}{\partial W}$ and $-2(a-x) \in \mathbb {1 \times n}$

There is no way we can multiply a $1$ by $n$ vector by something on the right side, and get an $n$ by $n$ matrix. Where is my mistake here?

2

There are 2 best solutions below

0
On BEST ANSWER

According to https://en.wikipedia.org/wiki/Matrix_calculus#Identities: "The chain rule applies in some of the cases, but unfortunately does not apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives (in the latter case, mostly involving the trace operator applied to matrices)".

If you scroll down this section, you will see a formula for your type of problem involving tranpose and trace. I think it'll work out if you use transpose of $-2(a-x).$

0
On

Let $$\eqalign{ x &= hW+b \cr y &= x-a \cr S &= y:y \cr }$$ where colon denotes the Frobenius Inner Product.

Finding the differential and gradient of $S$ is straightforward $$\eqalign{ dS &= 2y:dy \cr &= 2y:dx \cr &= 2y:h\,dW \cr &= 2h^Ty:dW \cr\cr \frac{\partial S}{\partial W} &= 2h^Ty \cr &= 2h^T(x-a) \cr }$$