Derivative of vector multiplication d/dx xTa

1.2k Views Asked by At

I'm completing the Math for Machine Learning course on AWS training and in one knowledge check I am asked to perform the derivative

$${dg(\boldsymbol{x})\over d\boldsymbol{x}}$$ where: $$g(\boldsymbol{x}) = 10\boldsymbol{x}^T\boldsymbol{y}$$ $$\boldsymbol{y} = [4.5, 2.3, 9.1]^T$$

I thought I would get $$\boldsymbol{y}$$ but apparently the answer is $$\boldsymbol{y}^T$$

Can someone help me understand? TIA

1

There are 1 best solutions below

2
On

I think you are just dealing an issue that arises from a notational convention. It is important to know the convention, but there is less to it than you think.

Note that $g(x+h)-g(x) = 10 h^T y$, from which we can read off the derivative (since it is linear).

Hence the derivative ${\partial g(x) \over \partial x}$ is defined for any $h$ by ${\partial g(x) \over \partial x}( h) = 10h^Ty$. This is a function that takes a vector $h$ and 'returns' the value $10h^Ty$.

When dealing with matrices, vectors, etc, there is a preference for writing this as a matrix multiplication, that is, we want to find a matrix $A$ such that ${\partial g(x) \over \partial x}( h) =Ah$.

Since $10h^Ty = 10 y^T h$, we see that if we let $A= 10 y^T$, the we can write ${\partial g(x) \over \partial x}( h) = Ah$ and so the convention is that we say that $A= 10y^T$ is the derivative (even though we really mean the function $h \mapsto 10 y^Th$).

Aside: Another useful convention for representing the derivative of a scalar valued function is the gradient. This is a vector $g$ such that ${\partial g(x) \over \partial x}( h) = \langle g , h \rangle$, where $\langle , \rangle$ is the inner product. In the above case, since $\langle a , b \rangle = a^T b$, we can see that the gradient is $10 y$ (that is, the transpose of the derivative). The distinction is more acute in infinite dimensional spaces.