get gradient using vector notation

268 Views Asked by At

I don't get the part that how can we get the gradient w.r.t $z$ of $z^Ty, z^Tz, z^TAz$ and just $Az$ by using vector notation of

$z, y \in R^n$ and $A \in \Bbb R^{n\times n}$

2

There are 2 best solutions below

5
On BEST ANSWER

We can write the functions more explicitly and then so how it turns out: $$ z^Ty=\sum_{i=1}^mz_iy_i \\ z^Tz=\sum_{i=1}^mz_iz_i=\sum_{i=1}^mz_i^2 \\ z^TAz=\sum_{i,j=1}^mz_ja_{ij}z_i $$ For the last one, we will use the product rule anyway.
Now you can differentiate with respect to $z_k$ and see that: $$ \frac{\partial}{\partial z_k}z^Ty=y_k \\ \frac{\partial}{\partial z_k}z^Tz=2z_k $$ For the last one, we will apply the product rule: $$ \frac{\partial}{\partial z_k}z^TAz=\frac{\partial}{\partial z_k}\sum_{j=1}^m \sum_{i=1}^mz_ja_{ij}z_i=\sum_{j=1}^m z_ja_{kj} + \sum_{i=1}^m a_{ik}z_i $$ Overall, you will get:
$$ \nabla z^Ty=y \\ \nabla z^Tz=2z \\ \nabla z^TAz=(A^T+A)z $$ The function $z \to Az$ has no gradient, but the Jacobian Matrix equals $A$

3
On

Assuming that you mean the entire calculation is to be done without resorting to coordinates, then we have to start with a definition of the gradient that is not with respect to coordinates. The normal introductory definition of $$\nabla f = \left(\frac{\partial f}{\partial z_1},\frac{\partial f}{\partial z_2}, ..., \frac{\partial f}{\partial z_n}\right)$$ is defined by coordinates, so any calculation with it is necessarily done by coordinates.

The coordinate-free definition of $\nabla f$ requires the definition of the directional derivative first: If $v \in \Bbb R^n$, then the directional derivative of $f$ at $z$ in the direction of $v$ is $$D_vf(z) := \left.\frac{d}{dt}\right|_0f(z + tv)$$ Then the gradient is defined to be the unique vector such that $$v\cdot \nabla f(z) = D_vf(z)$$for all vectors $v\in \Bbb R^n$.

So let's examine your problems:

  • $f(z) = z\cdot y$.

Then $f(z + tv) = z\cdot y + tv\cdot y$, so $D_vf(z) = v\cdot y$. From which we see that $\nabla f = y$.

  • $f(z) = z\cdot z$.

Then $f(z + tv) = z\cdot z + t(z \cdot v) + t(v \cdot z) + t^2(v\cdot v)$, do $D_vf(z) = 2( v\cdot z) = v\cdot (2z)$, since $v\cdot z = z \cdot v$. Therefore $\nabla f = 2z$.

  • $f(z) = z\cdot Az$.

Then $f(z + tv) = z\cdot z + t(z \cdot Av) + t(v \cdot Az) + t^2(v \cdot v)$. So $D_vf(z) = (z \cdot Av) + (v \cdot Az) = (A^Tz \cdot v) + (v \cdot Az) = v \cdot (A^Tz + Az)$. Therefore $\nabla f = (A^T + A)z$.