Answer checking - involved derivative under summation

82 Views Asked by At

I'd like someone smarter and more experienced than me to check my answer and give advice on how to do it better and derive a closed form for what I'm looking for.

Given a matrix $Y \in \mathbb R^{m \times k}$, $X \in \mathbb R^{m \times n}$ known matrices, and our matrices of unknown variables $\beta \in \mathbb R^{k \times n}$ we define a function $L(\beta) = \sum_{t = 1}^{m} ||Y_t-X_t\beta ^T||^2$ where $Y_t$ denotes the $t$'th row of $Y$ and $X_t$ denotes $t$'th row of $X$.

My goal was to find $\frac{\partial L(\beta)}{\partial \beta_{ij}}$.

My answer

The function $L$ is just the summation of the squared norms of the difference between row vectors. So if we write it explicitly it should be:

$L(\beta) = (Y_{11}-\sum_{r = 1}^{n}X_{1r}\beta^T _{r1})^2+...+(Y_{1k}-\sum_{r = 1}^{n}X_{1r}\beta^T _{rk})^2+...+(Y_{m1}-\sum_{r = 1}^{n}X_{mr}\beta^T _{r1})^2+...+(Y_{mk}-\sum_{r = 1}^{n}X_{mr}\beta^T _{mk})^2$

this is because the $d$'th entry of $X_t\beta ^T$ is $\sum_{r = 1}^{n}X_{tr}\beta^T _{rd}$ from matrix multiplication.

To simplify things, let work with $\beta$ rather than $\beta ^T$:

$L(\beta) = (Y_{11}-\sum_{r = 1}^{n}X_{1r}\beta _{1r})^2+...+(Y_{1k}-\sum_{r = 1}^{n}X_{1r}\beta _{kr})^2+...+(Y_{m1}-\sum_{r = 1}^{n}X_{mr}\beta _{1r})^2+...+(Y_{mk}-\sum_{r = 1}^{n}X_{mr}\beta _{km})^2$

We are only interested in terms where $\beta _{ij}$ appears. Those terms are $(Y_{1i}-\sum_{r = 1}^{n}X_{1r}\beta _{ir})^2$, $(Y_{2i}-\sum_{r = 1}^{n}X_{2r}\beta _{ir})^2$ and so on until $(Y_{mi}-\sum_{r = 1}^{n}X_{mr}\beta _{ir})^2$

So $\frac{\partial L(\beta)}{\partial \beta _{ij}}= \frac{\partial}{\partial \beta _{ij}}\sum_{t=1}^{m}(Y_{ti}-\sum_{r=1}^{n}X_{tr}\beta _{ir})^2 = -2\sum_{t=1}^{m}(Y_{ti}-\sum_{r=1}^{n}X_{tr}\beta _{ir})x_{tj}$

Is this the correct answer? If it is, is it possible (using matrix multiplication) to find a simple closed form without sigma signs for the entire matrix $\frac{\partial L(\beta)}{\partial \beta}$?

1

There are 1 best solutions below

0
On BEST ANSWER

Stripping away the sigmas yields the matrix formula $$\eqalign{ \frac{\partial L}{\partial \beta} &= 2\,(\beta X^TX-Y^TX) }$$