How does this formula describe the gradient?

43 Views Asked by At

Below is the gradient of the log likelihood of the logistic regression model:

$\sum_{i=1}^n (\alpha_i - y_i)x_{ij}$

It is equal to

$ X^T(\alpha - y)$

Where $X$ is the design matrix, $y$ is the target vector. And $\alpha$ = $\beta^TX$. Finally $\beta$ consists of parameters.

How can I prove this or demonstrate this. How does the summation describe a vector of first partial derivatives?

1

There are 1 best solutions below

0
On

In your first line $$\sum_{i=1}^n (\alpha_i - y_i)x_{ij}$$ this is indeed a scalar, but the sum is over $i$ and there is a second index $j$. So the above scalar exists for each column $j$ of $X$.