Below is the gradient of the log likelihood of the logistic regression model:
$\sum_{i=1}^n (\alpha_i - y_i)x_{ij}$
It is equal to
$ X^T(\alpha - y)$
Where $X$ is the design matrix, $y$ is the target vector. And $\alpha$ = $\beta^TX$. Finally $\beta$ consists of parameters.
How can I prove this or demonstrate this. How does the summation describe a vector of first partial derivatives?
In your first line $$\sum_{i=1}^n (\alpha_i - y_i)x_{ij}$$ this is indeed a scalar, but the sum is over $i$ and there is a second index $j$. So the above scalar exists for each column $j$ of $X$.