Matrix Calculus: product rule between row vector and square matrix with regards to a row vector of variables

61 Views Asked by At

Problem:

I am very new to the world of matrix calculus. I am attempting to educate myself more on this vast topic. I am currently trying to tackle the following:

We have a row vector of shape $(1,M)$ and call it $\vec{\alpha}$. Furthermore, we have square matrix of shape $(M,M)$ and we call it $\mathbf{P}$. We assume that the entries of the vector and Matrix are functions of the parameters $\theta_i$. Hence, we have row vector of shape $(1,N)$ called $\vec{\theta}$.

I would like to compute the following: $$\frac{\partial (\vec{\alpha}\mathbf{P})}{\partial \vec{\theta}} $$

I am going to give this a go. I would like you to please confirm if what I have done is valid. Please, do keep in mind that my approach is that of someone who leans towards a computer science approach hence the solution might seem awkward to a Mathematician.

My Attempt:

I assume that one can perform $\frac{\partial (\vec{\alpha}\mathbf{P})}{\partial \theta_i} $ for each $\theta_i$ separately. This would then result in a tensor of shape $(N,1,M)$. I will justify this shape shortly. Hence, we should have something that looks like this:

$$ \frac{\partial (\vec{\alpha}\mathbf{P})}{\partial \vec{\theta}} = \left[ \frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_1} ,\cdots,\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_N} \right] $$ This assumption would allow us to compute each of the $\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_i} $ separately. This assumption is where either the success or downfall of the approach lies. Let us focus on a arbitrary entry of this tensor as we are going to use the product rule on it.

$$\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_i} =\frac{\partial \vec{\alpha} }{\partial \theta_i} \mathbf{P} + \vec{\alpha}\frac{\partial \mathbf{P}}{\partial \theta_i} $$

Seeing as $\theta_i$ is a scalar variable, we have don't have the scenario of things changing shape e.g. derivative of a vector by a vector gives a big Jacobian.

We thus conclude that $\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_i} $ must be shape $(1,M)$ as we have matrix multiplication between a row vector and square matrix. This leads us to have a tensor with shape $(N,1,M)$. I will be discarding the second axis of size 1 to then form a $(N,M)$ shape matrix where each row of index $i$ contains all the entries with respect to $\theta_i$. I don't think it is perhaps necessary but would be convenient for coding.

To end off:

Please help explain if it is a correct approach or not. If wrong, please explain why and guide me in the correct direction. I would appreciate this greatly.

Thank you for your time!