Problem:
I am very new to the world of matrix calculus. I am attempting to educate myself more on this vast topic. I am currently trying to tackle the following:
We have a row vector of shape $(1,M)$ and call it $\vec{\alpha}$. Furthermore, we have square matrix of shape $(M,M)$ and we call it $\mathbf{P}$. We assume that the entries of the vector and Matrix are functions of the parameters $\theta_i$. Hence, we have row vector of shape $(1,N)$ called $\vec{\theta}$.
I would like to compute the following: $$\frac{\partial (\vec{\alpha}\mathbf{P})}{\partial \vec{\theta}} $$
I am going to give this a go. I would like you to please confirm if what I have done is valid. Please, do keep in mind that my approach is that of someone who leans towards a computer science approach hence the solution might seem awkward to a Mathematician.
My Attempt:
I assume that one can perform $\frac{\partial (\vec{\alpha}\mathbf{P})}{\partial \theta_i} $ for each $\theta_i$ separately. This would then result in a tensor of shape $(N,1,M)$. I will justify this shape shortly. Hence, we should have something that looks like this:
$$ \frac{\partial (\vec{\alpha}\mathbf{P})}{\partial \vec{\theta}} = \left[ \frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_1} ,\cdots,\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_N} \right] $$ This assumption would allow us to compute each of the $\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_i} $ separately. This assumption is where either the success or downfall of the approach lies. Let us focus on a arbitrary entry of this tensor as we are going to use the product rule on it.
$$\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_i} =\frac{\partial \vec{\alpha} }{\partial \theta_i} \mathbf{P} + \vec{\alpha}\frac{\partial \mathbf{P}}{\partial \theta_i} $$
Seeing as $\theta_i$ is a scalar variable, we have don't have the scenario of things changing shape e.g. derivative of a vector by a vector gives a big Jacobian.
We thus conclude that $\frac{\partial(\vec{\alpha}\mathbf{P})}{\partial \theta_i} $ must be shape $(1,M)$ as we have matrix multiplication between a row vector and square matrix. This leads us to have a tensor with shape $(N,1,M)$. I will be discarding the second axis of size 1 to then form a $(N,M)$ shape matrix where each row of index $i$ contains all the entries with respect to $\theta_i$. I don't think it is perhaps necessary but would be convenient for coding.
To end off:
Please help explain if it is a correct approach or not. If wrong, please explain why and guide me in the correct direction. I would appreciate this greatly.
Thank you for your time!