I am struggling with backpropagating BatchNormalization. For that, I would like to take the partial derivative of a vector valued function with respect to a scalar. The simplified function looks like this.
$$ \vec{f}(\vec{x}, y) = \vec{x} + (y,y,y) = \begin{bmatrix} x_{1} + y \\ x_{2} + y \\ x_{3} + y \end{bmatrix} $$
I can see that $$\frac{\partial{f_i}}{\partial{y}} = 1$$ And following this post the partial derivative for the vector-valued function should equal
$$\frac{\partial{\vec{f}}}{\partial{y}} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}$$
But from my knowledge of backprop I thought that the derivative should have the same shape as the variable which is obviously not the case because $dim(\frac{\partial{\vec{f}}}{\partial{y}}) \neq dim(y)$
But let's say we have a second function $g(\vec{f}) = g(f_{1}(x_{1}, y), \cdots)$, then with the multivariable variable chain rule we would obtain: $$ \frac{\partial{g}}{\partial{y}} = \sum_{i=1}^{3}\frac{\partial{f_{i}}}{\partial{y}} = 3 $$
I would like to ask if my intuition is correct. Thanks in advance.
Comment: you cannot sum a vector $x(3 \times 1)$ and a scalar $y$.