I'm trying to find a derivative of function: $$L = f \cdot y; f = X \cdot W + b$$
Matrices shapes: $X.shape=(1, m), W.shape=(m,10), b.shape=(1, 10), y.shape=(10, 1)$ I'm looking for $\frac{\partial L}{\partial W}$
According to chain-rule: $$\frac{\partial L}{\partial W} = \frac{\partial L}{\partial f} \frac{\partial f}{\partial W} $$
Separately we can find: $$ \frac{\partial L}{\partial f} = y$$ $$ \frac{\partial f}{\partial W} = X$$
And the problem is that the derivative's dimension of $\frac{\partial L}{\partial W} $ according to my formula is $(10, m)$. However, the dimension should coincide with dimension of $W$.
Also I was advised to find differential of $L$:
$$ d(L) = d(f \cdot y) = d(f) \cdot y = d (X \cdot W + b)y = X \cdot dW \cdot y $$ But I do not understand how can I get from this the derivative $\frac{\partial L}{\partial W} $
Let's use a convention where a lowercase Latin letter always represents a column vector, an uppercase Latin is a matrix, and a Greek letter is a scalar.
Using this convention your equations are $$\eqalign{ f &= W^Tx + b \cr \lambda &= f^Ty \cr }$$ As you have noted, the differential of the scalar function is $$\eqalign{ d\lambda &= df^Ty = (dW^Tx)^Ty = x^TdW\,y \cr }$$ Let's develop that a bit further by introducing the Trace function $$\eqalign{ d\lambda &= {\rm Tr}(x^TdW\,y) = {\rm Tr}(yx^TdW) \cr }$$ Then, depending on your preferred Layout Convention, the gradient is either $$\eqalign{ \frac{\partial\lambda}{\partial W} &=yx^T \quad{\rm or}\quad xy^T \cr }$$ Since you expected the the dimensions of the gradient to be those of $W$, it sounds like your preferred layout is $xy^T$
Also note that $\frac{\partial f}{\partial W}\neq X.\,$ The gradient is a 3rd order tensor, while $X$ is just a 2nd order tensor (aka a matrix). The presence of these 3rd and 4th order tensors as intermediate quantities in the chain rule can make it difficult/impossible to use in practice.
The differential approach suggested by your advisor is often simpler because the differential of a matrix is just another matrix quantity, which is easy to handle.