Consider two differentiable functions, $f : \mathbb{R}^{n \times n} \to \mathbb{R}$ and $g : \mathbb{R}^2 \to \mathbb{R}^{n \times n}.$ In general, for some $x \in \mathbb{R}^2$, what is the gradient $\nabla_x (f \circ g) (a)$ for some point $a \in \mathbb{R}^2$?
My answer is that $$ \nabla_x (f \circ g)(a) = \begin{bmatrix} \left\langle \left[ \frac{df}{dY} \big|_{g(a)} \right]^\top, \frac{dg}{dx_1}\big|_a\right\rangle \\ \left\langle \left[ \frac{df}{dY} \big|_{g(a)} \right]^\top, \frac{dg}{dx_2}\big|_a \right\rangle \end{bmatrix} $$ The dimensions work, but I am not convinced.
It's not quite clear what you mean by ${df\over dY}$, but I guess you have the right thing in mind. Note that writing ${\mathbb R}^{n\times n}$ instead of ${\mathbb R}^{n^2}$ has no "matricial" effect, it just means that the components of $g$ are not numbered from $1$ to $n^2$, but in $[n]\times [n]$ format.
I'd write the gradient of $h:=f\circ g$ as follows: $$\nabla h(a)=\sum_{i,\>k}{\partial f\over \partial y_{ik}}\bigl(g(a)\bigr)\left[\matrix{ g_{ik.1}(a)\cr g_{ik.2}(a)\cr}\right]\ .$$