I’ll appreciate any help on this.
Given $\mathbf{X}\in {{R}^{m\times n}}$ and $\mathbf{a,b}\in {{R}^{m\times 1}}$, consider $f(\mathbf{{{X}^{T}}a})$ to be an element-wise operation, i.e., \begin{equation} f( \mathbf{{{X}^{T}}a})=\left[ \begin{matrix} f( {{v}_{1}}) \\ \ldots \\ f\left( {{v}_{n}} \right) \\ \end{matrix} \right]=\mathbf{w} \end{equation}
where $\mathbf{{{X}^{T}}a=v}$ and $f$ is a sigmoid function defined as $f\left( x \right)=1/\left( 1+{{e}^{-x}} \right)$. What is $\partial \mathbf{w^T w} / \partial \mathbf{X}$?
As Erik Miehling has helpfully pointed out, you would require the product and chain rule.
We have $\mathbf{w}^T\mathbf{w}=\sum_{k=1}^nf(v_k)^2$
Therefore, using the chain rule
$$\frac{d\mathbf{w}^T\mathbf{w}}{dv_g}=2f(v_g)\frac{df(v_g)}{dv_g}$$
We are given that $f(x)=1/(1+e^{-x})$, thus
$$\frac{df(v_g)}{dv_g}=\frac{e^{-v_g}}{(1+e^{-v_g})^2}$$
Thus
$$\frac{d\mathbf{w}^T\mathbf{w}}{dv_g}=2f(v_g)\frac{e^{-v_g}}{(1+e^{-v_g})^2}$$
Now scalar $v_g$ is the product of row $g$ of matrix $\mathbf{X}^T$ and column vector $\mathbf{a}$, i.e.
$$v_g=\sum_{k=1}^mX_{k,g}a_k$$
where $X_{k,g}$ is the element of matrix $\mathbf{X}$ in row $k\in \{1,2,..,m\}$ and column $g\in \{1,2,..,n\}$.
Thus for $f\in \{1,2,..,m\}$
$$\frac{dv_g}{dX_{f,g}}=a_f$$
Therefore,
$$\frac{d\mathbf{w}^T\mathbf{w}}{dX_{f,g}}=\frac{d\mathbf{w}^T\mathbf{w}}{dv_g}\frac{dv_g}{dX_{f,g}}=2f(v_g)\frac{e^{-v_g}}{(1+e^{-v_g})^2}a_f$$
Having dealt with a single element of matrix $\mathbf{X}$, we can extend the derivative to the entire matrix as follows:-
$$\frac{d\mathbf{w}^T\mathbf{w}}{d\mathbf{X}}=\mathbf{AD}$$
where $\mathbf{A}$ is an $m\times n$ matrix with $n$ identical columns - each column containing $m$ rows, $[a_1,a_2,\cdots,a_m]$, thus
$$\mathbf{A}=\left( \begin{array}{cccc} a_1 & a_1 & \cdots & a_1 \\ a_2 & a_2 & \cdots & a_2 \\ a_3 & a_3 & \cdots & a_3 \\ \vdots & \vdots & \ddots & \vdots \\ a_m & a_m & \cdots & a_m \end{array} \right)$$
and $\mathbf{D}$ is an $n\times n$ diagonal matrix of the form, where the $r$th diagonal element contains $2f(v_r)\frac{e^{-v_r}}{(1+e^{-v_r})^2}$, where $r\in\{1,2,..,n\}$.
$$\mathbf{D}=\left( \begin{array}{cccc} 2f(v_1)\frac{e^{-v_1}}{(1+e^{-v_1})^2} & 0 & 0 & 0 \\ 0 & 2f(v_2)\frac{e^{-v_2}}{(1+e^{-v_2})^2} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & 2f(v_n)\frac{e^{-v_n}}{(1+e^{-v_n})^2} \end{array} \right)$$