I have been trying my best to understand how gradient descent works. Here I have some scratch notes from when I was attempting to derive a formula. Ignore the subscripts of the matrices. My work assumes a neural network with 2 inputs, 1 hidden layer with 3 nodes and 2 outputs. I would love to know if I calculated the practical derivative of the cost functions with respect to m7, m10 and m1 correctly (the part that I circled). Thank you in advance.
$f(x) = \frac{1}{1+e^{-x}}$ and $f^\prime(x)=f(x)(1-f(x))$ $$a_1 = f(m_1x_1+m_2x_2+b_1)$$ $$a_2 = f(m_3x_1+m_4x_2+b_2)$$ $$a_3 = f(m_5x_1+m_6x_2+b_3)$$ $$a_4 = f(m_7a_1+m_8a_2+m_9a_3+b_4)$$ $$a_5 = f(m_{10}a_1+m_{11}a_2+m_{12}a_3+b_5)$$ $$Cost=(a_4-t1)^2+(a_5-t_2)^2$$ $$Cost=(f(m_7a_1+m_8a_2+m_9a_3+b_4)-t_1)^2+(f(m_{10}a_1+m_{11}a_2+m_{12}a_3+b_5)-t_2)^2$$ $$Cost=(f(m_7f(m_1x_1+m_2x_2+b_1)+m_8f(m_3x_1+m_4x_2+b_1)+m_9f(m_5x_1+m_6x_2+b_3)+b_4)-t_1)^2+(f(m_{10}f(m_1x_1+m_2x_2+b_1)+m_{11}f(m_3x_1+m_4x_2+b_1)+m_{12}f(m_5x_1+m_6x_2+b_3)+b_5)-t_2)^2$$ $$\frac{\partial Cost}{\partial m_{7}} = 2(a_4-t_1)(a_4(1-a_4))(a_1)$$ $$\frac{\partial Cost}{\partial m_{10}} = 2(a_5-t_2)(a_5(1-a_5))(a_1)$$ $$\frac{\partial Cost}{\partial m_{1}} = 2(a_4-t_1)(a_4(1-a_4))(m_7)(a_1(1-a_1))(x_1) + 2(a_5-t_2)(a_5(1-a_5))(m_{10})(a_1(1-a_1))(x_1)$$ $$\frac{\partial Cost}{\partial m_{1}} = \frac{(a_1(1-a_1))(x_1)}{a_1}(\frac{\partial Cost}{\partial m_7}(m_7)+\frac{\partial Cost}{\partial m_{10}}(m_{10}))$$