In my stats textbook, they define the following function:
$\mathbf{f} = \frac{1}{2}(\mathbf{A}\mathbf{x} - \mathbf{b})^2$,
where $\mathbf{A}$ is a matrix, $\mathbf{x}, \mathbf{b}$ are just vectors. They then say that:
$\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \mathbf{A}^{T}(\mathbf{A}\mathbf{x} - \mathbf{b})$
I tried to do this derivative using index notion. So, I defined $f$ as:
$f = \frac{1}{2} (A_{ij}x^{j} - b_{i})^2$,
Then took the derivative with respect to $x^k$, (I use commas to denote partial derivatives):
$f_{,k} = \delta^{j}_{k} A_{ij} (A_{ij}x^{j} - b_{i})$
Which applying the contraction, I get:
$f_{,k} = A_{i}^{k} (A_{ij}x^{j} - b_{i})$
But, I do not know if $A_{i}^{k}$ represents $\mathbf{A}^T$?
Your second equation can be rewritten by taking its $k$th component, viz. $$f_{,k}=(A^T)_{ki}(Ax-b)_i=(A^T)_{ki}(A_{ij}x_j-b_i).$$Comparing this with your final equation, $A_i^k=(A^T)_{ki}=A_{ik}$.