I know We have the chain rule if $f(x)=g(Ax)$, then $∂f(x)=A^T∂g(Ax)$. I cannot really understand that. Since if the $f(x) = g(Ax)$, then we have $∂f(x)=A^T∂g(Ax) = A^T∂f(x)$.
For example, we have $f(x) = |2x_1 + 3x_2|$, then by this rule we have $∂f(x)=A^T∂g(Ax) = [2,3]^T∂|2x_1 + 3x_2|$, it seems we get back to origin again. Could you calculate the subgradient of the example by this rule in detail?
The chain rule still applies with appropriate modifications and assumptions, however since the 'inner' function is affine one can compute the subgradient directly.
Let $n(x) = \|x\|$, whatever the norm in question is.
Note that $\nu \in \partial n(x)$ iff $n(x+h)-n(x) \ge \langle \nu, h \rangle $ for all $h$.
Then $f(x+h)-f(x) \ge \langle \xi, Ah \rangle = \langle A^T \xi, h \rangle$ for all $h$ and so we see that $\partial f(x) = A^T \partial n(Ax+b)$.
In the example above, this becomes $\partial f(x) = \begin{cases} \{ - \begin{bmatrix} 2 \\ 3 \end{bmatrix} \}, & 2x_1+3x_2 <0 \\ \{ t \begin{bmatrix} 2 \\ 3 \end{bmatrix} \}_{t \in [-1,1]}, & 2x_1+3x_2 =0 \\ \{ + \begin{bmatrix} 2 \\ 3 \end{bmatrix} \}, & 2x_1+3x_2 >0 \\ \end{cases}$.