Nonlinear PCA is based on minimizing wrt matrix $W$ the function:
$$I = E \{ \|x-Wg(W^Tx)\|^2\}$$
where $g$ is an odd function.
However, what is $W$?
Nonlinear PCA is based on minimizing wrt matrix $W$ the function:
$$I = E \{ \|x-Wg(W^Tx)\|^2\}$$
where $g$ is an odd function.
However, what is $W$?
$W$ is essentially the parameter matrix that controls the embedding. If we let $$ J(W) = \mathbb{E}_{x\sim D}\left[ ||x-Wg(W^Tx)||^2 \right] $$ where $x\in\mathbb{R}^n$, $g:\mathbb{R}^m\rightarrow\mathbb{R}^m$ is a non-linear function, and $W\in\mathbb{R}^{n\times m}$. In this formulation, $g$ does not have parameters, so we cannot optimize it. So our transform is entirely controlled by the weight matrix $W$, which we have to optimize. (Note it is linear, i.e. without bias, by e.g. assuming the data is centered).
Essentially, $W$ performs a linear transformation of the data; $g$ then adds non-linearity to the transform. We "learn" the matrix $W$ by optimization.
Note that this criteria tries to enforce $T(v)^{-1} = Wv\,$ as the "inverse" of the embedding function $T(x)=g(W^Tx)$. Notice that if we "remove" the non-linearity, i.e. let $g(p)=p$, then we find that we are trying to minimize $J(W) = \mathbb{E}_{x\sim D}\left[ ||x-WW^Tx||^2 \right]$, meaning $W$ should be orthogonal and that our dimensionality reduction is now a linear transform. This (almost) is just PCA, essentially.
See e.g. The Nonlinear PCA Criterion in Blind Source Separation: Relations with Other Approaches, by Karhunen et al.