In article Ghimpeteanu G., et al. - A Decomposition Framework for Image Denoising Algorithms, I found as below:
Let $\displaystyle I : \Omega \subset R^2\mapsto R$ be a gray-level image, and $(x, y)$ be the standard coordinate system of $R^2$.We denote by $Ix \hspace{.2cm}resp.Iy$ the derivative of $I$ with respect to $x \hspace{.2cm}resp. y$, and by $\nabla I$ the gradient of $I$ . Our image decomposition model for $I$ is a two-stages approach: first, we construct an orthonormal moving frame $(Z_1, Z_2, N)$ of $(R3,\| \|_2)$ over $\Omega$ that encodes the local geometry of $I$ . Then, we compute the components $(J^1, J^2, J^3)$ of the $R^3-$valued function $(0, 0, I )$ in that moving frame. More precisely, we consider a scaled version $\mu I$ of $I$ , for $\mu \in ]0, 1]$, and its graph, which is the surface $S$ in $R^3$ parametrized by $\psi : (x, y) \mapsto (x, y,μ I (x, y))$
See snapshoot-
Please someone explain (possibly visualize geometrically) 4th and 5th line of the excerpt. i.e
Our image decomposition model for $I$ is a two-stages approach: first, we construct an orthonormal moving frame $(Z_1, Z_2, N)$ of $(\mathbb{R}^3,\| \|_2)$ over $\Omega$ that encodes the local geometry of $I$ . Then, we compute the components $(J^1, J^2, J^3)$ of the $\mathbb{R}^3-$valued function $(0, 0, I )$ in that moving frame.

Each position in the image is assigned a local 3 dimensional ON coordinate system. The ON-system will vary with the localization ( which pixel we are at ). The transformation matrix to this three dimensional ON system is given by P matrix where you can see the $\nabla$ operators, corresponding to partial differentiation with respect to the spatial dimensions of the image. I also assume that subscript means partial differential wrt that dimension e.g. $I_x$ means partial differential of $I$ with respect to $x$ dimension. So the moving frame will be depending on the local partial differentials of the image ( treating it as a scalar function ).