My research area has "nothing to do with mathematics" but I still find it full of optimization problems. Therefore, I would like to learn to formulate and solve such problems, even though I am not encouraged to do it (at least at the moment; maybe the situation will change after I have proved my point :-).
Currently, I have tried to get familiar with gradient methods (gradient descent), and I think I understand some of the basic ideas now. Still I find it difficult to put my problems into mathematical formulas, yet solving them.
The ingridients I have for my optimization problem are:
1) My data; two vectors $x = (x_{0}, ..., x_{N})$ and $y = (y_{0}, ..., y_{N})$ having both $N$ samples.
2) Function $f(a, b)$ which tells me something about the relation of vectors $a$ and $b$.
What I want to do is:
Find a square matrix $P$ (of size 2 x 2) such that the value of $f(z_{1}, z_{2})$, where $z = P [x, y]^{T}$, becomes minimal.
To clarify (sorry, I'm not sure if my notation is completely correct) I mean that $z$ is computed as:
$z_{1} = p_{11}x + p_{12}y\\ z_{2} = p_{21}x + p_{22}y$.
How would one squeeze up all this into a problem to be solved using an optimization method like the gradient descent? All help is appreciated. Please note that my mathematical background is not too solid, I know only some very basic calculus and linear algebra.
The notation in the question looks fine. So, you have a function $F$ of four real variables $p_{11},\dots,p_{22}$, defined by $$F(p_{11},p_{12},p_{21},p_{22}) = f(p_{11}x+p_{12}y,\ p_{21}x+p_{22}y) \tag2$$ If $f$ is differentiable, then so is $F$. Therefore, the gradient descent can be used; how successful it will be depends on $f$. From the question it's not clear what kind of function $f$ is. Some natural functions like $f(z_1,z_2)=\|z_1-z_2\|^2$ would make the problem easy, but also uninteresting because the minimum is attained, e.g., at $p_{11}=p_{21}=1$, $p_{12}=p_{22}=0$, because these values make $z_1=z_2$.
Using the chain rule, one can express the gradient of $F$ in terms of the gradient of $f$ and the vectors $x,y$. Let's write $f_{ik}$ for the partial derivative of $f(z_1,z_2)$ with respect to the $k$th component of $z_i$. Here the index $i$ takes values $1,2$ only, while $k$ ranges from $0$ to $N$. With this notation, $$\begin{split} \frac{\partial F}{\partial p_{11}}&=\sum_{k=0}^N x_k f_{1k}(p_{11}x+p_{12}y,\ p_{21}x+p_{22}y) \\ \frac{\partial F}{\partial p_{12}}&=\sum_{k=0}^N y_k f_{1k}(p_{11}x+p_{12}y,\ p_{21}x+p_{22}y) \\ \frac{\partial F}{\partial p_{21}}&=\sum_{k=0}^N x_k f_{2k}(p_{11}x+p_{12}y,\ p_{21}x+p_{22}y) \\ \frac{\partial F}{\partial p_{22}}&=\sum_{k=0}^N y_k f_{2k}(p_{11}x+p_{12}y,\ p_{21}x+p_{22}y) \\ \end{split} \tag1$$
The formulas (1) would be more compact if instead of $x,y$ the data vectors were called $x^{(1)}$ and $x^{(2)}$. Then (1) becomes $$ \frac{\partial F}{\partial p_{ij}}=\sum_{k=0}^N x^{(i)}_k f_{jk}(p_{11}x^{(1)}+p_{12}x^{(2)},\ p_{21}x^{(1)}+p_{22}x^{(2)}) \tag{1*}$$
For more concrete advice, it would help to know what kind of function $f$ you have in mind, and whether the matrix $P$ needs to be of any special kind (orthogonal, unit norm, etc).