I encountered this problem trying to implement an image processing algorithm.
I'm trying to minimize a cost function for a matrix subject to some constraints. The cost function is as follows:
$$J(U)=\sum_{\mathbf{r}}\Bigl(U(\mathbf{r})-\sum_{\mathbf{s}\in N(\mathbf{r})}w_{\mathbf{rs}} U(\mathbf{s})\Bigr)^2$$
Where $w_{\mathbf{rs}}$ is a weighting function:
$$w_{\mathbf{rs}}\propto\left(1+{1\over \sigma_{\mathbf{r}}^2}(U(\mathbf{r})-\mu_{\mathbf{r}})(U(\mathbf{s})-\mu_\mathbf{r})\right)$$
Each point in the matrix has a value of $J(U)$ associated with it. In the above formulas, $\mathbf{r}$ represents a point and $\mathbf{s}$ represents a neighboring (3 x 3 square) point in the matrix. The cost function for each point is a function of the point and surrounding points. $\mu$ and $\sigma$ are constants for each point that I've already computed.
$U(\mathbf{r})$ is the just the value of a parameter $U$ at that point.
Some points are constrained to fixed values of $U$. The objective is to find values of $U$ for every point that minimize the cost function.
The package I'm working with has SciPy, so I have access to these optimization algorithms if any of them would be helpful. I'm relatively new to this kind of math.
I think the problem can be converted into a sparse system of linear equations but I'm not sure how to do so.
Approximate solutions are fine.
It looks like none of your constraints in this problem are inequalities. This means we can write $J$ as a function of all of the entries of the matrix $U$ whose values are variable (you mentioned that some entries of $U$ are fixed and some are variable).
Suppose there are $n$ entries of $U$ whose values are variable - call them $x_1, x_2, ..., x_n$. Then, using the two equations from your question, you will be able to write out the cost function $J$, as a function of $x_1, x_2, ..., x_n$.
Once you have done this, you will want to find the set of candidate points (points with $n$ coordinates, in this case) for the global minimum of $J$. To do this, you will want to set the gradient of $J$ equal to the zero vector. The gradient of $J$ is itself a vector with $n$ entries, so setting the gradient of $J$ equal to the zero vector will result in a system of $n$ equations.
The solution set to this system of $n$ equations will be the candidate points for the global minimum of $J$. Some of these points may be local minima, local maxima, or the equivalent of saddle points. We wish to eliminate those points so that only the global minimum remains. You will need to find the all of the second-order partial derivatives of $J$ to compute the Hessian matrix of $J$.
You can then use the method described in the last page of this document to determine which of your candidate points will be your final answer.
The most tedious part of solving this problem will probably actually be writing $J$ in terms of $x_1, x_2, ..., x_n$. It will not be very difficult, but it will be time-consuming.