I am trying to understand a colorization algorithm (converting a gray scale image to color image), available in this paper Colorization using Optimization https://www.researchgate.net/publication/2896183_Colorization_using_Optimization
For a still image the algorithm inputs a gray-scale image $Y(x,y)$ and finds estimates of the chrominance channels, as in the YUV color space, $U(x,y)$ and $V(x,y)$, given the constraints that a user has assigned some subset of the $U(x,y)$ and $V(x,y)$ pixels as fixed.
Consider the $U$ channel. Then the goal is, for each pixel to minimize the difference between the color $U(x,y)$ at each pixel $(x,y)$ and a weighted average of the colors at neighboring pixels (implemented as a $3 \times 3$ window with appropriate truncation at the image borders) according to the cost function $$C(U) = \sum \limits_{r}(U(r) - \sum \limits_{s \in N(r)} w_{rs} \cdot U(s))^2$$ where $r$ ranges overall the pixels in the image, and $N(r)$ is the neighbourhood of $r$. The cost function for $V$ is identical. The choice of affinity function $w_{rs}$ measuring how similar the gray-scale values of pixels $r,s$ is up to the implementation, where the authors choose $w_{rs} = \frac {d_{rs}} {\sum_{t \in N(r)} d_{rt}}$ for $s \in N(R)$ where $d_{rs} = e^ \frac{(Y(s)-Y(r))^2}{\sigma_r^2}$ and $\sigma_r^2$ is the sample variance of intensity values in the window around $r$. For $s \notin N(R)$, $w_{rs}=0$.
The authors state:
Since the cost functions are quadratic and the constraints are linear, this optimization problem yields a large, sparse system of linear equations, which may be solved using a number of standard methods.
So then, I believe for the images $U(x,y), V(x,y)$ that minimizing the cost function can be seen as solving for a vector $x$ that minimizes $||Ax-b||_2$ where $A = D-W$ and
- $W$ is the affinity matrix with $W_{rs} = w_{rs}$. Note $W$ is a sparse matrix.
- $D$ is a diagonal matrix with $1$'s on the main diagonal
- $x$ is the flattened image vector we are solving for
$W, D$ are of size $p \times p$ where $p$ is the number of pixels in the image. $x$ is a vector of size $p \times 1$, and $b$ is the same size as $x$.
My question is what is $b$? I am imagining $b$ enforces the constraints, but I am not sure how. The pixel constraint values are floats in the range $[0,1]$. I would be happy to clarify anything, any insights appreciated.
One possible solution for $b$ is that $b$ consists of a flattened image vector where if at a particular pixel the user has assigned a $U$ or $V$ channel value, then $b_{x,y}$ (before flattening) $=U(x,y)$ and $b_{x,y} = 0$ otherwise.
The linear system that the authors refer to is this system. So instead of minimizing $||Ax-b||$, you minimize $x^TQx + c^Tx$ (you can transfer one objective into the other so they are equivalent, but the latter has an easier expression for the derivative, and the matrix $Q$ is sparse while $A$ might not be sparse).
The dimension of $x$ is the number of pixels. Let me take your cost function and express it in $x$: $$C(x) = \sum_i(x_i - \sum_j w_{ij} \cdot x_j)^2=\sum_i x_i^2 - \sum_i \sum_j w_{ij}x_i x_j$$ so indeed $Q=D-W$ and $c=0$. In fact, this is why in the paper they mention that the cost function is $x^T(D-W)x$ (in the right column next to formula (1)).
The constraints set the certain values of $x$, which can be modeled as $Ex=d$ where the number of rows in $E$ is the number of colored pixels and each row of $E$ has exactly one nonzero component. An alternative approach is to treat the fixed values as constants, meaning the dimensions of $Q$ and $c$ are a bit smaller, and you need $c \neq 0$ corresponding to the terms $\sum_i \sum_j w_{ij}x_i x_j$ where $x_i$ is fixed.
The only tricky part are the constraints $0 \leq x_i \leq 1$. One way of handling those is via an interior point method. It goes beyond the scope of this answer to fully explain the details, but in essence it adds the term $-\sum_i \mu \log(x_i)$ to the objective and gradually lets $\mu \downarrow 0$. For each $\mu$, you solve: $$\begin{pmatrix}Q+S & E^T\\ E & O\end{pmatrix}\begin{pmatrix}x \\ \lambda \end{pmatrix} = \begin{pmatrix}e \\ d\end{pmatrix}$$ where $S$ is a diagonal matrix and $S$ and $e$ are based on the current iterate of $x$.