How to characterize the redundancy in the parametric solution to an optimization with parametric constraints that are not independent?

Question

How to characterize the redundancy in the parametric solution to an optimization with parametric constraints that are not independent?

39 Views Asked by Bumbble Comm At 21 Apr 2026 - 4:44

$\newcommand{\delim}[3]{\left#1#3\right#2} \newcommand{\delimzero}[3]{#1#3#2} \newcommand{\delimone}[3]{\bigl#1#3\bigr#2} \newcommand{\delimtwo}[3]{\Bigl#1#3\Bigr#2} \newcommand{\delimthree}[3]{\biggl#1#3\biggr#2} \newcommand{\delimfour}[3]{\Biggl#1#3\Biggr#2}$More specifically,

The objective $f(p_1, \dotsc, p_N)$ is smooth and strictly convex;
Constraints are parametric linear equalities of $\delim\{\}{p_i}$ such that $$ \sum_{i=1}^N p_i = 1. $$ $$\begin{gather} \sum_{i=1}^N p_i c_1(i) = C_1, \\ \vdots \\ \sum_{i=1}^N p_i c_M(i) = C_M; \end{gather}$$ where $C_i$'s are parameters.
Constraints are not independent in the sense that constraints' parameters satisfying the equation $$ g(C_1, \dotsc ,C_M) = 0. $$

Utilizing Lagrange multipliers, the stationary points of the constrained objective could be obtained by the stationary points of the corresponding Lagrangian function. With both objective and constraints being strictly convex, Lagrangian function is strictly convex, thus there is at most one solution to the constrained optimization problem. Then the (parametric) solution to the constrained optimization, if exist, could be easily obtained by taking the gradient of the Lagrangian as zero.

But parameters in constraints are not independent, so there will be redundancy in the parametric solution. How to characterize the redundancy from the equation of parameters? In another word, how to derive a parametric transformation which does not change the probability distribution $\delim\{\}{p_i}$?

Context

Fearing that this question misses details in the actual problem I encountered, I'd like to give the actual problem as an example of this question.

The number $N$ of random events is actually the number of configurations of $K$ random variables each of which has $q$ possible states, therefore $N=q^K$. Denote the configuration of the $K$ random variables by $s = \delim(){s_1,\dots,s_K}$. Then the question in my problem reads

The objective $f(p_1, \dotsc, p_N)$ is the negative Shannon entropy $f(p_1, \dotsc, p_N) = \sum_i p_i \log p_i$. If indexed by configurations of the $K$ random variables, the probabilities of events could be also written as a function of configurations, $p(s)$. Then the objective reads $$ \sum_s p(s) \log p(s) . $$
Constraints are one- and two-site statistics of the $K$ random variables, thus they are equalities of probabilities and read $$\begin{gather} \sum_{s}^N p(s) \delta(s_i,a) = f_i(a), \\ \sum_{s}^N p(s) \delta(s_i,a) \delta(s_j,a) = f_{ij}(a,b), \end{gather}$$ where $i$ and $j$ are integers in the range $[1,K]$ (designating random variables), $a$ and $b$ are integers in the range $[1,q]$ (designating states of random variables); so the Kronecker detla $\delta(s_i,a)$ just counts the occurrence of state $a$ on site $i$; $f_i(a)$ is the frequency of site $i$ being state $a$, and $f_{ij}(a,b)$ is the frequency of site $i$ being state $a$ and site $j$ being state $b$; frequencies $\delimzero\{\}{f_i(a)}$ and $\delimzero\{\}{f_{ij}(a,b)}$ are given by the observation thus could be taken as parameters of the constrained optimization problem.

Then the parametric solution reads $$ P(s) = \frac 1 Z \exp\delimtwo[]{- \sum_i h_i(s_i) - \sum_{i<j} J_{ij}(s_i,s_j) } , \tag{1} $$ where $\delimone\{\}{h_i(a) : 1 \le i \le N, 1 \le a \le q}$ and $\delimone\{\}{J_{ij}(a,b) : 1 \le i,j \le N, 1 \le a,b \le q }$ are Lagrange multipliers for parametric constraints, and the normalization factor $Z = Z(\mathbf h, \mathbf J) = \sum_s \exp\delim[]{- \sum_i h_i(s_i) - \sum_{i<j} J_{ij}(s_i,s_j) }$ is the Lagrange multiplier for the normalization condition.
Constraints are not independent because not all frequencies are independent $$ \sum_{a=1}^q f_i(a) = 1, \quad \sum_{b=1}^q f_{ij}(a,b) = f_i(a). $$

Thus the parametric solution (1) has redundant parameters. This could be observed by the fact that the probability distribution is invariant under the transformation $$ \begin{align} J_{ij}(a,b) &\to J_{ij}(a,b) + K_{ij}(a) + K_{ji}(b) , \\ h_i(a) &\to h_i(a) + g_i - \sum_{j \neq i} \delim[]{K_{ij}(a) + K_{ji}(a)} . \end{align} \tag{2} $$ Allow me to explain a bit further: the contribution of $\delim\{\}{g_i}$ is to add an overall constant to the exponent, thus has no impact after normalization; adding $K_{ij}(a)$ to $J_{ij}(a,b)$ and subtracting $K_{ij}(a)$ from $h_i(a)$ simultaneously means to move some contribution of the edge $(i,j)$ to the vertex $i$.

So the question could be stated as: How to get the transformation (2) from equations of parameters?

I hope not to miss necessary details in the formulation of the mathematical question above. If any, please let me know.

Original Q&A

How to characterize the redundancy in the parametric solution to an optimization with parametric constraints that are not independent?

Context

Related Questions in CONVEX-OPTIMIZATION

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions