I am trying to solve the problem given below using sequential quadratic programming (Newton's approach). The result is $x_{1}=5$, $x_{2}=5$ which cannot be the optimal point. As substituting $x_{1}=5$, $x_{2}=0$, returns a value that is greater than that given by the technique. Is this some kind of limitation of SQP or is there an error at my end. I want to maximize the sum $log_{2}(1+\dfrac{x1}{x2+0.1})+log_{2}(1+\dfrac{x2}{x1+0.1})$ but the technique is maximizing their individual values. \begin{align} \max_{x1,x2}\quad log_{2}(1+\dfrac{x1}{x2+0.1})+log_{2}(1+\dfrac{x2}{x1+0.1})\\ s.t\quad x1\geq0,x2\geq0\\ x1\leq5,x2\leq5 \end{align} I'm not a mathematician and am new to the topic of SQP, so kindly guide me even if you don't know the exact answer. Anything that you would say might lead me in the right direction. Thank you.
2026-03-27 15:17:57.1774624677
Is this a limitation of sequencial quadratic programming?
324 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- optimization with strict inequality of variables
- Gradient of Cost Function To Find Matrix Factorization
- Calculation of distance of a point from a curve
- Find all local maxima and minima of $x^2+y^2$ subject to the constraint $x^2+2y=6$. Does $x^2+y^2$ have a global max/min on the same constraint?
- What does it mean to dualize a constraint in the context of Lagrangian relaxation?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Building the model for a Linear Programming Problem
- Maximize the function
- Transform LMI problem into different SDP form
Related Questions in NONLINEAR-OPTIMIZATION
- Prove that Newton's Method is invariant under invertible linear transformations
- set points in 2D interval with optimality condition
- Finding a mixture of 1st and 0'th order Markov models that is closest to an empirical distribution
- Sufficient condition for strict minimality in infinite-dimensional spaces
- Weak convergence under linear operators
- Solving special (simple?) system of polynomial equations (only up to second degree)
- Smallest distance to point where objective function value meets a given threshold
- KKT Condition and Global Optimal
- What is the purpose of an oracle in optimization?
- Prove that any Nonlinear program can be written in the form...
Related Questions in NUMERICAL-OPTIMIZATION
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Bouncing ball optimization
- Minimization of a convex quadratic form
- What is the purpose of an oracle in optimization?
- What do you call iteratively optimizing w.r.t. various groups of variables?
- ProxASAGA: compute and use the support of $\Delta f$
- Can every semidefinite program be solved in polynomial time?
- In semidefinite programming we don't have a full dimensional convex set to use ellipsoid method
- How to generate a large PSD matrix $A \in \mathbb{R}^{n \times n}$, where $\mathcal{O}(n) \sim 10^3$
- Gram matrices in the Rayleigh-Ritz algorithm
Related Questions in QUADRATIC-PROGRAMMING
- Minimization of a convex quadratic form
- Using a Lagrange multiplier to handle an inequality constraint
- Given matrix $Q$ and vector $s$, find a vector $w$ that minimizes $\| Qw-s \|^2$
- Linear Matrix Least Squares with Linear Equality Constraint - Minimize $ {\left\| A - B \right\|}_{F}^{2} $ Subject to $ B x = v $
- Closed form solution to this constrained optimization?
- Bound on the solution of a constrained least squares problem
- Minimisation of a scalar function with respect to a vector
- How to reformulate an objective function with absolute
- Generalized Projection of a Matrix onto the Non Negative Orthant
- Optimize quadratic non-convex function with project gradient descent
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Newton's approach main issues
This method can diverge in a fair amount of situations if no prior precaution is taken. In general, we only ever use it if the function to optimize is linear enough, at least around the optimal points, and if the starting point is not too far from the actual optimum. It can be shown that if the optimal point is $x^*$, then the initial point $x_0$ has to be chosen such that
$$ \left|{\frac {f''(x_{0})}{f'(x_{0})}}\right|< 2\left|{\frac {f''(x^* )}{f'(x^* )}}\right| .$$
From an iteration expression $$x_{n+1}=x_{n}-{\frac {f(x_{n})}{f'(x_{n})}},$$ we can also see that if ever the derivative approaches zero, the numerical stability of the method will suffer. This is also true if the second derivative (or hessian matrix) is poorly conditioned.
Common alternatives for global convergence
In the case of non-linear functions like yours, one usually rather uses trust region algorithms by approximating the objective function with a descriptive model (typically a quadratic model derived from the Taylor expansion of the objective function), which guarantee a global convergence.
The most commonly used ways to compute the descent direction of the next iterate are either
The internet is full of good resources about these methods.
Last but not least, these methods are only useful to solve unconstrained problems. They are incorporated into generalized methods such as the augmented Lagrangian method to solve constrained problems.
Solving a constrained problem using Cauchy's step method
1. Guaranteeing the global convergence of the algorithm
As mentionned above, the main problem of Newton's approach is that it can diverge in a lot of — pretty common — situations. To solve this issue, we will proceed by locally ensuring that we're still converging. A good idea to do so is to approach the objective function by a model that will behave as we want around the current iterate. Around here means within a region of some radius $\Delta_k$ that we'll have to compute. Let's take as our model, for instance, the second order Taylor's expansion of $f$ at the current iterate $x_k$:
$$ m_k(x_k + v) = q_k(v) = f(x_k) + \nabla f(x_k)^T v + \frac{1}{2} v^T \nabla^2 f(x_k) v.$$
As you can see, for each iterate we get a new quadratric function, that we know behaves really good in the neighborhood of $x_k$.
Our main goal in this part is to determine a trust region of this model, or in other words the radius $\Delta_k$ of the region where we estimate the model describes the objective function in an acceptable way. For this, let us define the approximation rate
$$\rho_k = \frac{f(x_k) - f(x_k + v_k)}{m_k(x_k) - m_k(x_k + v_k)}.$$
Thanks to this rate, we can evaluate the efficiency of the model and adapt the trust region accordingly;
By defining $\Delta_{max} > 0, 0 < \gamma_1 < 1 < \gamma_2$ and $0 < \eta_1 < \eta_2 < 1$, we can put the decision rule more rigorously as follows.
$$ \Delta_{k+1} = \begin{cases} \min \{\gamma_2 \Delta_k, \Delta_{max}\} & \text{if} & \rho_k \geq \eta_2 \\ \Delta_k & \text{if} & \rho_k \in [\eta_1, \eta_2] \\ \gamma_1 \Delta_k & \text{otherwise.} \end{cases}$$
We should step forward if and only if the model is close to the function at the current iterate, otherwise we should only shrink the trust region and not move before we reach a satisfying $\rho_k$. The decision rule for the next iterate is thus:
$$ x_{k+1} = \begin{cases} x_k + v_k & \text{if} & \rho_k \geq \eta_1 \\ x_k & \text{otherwise.}\end{cases}$$
2. Determining the next iterate
Now that we've got our decision rules that will ensure we keep converving no matter where we started, we need to determine the next iterate, which means computing $v_k$. There exist a lot of methods to do this, but we'll proceed by using the Cauchy's step method. It is an approximation method (which as such does not always provide the optimal answer) which consists in restraining the research of the minimum of the model along the gradient of the function. One can prove that, if not optimal, proceeding this way is always at least sufficient.
The problem to solve is
$$ \begin{cases} \min & q_k(v)) \\ \text{s.t.} & v = -t \cdot \nabla f(x_k) \\ & t > 0 \\ & \|v\| \leq \Delta_k.\end{cases}$$
Let $g = \nabla f(x_k)$. That's equivalent to minimizing $\varphi(t) = q_k(-tg) = \frac{1}{2} t^2 g^T H g - t g^T g$, which is a polynomial of order 2. We can compute its curvature $c = g^T H g$. The maximal step we can make along the gradient without going out of the trust region is $t_{max} = \Delta_k / \|g\|$. Two possible cases:
Congratulations, you are now able to solve any unconstrained problem provided you can compute the gradient and hessian of the model you've chosen, without having to care about how to choose the initial point $x_0$ — even though chosing smartly would make your algorithm quicker.
3. Extending to constrained problems (with equalities only)
We will use the augmented Lagrangian method, which has the main advantage of being very numerically stable. It is derived from what we call penalty methods. As you probably already know, the key concept behind the Lagrangian formalism is to transform constrained problem into unconstrained problems. In our case, that would be sweet because we could use our brand new trust region algorithm!
To achieve this, one can prove that iteratively solving the unconstrained problem
$$ \min_{x \in \mathbb{R}^n} L (x_k, \lambda_k, \mu_k) = f(x_k) + \lambda_k^T c(x_k) + \frac{\mu_k}{2} \| c(x_k) \|^2 \tag{P} $$
finishing when $\| \nabla_x L(\cdot, \lambda_k, \mu_k) \| \leq \varepsilon$ for some $\varepsilon > 0$.
If the algorithm did converge (that is to say if the non-augmented Lagrangian approached zero), then $x_k$ is an approximate solution of the constrained problem.
I won't get into the theorical details here as you can find them elsewhere. Numerical analysis of the method has shown that setting $\alpha = 0.1, \beta = 0.9, \mu_0 > 0, \epsilon_0 = 1/\mu_0, \tau > 0, \hat{\eta}_0 = 0.1258925$ such that $\eta_0 = \hat{\eta}_0 / \mu_0^{\alpha} = 0.1$ leads to the best results. Using those variables, the cases are
$$ \begin{cases} \lambda_{k+1} & = & \lambda_k + \mu_k \cdot c(x_{k+1}) \\ \mu_{k+1} & = & \mu_k \\ \varepsilon_{k+1} & = & \varepsilon_k / \mu_k \\ \eta_{k+1} & = & \eta_k / \mu_k^{\beta}.\end{cases}$$
Otherwise, update the penalty parameters:
$$ \begin{cases} \lambda_{k+1} & = & \lambda_k \\ \mu_{k+1} & = & \tau\mu_k \\ \varepsilon_{k+1} & = & \varepsilon_0 / \mu_{k+1} \\ \eta_{k+1} & = & \hat{\eta}_0 / \mu_{k+1}^{\alpha}.\end{cases}$$
This method only works for equality constaints.
4. Extending to problems constrained with inequalities
The common way to solve this kind of problems is to let $$z^2 = < z, z > = \begin{pmatrix}z_1^2 \\ \vdots \\ z_n^2 \end{pmatrix}.$$
This allows you to rewrite
$$ \begin{cases} \min & f(x) \\ s.t. & h(x) = 0 \\ & g(x) \leq 0 \end{cases} \tag{P'} $$
as
$$ \begin{cases} \min & f(x) \\ s.t. & h(x) = 0 \\ & g(x) + z^2 = 0 \end{cases} \tag{P''} $$
and solve $(P'')$ as a problem constrained with equalities.
Going further: Newton's method is not always bad
Very often, it can be that we actually have to deal with "good-looking functions". In such cases, the main issue we have to overcome is practically rather the dimension of the problem.
In Newton's method, one uses $x_{k+1} = x_k - [\nabla^2 f(x_k)]^{-1} \nabla f(x_k)$. In trust regions, one minimizes $ f(x_k) + \nabla f(x_k)^T v + \frac{1}{2} v^T H_k v$ where $H_k$ approaches $\nabla^2 f(x_k)$.
For the current high dimensions (between $10^6$ and $10^7$ variables), one must find efficient methods to approach $\nabla^2 f(x_k)$ and to solve $\nabla^2 f(x_k) v = -\nabla f(x_k)$.
The quasi-Newton methods start from points $x_0, x_1$ and use $\nabla f(x_0), \nabla f(x_1)$ as well as an approximation $H_0 = \alpha I$ of $\nabla^2 f(x_0)$. Using Taylor's integral rest form, one has
$$ \begin{align}\nabla f(x_1) - \nabla f(x_0) & = \int_0^1 \nabla^2 f(x_0 + v(x_1 - x_0) (x_1 - x_0) dv \\ & = \left[ \int_0^1 \nabla^2 f(x_0 + v(x_1 - x_0)) dv \right] (x_1 - x_0). \end{align}$$
We then try to solve the optimization problem
$$ \begin{cases} \min \| \Delta H \|_F \\ \Delta H = \Delta H^T \\ (H_0 + \Delta H)v = y. \end{cases} \Longrightarrow H_1 = H_0 + \Delta H_{opt}.$$
One can show the solution of this problem is given by
$$ \Delta H_{opt} = \frac{y - H_1v)v^T + v(y - H_1 v)^T}{v^T v} - \frac{v^T(y - H_1 v)}{(v^Tv)^2}. $$
This is numerically easy to compute, and we can directly deduce $x_2 = x_1 - \alpha H_1^{-1} \nabla f(x_1)$.
The last question to answer is: how to compute efficiently $H_k^{-1}$? The best way is to use Sherman-Morrison's formula:
$$(A+uv^{T})^{-1}=A^{-1}-{A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}.$$
For further readings, see