I have function \begin{equation} f(W)=\gamma ||W||_2 \end{equation} What is the prox operator of this?
2026-03-25 12:28:22.1774441702
The Proximal Operator and the Sub Differential of the $ {L}_{2} $ Norm of a Matrix
349 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in CONVEX-ANALYSIS
- Proving that: $||x|^{s/2}-|y|^{s/2}|\le 2|x-y|^{s/2}$
- Convex open sets of $\Bbb R^m$: are they MORE than connected by polygonal paths parallel to the axis?
- Show that this function is concave?
- In resticted domain , Applying the Cauchy-Schwarz's inequality
- Area covered by convex polygon centered at vertices of the unit square
- How does positive (semi)definiteness help with showing convexity of quadratic forms?
- Why does one of the following constraints define a convex set while another defines a non-convex set?
- Concave function - proof
- Sufficient condition for strict minimality in infinite-dimensional spaces
- compact convex sets
Related Questions in CONVEX-OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- Least Absolute Deviation (LAD) Line Fitting / Regression
- Check if $\phi$ is convex
- Transform LMI problem into different SDP form
- Can a linear matrix inequality constraint transform to second-order cone constraint(s)?
- Optimality conditions - necessary vs sufficient
- Minimization of a convex quadratic form
- Prove that the objective function of K-means is non convex
- How to solve a linear program without any given data?
- Distance between a point $x \in \mathbb R^2$ and $x_1^2+x_2^2 \le 4$
Related Questions in NONLINEAR-OPTIMIZATION
- Prove that Newton's Method is invariant under invertible linear transformations
- set points in 2D interval with optimality condition
- Finding a mixture of 1st and 0'th order Markov models that is closest to an empirical distribution
- Sufficient condition for strict minimality in infinite-dimensional spaces
- Weak convergence under linear operators
- Solving special (simple?) system of polynomial equations (only up to second degree)
- Smallest distance to point where objective function value meets a given threshold
- KKT Condition and Global Optimal
- What is the purpose of an oracle in optimization?
- Prove that any Nonlinear program can be written in the form...
Related Questions in REGULARIZATION
- Zeta regularization vs Dirichlet series
- Uniform convergence of regularized inverse
- Composition of regularized inverse of linear operator on dense subspace converges on whole space?
- Linear Least Squares with $ {L}_{2} $ Norm Regularization / Penalty Term
- SeDuMi form of $\min_x\left\{\|Ax-b\|_2^2 + \lambda\|x\|_2\right\}$
- Solving minimization problem $L_2$ IRLS (Iteration derivation)
- How to utilize the right-hand side in inverse problems
- How Does $ {L}_{1} $ Regularization Present Itself in Gradient Descent?
- Proof in inverse scattering theory (regularization schemes)
- Derivation of Hard Thresholding Operator (Least Squares with Pseudo $ {L}_{0} $ Norm)
Related Questions in PROXIMAL-OPERATORS
- Proximal Operator of Summation of $ {L}_{1} $ Norm and $ {L}_{2, 1} $ Norm
- Prox Operator of a First Order Perturbation (Adding Linear Term to the Function)
- Proximal Operator / Mapping Intuition and Practical Example
- How to Solve Linear Least Squares with Unit Simplex Constraint
- Proximal Operator of Huber Loss Function (For $ {L}_{1} $ Regularized Huber Loss of a Regression Function)
- Can the Proximal Operator Have Discontinuity?
- Why Is the Proximal Mapping (Proximal Operator) a Mapping?
- Gradient of Moreau Envelope
- Why Is the Proximal Operator Well Defined?
- Proximal Operator / Mapping of Multiplication of Two Matrices
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
anything for a bounty ;-) (In fact, I was simply running out of time during the work week.)
From previous discussions we know that the author is actually referring to an elementwise 1-norm, not the induced 1-norm. That is, $\|W\|_1\triangleq \sum_{ij}|w_{ij}|$. This means that this function is separable across columns:
$$f(W) = \sum_{j=1}^T \left( \lambda \|w_j\|_1 + \gamma \|w_j\|_2 + \tfrac{1}{2} \|w_j - u_j\|_2^2 \right)$$ So we can focus our view to the vector function $$g(w) = \lambda \|w\|_1 + \gamma \|w\|_2 + \tfrac{1}{2} \|w - u\|_2^2$$ The optimality conditions are: $$\lambda v_1 + \gamma v_2 + w = u, \quad v_1\in\partial \|w\|_1, \quad v_2\in\partial \|w\|_2$$ $$\partial \|w\|_1 = \{v\,|\, \|v\|_\infty \leq 1, ~ \langle v, w \rangle = \|w\|_1\}$$ $$\partial \|w\|_2 = \{v\,|\, \|v\|_2 \leq 1, ~ \langle v, w \rangle = \|w\|_2\}$$ To solve, let's define the standard soft-thresholding operator: $$\mathop{\textrm{soft}}(x;\lambda)= \begin{cases} x - \lambda & x > \lambda \\ 0 & |x| \leq \lambda \\ x + \lambda & x < -\lambda \end{cases}$$ and extend it to apply elementwise to vectors. Then we choose $$[v_1]_i = \mathop{\textrm{sign}}(u_i)\min\{|u_i|/\lambda,1\}, ~i=1,2,\dots m$$ $$\quad\Longrightarrow u - \lambda v_1 = \mathop{\textrm{soft}}(u;\lambda)$$ This reduces the optimality conditions to $$\gamma v_2 + w = \mathop{\textrm{soft}}(u;\lambda)$$ Let's call that right-hand term $q$ and consider three cases:
If you're familiar with the proximal operator for $\ell_2$ alone, you should recognize that we're doing the exact same operation here. Let's call it "shrink": $$\mathop{\textrm{shrink}}(q; \gamma) = \begin{cases} 0 & \|q\|_2 \leq \gamma \\ (1-\gamma/\|q\|_2\}) q & \|q\|_2 > \gamma \end{cases}$$ Therefore, we have $$\mathop{\textrm{arg min}} g(w) = \mathop{\textrm{shrink}}(\mathop{\textrm{soft}}(u;\lambda); \gamma).$$ That's right: to compute the prox for $\ell_1$ plus $\ell_2$, we simply apply the $\ell_1$ prox first, then the $\ell_2$!
So for the original problem, $$\mathop{\textrm{arg min}} F(W) = \bar{W} = \begin{bmatrix} \bar{w}_1 & \dots & \bar{w}_T \end{bmatrix}, \quad \bar{w}_j = \mathop{\textrm{shrink}}(\mathop{\textrm{soft}}(u_j;\lambda); \gamma).$$