Recently I have read a paper, but I was confused about the optimized method of this article. In the following I will try to abstract the problem in the text. Supposed that we have six variables $\bf{\theta_{1}},\bf{\theta_{2}},\bf{\theta_{3}}$,$w_1,w_2,w_3$,(note that $\bf{\theta}$ is a column vector and $w$ is a scalar) and a continuous and differentiable function $f(x)$. And here the variables satisfy the relationship $$ \theta_{i}^{t+1} = \sum_{n=1}^{3}w_n\theta_{n}^{t}, i=1,2,3 $$ Here $t$ means a moment, and t = 0,1,2,...N. Besides, let's just think of $i$ as some fixed value. Our target is to find $w_1, w_2, w_3$ at each moment t which can minimize the following: $$ \min_{w_1,w_2,w_3}f(\theta_i^{t+1})=\min_{w_1,w_2,w_3}f(\sum_{n=1}^{3}w_n\theta_{n}^{t}) $$ In my opinion, at the moment $t+1$, because $w_1,w_2,w_3$ are variables, and $\theta_1^t,\theta_2^t,\theta_3^t$ are constant vector, we can think that $\theta_i^{t+1}$ is a linear combination of $w_1,w_2,w_3$. To minimize $f(\theta_i^{t+1})$, using gradient descent method, we can find a appoximated solution $\tilde{\theta}_i^{t+1} $after T iterations: $$ (\theta_i^{t+1})^{T+1} = (\theta_i^{t+1})^{T} - \eta\nabla_{(\theta_{i}^{t+1})^T} f((\theta_i^{t+1})^{T}) $$ Here $\eta$ is the step size. Using the chain rule, we have: $$ \nabla_{(\theta_{i}^{t+1})^T} f((\theta_i^{t+1})^{T})=\frac{\partial f((\theta_i^{t+1})^{T}) }{\partial w_1}\cdot \frac{\partial w_1 }{\partial \theta_{i}^{t+1}} + \frac{\partial f((\theta_i^{t+1})^{T}) }{\partial w_2}\cdot \frac{\partial w_2 }{\partial \theta_{i}^{t+1}} + \frac{\partial f((\theta_i^{t+1})^{T}) }{\partial w_3}\cdot \frac{\partial w_3 }{\partial \theta_{i}^{t+1}} $$ Let $\bf{w}$$=[w_1,w_2,w_3]^{H}$, here $H$ means transposition, then we have: $$ \nabla_{(\theta_{i}^{t+1})^T} f((\theta_i^{t+1})^{T})=[\frac{\partial w_1 }{\partial \theta_{i}^{t+1}},\frac{\partial w_2 }{\partial \theta_{i}^{t+1}},\frac{\partial w_3 }{\partial \theta_{i}^{t+1}}]^{H} \nabla_{\bf{w}} f((\theta_i^{t+1})^{T}) $$ So, the final iteration formula should be $$ (\theta_i^{t+1})^{T+1} = (\theta_i^{t+1})^{T} - \eta[\frac{\partial w_1 }{\partial \theta_{i}^{t+1}},\frac{\partial w_2 }{\partial \theta_{i}^{t+1}},\frac{\partial w_3 }{\partial \theta_{i}^{t+1}}]^{H} \nabla_{\bf{w}} f((\theta_i^{t+1})^{T}) $$ The above is my opinion. But in the paper it seems that the final result is $$ (\theta_i^{t+1})^{T+1} = (\theta_i^{t+1})^{T} - \eta \bf{1}^{H} \nabla_{\bf{w}} f((\theta_i^{t+1})^{T}) $$ Here $\bf{1}$ is a size-3 vector of ones. By the way, it seems that the moment $t$ and iteration $T$ are mixed in the paper, and the above is purely my personal understanding, I hope that someone can tell me what went wrong in my solution process? If someone is interested in the paper, you can search "PERSONALIZED FEDERATED LEARNING WITH FIRST ORDER MODEL OPTIMIZATION", but I think it may be tough for you if you don't know about federated learning. Thank you.
2026-04-02 23:36:58.1775173018
Question about a convex optimization using gradient descent
47 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtRelated Questions in OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- optimization with strict inequality of variables
- Gradient of Cost Function To Find Matrix Factorization
- Calculation of distance of a point from a curve
- Find all local maxima and minima of $x^2+y^2$ subject to the constraint $x^2+2y=6$. Does $x^2+y^2$ have a global max/min on the same constraint?
- What does it mean to dualize a constraint in the context of Lagrangian relaxation?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Building the model for a Linear Programming Problem
- Maximize the function
- Transform LMI problem into different SDP form
Related Questions in CONVEX-OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- Least Absolute Deviation (LAD) Line Fitting / Regression
- Check if $\phi$ is convex
- Transform LMI problem into different SDP form
- Can a linear matrix inequality constraint transform to second-order cone constraint(s)?
- Optimality conditions - necessary vs sufficient
- Minimization of a convex quadratic form
- Prove that the objective function of K-means is non convex
- How to solve a linear program without any given data?
- Distance between a point $x \in \mathbb R^2$ and $x_1^2+x_2^2 \le 4$
Related Questions in NUMERICAL-OPTIMIZATION
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Bouncing ball optimization
- Minimization of a convex quadratic form
- What is the purpose of an oracle in optimization?
- What do you call iteratively optimizing w.r.t. various groups of variables?
- ProxASAGA: compute and use the support of $\Delta f$
- Can every semidefinite program be solved in polynomial time?
- In semidefinite programming we don't have a full dimensional convex set to use ellipsoid method
- How to generate a large PSD matrix $A \in \mathbb{R}^{n \times n}$, where $\mathcal{O}(n) \sim 10^3$
- Gram matrices in the Rayleigh-Ritz algorithm
Related Questions in GRADIENT-DESCENT
- Gradient of Cost Function To Find Matrix Factorization
- Can someone explain the calculus within this gradient descent function?
- Established results on the convergence rate of iterates for Accelerated Gradient Descent?
- Sensitivity (gradient) of function solved using RK4
- Concerning the sequence of gradients in Nesterov's Accelerated Descent
- Gradient descent proof: justify $\left(\dfrac{\kappa - 1}{\kappa + 1}\right)^2 \leq \exp(-\dfrac{4t}{\kappa+1})$
- If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?
- How does one show that the likelihood solution for logistic regression has a magnitude of infinity for separable data (Bishop exercise 4.14)?
- How to determinate that a constrained inequality system is not empty?
- How to show that the gradient descent for unconstrained optimization can be represented as the argmin of a quadratic?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?