I realize SGD is used for large data sets where the iterative solution could be approximated by a random sample's gradient instead of the sum over all samples. My question is suppose I have a function of mutiple variables say 'd', i,e just one sample, could one use stochastic gradient descent for just the function? I am asking this because in one of the homework problems in Gilbert Strang's Data science course asks you to compute a single step of gradient descent for a function of two variables, it is explicitly mentioned, full gradient descent not stochastic? I wonder why?
2026-04-06 18:58:03.1775501883
Stochastic gradient descent for a function of multiple variables?
293 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in LINEAR-ALGEBRA
- An underdetermined system derived for rotated coordinate system
- How to prove the following equality with matrix norm?
- Alternate basis for a subspace of $\mathcal P_3(\mathbb R)$?
- Why the derivative of $T(\gamma(s))$ is $T$ if this composition is not a linear transformation?
- Why is necessary ask $F$ to be infinite in order to obtain: $ f(v)=0$ for all $ f\in V^* \implies v=0 $
- I don't understand this $\left(\left[T\right]^B_C\right)^{-1}=\left[T^{-1}\right]^C_B$
- Summation in subsets
- $C=AB-BA$. If $CA=AC$, then $C$ is not invertible.
- Basis of span in $R^4$
- Prove if A is regular skew symmetric, I+A is regular (with obstacles)
Related Questions in CONVEX-OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- Least Absolute Deviation (LAD) Line Fitting / Regression
- Check if $\phi$ is convex
- Transform LMI problem into different SDP form
- Can a linear matrix inequality constraint transform to second-order cone constraint(s)?
- Optimality conditions - necessary vs sufficient
- Minimization of a convex quadratic form
- Prove that the objective function of K-means is non convex
- How to solve a linear program without any given data?
- Distance between a point $x \in \mathbb R^2$ and $x_1^2+x_2^2 \le 4$
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
I think you're mixing two things here. In Stochastic Gradient Descent (SGD) you utilize all the variables, but not all the data. In Coordinate Descent (CD), you utilize some of the variables, but do use all the data.
To give an example - consider you have a problem with two variables, $x,y$ and 10 data points. Then when using SGD you calculate the gradient with respect to $x,y$, but at each sub-iteration use only part of the data points to calculate the gradient - let's say 5 data points, then in the next sub-iteration you use the other 5 data points. In a "meta-iteration" you have used all the data. With CD, you will use all 10 data points, but at the first sub-iteration take a step only in the (negative) direction of the gradient with respect to $x$, and in the second sub-iteration use the gradient with respect to $y$. A "meta-iteration" have used all the variables.
Notice that this is a substantial difference! It might sound similar but in fact these algorithms are completely different and lead to different results.