Proximal Gradient Descent

Question

Proximal Gradient Descent

574 Views Asked by Bumbble Comm At 25 Mar 2026 - 9:53

I am trying to solve the below optimization problem using proximal gradient descent on a dataset in python:

$f(\theta) = \arg\min_{\theta \in R^d}\frac{1}{m}\sum_{i=1}^m\Big [log(1+exp(x_i\theta))-y_ix_i\theta\Big]+\frac{\lambda_2}{2}||\theta||^2_2 + \lambda_1||\theta||_1$

where $x_i \in R^{1 \times d}$ is sample $i$, $\theta \in R^d$ and $y_i \in \{0,1\}$ is the label for sample $i$ and $\lambda>0$.

The gradient of $g(\theta)$ being

$\nabla g(\theta) = \frac{1}{m}\sum_{i=1}^m\Big[\frac{x_ie^{x\theta}}{1+e^{x_i\theta}} - x_iy_i\Big]+\theta \lambda_2$

The dataset contains 784 columns and 2000 datapoints half of which i use for learning $\theta$ and the remaining for evaluating accuracy of the classifier. The $\theta$ learnt is used to predict labels given by $\frac{1}{1+exp(-x\theta)}$. I need to build the classifier using proximal operators, and plot the accuracy per iteration.

This is the code i have come up with but my accuracy is way off, it is expected to converge with >98% accuracy in one to 5 iterations max.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from sklearn.metrics import accuracy_score
np.random.seed(100)
data = pd.read_csv("data.csv",header = None).values
A = data[:,0:784]
y = data[:,-1]
A_train = A[:1000,]
y_train = y[:1000,]
A_test = A[1000:,]
y_test = y[1000:,]
lam_1 = 10
lam_2 = 5
m = len(A_train)

def prox_op(theta,t):
    a = np.maximum(theta - t, 0)
    b = np.minimum(theta + t, 0)
    return a+b

def func(theta):
  func_sum = 0
  for i in range(m):
    func_sum += np.log(1+np.exp(A_train[i]*theta)) - y_train[i]*A_train[i]*theta
  term2 = lam_2/2 * np.linalg.norm(theta)
  term3 = lam_1 * np.linalg.norm([theta], ord = 1)
  return func_sum/m + term2 + term3

def grad(theta):
  for i in range(m):
    grad_sum = (A_train[i] * np.exp(A_train[i] * theta)/(1 + np.exp(A_train[i] * theta))) - (A_train[i] * y_train[i]) 
  return grad_sum + (theta * lam_2)

t = 0.01
iterations = 100
theta_new = np.random.uniform(-1,1,784)
theta_list = [theta_new]
func_list = [func(theta_new)]
acc_sc = []

for i in range(iterations-1):
    theta_current = theta_list[-1]
    grad_g = grad(theta_current) 
    theta_new_gd = theta_current - t * grad_g; 
    theta_new = prox_op(theta_new_gd,lam_1*t)
    theta_list.append(theta_new)
    func_list.append(func(theta_new))
    yy_tr = 1/(1+np.exp(np.dot(-A_train, theta_new)))
    acc_sc.append(accuracy_score(yy_tr>t*lam_1,y_train))

plt.plot(func_list)
plt.title('Proximal GD')
plt.xlabel('Iteration')
plt.ylabel('Function Value')
plt.show()

Could some one point where im going wrong please?

This is the plot of function value vs #iterations: plot

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Before doing the proximal GD, you can (should ?) that your analytical gradient is correct. I suspect it should be

*** grad_sum += *** instead of *** grad_sum = *** since you are summing over the examples. Also the normalisation term has disappeared...

Also you give the gradient but not the proximal update so this is not 'easy' to detect where your error is located.

Using slightly different notations, denote $$ \phi(\mathbf{w}) = g(\mathbf{w}) + \lambda_1 \|\mathbf{w}\|_1 $$ where $g(\mathbf{w}) =\frac{1}{N} \sum_{n=1}^N \left[ \log \left(1+e_n\right) - y_n \mathbf{x}_n^T \mathbf{w} \right] + \frac{\lambda_2}{2} \|\mathbf{w}\|^2_2 $ and the scalar $e_n=\exp^{\mathbf{x}_n^T \mathbf{w}}$.

The update requires the soft-thresholding operator $$ \mathbf{w} \leftarrow S_{\lambda_1 t} \left[ \mathbf{w} -t \nabla g(\mathbf{w}) \right] $$ where $\nabla g(\mathbf{w}) =\frac{1}{N} \sum_{n=1}^N \left[ \left( \frac{e_n}{1+e_n}-y_n \right) \mathbf{x}_n \right] + \lambda_2 \mathbf{w} $.

Proximal Gradient Descent

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Related Questions in PYTHON

Related Questions in PROXIMAL-OPERATORS

Trending Questions

Popular # Hahtags

Popular Questions