SVM derivative with respect to X

113 Views Asked by At

I am currently working my way through the cs231n class and I got stuck with the partial derivative for SVM with respect to X. The loss function is defined as:

enter image description here

the partial derivatives wrt $w_{yi}$ and $w_j$ make sense to me, and also allow a quite straightforward implementation in vectorized form:

enter image description here

and

enter image description here

Now to my problem. For the gradient calculation, I am required to derive the loss function with respect to x and I cannot figure out how this is supposed to work.

Unfortunately, I was not able to find an explanation or walk-through, but rather found a (code) solution online. I would like to go beyond that and understand why this is actually correct and how to get there. Note that "x" in the code refers to the calculated scores, hence this already is -> $w_j^Tx_i$.

I appreciate any help to derive wrt x...

def svm_loss(x, y):
"""
  Computes the loss and gradient using for multiclass SVM classification.
  Inputs:
  - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
    class for the ith input.
  - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C
  Returns a tuple of:
  - loss: Scalar giving the loss
  - dx: Gradient of the loss with respect to x
  """
  N = x.shape[0]
  correct_class_scores = x[np.arange(N), y]
  margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
  margins[np.arange(N), y] = 0
  loss = np.sum(margins) / N
  num_pos = np.sum(margins > 0, axis=1)
  dx = np.zeros_like(x)
  dx[margins > 0] = 1
  dx[np.arange(N), y] -= num_pos
  dx /= N
  return loss, dx