I am currently working my way through the cs231n class and I got stuck with the partial derivative for SVM with respect to X. The loss function is defined as:
the partial derivatives wrt $w_{yi}$ and $w_j$ make sense to me, and also allow a quite straightforward implementation in vectorized form:
and
Now to my problem. For the gradient calculation, I am required to derive the loss function with respect to x and I cannot figure out how this is supposed to work.
Unfortunately, I was not able to find an explanation or walk-through, but rather found a (code) solution online. I would like to go beyond that and understand why this is actually correct and how to get there. Note that "x" in the code refers to the calculated scores, hence this already is -> $w_j^Tx_i$.
I appreciate any help to derive wrt x...
def svm_loss(x, y):
"""
Computes the loss and gradient using for multiclass SVM classification.
Inputs:
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth
class for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C
Returns a tuple of:
- loss: Scalar giving the loss
- dx: Gradient of the loss with respect to x
"""
N = x.shape[0]
correct_class_scores = x[np.arange(N), y]
margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
margins[np.arange(N), y] = 0
loss = np.sum(margins) / N
num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(N), y] -= num_pos
dx /= N
return loss, dx


