From the Matrix Calculus for Deep Learning, in the "Derivatives of vector element-wise binary operators" section, it says
Any time the general function is a vector, we know that $f_i(w)$ reduces to $f_i(w_i) = w_i$.
Why is it that $f_i(w)$ reduces to $f_i(w_i) = w_i$ for $y=w+x$ ?
Can someone show an example of why when i != j, the partial derivatives are zero?
Is this answer satisfactory? Something like this?
$\frac{d}{dw_i}f_i(w_j)$ when i != j, the scalar derivative is 0

The reason that the off-diagonal elements are 0 is because the derivative of a constant is 0. The key bit is that $w_i$ looks like a constant if we take the derivative with respect to $w_j$ for $j \ne i$.
Here is more of the matrix calculus article I co-authored: