Suppose I have the function $y=\mu^Tx$ where $\mu$ is constant. Now I understand that $\frac{\partial y}{\partial x} = \mu$. However, if I was to limit $x$ to a hypershere how would this change?
From what I'm thinking in terms of fundamentals it doesn't make sense to simply to look $\delta x$ since $x$ has limits on what directions it can move. Any thoughts?
So to give some context I'm trying to do stochastic gradient descent, used in neural nets. What I would ideally like is a rotation matrix $\Omega$ that would rotate x such it maximises the objective function $y$. But since it is SGD what I am doing right now is doing: $x_{new} = x_{old} - \gamma\frac{\partial y}{\partial x_{old}}$ and then normalising $x_{new}$ to be of unit length.