Graident of Weighted Power Mean W.R.T Parametirization of ML Model

40 Views Asked by At

Let $H = \{h_{\theta}: \theta \in \Theta \subseteq \mathbb{R}^d\}$ be the hypothesis class that we will use to find the classifier for the classification task of interest, where each classifier $h \in H$ is parameterized by a vector of weights $\theta$.

Therefore, we define a generic loss function to be \begin{equation*} L(x_i) = \ell(h_{\theta}(x_i), y_i) \end{equation*} where $\ell$ is the loss function used in training, $x_i$ is the sample image, and $y_i$ is the true label of the sample image.

We define the empirical risk as the sum of the losses for the images in a group/class over the cardinality $N$ of the group. \begin{equation*} \hat{R}_g = \frac{L(x_i)}{N_g} \end{equation*}

This ultimately leads to a definition of the fair objective expressed as a power mean in terms of empirical risk as enter image description here

How would we define the gradient of the objective with respect to theta i.e. \begin{equation*} \frac{\partial E}{\partial \theta} = \frac{\sqrt[\leftroot{-1}\uproot{2}p]{\sum\limits_{g=1}^{G}\omega_g \cdot \ell(h_{\theta}(x_i), y_i)}}{p \cdot \sqrt[\leftroot{-1}\uproot{2}p]{N_g} \cdot \ell(h_{\theta}(x_i), y_i)} \cdot \frac{\partial}{\partial \theta} \left( \ell (h_{\theta}(x_i), y_i) \right) \cdot \frac{\partial}{\partial \theta}(h_{\theta}(x_i), y_i) \end{equation*}