Gradient of a loss function

125 Views Asked by At

Considering the following Loss function:
$$A{L_t}\left( {{\mathbf{w}_t}} \right) = \sum\nolimits_{j = 1}^k {{L_t}\left( {{b^j}} \right)w_t^j}$$ I want to calculate the gradient of it ($\nabla A{L_t}\left( {{\mathbf{w}_t}} \right)$). The variables are:
${L_t}\left( {{b^j}} \right) = - \ln \left( {{b^j}\mathbf{x}_t^ \top } \right) + \gamma {\left\| {{b^j}} \right\|^2}$
$\begin{array}{l}\gamma = \text{Scalar}\\{b^j} = \left( {b_j^1,b_j^2, \ldots ,b_j^m} \right) \in {\mathbb{R}^m}\\{\mathbf{x}_t} = \left( {x_t^1,x_t^2, \ldots ,x_t^m} \right) \in {\mathbb{R}^m}\\{\mathbf{w}_t} = \left( {w_t^1,w_t^2, \ldots ,w_t^k} \right) \in {\mathbb{R}^k}\\\text{Number of Experts} \in \left\{ {1,2, \ldots ,k} \right\}\\\text{Number of Stocks} \in \left\{ {1,2, \ldots ,m} \right\}\end{array}$
But I don't know how to calculate $\nabla A{L_t}\left( {{\mathbf{w}_t}} \right)$. I know that the gradient of $f(x)$ would be:
$$\nabla f\left( x \right) = \frac{{\partial f\left( x \right)}}{{\partial x}} % MathType!MTEF!2!1!+- % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9 % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey4bIeTaam % OzamaabmaabaGaamiEaaGaayjkaiaawMcaaiabg2da9maalaaabaGa % eyOaIyRaamOzamaabmaabaGaamiEaaGaayjkaiaawMcaaaqaaiabgk % Gi2kaadIhaaaaaaa!433A! $$

Then we can say:
$$\nabla A{L_t}\left( {{{\bf{w}}_t}} \right) = \frac{{\partial A{L_t}\left( {{{\bf{w}}_t}} \right)}}{{\partial {{\bf{w}}_t}}} = \frac{{\partial \sum\nolimits_{j = 1}^k {{L_t}\left( {{b^j}} \right)w_t^j} }}{{\partial {{\bf{w}}_t}}} % MathType!MTEF!2!1!+- % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9 % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 % qacqGHhis0caWGbbGaamita8aadaWgaaWcbaWdbiaadshaa8aabeaa % k8qadaqadaWdaeaapeGaaC4Da8aadaWgaaWcbaWdbiaadshaa8aabe % aaaOWdbiaawIcacaGLPaaacqGH9aqpdaWcaaqaaiabgkGi2kaadgea % caWGmbWdamaaBaaaleaapeGaamiDaaWdaeqaaOWdbmaabmaapaqaa8 % qacaWH3bWdamaaBaaaleaapeGaamiDaaWdaeqaaaGcpeGaayjkaiaa % wMcaaaqaaiabgkGi2kaahEhapaWaaSbaaSqaa8qacaWG0baapaqaba % aaaOWdbiabg2da9maalaaabaGaeyOaIy7damaaqadabaGaamitamaa % BaaaleaacaWG0baabeaakmaabmaabaGaamOyamaaCaaaleqabaGaam % OAaaaaaOGaayjkaiaawMcaaiaadEhadaqhaaWcbaGaamiDaaqaaiaa % dQgaaaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLd % aak8qabaGaeyOaIyRaaC4Da8aadaWgaaWcbaWdbiaadshaa8aabeaa % aaaaaa!60B6! $$ Let's continue opening up the numerator:
$$\frac{{\partial {L_t}\left( {{b^1}} \right)w_t^1 + \partial {L_t}\left( {{b^2}} \right)w_t^2 + \cdots + \partial {L_t}\left( {{b^k}} \right)w_t^k}}{{\partial {{\bf{w}}_t}}} = \frac{{\partial {L_t}\left( {{b^1}} \right)w_t^1}}{{\partial {{\bf{w}}_t}}} + \frac{{\partial {L_t}\left( {{b^2}} \right)w_t^2}}{{\partial {{\bf{w}}_t}}} + \cdots + \frac{{\partial {L_t}\left( {{b^k}} \right)w_t^k}}{{\partial {{\bf{w}}_t}}} % MathType!MTEF!2!1!+- % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9 % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcqaaaaaaaaaWdbe % aadaWcaaqaaiabgkGi2+aacaWGmbWaaSbaaSqaaiaadshaaeqaaOWa % aeWaaeaacaWGIbWaaWbaaSqabeaacaaIXaaaaaGccaGLOaGaayzkaa % Gaam4DamaaDaaaleaacaWG0baabaGaaGymaaaakiabgUcaR8qacqGH % ciITpaGaamitamaaBaaaleaacaWG0baabeaakmaabmaabaGaamOyam % aaCaaaleqabaGaaGOmaaaaaOGaayjkaiaawMcaaiaadEhadaqhaaWc % baGaamiDaaqaaiaaikdaaaGccqGHRaWkcqWIVlctcqGHRaWkpeGaey % OaIy7daiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkga % daahaaWcbeqaaiaadUgaaaaakiaawIcacaGLPaaacaWG3bWaa0baaS % qaaiaadshaaeaacaWGRbaaaaGcpeqaaiabgkGi2kaahEhapaWaaSba % aSqaa8qacaWG0baapaqabaaaaOWdbiabg2da9maalaaabaGaeyOaIy % 7daiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkgadaah % aaWcbeqaaiaaigdaaaaakiaawIcacaGLPaaacaWG3bWaa0baaSqaai % aadshaaeaacaaIXaaaaaGcpeqaaiabgkGi2kaahEhapaWaaSbaaSqa % a8qacaWG0baapaqabaaaaOWdbiabgUcaRmaalaaabaGaeyOaIy7dai % aadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkgadaahaaWc % beqaaiaaikdaaaaakiaawIcacaGLPaaacaWG3bWaa0baaSqaaiaads % haaeaacaaIYaaaaaGcpeqaaiabgkGi2kaahEhapaWaaSbaaSqaa8qa % caWG0baapaqabaaaaOWdbiabgUcaRiabl+UimjabgUcaRmaalaaaba % GaeyOaIy7daiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaa % dkgadaahaaWcbeqaaiaadUgaaaaakiaawIcacaGLPaaacaWG3bWaa0 % baaSqaaiaadshaaeaacaWGRbaaaaGcpeqaaiabgkGi2kaahEhapaWa % aSbaaSqaa8qacaWG0baapaqabaaaaaaa!8B5F! $$
I think the result should be:
$$\nabla A{L_t}\left( {{\mathbf{w}_t}} \right) = {L_t}\left( {{b^1}} \right) + {L_t}\left( {{b^2}} \right) + \cdots + {L_t}\left( {{b^k}} \right) % MathType!MTEF!2!1!+- % feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr % 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9 % vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x % fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcqaaaaaaaaaWdbe % aapaGaey4bIeTaamyqaiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqa % daqaaiaadEhadaWgaaWcbaGaamiDaaqabaaakiaawIcacaGLPaaacq % GH9aqpcaWGmbWaaSbaaSqaaiaadshaaeqaaOWaaeWaaeaacaWGIbWa % aWbaaSqabeaacaaIXaaaaaGccaGLOaGaayzkaaGaey4kaSIaamitam % aaBaaaleaacaWG0baabeaakmaabmaabaGaamOyamaaCaaaleqabaGa % aGOmaaaaaOGaayjkaiaawMcaaiabgUcaRiabl+UimjabgUcaRiaadY % eadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkgadaahaaWcbeqa % aiaadUgaaaaakiaawIcacaGLPaaaaaa!5418! $$
Am I right?

1

There are 1 best solutions below

9
On BEST ANSWER

The gradient should be a vector not a scalar. The $j$-th component of the gradient is $L_t(b^j)$ as simple derivation shows. $$\frac{\partial \phi}{\partial w_j}=L_t(\mathbf{b}_j)$$