Considering the following Loss function:
$$A{L_t}\left( {{\mathbf{w}_t}} \right) = \sum\nolimits_{j = 1}^k {{L_t}\left( {{b^j}} \right)w_t^j}$$
I want to calculate the gradient of it ($\nabla A{L_t}\left( {{\mathbf{w}_t}} \right)$). The variables are:
${L_t}\left( {{b^j}} \right) = - \ln \left( {{b^j}\mathbf{x}_t^ \top } \right) + \gamma {\left\| {{b^j}} \right\|^2}$
$\begin{array}{l}\gamma = \text{Scalar}\\{b^j} = \left( {b_j^1,b_j^2, \ldots ,b_j^m} \right) \in {\mathbb{R}^m}\\{\mathbf{x}_t} = \left( {x_t^1,x_t^2, \ldots ,x_t^m} \right) \in {\mathbb{R}^m}\\{\mathbf{w}_t} = \left( {w_t^1,w_t^2, \ldots ,w_t^k} \right) \in {\mathbb{R}^k}\\\text{Number of Experts} \in \left\{ {1,2, \ldots ,k} \right\}\\\text{Number of Stocks} \in \left\{ {1,2, \ldots ,m} \right\}\end{array}$
But I don't know how to calculate $\nabla A{L_t}\left( {{\mathbf{w}_t}} \right)$. I know that the gradient of $f(x)$ would be:
$$\nabla f\left( x \right) = \frac{{\partial f\left( x \right)}}{{\partial x}}
% MathType!MTEF!2!1!+-
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey4bIeTaam
% OzamaabmaabaGaamiEaaGaayjkaiaawMcaaiabg2da9maalaaabaGa
% eyOaIyRaamOzamaabmaabaGaamiEaaGaayjkaiaawMcaaaqaaiabgk
% Gi2kaadIhaaaaaaa!433A!
$$
Then we can say:
$$\nabla A{L_t}\left( {{{\bf{w}}_t}} \right) = \frac{{\partial A{L_t}\left( {{{\bf{w}}_t}} \right)}}{{\partial {{\bf{w}}_t}}} = \frac{{\partial \sum\nolimits_{j = 1}^k {{L_t}\left( {{b^j}} \right)w_t^j} }}{{\partial {{\bf{w}}_t}}}
% MathType!MTEF!2!1!+-
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8
% qacqGHhis0caWGbbGaamita8aadaWgaaWcbaWdbiaadshaa8aabeaa
% k8qadaqadaWdaeaapeGaaC4Da8aadaWgaaWcbaWdbiaadshaa8aabe
% aaaOWdbiaawIcacaGLPaaacqGH9aqpdaWcaaqaaiabgkGi2kaadgea
% caWGmbWdamaaBaaaleaapeGaamiDaaWdaeqaaOWdbmaabmaapaqaa8
% qacaWH3bWdamaaBaaaleaapeGaamiDaaWdaeqaaaGcpeGaayjkaiaa
% wMcaaaqaaiabgkGi2kaahEhapaWaaSbaaSqaa8qacaWG0baapaqaba
% aaaOWdbiabg2da9maalaaabaGaeyOaIy7damaaqadabaGaamitamaa
% BaaaleaacaWG0baabeaakmaabmaabaGaamOyamaaCaaaleqabaGaam
% OAaaaaaOGaayjkaiaawMcaaiaadEhadaqhaaWcbaGaamiDaaqaaiaa
% dQgaaaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLd
% aak8qabaGaeyOaIyRaaC4Da8aadaWgaaWcbaWdbiaadshaa8aabeaa
% aaaaaa!60B6!
$$
Let's continue opening up the numerator:
$$\frac{{\partial {L_t}\left( {{b^1}} \right)w_t^1 + \partial {L_t}\left( {{b^2}} \right)w_t^2 + \cdots + \partial {L_t}\left( {{b^k}} \right)w_t^k}}{{\partial {{\bf{w}}_t}}} = \frac{{\partial {L_t}\left( {{b^1}} \right)w_t^1}}{{\partial {{\bf{w}}_t}}} + \frac{{\partial {L_t}\left( {{b^2}} \right)w_t^2}}{{\partial {{\bf{w}}_t}}} + \cdots + \frac{{\partial {L_t}\left( {{b^k}} \right)w_t^k}}{{\partial {{\bf{w}}_t}}}
% MathType!MTEF!2!1!+-
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcqaaaaaaaaaWdbe
% aadaWcaaqaaiabgkGi2+aacaWGmbWaaSbaaSqaaiaadshaaeqaaOWa
% aeWaaeaacaWGIbWaaWbaaSqabeaacaaIXaaaaaGccaGLOaGaayzkaa
% Gaam4DamaaDaaaleaacaWG0baabaGaaGymaaaakiabgUcaR8qacqGH
% ciITpaGaamitamaaBaaaleaacaWG0baabeaakmaabmaabaGaamOyam
% aaCaaaleqabaGaaGOmaaaaaOGaayjkaiaawMcaaiaadEhadaqhaaWc
% baGaamiDaaqaaiaaikdaaaGccqGHRaWkcqWIVlctcqGHRaWkpeGaey
% OaIy7daiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkga
% daahaaWcbeqaaiaadUgaaaaakiaawIcacaGLPaaacaWG3bWaa0baaS
% qaaiaadshaaeaacaWGRbaaaaGcpeqaaiabgkGi2kaahEhapaWaaSba
% aSqaa8qacaWG0baapaqabaaaaOWdbiabg2da9maalaaabaGaeyOaIy
% 7daiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkgadaah
% aaWcbeqaaiaaigdaaaaakiaawIcacaGLPaaacaWG3bWaa0baaSqaai
% aadshaaeaacaaIXaaaaaGcpeqaaiabgkGi2kaahEhapaWaaSbaaSqa
% a8qacaWG0baapaqabaaaaOWdbiabgUcaRmaalaaabaGaeyOaIy7dai
% aadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkgadaahaaWc
% beqaaiaaikdaaaaakiaawIcacaGLPaaacaWG3bWaa0baaSqaaiaads
% haaeaacaaIYaaaaaGcpeqaaiabgkGi2kaahEhapaWaaSbaaSqaa8qa
% caWG0baapaqabaaaaOWdbiabgUcaRiabl+UimjabgUcaRmaalaaaba
% GaeyOaIy7daiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaa
% dkgadaahaaWcbeqaaiaadUgaaaaakiaawIcacaGLPaaacaWG3bWaa0
% baaSqaaiaadshaaeaacaWGRbaaaaGcpeqaaiabgkGi2kaahEhapaWa
% aSbaaSqaa8qacaWG0baapaqabaaaaaaa!8B5F!
$$
I think the result should be:
$$\nabla A{L_t}\left( {{\mathbf{w}_t}} \right) = {L_t}\left( {{b^1}} \right) + {L_t}\left( {{b^2}} \right) + \cdots + {L_t}\left( {{b^k}} \right)
% MathType!MTEF!2!1!+-
% feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
% hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
% 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq-Jc9
% vqaqpepm0xbba9pwe9Q8fs0-yqaqpepae9pg0FirpepeKkFr0xfr-x
% fr-xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcqaaaaaaaaaWdbe
% aapaGaey4bIeTaamyqaiaadYeadaWgaaWcbaGaamiDaaqabaGcdaqa
% daqaaiaadEhadaWgaaWcbaGaamiDaaqabaaakiaawIcacaGLPaaacq
% GH9aqpcaWGmbWaaSbaaSqaaiaadshaaeqaaOWaaeWaaeaacaWGIbWa
% aWbaaSqabeaacaaIXaaaaaGccaGLOaGaayzkaaGaey4kaSIaamitam
% aaBaaaleaacaWG0baabeaakmaabmaabaGaamOyamaaCaaaleqabaGa
% aGOmaaaaaOGaayjkaiaawMcaaiabgUcaRiabl+UimjabgUcaRiaadY
% eadaWgaaWcbaGaamiDaaqabaGcdaqadaqaaiaadkgadaahaaWcbeqa
% aiaadUgaaaaakiaawIcacaGLPaaaaaa!5418!
$$
Am I right?
The gradient should be a vector not a scalar. The $j$-th component of the gradient is $L_t(b^j)$ as simple derivation shows. $$\frac{\partial \phi}{\partial w_j}=L_t(\mathbf{b}_j)$$