Why there is no $y_i$ term in $\frac{d^{2} J(\boldsymbol{\alpha})}{d \alpha_{i}^{2}}$?

Question

Why there is no $y_i$ term in $\frac{d^{2} J(\boldsymbol{\alpha})}{d \alpha_{i}^{2}}$?

60 Views Asked by user817548 At 27 Mar 2026 - 2:37

$$\frac{d^{2} J(\boldsymbol{\alpha})}{d \alpha_{i}^{2}}=\lambda^{-1} \mathbf{x}_{i}^{\mathrm{T}} \mathbf{x}_{i}+\frac{1}{\alpha_{i}\left(1-\alpha_{i}\right)}$$

I cannot understand why there is no $y_i$ term in $\frac{d^{2} J(\boldsymbol{\alpha})}{d \alpha_{i}^{2}}.$

$$\mathbf{w}(\boldsymbol{\alpha})=\lambda^{-1} \sum_{i} \alpha_{i} y_{i} \mathbf{x}_{i}$$

$$J(\alpha)=\frac{1}{2 \lambda} \sum_{i j} \alpha_{i} \alpha_{j} y_{i} y_{j} \mathbf{x}_{j}^{\mathrm{T}} \mathbf{x}_{i}-\sum_{i} H\left(\alpha_{i}\right)$$

\begin{aligned} \frac{d J(\boldsymbol{\alpha})}{d \alpha_{i}} &=\lambda^{-1} y_{i} \sum_{j} \alpha_{j} y_{j} \mathbf{x}_{j}^{\mathrm{T}} \mathbf{x}_{i}+\log \frac{\alpha_{i}}{1-\alpha_{i}} \\ &=y_{i} \mathbf{w}(\boldsymbol{\alpha})^{\mathrm{T}} \mathbf{x}_{i}+\log \frac{\alpha_{i}}{1-\alpha_{i}} \end{aligned}

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

$ \def\J{{\cal J}} \def\a{\alpha}\def\b{\beta}\def\e{\varepsilon}\def\l{\lambda} \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3}} \def\c#1{\color{red}{#1}} $The Frobenius product is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

We are given vectors $(a,y)$ with components $(\a_i,y_i)$ and a matrix $(X)$ with columns $(x_i)$.
In addition, it will prove convenient to define two diagonal matrices and a vector $$\eqalign{ A = \Diag{a}, \quad Y = \Diag{y},\quad b=\LR{y\odot a}=Ya \\ }$$ where $(y\odot a)$ denotes elementwise multiplication.
Similarly, $\LR{\frac ba}$ will be used to denote elementwise division.

The entropy is defined as $$\eqalign{ -H(a) = a:\log(a) + (\o-a):\log(\o-a) \\ }$$ where the log() function is applied elementwise.

Now the objective function can be written in matrix notation without explicit summations $$\eqalign{ w &= \l^{-1}Xb \;=\; \l^{-1}XYa \\ \J &= \frac{\l}{2}\LR{w^Tw} - H(a) \;\;\doteq\;\; \frac{\l}{2}\LR{w:w} + a:\log(a) + (\o-a):\log(\o-a) \\ }$$ Calculate the differential and the gradient of the function. $$\eqalign{ d\J &= \l w:dw + \log\LR{\frac{a}{\o-a}}:da \\ &= \LR{XYa}:\LR{\l^{-1}XY\,da} + \log\LR{\frac{a}{\o-a}}:da \\ &= \l^{-1}{YX^TXYa}:da + \log(a):da - \log(\o-a):da \\ \grad{\J}{a} &= \l^{-1}{YX^TXYa} + \log(a) - \log(\o-a) \;\doteq\; g \qquad\big({\rm the\;gradient}\big) \\ }$$ Now calculate the differential and the gradient of the gradient. $$\eqalign{ dg &= \l^{-1}{YX^TXY\,da} + \frac{da}{a} - \frac{d(\o-a)}{\o-a} \\ &= \l^{-1}\LR{YX^TXY}\,da + \LR{A-A^2}^{-1}da \\ \grad{g}{a} &= \l^{-1}\BR{\Diag{y}\;X^TX\;\Diag{y}} + \LR{A-A^2}^{-1} \;\doteq\; \hess{\J}{a}{a^T} \\ }$$ Therefore, $y$-terms are present in the Hessian.

However, if you take one more derivative (i.e the gradient of the Hessian) then the $y$-terms will be annihilated.

Why there is no $y_i$ term in $\frac{d^{2} J(\boldsymbol{\alpha})}{d \alpha_{i}^{2}}$?

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in ENTROPY

Related Questions in HESSIAN-MATRIX

Related Questions in LOGISTIC-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions