I'm trying to get the partial derivatives $\frac{\partial L}{\partial w}$ of a log-Likelihood function
$$ L(w) = \sum_{n=1}^{N}\sum_{k=1}^{K}y_{nk}\cdot log(\frac{e^{\sum_{i=1}^{D}w_{ki}x_{i}}}{\sum_{k\prime=1}^{K}e^{\sum_{i=1}^{D}w_{k\prime i}x_{i}}}) $$
with regards to $w$, where $y_{nk}$ is only $1$, if $n$ and $k$ are equal. So far I managed to reformat the function as
$$ L(w)=\sum_{n=1}^{N}\sum_{k=1}^{K}y_{nk}\cdot \log e^{\sum_{i=1}^{D}w_{ki}x_{i}} - \sum_{n=1}^{N}\sum_{k=1}^{K}y_{nk}\cdot \log\sum_{k\prime=1}^{K}e^{\sum_{i=1}^{D}w_{k\prime i}x_{i}} $$
using the rules of logarithms and splitting up the two sums. The $\log$ and $e$ cancel out in the first leaving only,
$$ L(w)=\sum_{n=1}^{N}\sum_{k=1}^{K}y_{nk}\cdot \sum_{i=1}^{D}w_{ki}x_{i} - \sum_{n=1}^{N}\sum_{k=1}^{K}y_{nk}\cdot \log\sum_{k\prime=1}^{K}e^{\sum_{i=1}^{D}w_{k\prime i}x_{i}} $$
but this is where I'm stuck. If I'm not mistaken, it is possible to derive each summant individually, but the double sums confuse me. How would I go on about deriving either of the two?
if $y_{nk}$ is just 1 if $n=k$ and otherwise is zero, you can remove one of summation and put $n$ for $k$. $$ \sum_n \sum_k \delta_{n,k} F(n,k) = \sum_n F(n,n). $$ but even without this simplification. it seems $\omega$ is a matrix here with dimension $N \times D$. so $L(\omega)$ is a multivariable function and for its derivative you should calculate $\frac{\partial L}{\partial \omega_{i,j}}$. $$ DL_\omega = \big [ \frac{\partial L}{\partial \omega_{i,j}}\big ]. $$ diffrentiation is a linear operator so it passes the summations, doesn't matter how many summation it is. $$ \frac{\partial L}{\partial \omega_{i,j}} = \sum_n \sum_k y_{nk} \frac{\partial }{\partial \omega_{i,j}} \big( \log(g(\omega)\big) = \sum_n \sum_k y_{nk} \frac{1}{g(\omega)} \frac{\partial }{\partial \omega_{i,j}} g(\omega). $$ in which $g(x)$ is the functionallity in argument of $\log$. i use $d\log(u) = \frac{du}{u}$ for second part. the remaining work is differentiation of $g(\omega)$, which seems hard but it's just tortuous and needed to be wrote down. ($A=\sum_{k\prime} \exp (\sum_i \omega_{k\prime,i} x_i)$) $$ \frac{\partial }{\partial \omega_{i,j}} g(\omega) = \frac{\partial }{\partial \omega_{i,j}} \frac{\exp (\sum_l \omega_{k,l} x_l)}{\sum_{k\prime} \exp (\sum_l \omega_{k\prime,l} x_l)} \\ = \frac{(\frac{\partial }{\partial \omega_{i,j}}\exp (\sum_l \omega_{k,l} x_l))A - \exp (\sum_l \omega_{k,l} x_l) (\frac{\partial }{\partial \omega_{i,j}}A)}{A^2} $$ only relation you need to know is $\frac{d}{dx} \exp(f(x)) = \exp(f(x)) \frac{df}{dx}$ and $\frac{\partial }{\partial \omega_{i,j}} \omega_{l,m} = \delta_{i,l}\delta_{j,m}$. which $\delta_{i,j}$ is kronecker delta function (1 if $i=j$ otherwise zero). $$ \frac{\partial }{\partial \omega_{i,j}}\exp (\sum_l \omega_{k,l} x_l) = \exp (\sum_l \omega_{k,l} x_l) (\frac{\partial }{\partial \omega_{i,j}}\sum_l \omega_{k,l} x_l) \\ = \delta_{i,k} x_j \exp (\sum_l \omega_{k,l} x_l) \\ \frac{\partial }{\partial \omega_{i,j}}A = \sum_{k\prime} \frac{\partial }{\partial \omega_{i,j}} \exp (\sum_l \omega_{k\prime,l} x_l) = \sum_{k\prime} \delta_{i,k\prime} x_j \exp (\sum_l \omega_{k\prime,l} x_l) = x_j \exp (\sum_l \omega_{i,l} x_l) $$ that's basically every calculations you need. you just need to put these together.