findingoptimal weights in cross-entropy loss function

173 Views Asked by At

I'm trying to understand how cross-entropy works for finding the optimal weights in neural networks in a classic way. I use cross-entropy as a loss function and try to find the derivative of cross-entropy with respect to $w _ (pq) $ and then makes the result of derivative equal to zero. but all papers I read before used gradient descent for optimizing function and I can't find any other way. cause in most cases, the equation $∇f=0$ cannot be solved symbolically but I work with an online network which means I can't use gradient descent so I try to find optimal weights using classic way. here is the cost function:

$$j=-\sum_{i=1} ^t net_{c_i}^i + \sum_{i=1}^t log \sum_{c'=1}^ c e^{net_{c'}^i}$$

where ${net_{c'}^i}$ is:

$${net_{c'}^i}= \sum_{k=1}^k w_{c'k}h_k^i$$

$i=1$ to $N$ is the stream of data samples and as long as we have online learning just t number of these samples come to network.

the Derivative of $j$ function is equal to:

$$\frac{\partial j}{\partial w_{pq}} = -\sum_{i\in p}h_g^i + \sum_{i=1}^t \frac{h_q^i e^ {\sum_{k=1}^kw_{pk}h_k^i}}{\sum_{c'=1}^c e ^{\sum_{k=1}^kw_{c'k}h_k^i}}=0$$

I need to find the roots of this function but the function seems too hard to find so here is my question is there any way or approximation to find roots?