Are weights updated differently in a regression network vs. a classification network?

33 Views Asked by At

Are the weight of a neural network updated differently due to back propagation for a classification network vs. a regression network, if so how?..

My concern comes due to the both network uses different cost function, hence the update must be different as well.

I am under the intuition that log regression often are used for classification, and linear regression is used for normal neural network for regression problems..

Hence must the update of the weight also be different.. If the statement is correct? how does one in a classification process update the weights differently?..

I know that the cost function is defined as

\begin{equation}\label{eq:gradient_softmax} \begin{split} J(W,b,z) &= \left[ \frac{1}{n} \sum_{n=1}^{N}\sum_{i=0}^{M} -(y_n=i) \log\left(\frac{e^{z_{(i)}^{(n)}}}{\sum_{k=1}^{M}e^{z^{(k)}}}\right)\right]\\ \end{split} \end{equation}

But from this how is the weight and bias updated, or the gradient defined?...

1

There are 1 best solutions below

0
On

In principle, the weight updates are done the same way.

Yes, the cost function is usually different and also the last activation function. But this only means your derivatives are different. The algorithm itself (gradient descent) is still the same.

One cost function that is wide-spread in literature is mean squared error (MSE):

\begin{align} E(X_i) &= {(p(X_i) - t(X_i))}^2\\ E(X) &= \sum_{i=1}^{|X|} E(X_i) \end{align}

where $X$ is your mini-batch, $p$ is your network, $p(X_i)$ is the networks prediction for the sample $X_i$ and $t(X_i)$ is the truth for that sample. I've never seen an error function which is not point-wise defined (at least none comes to my mind right now)