How can I optimize this neural network? - Basic Deep learning

94 Views Asked by At

I'm trying to understand deep learning and I think all has to do with least squares for non-linear algebra.

Let's say that we have inputs $X$ and outputs $Y$ in a 2-layer neural network, e.g deep learning. My goal is to find $a(i,j), b(i,j), c(i,j)$ where $j$ stands for which neural we are focusing on e.g $2$ or $5$ and $i$ stands for which line for the neural we are focusing on e.g line number $1$ or line number $3$

On other words $a(i,j), b(i,j), c(i,j)$ are matrices.

enter image description here

If we want to display this in beautiful linear algebra. We write:

$$Z1 = X^T a(i, j)$$ $$Z2 = Z1 b(i, j)$$ $$Y = Z2 c(i, j)$$

Or we can put all together: $$Y = X^T a(i, j) b(i, j) c(i, j)$$

Notice that $X$ and $Y$ changes for each data set.

Question:

How can I find $a(i, j) b(i, j) c(i, j)$ if I know $Y$ and $X$?

I'm assuming that I will get lots of solutions depending on which $Y$ and $X$ I'm using. What solution should I use then? The best fit? How?

1

There are 1 best solutions below

0
On BEST ANSWER

You will use the gradient descent method to backpropagate the error of your prediction to lower layers. Your goal is to adjust the $a(i,j), b(i,j)$, and $c(i,j)$ values to minimize the error. Minimizing the error is usually represented a minimizing a loss. A loss function usually used is $L= y\log(y_0)$ where $y_0$ will be the prediction you get from the model and $y$ is the actual values $Y$ should have. By gradient descent you will calculate $\frac{dL}{da(i,j)}, \frac{dL}{db(i,j)}$, and $\frac{dL}{dc(i,j)}$. Then you will update the parameters as follows;

$$ a(i,j)= a(i,j)-\alpha dL/da(i,j) \\ b(i,j)= b(i,j)-\alpha dL/db(i,j) \\ c(i,j)= c(i,j)-\alpha dL/dc(i,j)\ $$

where $\alpha$ is the learning rate. You will keep doing this for multiple data samples for multiple iterations until you get a model that shows a low loss.