I am playing around with simple neural networks using Pytorch and I am confused about something. I think it should be able to solve the AND problem, since it is linearly separable, but after optimizing the parameters keep growing and so does the error. It works when adding a sigmoid activation function after the linear layer but I don't think that should be necessary. Can someone explain? Here is the gist of the code:
model = nn.Sequential(nn.Linear(2, 1))`
loss_function = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
pred_y = model(data_x)
loss = loss_function(pred_y, data_y)
model.zero_grad()
loss.backward()
optimizer.step()
