Neural network architecture capable of performing a sum by category?

77 Views Asked by At

I am wondering whether it would be possible to build a NN that can be trained to take 2D training examples (with a fixed number of rows) where the two columns would represent an amount and a category, and output a vector with the aggregated amount by type (lenght would be equal to the number of different possible types).

For example:

Input

(amount) (category)
0.2 1
0.5 2
0.7 3
1.1 1
0.1 2

Output

[1.3, 0.6, 0.7] (the aggregated amounts wouldn't need to be in any specific order)

I have some basic knowledge of neural networks but can't think of an architecture that would work (I've unsuccesfully attempted some). May be it could be achieved with convolutions using some special filters..

I guess the question goes beyond NNs and what I'm really asking if it is possible to come up with a sequence of matrix operations and activation functions to perform a "SUM GROUP BY".

Thank you!

1

There are 1 best solutions below

3
On

Okay, so I tried to do some testing and have come up with a partial solution to this inquiry. Given some more resources and time I think I could have a full solution and an accurate NN.

We want to show that an NN can sum by groups. If we first create a program/algorithm to collect the values of the individual groups into a list/array/vector representing that group, the problem of having an NN sum the values of that list/array/vector becomes much simpler. If this is not possible, i.e. it is not possible to preprocess the data format to sort the values in a dataset by group, then reinforcement learning would probably have to be used. However, if it is possible to preprocess the sorting by groups, then we have a solution.

Suppose that there are 40 groups, each with 400 values to their name (we could have a range of values to each group, but this would make things more difficult as you would either have to use a sparse tensor or augment each group with 0's numbering (length of the maximally size group) - (length of current group)). We want an NN to predict accurately the sum of those 400 values by group.

For this task we can employ an LSTM, but there are probably other options. What I have code below is more a proof of concept.

This program and the NN it contains produces the output at the bottom

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from sklearn.model_selection import train_test_split
import random as r

np.random.seed(1234)

X = np.array([r.random()+r.randint(0,5) for i in range(16000)])
X = X.reshape(400, 40, 1)

print(X)

Y = []
for x in X:
    Y.append(x.sum())
Y = np.array(Y)

trn_x, val_x = train_test_split(X, test_size=0.2)
trn_y, val_y = train_test_split(Y, test_size=0.2)

model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(100, activation='relu', return_sequences=True, input_shape=(40, 1)))
model.add(tf.keras.layers.LSTM(75, activation='relu', return_sequences=True))
model.add(tf.keras.layers.LSTM(50, activation='relu', return_sequences=True))
model.add(tf.keras.layers.LSTM(25, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['accuracy']
)

history = model.fit(
    trn_x,
    trn_y,
    validation_data=(val_x, val_y),
    epochs=8,
    verbose=1
)

tst = np.array([r.random()+r.randint(1,5) for i in range(40)]).reshape((1, 40, 1))
print(f'Actual: Sum of Tst {tst.sum()}')
tst_out = model.predict(tst, verbose=0)
print(f'Predicted: Sum of Tst {tst_out}')

model.save_weights(filepath='Epochs08.h5')

Now see the output

Epoch 1/8
320/320 [==============================] - 11s 33ms/sample - loss: 14464.3957 - accuracy: 0.0000e+00 - val_loss: 12610.6732 - val_accuracy: 0.0000e+00
Epoch 2/8
320/320 [==============================] - 4s 11ms/sample - loss: 10540.1985 - accuracy: 0.0000e+00 - val_loss: 11923.0959 - val_accuracy: 0.0000e+00
Epoch 3/8
320/320 [==============================] - 4s 12ms/sample - loss: 13577.3292 - accuracy: 0.0000e+00 - val_loss: 8517.4146 - val_accuracy: 0.0000e+00
Epoch 4/8
320/320 [==============================] - 4s 11ms/sample - loss: 11339.2666 - accuracy: 0.0000e+00 - val_loss: 11675.0383 - val_accuracy: 0.0000e+00
Epoch 5/8
320/320 [==============================] - 4s 12ms/sample - loss: 5248.6647 - accuracy: 0.0000e+00 - val_loss: 1544.8824 - val_accuracy: 0.0000e+00
Epoch 6/8
320/320 [==============================] - 4s 11ms/sample - loss: 673.1250 - accuracy: 0.0000e+00 - val_loss: 170.6239 - val_accuracy: 0.0000e+00
Epoch 7/8
320/320 [==============================] - 4s 11ms/sample - loss: 238.3624 - accuracy: 0.0000e+00 - val_loss: 179.2275 - val_accuracy: 0.0000e+00
Epoch 8/8
320/320 [==============================] - 4s 11ms/sample - loss: 184.1248 - accuracy: 0.0000e+00 - val_loss: 188.9833 - val_accuracy: 0.0000e+00
Actual: Sum of Tst 146.00304187273503
Predicted: Sum of Tst [[113.81824]]

While somewhat off, our NN would most likely become more accurate with more data, better fine tuning, and more layers or a better sequence of layers. I hope this helps in your quest. I will probably do some more testing for fun and to improve the accuracy; I wish you the best.