How do you get from a vector of partial derivatives to a scalar?

30 Views Asked by At

I am teaching myself ai / machine learning without libraries

I understand most of the derivative of the softmax activation function

if I have 3 nodes in a layer

then the softmax activation equation becomes

e^x_i / sigma e^x_i

and its derivative returns a vector of partial derivatives that can be described as

node 1 = {e^x_1(1 - e^x_1), e^x_1(0 - e^x_2), e^x_1(0 - e^x_3)}

node 2 = {e^x_2(0 - e^x_1), e^x_2(1 - e^x_2), e^x_2(0 - e^x_3)}

node 3 = {e^x_3(0 - e^x_1), e^x_3(0 - e^x_2), e^x_3(1 - e^x_3)}

which is a real result described by the equation

softmaxActivation(i) * (kroneker - softmaxActivation(j))

which means in a layer that contains 3 nodes. each node will have a derivative make up of 3 partial derivatives

how do I get that back to a scalar? Is it true that i need a scalar for backpropagation? can I use the directional derivative ( a new concept to me ) - which can be described as gradient * vector

or in a slightly more tangible description directional vector for node 1 = ((e^x_1(1 - e^x_1)) * e^x_1 + (e^x_1(0 - e^x_2) + e^x_2) + (e^x_1(0 - e^x_3 + e^x_3)

Im teaching myself so any guidance would be appreciated thank you