Problem 1 :
The input is 2 values, that can be in a scale between [-3.89, 10.66]
And i need to compare the difference between an oldValue (A) and a newValue (B).
So i want to create a variable that express this difference.
Let say :
A = -2
B = 3
diff = ?
If i do a simple substraction it doesn't capture the fact that it switches side from negative to positive.
For example if A=2 & B=7, it shouldn't give the same "difference" than the case mentionned above, since it stays positive.
What formula should i use ?
Problem 2 :
Problem 2 is similar as problem 1 but a bit more tricky.
I have :
- M is the mean of multiple values in this scale (probably not normally distributed)
- S is the standard deviation of the multiple values
- X which is a new value i want to compare to the mean/std of other values.
M and X can be treated as A and B in Problem 1. But i also want to use the standard deviation S.
The "difference" variable between X and M should be reduced by the standard deviation :
Logically, the greater is the standard deviation, the lower is this "distance" variable of X.
So I was thinking of something like :
Distance = diff(X,M) - weight(S,X,M)
But i can't find a good formula.. My mathematics and statistics knowledge are lacking. Maybe i'm overthinking and missing a simple solution.
Hope i was clear enough about the issue and thank you in advance for your help!
EDIT :
Background of this problem :
I want to use machine learning on a set of data.
The scale mentioned above is the IDP-hydropathy scale of amino acids (AA), or simply put : their affinity with water.
idp_scale.
I have an alignment of proteins, and at each position on this alignment, i captured the different amino acids, and their hydropathy scores.
For example :
Position : 1-2-3-4-5-6-7
Protein 1 : A-L-Y-V-I-A-A
Protein 2 : A-I-Y-V-I-A-A
Protein 3 : A-I-Y-V-I-V-A
...
So for example let say there is a mutation on Protein 1 at position 6 :
A > P
The hydropathy score changes from A(0.91) to P(-3.89).
And let say the mean of all hydropathy scores at this consensus position ((A(0.91)+A(0.91)+V(4.64)+..)/number_of_AAs) = 1.2.
I want to create a variable that captures the change of hydropathy from the old AA to the new AA, and another variable comparing the new AA to the mean of all AA at the consensus position, then insert these 2 values as features for a machine learning model.