Predicting the increase/decrease of number

223 Views Asked by At

I have these entries in my database that looks like this:

ItemID  Coefficient    Time 
14      1.74           "2014-12-21 17:13:29"
14      1.73           "2014-12-21 17:13:30"
78      2.55           "2014-12-21 17:17:11"
78      2.56           "2014-12-21 17:18:12"

For each item I have an entry for each second for a a period of ~5 minutes. So each item has ~300 data entries. All in all I have thousands of items, therefore lets say 300 000 entries.

I would like to know in average, is there a tendency that the Coefficient within these 5 minutes will rather drop or increase over all these items and how strong is this tendency.

I can see it visually that one items coefficient does fall or increase, but what algorithm should I use to find it out over all of these items?

I am not strong in maths, but I was googling about it, and found out about regression, but not sure how to exactly use it.

Can you give me a hint or suggestion, which algorithm I should use to analyze this data?

1

There are 1 best solutions below

6
On

The resulting price of $i^{th}$ item is a sum (or product, or some another combination) of many random variables. The only way to define the bounds of possible result is to find probability distributions of the most meaningful ones and guess about how are they connected using some tests.

Statement: N objects, each one have almost continuous function $P_i(t)$ where $P$ - price, $t$ - time.

Question: what can we say about $P_i$ and how it changes over time?

I see 3 possible solutions:

  • Define several most important random variables $(v_1,v_2,...)$ and try to find $f_i$ such that $P_i = f_i(v_1,v_2,..)$ where $P_i$ - price of $i^{th}$ item. You can do it by making/refuting hypotheses (long iterational process). Don't forget, $v_j$ may change over time periodically (for example, peoples usually start buying useless stuff in december 20-31). This is a way of R&D (never ends and may not give concrete results).

  • Find some mathematical model in papers/blogs/conferences materials and try to apply it for your data. Choose model from related sphere, there is no some good standart approaches for all possible data, because online shop differs from shoes store. It's possible to find some approximations/facts that may help . For example - suppose we know how many visitors online store has and how many purshcases were made. What can we say about visual design efficiency? I'm sure someone has already made some research steps to find it out.

  • Find a company or analytics service, pay some $, and they will do both prevous steps for you (most probably, not so good as you expect).

=================================================

Regression is the method to estimate the dependence of random var $Y$ from $X$ using some mathematical model + some amount of data. For example linear regression $$Y = K*X + B$$ where $X$ - problem difficulty, $Y$ - time needed to solve it. Find good values for $K$ and $B$ by least square method using your data. Check you model by suitable statistical test (Fisher's or Pearson's tests) and make a conclusion whether linear model is adequate (in this case - no, there must be qudratic or even cubic dependence). Moreover, maybe $Y$ is the result of two or more random variables $Y = f(X,Z,..)$.