Determine Patterns

82 Views Asked by At

I have some weather data that I would like to analyze. I have about a millions rows of data, and each row has about 100 attribute values. Each attribute value represents some measurement (i.e., temperature in that area, humidity, co2 levels, etc) when means that each attribute has a different scale and range for its values. Finally there is a result column that has the amount of rain in the next 24 hours. What I want to is determine which attribute patterns result in what likeliness of rain. Such as when attribute 1 is between value a and b and attribute 2 is between c and d, there is a 90% chance of at least 1 inch of rain and a 30% chance of at least 6 inches of rain.

I am trying to figure the best method to determine these patterns recognizing that some of the attributes may have no bearing, some may overlap other attributes, and there could be multiple patterns like attributes 1,2,4 or 8,11,34.

Hope that makes sense. Any help is appreciated

1

There are 1 best solutions below

0
On BEST ANSWER

This is a machine learning problem and there are many different techniques with varying levels of effectiveness, depending on the domain, and on exactly what question you want answered.

The most straightforward is linear regression, which assumes that there is some underlying linear mechanism that connects the input feature values and the output variable. This gives you an answer in the form of a linear function of the input variables which will predict the value of the output variable given new input values.

It is usually a better idea to use a tool such as Matlab (or the free equivalent Octave), Mathematica, or some other tool. Writing this kind of application yourself is likely to be incorrect, inefficient or both (floating point errors, for example, can quickly add up to a significant error.)

It would also be a good idea to take an online course in machine learning (such as Stanford University's excellent course taught by Andrew Ng). This is the only way to determine what is the best technique to solve the problem you are really trying to solve.