Identifying distribution for scenarios

203 Views Asked by At

I'm trying to identify the most appropriate distribution to be used for variables given in a scenario for a bank.

There are 4 distributions which are, the Gaussian Distribution, the Bernoulli Distribution, the Binomial Distribution and Poisson Distribution.

(a) Relationship status of the applicant (single or partnered) - My answer would be Binomial Distribution.

(b) Number of previous times defaulted on loan repayment? - My answer would be the Gaussian Distribution

(c) Income in last financial year. - My answer would be Poisson Distribution

(d) Number of dependents (children, spouse) of applicant? - My answer would be Bernoulli Distribution.

I was wondering if i've identified the correct distribution for each scenario and would like some feedback on it.

1

There are 1 best solutions below

0
On

Clues, comments: @Aaron Montgomery is right to disagree with some of your choices. Here are clues to get you started toward better choices.

(a) Bernoulli. One observation per applicant (result Single or Not).

(b) Not Gaussian. Integer number of defaults is 0, 1, 2, ... (with no set limit).

(d) Not Bernoulli. As in (b), integer number of dependents (no limit).


Notes:

Binomial must have a number $n$ of independent trials, known in advance. Might be number of months in last 12 credit card balance paid in full. Values are integers from $0$ through $n.$

Poisson random variables have integer values with no practical limit on the largest value.

Normal applies to a quantity that can be considered as continuous random variable, not integer (except for rounding). Height, Weight, bank balance might be viewed as continuous. But to be considered specifically as normal the distribution should be symmetrical, so weights of people in general are not usually modeled as normal ['people' 250 lbs (115 kg) underweight would be dead and not in the population; some people 250 lbs overweight may be on diets, but are alive and in the population].

Similarly, incomes are often considered as continuous, but in real life they too are highly skewed towards higher values; maybe the answer to (c) was supposed to be 'Gaussian' because that's the only continuous distribution on your list, but it's not really a good choice.