Using GPA and Class Rank/Percentile Data to create a regression based on the assumption of a normal distribution.

101 Views Asked by Bumbble Comm At 26 Mar 2026 - 10:15

I was interested in seeing if I can use just individual data points, knowing what the percentile of those GPA values is to be able create a normal distribution to predict all other GPA values.

For example: Given- Top 10% of all GPA's are above 4.422 Rank 12/1306 has a GPA 4.664 Rank 1/1306 has a GPA 4.727

Is it possible to derive the mean and standard deviation of the normal distribution based only on that information?

Using z-scores, I get the following equations where y is standard deviation and x is mean: see this

But there is no unified solution for the system of all three equations and each one on its own gives a distribution that does not seem accurate.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 20 Dec 2021 - 10:48

The reason the 3 equations will be inconsistent is that the real GPA values do not follow a perfect normal distribution but are random variables assumed to be given by a normal distribution in aggregate. I think you need to formulate the task as a maximum likelihood problem i.e. what values of $y$ and $x$ maximise the likelihood of observing the real GPAs assuming they are normally distributed and that they are the given percentiles of the observed dataset. The likelihood of observing the data in this case would be:

$$\begin{align}L(x,y|\mathbf{g},\mathbf{p},g \sim N(x,y))&=\prod_i^nP(g_i,p_i|g\sim N(x,y))\\ &=\prod_i^n B(C(g_i|x,y),(1-p_i)N,N) \end{align}$$

where $\mathbf{g}$ is the vector of observed GPA values, $\mathbf{p}$ is the vector of associated percentiles with those GPA values, $N(x,y)$ is a normal distribution with mean $x$ and standard deviation $y$. In the second line we use the binomial distribution $B(s,A,N)$ where $s$ is the probability of success, in this case the probability the GPA is to the left of $g_i$, which if normally distributed is the cumulative normal distribution $C(g_i|x,y)$. $A$ is the number of successes, in this case the number of data points to the left of $g_i$ which is related to the percentile, and $N$ is the size of the dataset. This is a non trivial likelihood function and without a closed form expression for the cumulative normal distribution $C(g_i|x,y)$, probably can only be solved numerically. I think an easier approach is using the whole dataset and standard sample mean/standard deviation estimators.

Hope this helps.

Using GPA and Class Rank/Percentile Data to create a regression based on the assumption of a normal distribution.

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in NORMAL-DISTRIBUTION

Related Questions in REGRESSION

Related Questions in REGRESSION-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions