Consider a set of starting salaries for engineers: $\$42000,$ $\$43412,$ $\$45500,$ and $\$53750.$
a) what percentile corresponds to a salary of $\$45,500$?
b) what salary corresponds to the 50th percentile?
c) What salary do you require to be in the top third?
d) To what percentile does a salary of $\$49000$? Use linear interpolation
I know how to do a. But I am struggling with b, c and d. If you could, please clarify to me the concept of linear interpolation. Any help would be highly appreciated.
There are numerous rules for calculating percentiles from a list of observations: at least nine different ways, none of which produce exactly the same results in general.
In some methods, the $P$th percentile is always one of the $N$ given data points. The graph at https://en.wikipedia.org/wiki/File:Percentile.png gives an example of one such method used in the Wikipedia article on percentiles.
In other methods, the $P$th percentile can be interpolated between data points. The graph at https://en.wikipedia.org/wiki/File:Percentile_interpolation.png illustrates three different schemes for doing this.
The general idea of interpolation is that rather than just describe the given observations, we assume these observations come from a much larger population (or a random distribution) in which values between these observations also occur. This seems like a reasonable assumption for the starting salaries of engineers, unless there is some law or other agreement that no company should ever offer any starting salary other than one of the four given values.
What is a little unclear is what method your book uses. From the formula "$(100 + i/(n+1))$th percentile" in some comments, I would infer that the book uses an interpolating formula equivalent to the $C=0$ curve of at https://en.wikipedia.org/wiki/File:Percentile_interpolation.png. Some other sources call this an $(N+1)p$ method, where $p=P/100$ for the $P$th percentile. According to that method, the observations $42000,$ $43412,$ $45500,$ and $53750$ should be (respectively) the $20$th, $40$th, $60$th, and $80$th percentiles of the population, and any other percentile between $20$ and $80$ gets interpolated. For example, the $30$th percentile value would be halfway between the $20$th percentile value and the $40$th percentile value. The $75$th percentile value would between the $60$th and $80$th percentile values, but closer to the $80$th percentile. Since $20$ percent of all salaries supposedly are in that interval, which has a width of $8250,$ we assume $15$ percent are in the first $\frac34$ of the interval and $5$ percent are in the remaining $\frac14$; that is, we assume the salaries within that interval are uniformly distributed between the minimum and maximum. The $75$ percentile should therefore be $\frac34$ of the way up from the $60$th percentile and $\frac14$ of the way down from the $80$th. That is, the $75$ percentile is $$45500 + \frac34(8250) = 53750 - \frac14(8250) = 51687.5.$$
Working that $75$th percentile again using the book's method, we solve $75 = 100i/5,$ so $i = 3.75.$ So again we are looking for something between $45500$ (the $3$rd observation) and $53750$ (the $4$th observation), but closer to $53750$; in fact, it should be $\frac34$ of the way up from $45500.$
To get a percentile from a salary between two of the observed salaries, we again interpolate between the observations. One nice property of linear interpolation between two points is that the inverse of a linear function is also linear, so we can use linear interpolation in both directions.
If we're really supposed to use the $(N+1)p$ rule for percentiles, however, it's curious that the book finds it necessary to remind you to interpolate for part (d) but does not say to interpolate part (c). (The top third of salaries would start at the $66\frac23$ percentile, and $66\frac23 = 100i/5$ is solved by $i=\frac43\approx3.333.$)