The Normal Distribution

Understanding the normal distribution is vital to understanding all sorts of statistics, to understanding signal detection theory, and in a host of other applications.  This tutorial is a review of what we know about normal distributions.

Distributions

So first, what exactly do people mean when they refer to a distribution?   The answer is simple. A distribution is just a collection of scores arranged to indicate how common various scores are.  So for instance, after a test your instructor may list the class grades like this:
 
Score Proportion of Students
95-100 .01
90-95 .02
85-90 .03
80-85 .08
75-80 .21
70-75 .15
65-70 .1
60-65 .2
55-60 .1
50-55 .1

For each range of test scores, a certain proportion of students achieved that score. What your instructors are doing when they list test scores like this is providing you with the distribution of test scores.  Notice that if you add up all of the proportions they add up to 1 (i.e. 100%).

You can also illustrate a distribution graphically if you want.  So your instructor may show the grades like this instead:

In fact when you encounter discussions of distributions in the psychological literature they are typically depicted graphically.  The important point is that all a distribution is telling you is the relative frequency of various scores.

Now remember that a distribution is simply a collection of scores.  Because of that, like any other collection of scores you can calculate descriptive statistics for those scores.

Z Scores

Sometimes its necessary to compare scores that come from different distributions.  Imagine for instance you work in a personnel office and need to provide rankings of job applicants based on their GPAs.  The applicants you get all come from different colleges and universities.  Some of the applicants come from universities where the grading is very tough, there are a lot of flunk out courses, and its really hard to get an A in any course.  Othe applicants come from universities where grade inflation is rampant, almost everyone in every class gets a B and about half of the students in every class get an A.  Obviously given a situation like this, you couldn't just look at the applicant's GPAs when deciding how to rank them.  A student from the first university with a 3 point is above average, a student from the second university with a 3 point is probably below average.  But how exactly would one account for the difference.

One of the easiest approaches to dealing with this problem is to use the standard deviation as your yardstick.  If you know the mean GPA at each university and you know the standard deviation of the GPAs at each university then you can provide each applicant with an adjusted score that indicates how many standard deviations above or below the mean each applicant is.

Say Fred has a mean GPA of 3.5.  He comes from a University where the average GPA is 2.0 and the standard deviation is .5.  Fred's GPA is three standard deviations above the mean.  Say, Beevis has a GPA of 3.5.  He comes from a University where the average GPA is 3.0 and the standard deviation is .25.  Beevis's GPA is two standard deviations above the mean.  Assuming the overall quality of the students at both universities is the same (which could be checked with incoming SATs and so on) then it appears that Fred is the better applicant, even though they both have the same GPA.

Statisticians use the term Z score to refer to how many standard deviations above or below the mean a score is.  If someone's Z score equals 3, that means their score is three standard deviations higher than the average score.  If someone has a Z score of -2, that means their score is two standard deviations below the average score.  And so on.

Z scores are important because they allow you to take things that are measured on different scales an equate them.  Consider for instance, the SAT and ACT.  The SATs were designed to have means of about 500 for each subscale and standard deviations of about 100 (these have changed over time with the population, but this was true of the original test).  ACTs as you know have a much lower mean and standard deviation.  Imagine you wanted to compare two students, one who took the SAT and one who took the ACT.  How would you do it?  The answer is really pretty simple, calculate the Z score for each student.  The Z score puts both tests on the same scale.

Normal Distribution

Above we talked about distributions generally.  But it turns out that in nature a distribution with a particular kind of shape occurs quite frequently.  It's called the normal distribution.  Scores on things like height, weight, IQ scores, etc. all seem to be distributed normally.  Normal distributions are symetrical about the mean and they are bell shaped.  Mathematicians have studied the normal distribution so much that there is even an equation that exactly describes it (although you don't need to know that equation).  This is what a normal distribution looks like:

Notice that this is plotted in terms of Z scores.  You can also plot one in terms of raw scores.  If you do so, it will maintain the same basic shape except that the X Axis will be either stretched or shrunk.  In fact, imagine that this is printed on a big rubber sheet and that it can be stretched or shrunk along the X Axis as much as you like.

From Z Scores to Proportions

One of the most useful properties of the normal distribution is that for any Z score statisticians have been able to figure out the exact proportion of scores that fall above and below that score.  So if your Z score on a test is 2, and the scores are normally distributed, then exactly 97.72% of the test scores fall below yours.  Here are some Z scores and the proportion of cases that fall above or below that Z score in a normal distribution:
Z Score Scores Below Scores Above
-3 0.001349967 0.998650033
-2 0.022750062 0.977249938
-1 0.15865526 0.84134474
0 .5000000000 .500000000
1 0.84134474 0.15865526
2 0.977249938 0.022750062
3 0.998650033 0.001349967
 
Before you go too far in this tutorial remind yourself of what all this means.  Z Scores are simply how many standard deviations from the mean a score is.  So when we say that 84% or so of scores fall below a Z score of 1, what we're really saying is that if your score is one standard deviation above the mean, then 84% of the scores are below you.  Or in other words when you're one standard deviation above the mean, your percentile rank is 84.

You can find Z scores in the tables at the end of almost every stats book and many research methods books.  If you're familiar with excel there's also an easy function you can use to calculate the proportion of scores that fall below a particular Z score.  The function is called NORMSDIST.  Say you wanted to know how many scores fall below a Z score of 1.5.  You would type the following into a cell in your excel worksheet:

=normsdist(1.5)
 
When I did that the answer I got was 0.933192771.  So about 93% of all scores fall below a Z score of 1.5.  To calculate the proportion of scores above that Z score you would subtract that answer from 1.  So you could type in:
=1-normsdist(1.5)
 
The answer I got for that was 0.066807229.  So one of the most useful things about a normal distribution is that once you know a Z score you know the proportion of scores that fall above or below that score.

From Proportions to Z Scores

Say you are still working in the personnel office of a company and you need to hire about 15 people per year.  You get about 100 applicants a year.  The problem is the applications don't come in all at once, but rather come in on a regular basis throughout the year.  For every applicant you determine a overall score that takes into account their grades, test scores, activities, letters of recommenation and so on.  From a very large study you know that the mean overall score is 5.6 and the standard deviation is .4.  So as every application comes in you're able to calculate a Z score that places that applicant relative to all the other applicantions that are likely to come in during the year.

Your problem is that you want to accept about 15% of the applications and you want to know, where you should put the cutoff such that 15% of the applicants will be accepted and 85% will be rejected.  This problem then, is just the reverse of the problem discussed above.  For what Z score, will 15% of the scores be above that Z score and 85% be below that Z score.

The answer to this problem is quite easy.  If knowing a Z score tells you the proportion of scores below that score, then the reverse must also be true.  If you know what proportion of scores fall above or below a score, you can derive the Z score.  Its just the inverse operation.  Again you can use the tables in any stats book.  You can also use a formula in excel called NORMINV.  To use it you need to type in the probability below the Z score and then 0 and 1 for the mean and standard deviation (when you convert to Z scores the mean becomes 0 and the standard deviation becomes 1).  Here's how you would use the function if you wanted to know the Z score where 95% of the scores fall below that score:
 
=NORMINV(.95,0,1)

Notice the number I typed in was the proportion .95 not the percentage 95.  It will only work if you use the proportion.  The answer I got from excel was 1.644853.  So if someone's percentile rank is 95, then their score is 1.64 standard deviations above the mean (i.e. they have a Z score of 1.64).

This probably goes without saying, but you can also solve the problem if instead you're given the proportion of scores above a particular cutoff.   So say you want 8% of the score to fall above the cutoff.  You would now enter:
 
=NORMINV(1-.08,0,1)

When I did that the Z score I got was 1.405073817.

Take Home Messages:

 

 
University of Arkansas
Department of Psychology
Graduate Program in Experimental Psychology
Lampinen Lab
False Memory Reading Group
Other Tutorials