![]() Descriptive
Statistics
There are variety of ways that can be used to compare distributions, including dot plots (to get an idea of the spread of the data) as well as the more traditional summary statistics (mean, median, etc.). Moroever, descriptive statistics include bar charts and histograms, as well as finding relationships between variables using visual association & correlation. Let's begin by assuming that we have a data set that contains hypothetical scores on an exam for 11 different introductory psychology classes. There were 100 points possible on the exam, and there were 50 students in each class. We want to compare the performance across classes. Frequency Counts Using Dot Plots Dot Plots can be used to provide a simple frequency chart for each class thus enabling us to compare the resulting distributions to each other. Creating these simple frequency charts is very easy and is best done using the Interactive Graphs menu. To begin, simply click on the Graphs menu and choose Interactive. In that sub menu, choose Dot. The kind of chart we are going to create is called a "dotplot", in which each dot represents a number (or several numbers, if they happen to be the same) in the distribution. When the Create Dots panel appears, you will see a list of all the variables (the classes in our case) in the left window and spaces representing the vertical and horizontal axes in the right window.
All of the Interactive Graphs are done with a drag-and-drop technique (see the description of preparing histograms for more details). Simply highlight the class that you want to examine (Class A, in this case), and drag it into the box for the horizontal axis. When you have done that, click OK, and the chart will pop up in the Output window. The window also provides a number of options for adding titles, lines, and various other accessories to your chart. Now do the same thing for Classes B and C. (Notice that when you choose another variable to plot, it automatically replaces the one in the horizontal axis box.) Fortunately, SPSS makes it extremely easy for you to examine these charts and compare them. To do so, simply locate the small icon in the left panel that represents each of the charts. When you single-click on one, you will see it in the Output window.
Numerical Summaries of Distributions To produce numerical summaries of any distribution just go to the Analyze menu, choose Descriptive Statistics, and then choose Frequencies. When you get the Frequencies dialogue box, simply highlight each variable name, and then click on the arrow to put it in the right side of the box. You can do the variables one at a time or all at once. Then click on the box labeled Statistics. This will take you to a dialogue box labeled Frequencies: Statistics, where you can decide which descriptive statistics you would like to have calculated.. When you selected the statistics you want, click Continue. Then click Ok, and SPSS will calculate the descriptive statistics.
Bar Charts One of the best ways to present data is by using a bar chart or a histogram. Bar charts are appropriate when you have discrete sets of data (males, females) while histograms should be used when you have a continuous variable (grade point average). Below are instructions on how to produce each type using SPSS. Consider a situation where we have data on Allegheny graduates in various majors for 1998 and we want to be able to compare the numbers of graduates from the various majors. The easiest way to do that graphically is to create a bar chart. To do that, follow the following steps. First, select Bar… under the Graphs menu. When the Bar Charts panel comes up, you will see that you have several selections of the kind of bar chart you want. (We have selected Simple.)
In this case, we want the bars to represent the number of graduates for each major, so we highlight that variable in the list on the left (by clicking on it), and then we click on the small arrow to move that variable into the Bars Represent box. Under Category Labels we are given two choices, Case Number or Variable; we want to see the major names under each bar (instead of the case numbers, which would simply be 1,2,3,4, etc.), so we click the button for Variable, highlight the "major" variable on the left and then click on the arrow to move it into the Category Labels box. The Define Simple Bar dialogue box should now look like this:
We have now finished defining our bar chart. Click OK on the upper right corner of the panel, and in a few moments, your finished graph should pop up on to your screen! There are a few things you should know about the SPSS output file that you have created. The information comes in an Output window, and it has two parts. On the right is the output from whatever procedures you have just completed. On the left is the Outline – an outline representation of all the recent SPSS procedures that you have run. You can explore this panel as you want, but the main thing to remember is that it can be a good way of helping to keep track of the procedures that you have run. You can also edit, rearrange, and delete items in your output file using this feature.
Histograms In situations where the variable of concern is continuous, a histogram (bar will touch) is more appropriate. Consider a situation where we can to examine the lengths of the reigns of British rulers since 1066. Here is the data Reigns of
British Rulers since 1066
Rather than simply presenting a figure where each king or queen is represented by a bar, we want a histograms that summarizes the data by grouping the information together into categories. In this case, the categories will be ranges of lengths of reigns. To begin, open a new Data Editor in SPSS. We are going to enter in all of the lengths of the reigns. However, since we are not going to use the individual kings’ and queens’ names in this analysis (since we are going to group the data together into categories), we only need one column, which you should name reign and label as "Reigns of British Rulers." Define that variable (it is numerical, and we don’t need any decimal places), and enter the data into the data editor. Once you have entered the data, we need to tell SPSS to create a histogram that displays it. There are two ways to do this, and we will try both. First, down towards the bottom of the Graphs menu you will find an item called Histogram … Click on this item. In the left portion of the panel you should see the description of the "reign" variable.
Simply highlight that variable label and then click on the arrow to insert it under Variable. Then click OK, and in a few moments, you should have a histogram. Changing the Look of Histograms One thing that you may have noticed is that SPSS decided how many intervals to create for your histogram and what the width of those intervals should be. However, you can change that if you want, and changing the interval width can change the look of the histogram significantly. To change the interval width we will use a slightly different feature of SPSS, which is called "Interactive Graphs". This part of SPSS will do much the same thing as the histogram program that we used above; the only difference is that with an interactive graph, we can control more of how the histogram will eventually look. Select Graphs, then Interactive, and Histogram to get the panel seen below.
This panel works on the principle of "drag and drop"; instead of typing in the variable names that you want, you can simply highlight the one you want and drag it to its proper place. For example, you see in the panel two arrows, representing the vertical and horizontal axes of the histogram. The vertical axis is already labeled "count," since it will simply show how many of the reigns fall into each interval. Now highlight the variable "reign" and drag it into the open space on the horizontal axis, since that is where we want it. To customize the look of the histogram, we now simply click on the tab labeled Histogram. There are all kinds of things we can do at this point, such as superimposing a normal curve over the histogram, or giving it a 3-D look; you can experiment with those when you want. For now, the most important feature is the Interval Size box. Here you can either ask SPSS to set the interval size, or you can specify a particular number of intervals or width of intervals. To see how different the histogram might look with a different number of intervals, count how many intervals were in your original histogram, and use half that number here. Then click on the Titles tab, and type in a title for your histogram in the top space. Then click OK to create the histogram. Now choose a different interval width, and create a histogram.
Visual Association The relationships between a number of variables are often best examined by looking at some type of scatter plot of pairs of variables. By looking at these plots not only can relationships be oberved but outliers, problems of limited range, etc. can become obvious. To prepare scatter plots, simply enter the data into SPSS, one variable per column of data. To make the output more readable, make sure that you have provided labels for the variables. Then pull down the Graphs menu and choose Interactive > Scatter. Put the one variable on the Y (vertical) axis and another variable on the X-axis. The Interactive graphs make use a drag-and-drop technique. Highlight the variable you want to be on the Y-axis and drag it to the vertical box. So the same for the X axis (drag the name of the variable to the horizontal box. The Create Scatterplot box should look something like this.
Unless you are doing a plot with more than two variables you will not need to make use of the Legend Variables options.
Bivariate Correlation n addition to examining our data using scatter plots, we will often want to examine correlation coefficients among the variables. Once you have entered your data (one variable per column), simply click on the Statistics menu, then go to Correlate, and choose Bivariate (because we have two variables). When you get to the Bivariate Correlations dialogue box, simply insert both variables into the Variables box, check the Pearson box (if your data is interval/ratio, otherwise choose Kendall's tau or Spearman's correlation), and click OK.
Your output will provide you with correlation coefficients between each pair of variables you selected (three in the above example). ![]() In addition, a measure of significance of each correlation (is the correlation significantly different from zero) is included. Note that the correlation between a particular set of variables is given twice (Miles per Gallon vs. Time to Accelerate as well as Time to Accelerate vs. Miles per Gallon). Correlations between a variable and itself are shown as being 1.000.
5/00
|