Module #1. Calculations and observations based on a small dataset
Module #1, Interactive Exercise #1
In this exercise, you will learn how to use the WISE regression applet to deepen your understanding of regression. Using the very small set of data shown below, we will step through relevant regression values and see how they are calculated and how they are represented graphically. Answers are provided for all problems in Module 1 at the end of this handout. If you have a copy this handout, go directly to the applet.
Set up the applet: From the ‘Select a Lesson:’ menu in the lower right hand corner of the applet, choose ‘Regression.’ Remove the checks from all boxes except for the box: Show Regression Line.
X 
Y 
1 
2 
3 
5 
5 
7 
7 
6 
a. Correlation, Slope and Yintercept. . The applet provides these statistics, which are important for regression analysis. Find these terms in the applet and enter each below. If you get stuck, you may check answers 1a.
r (correlation) = __________
slope (b_{y}) = __________
yintercept (a) = _________
b. The regression equation.
The
regression equation is a formula for the straight line that best fits the data.
Later we will learn exactly how ‘best fit’ is defined. The regression
equation can be used to predict the Y score (called Y´, or Yprime) for each of
our x values. The general form of the regression equation is Y´ = a +bX .
X 
Y 
Y´ 
1 
2 
2.9 
3 
5 

5 
7 

7 
6 

SS Total (Total Variance)
SS
total is the sum of squared deviations of observed Y scores from the mean of Y.
This is an indication of the error we expect if we predict every Y score
to be at the mean of Y. (If X is
not available or if X is not useful, then the mean of Y is our best prediction
of Y scores.)
X 
Y 


1 
2 
25 = 3 
(3)^{2} = 9 
3 
5 


5 
7 


7 
6 





Sum = 
d.
Now in the applet,
place a check mark in the boxes
titled Show SS Total and Show Mean of Y and remove all other checks.
The vertical black lines represent the deviations of each case from
the mean of Y. Verify the correspondence of the length of these lines with the
values in the table for the column
. Which case has the largest deviation from the mean?
The
largest deviation from the mean is _____ for Case ___.
Hint:
Look at the graph in the applet and at your calculations in the table.
Check answer 1d.
e.
Now check the box labeled Show Error as Squares. The sizes of the black
squares correspond to the squared deviations from the mean, and the sum of the
areas of these squares corresponds to SS Total. Notice how the deviations from
the mean for the first and fourth cases are 3
and +1, while the squared deviations are 9 and 1. This shows how points
farther from the mean contribute much more to SS Total than points closer to the
mean. What is the contribution of the second case to SS Total? Why?
The
contribution to SS Total for Case 2 is ______ because (answer below)
f.
Now calculate the sum of the squared deviations from the mean
. You can do this by adding the values in the column headed
.
= SS Total = ________.
In the applet, SS for Total = ________.
Check answer 1f.
g.
Explain what SS Total means. How
would the plot differ if SS Total was much smaller, say 2.00?
What if SS Total was much larger, say 100? Check
answer 1g.
SS
Error
SS
Error is the sum of squared deviations of observed Y scores from the predicted Y
scores when we use information on X to predict Y scores with a regression
equation. SS Error is the part of
SS Total that CANNOT be explained by the regression.
h.
Calculations. Complete the calculations below using the predicted scores (Y´)
calculated in question 1b. The sample mean is 5.0 for every case.
Check answer 1h.
Case 
X 
Y 
Y´ 
(Y
 Y´) 
(Y  Y´)^{2}^{ } 
1 
1 
2 
2.9 
22.9
= 0.9 
(0.9)^{2}
= 0.81 
2 
3 
5 



3 
5 
7 



4 
7 
6 



Sum 
16 
20 



i.
Now place check marks in the boxes titled Show Regression Line and
Show SS error, and remove checks from all other boxes. Deviations of the
observed points from their predicted values on the regression line are shown in
red.
The
largest deviation is for Case ____, and the size of the deviation is ______.
The
smallest deviation is for Case ____, and the size of the deviation is ______.
j. Now check the box titled Show Errors
as Squares. The sizes of the red squares correspond to the squared
deviations. In the table for part h, compare the squared deviations shown in the
last column for Cases 2 and 3. Observe how the red boxes for Cases 2 and 3
correspond to these values. The sum of the squared deviations is the sum of the
last column in the table.
Record you calculated value here _________.
This is the Sum of Squares Error.
In
the applet under Analysis of Variance find the value for SS Error _____________
k.
Explain in simple English what SS Error means.
What would the plot look like if SS Error was very small compared to SS
Total? What would the plot look like if SS Error is about as large as SS Total?
SS
Predicted
SS
Predicted is the part of SS Total that CAN be predicted from the regression.
This corresponds to the sum of squared deviations of predicted values of
Y from the mean of Y.
L.
Calculations. Complete the calculations below using the predicted scores (Y´)
calculated for each case in part 1b and the mean of Y (5).
Check answer 1L.
Case 
X 
Y 
Y´ 

1 
1 
2 
2.9 
2.9
– 5.0 = 2.1 
(2.1)^{2}
= 4.41 
2 
3 
5 



3 
5 
7 



4 
7 
6 



Sum 
16 
20 
20.0



m.
Now click the boxes marked Show Mean of Y and Show Regression Line and
remove the checks from all other boxes.
Check Show SS Predicted to see deviations of regression line from
the mean, shown in blue. The blue
lines represent the differences between the mean and predicted scores.
If X were not useful in predicting Y, then the best prediction of Y would
be simply the mean of Y for any value of X, and the blue lines would be zero in
length. If X is useful in predicting Y, then the predicted values differ from
the mean. The blue lines give an
indication of how well X predicts Y.
Click
the box marked Show Error as Squares, to see the squared deviations of
predicted scores from means. Compare these to the red squares for SS Error. (You
can click Show SS Error if you would like to be reminded of the size of the red
squares.) Is X useful for predicting Y in this plot?
How do you know?
n. The sum of the squared deviations of the predicted scores from the mean is
the sum of the last column in the table in part L.
Record the calculated value here _________.
This is the Sum of Squares Predicted.
In
the applet under Analysis of Variance find the value for SS Predicted
_____________ .
o.
Explain what SS Predicted means. What
would the plot look like if SS Predicted was very small relative to SS Total?
p.
Note that SS Total = SS Predicted + SS Error.
(14.000 = 9.800 + 4.200). Thus,
with the regression model, we split SS Total into two parts, SS Predicted and SS
Error. We can compute the proportion of SS Total that is in SS Predicted.
In terms of sums of squares, this is the ratio of SS Predicted to SS
Total.
Calculate
[SS Predicted/ SS Total] = _________ / __________ = ____________.
SS
Total is the numerator of the variance of Y (i.e.,
), so the calculated
ratio can be interpreted as the proportion of variance in Y that can be
predicted from X using the regression model. A useful fact in regression is that
this ratio is equal to the correlation squared (rsquared).
Thus, the correlation squared (rsquared) represents the proportion of
variance in Y that can be explained by X, using the regression model.
What
does the applet report for the correlation r and rsquared?
r
= ______; r squared =
________
Summarize the relationship between X and Y for this set of data in simple English.