Regression
Module #1, Exercise #2
Sums
of Squares: Computation and interpretation
Download
a paper copy of this exercise if you do not already have one. You will learn
about the meaning of Sums of Squares and how they can be represented and
understood graphically. You will be
asked to make some calculations using a very small data set. These data, along
with many calculated values, are represented in the applet. If you have a copy
of this page, proceed directly to the applet
for Module 1, Exercise 2.
Case |
X |
Y |
Y´ |
(Y - Y´) | (Y - Y´)2 | ||||
1 |
1 |
4 |
3.40 |
.75 |
.5625 |
.60 |
.3600 |
.15 |
.0225 |
2 |
3 |
3 |
|
|
|
|
|
|
|
3 |
5 |
2 |
|
|
|
|
|
|
|
4 |
7 |
4 |
|
|
|
|
|
|
|
Sum |
16 |
13 |
13.00 |
0.00 |
2.7500 |
0.00 |
2.7000 |
0.00 |
.0500 |
a.
Look at the plot of the data. Does it
appear that the regression model will explain a large portion of the variance in
Y? Check
answer 2a.
b.
Calculate values for all of the empty
cells in the table. Values for Case 1 and for the Sum are shown so that you
can check your work. Some useful information can be found in the applet.
Hints: You
can calculate Y’ with the formula shown in the applet: Y’ = -.050X + 3.450.
Calculate the mean of Y by dividing the Sum of Y by n. (You should get
3.25.)
c.
Now place check marks only
in the boxes titled Show SS Total and Show Mean of Y.
The four vertical black lines represent the deviations of each case from
the mean of Y. Check the correspondence of the length of these lines with the
values in the table that you calculated for the column
.
Which case has the largest deviation from the mean?
The
largest deviation from the mean is _____ for Case ___.
Hint:
Look at the graph in the applet and at your calculations in the table.
Check answer 2c.
d.
Now check the box labeled Show Error as Squares. The black squares
correspond to the squared deviations from the mean, and the sum of the areas of
these squares corresponds to SS Total. Notice how the deviations from the mean
for the first and second cases are .75 and -.25 (a ratio of 3:1), while the
squared deviations are .5625 and .0625 (a ratio of 9:1). This shows how points
farther from the mean contribute much more to SS total than points closer to the
mean. What is the contribution of the third case to SS Total? What is the ratio
of this contribution compared to the contribution of the second case?
The
ratio of the contribution to SS total for Case 3 vs Case 2 is ______:______.
Check answer 2d.
e.
Now remove checks from all of the boxes,
and check the boxes labeled Show Regression Line and Show SS error. The
regression line allows us to find the predicted value of Y for any value of X.
The vertical red lines correspond to the deviations of the observed values for Y
from the predicted values on the regression line (Y´
The
largest deviation from the predicted value is _____ for Case ___.
Hint:
Look at the graph and your table. Check answer 2e.
f.
Now check the box labeled Show Error as Squares. The red squares correspond
to the squared deviations of Y from the predicted values (Y’). The sum of
these areas corresponds to SS Error. You can compare SS Error with SS Total by
also checking the box labeled Show SS Total. How does SS Error compare to SS
Total? Do you think SS Error is much smaller than SS Total? Now check your
table. What are the values for SS Error and SS Total?
Check answer 2f.
SS
Error = _______ SS Total =
_______
g.
Now remove checks from the boxes Show SS
Error, Show SS Total, and Show Error as Squares, and check the boxes labeled
Show Mean of Y and Show SS Predicted. (Show Regression Line is still
checked.) The blue vertical lines show the deviations between the predicted
value of Y (Y’) and the mean of Y for each case.
Now check Show Error
as Squares. The sum of the areas of these squares corresponds to SS
Predicted. If the regression line is near to the mean, that tells us that the
regression model does not predict Y much better than the mean does. Do you think
SS Predicted is much smaller than SS Total? Now check your table. What are the
values for SS Predicted and SS Total?
SS Predicted = _______
SS Total = _______
Check answer 2g.
h. Verify that SS Total = SS Predicted + SS Error.
Check answer 2h.
___________ =
___________ + _________
SS Total
= SS Predicted
+ SS Error
i.
We can calculate r squared from our SS values.
What
portion of SS Total is accounted for by SS Predicted?
[SS
Predicted / SS Total] = [_________ / _________] = ___________
What
is the value for r squared as reported in the applet? ___________ Check
answer 2i.
Thus,
we can say “r squared is the proportion of variance in Y that is explained by
X.” The total variability in Y is measured by the sum of the squared deviation
of Y scores around the mean of Y, which is SS Total. When we use linear
regression, information on X is used to generate the regression line and the
predicted value of Y. The part of SS Total that is ‘explained’ by the model
is SS Predicted. The proportion explained is SS Predicted divided by SS Total.