Regression Module #1, Exercise #2          

Sums of Squares: Computation and interpretation

Download a paper copy of this exercise if you do not already have one. You will learn about the meaning of Sums of Squares and how they can be represented and understood graphically.  You will be asked to make some calculations using a very small data set. These data, along with many calculated values, are represented in the applet. If you have a copy of this page, proceed directly to the applet for Module 1, Exercise 2.

Case

  X

  Y

(Y - Y´) (Y - Y´)2

1

  1

  4

  3.40

  .75

  .5625

  .60

  .3600

  .15

  .0225

2

  3

  3

 

 

 

 

 

 

 

3

  5

  2

 

 

 

 

 

 

 

4

  7

  4

 

 

 

 

 

 

 

Sum

16

13

  13.00

0.00

2.7500

0.00

2.7000

0.00

   .0500

a. Look at the plot of the data. Does it appear that the regression model will explain a large portion of the variance in Y?  Check answer 2a.

 

b. Calculate values for all of the empty cells in the table. Values for Case 1 and for the Sum are shown so that you can check your work. Some useful information can be found in the applet.

Hints:    You can calculate Y’ with the formula shown in the applet: Y’ = -.050X + 3.450. 

            Calculate the mean of Y by dividing the Sum of Y by n. (You should get 3.25.)

            More hints 2b.

c. Now place check marks only in the boxes titled Show SS Total and Show Mean of Y.  The four vertical black lines represent the deviations of each case from the mean of Y. Check the correspondence of the length of these lines with the values in the table that you calculated for the column . Which case has the largest deviation from the mean? 

The largest deviation from the mean is _____ for Case ___.

Hint: Look at the graph in the applet and at your calculations in the table.  Check answer 2c.

d. Now check the box labeled Show Error as Squares. The black squares correspond to the squared deviations from the mean, and the sum of the areas of these squares corresponds to SS Total. Notice how the deviations from the mean for the first and second cases are .75 and -.25 (a ratio of 3:1), while the squared deviations are .5625 and .0625 (a ratio of 9:1). This shows how points farther from the mean contribute much more to SS total than points closer to the mean. What is the contribution of the third case to SS Total? What is the ratio of this contribution compared to the contribution of the second case?

The ratio of the contribution to SS total for Case 3 vs Case 2 is ______:______.    Check answer 2d.

e. Now remove checks from all of the boxes, and check the boxes labeled Show Regression Line and Show SS error. The regression line allows us to find the predicted value of Y for any value of X. The vertical red lines correspond to the deviations of the observed values for Y from the predicted values on the regression line (). Which case has the largest deviation from its predicted value of ?

The largest deviation from the predicted value is _____ for Case ___.   

Hint: Look at the graph and your table. Check answer 2e.

f. Now check the box labeled Show Error as Squares. The red squares correspond to the squared deviations of Y from the predicted values (Y’). The sum of these areas corresponds to SS Error. You can compare SS Error with SS Total by also checking the box labeled Show SS Total. How does SS Error compare to SS Total? Do you think SS Error is much smaller than SS Total? Now check your table. What are the values for SS Error and SS Total?   Check answer 2f.

SS Error = _______   SS Total = _______

g. Now remove checks from the boxes Show SS Error, Show SS Total, and Show Error as Squares, and check the boxes labeled Show Mean of Y and Show SS Predicted. (Show Regression Line is still checked.) The blue vertical lines show the deviations between the predicted value of Y (Y’) and the mean of Y for each case.

Now check Show Error as Squares. The sum of the areas of these squares corresponds to SS Predicted. If the regression line is near to the mean, that tells us that the regression model does not predict Y much better than the mean does. Do you think SS Predicted is much smaller than SS Total? Now check your table. What are the values for SS Predicted and SS Total?

  

SS Predicted = _______   SS Total = _______            Check answer 2g.

 

 

h. Verify that SS Total = SS Predicted + SS Error.   Check answer 2h.

 

___________  =  ___________ +  _________

SS Total          =  SS Predicted   +  SS Error

 

i. We can calculate r squared from our SS values.

What portion of SS Total is accounted for by SS Predicted?

[SS Predicted / SS Total] = [_________ / _________]  =  ___________

What is the value for r squared as reported in the applet?  ___________   Check answer 2i.

Thus, we can say “r squared is the proportion of variance in Y that is explained by X.” The total variability in Y is measured by the sum of the squared deviation of Y scores around the mean of Y, which is SS Total. When we use linear regression, information on X is used to generate the regression line and the predicted value of Y. The part of SS Total that is ‘explained’ by the model is SS Predicted. The proportion explained is SS Predicted divided by SS Total.

Summary of Answer to 2a-1i  

Go to the Applet