Answers for Module 1,
Exercise 1:
Module 1, Exercise
1a:
The correlation (r) is .837, the slope of the line (b) is
.700, and the Intercept (a) is 2.200, taken from the right panel in the applet
for Regression Module 1, Exercise 1.
Module 1, Exercise
1b:
4.3, 5.7, and 7.1. Calculation
for the last value is 2.2 + .7x7 = 2.2 + 4.9 = 7.1.
Module 1, Exercise
1c:
SS Total = 14. The
squared deviations from the mean for the four cases are 9, 0, 4, and 1,
respectively.
Module 1, Exercise
1d:
The largest deviation from the mean is -3, for Case 1.
Module 1, Exercise
1e:
The contribution of the second case to SS Total is zero,
because the Y value of 5 is exactly equal to the mean.
Module 1, Exercise
1f:
SS Total = 14. You can find this as the sum of the last
column in your table, and this value is also shown in the applet in the SS
column in the Analysis of Variance section.
Module 1, Exercise
1g:
SS Total is the sum of the squared deviations of Y scores
from the mean of Y. If SS Total was much smaller, then all of the Y values must
be close to the mean. SS Total could be much larger for several reasons: many of
the Y values could be somewhat farther from the mean, a few values, or even one
value, could be very far from the mean, or we could simply have many more Y
values. Note that a single Y value that differed from the mean by 10 points
would contribute 100 to SS Total.
There is a close relationship between SS Total and
variance. An estimate of the population variance taken from a sample is
calculated at the sum of the squared deviations from the mean divided by the
degrees of freedom, which is (SS Total) / (n-1) for a single sample. In our example, this is 14/3 = 4.667. The standard deviation
is the square root of variance = 2.16, the value shown in the applet as the std
dev for the DV.
Module 1, Exercise
1h:
For the second case, Y′ = 4.3, Y-Y’ = (5 – 4.3) = .7, and (Y – Y′)2 = .49. For the third case, Y′ = 5.7, (Y-Y′)=(7 – 5.7) = 1.3, and (Y – Y′)2 = 1.69.
Module 1, Exercise
1i:
The
largest deviation is for Case __3__, and the size of the deviation is _1.3_.
The
smallest deviation is for Case __2__, and the size of the deviation is __.7__.
Module 1, Exercise
1j:
The calculated value for the Sum of Squares Error = SS
Error = 4.200.
Module 1, Exercise
1k:
SS
Error is the sum of the squared deviations of observed scores from the predicted
scores. If SS Error is very small, every observed score is close to the
predicted score, so the plot of every observed score is close to the regression
line.
If
SS Error is much smaller than SS Total, then the sum of deviations around the
regression line is much smaller than the sum of deviations around the mean.
Thus, the regression equation gives much more accurate predictions of scores
than simply using the mean as the prediction for all scores. The plot would show
a strong linear relationship between X and Y.
If
SS Error is about the same size as SS Total, then the regression equation has
not improved our prediction of Y scores. The regression line would be close to
horizontal at the mean. The plot would not show any indication of a linear
relationship between X and Y.
Module 1, Exercise
1L:
For the second case, the predicted score is 4.3, which is
.7 below the mean of 5.0, so the squared deviation of the predicted score from
the mean is .49. For the third case, the deviation is +.7, and for the fourth
case the deviation is +2.1. The sum of the squared deviations is 9.80.
Module 1, Exercise
1m:
Yes, it appears that X is useful in predicting Y in our
plot. The blue lines, which indicate predictive ability, are substantial. They
are relatively long, compared to the red lines we observed for error deviations,
and the blue squares are relatively large compared to the red squares. Thus, it
appears that the SS Predicted is substantial.
Module 1, Exercise
1n:
The Sum of Squares Predicted from the Analysis of Variance
table in the applet is 9.800, which is also the sum of the last column in the
table in part L.
Module 1, Exercise
1o:
SS Predicted is the sum of the squared deviations of
predicted scores from the mean. If the regression model is not at all useful,
then the predicted score will be the mean for each case, and SS Predicted will
be zero. If the regression model is only slightly helpful, then the predicted
scores will be only slightly different from the mean, and SS Predicted will be
small relative to SS Total. This plot would show virtually no linear
relationship between X and Y, and the regression line would be close to the
horizontal line for the mean of Y.
If there is a strong linear relationship in the data, SS
Predicted is large relative to SS Error, and the observed data fall close to the
regression line.
Module 1, Exercise
1p:
[SS Predicted / SS Total] = 9.800 / 14.000 = .700.
The applet reports r = .837 and r squared = .700.
This sample data shows a strong linear relationship, as
measured by r=.837. The plot shows this strong positive relationship, with
larger values of X generally associated with larger values of Y. In this sample,
70% of the variance in Y can be explained by the linear relationship with X.
We should note that this is an extremely small sample, and
that we would not be able to generalize to the relationship in a population of X
and Y values, even if there four cases are a random sample from that population.