Residuals and outliers

Document technical information

Format pdf
Size 40.0 kB
First found May 22, 2018

Document content analysis

Category Also themed
not defined
no text concepts found





Remember that the predicted values are
ŷi = β̂0 + β̂1x1i + · · · + β̂mxmi,
i = 1, . . . , n.
The residuals are e1, . . . , en, where
ei = yi − ŷi,
i = 1, . . . , n.
Plots to consider:
1) Construct a histogram, boxplot or normal
probability plot of residuals to check on
normality assumption.
2) Plot residuals against the predicted values.
This is a good plot for checking the equal
variances assumption.
3) If the independent variables are not highly
related, plot residuals against each independent variable.
4) If the data are collected over time, plot the
residuals against time. If time does not affect the response, this plot should show no
pattern. Durbin-Watson test can be used
to test for time effect. The Durbin-Watson
statistic can be gotten in SPSS via Regression → Linear → Statistics → DurbinWatson. Values of the statistic larger than
2.5 or less than 1.5 are indicative of a time
As in simple regression, outliers that occur near
the boundary of the x-region may not show up
in a residual plot. So, methods besides residuals are needed to spot outliers.
DF F IT S(i) =
ŷi − ŷ(i)
scale factor
where ŷi is as usual and ŷ(i) is the ith predicted value obtained after removing the ith
observation from the data set.
A large value of DF F IT S(i) indicates that the
ith observation may be an outlier. Values bigger than 2 in absolute value indicate potential
The DF F IT S statistics are obtained in SPSS
as follows: Regression → Linear → Save →
Standardized DfFit.
Plot DF F IT S(i) against i or one of the independent variables to check for outliers.
Always plot both residuals and DF F IT S.
• Residuals may miss outliers near boundary
of x-region.
• DF F IT S may miss outliers in ”middle” of
What should one do with outliers?
• After spotting an outlier, check to see if an
error was made in recording the data. If an
error was made, correct it and re-estimate
the model using all the data.
• If no errors were made, there are at least
two courses of action:
– Throw out the outlier(s) and estimate
the model with the remaining data. Consult a statistician if you want to predict
the response at values of x near the ones
thrown out.
– Use an alternative to least squares analysis, such as robust regression.

Report this document